OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Human Genomics 2010

A survey of statistical software for analysing RNA-seq data

DOI: 10.1186/1479-7364-5-1-56

Dexiang Gao, Jihye Kim, Hyunmin Kim, Tzu L Phang, Heather Selby, Aik Tan, Tiejun Tong

Keywords: statistical software, RNA-sequencing analysis, normalisation, sequencing data

Full-Text Cite this paper Add to My Lib

Abstract:

High-throughput genome-wide RNA profiling by deep sequencing (RNA-seq) is rapidly emerging as a favourite method for gene expression studies. RNA-seq provides more precise measurement of levels of transcripts at a wide dynamic range and the ability to quantitate and detect known and novel isoforms by comparison with hybridisation-based technology (oligonucleotide and cDNA microarrays). In every sequencing run, tens of millions of short reads are simultaneously sequenced in each lane by the next generation sequencer. After pre-processing and mapping against a reference genome, the total number of counts for each mappable transcript is reported. It has been reported that the sequencing results are highly reproducible [1]. One of the main applications of RNA-seq is to identify differential expression (DE) genes under two or more different phenotypes (eg cancer versus normal samples).Several statistical methods have been proposed to identify DE [1-5]. When choosing a statistical analysis approach, some aspects need to be considered:(a) Normalisation. It was noticed that the observed number of reads for a gene depends on the expression level and the length of the gene, and also on the RNA composition of the sample [6,7]. The purpose of the normalisation is to minimise the influences of gene length and total sample RNA composition so that the normalised read counts represent a direct reflection of the targeted gene expression level. It has been shown that the normalisation procedure has a great impact on DE detection [2,7]. Depending on the experimental design, different normalisation methods are required.(b) Statistical model. The Poisson distribution is commonly used to model count data. Due to biological and genetic variations, however, for sequencing data the variance of a read is often much greater than the mean value. That is, the data are over-dispersed. In such cases, one natural alternative to Poisson is the negative binomial (NB) model. In addition to these two

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133