OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Genome Biology 2010

Modeling non-uniformity in short-read rates in RNA-Seq data

DOI: 10.1186/gb-2010-11-5-r50

Jun Li, Hui Jiang, Wing Wong

Full-Text Cite this paper Add to My Lib

Abstract:

Microarrays are an efficient technology to measure the expression levels of many genes simultaneously, but there are some limitations to this method. The expression estimates are typically not reliable for lowly expressed genes because the true signals are masked by cross-hybridization effects [1,2]. Furthermore, the design of the array depends on annotation of gene structures and thus the method is not ideal for the discovery of novel splicing events. A recently developed alternative approach, called RNA-Seq, has the potential to overcome these difficulties [3]. RNA-Seq uses ultra-high-throughput sequencing [4] to determine the sequence of a large number of cDNA fragments. The resulting sequences (reads) can be long (>100 nucleotides) or short, depending on the platform [4]. Two currently popular short-read platforms are Illumina's Solexa [5-11] and Applied Biosystems' (ABI's) SOLiD [12]. Each can produce tens of millions of short reads in a single run [5-12]. In this paper, we only consider the short-read RNA-Seq.The reads produced by RNA-Seq are first mapped to the genome and/or to the reference transcripts using computer programs. Then, the output of RNA-Seq can be summarized by a sequence of 'counts'. That is, for each position in the genome or on a putative transcript, it gives a count standing for the number of reads whose mapping starts at that position. As an example (we have shortened the gene and reads for simplification), if a gene with a single isoform has sequence ACGTCCCC, and we have 12 ACGTC reads, 8 CGTCC reads, 9 GTCCC reads, and 5 TCCCC reads, then this gene can be summarized by a sequence of counts 12, 8, 9, 5.Quantitative inference of RNA-Seq data, such as calculating gene expression levels [7] and isoform expression levels [13], is based on these counts. To utilize the data efficiently, it is crucial to have an appropriate statistical model for these counts. Current analysis methods assume, explicitly or implicitly, a naive constant-rate Poiss

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133