|
Genome Biology 2010
Cloud-scale RNA-sequencing differential expression analysis with MyrnaAbstract: As cost and throughput continue to improve, second generation sequencing [1], in conjunction with RNA-Seq [2,3], is becoming an increasingly efficient and popular tool for studying gene expression. Currently, an RNA-Seq sequencing run generates hundreds of millions of reads derived from coding mRNA molecules in one or more biological samples. A typical RNA-Seq differential-expression analysis proceeds in three stages. First, reads are computationally categorized according to the transcribed feature from which each likely originated. Features of interest could be genes, exons or isoforms. This categorization might be conducted comparatively with respect to a reference [4], by de novo assembly [5], or a combination of both [6-8]. Second, a normalized count of the number of reads assigned to each feature is calculated. The count acts as a proxy for the feature's true abundance in the sample. Third, a statistical test is applied to identify which features exhibit differential abundance, or expression, between samples.Since second generation sequencing produces a very large number of reads distributed across the entire transcriptome, RNA-Seq affords greater resolution than expression arrays. Preliminary comparisons of the data from RNA-Seq also suggest that the measurements may more precisely measure RNA abundance in spike-in experiments than gene expression microarrays, provided appropriate normalization is applied [4,9].But improvements in sequencing cost and throughput also pose a data analysis challenge. While sequencing throughput grows at a rate of about 5× per year [10-12], computer speeds are thought to double approximately every 18 or 24 months [13]. Recent studies and commentaries [13-17] propose cloud computing as a paradigm that counteracts this disparity by tapping into the economies of scale afforded by commercial and institutional computing centers. If an algorithm can be made to run efficiently on many loosely coupled processors, implementing it as a clou
|