%0 Journal Article
%T Differential Expression Analysis for RNA-Seq Data
%A Rashi Gupta
%A Isha Dewan
%A Richa Bharti
%A Alok Bhattacharya
%J ISRN Bioinformatics
%D 2012
%R 10.5402/2012/817508
%X RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression. 1. Introduction One of the recent methods for gene expression profiling is RNA-Seq. An advantage of RNA-Seq over other gene expression profiling technologies is that it allows a comprehensive assay that does not require probes for targets to be specified in advance. It has particularly been used for de novo detection of splice junctions and allows genome wide expression profiling of organisms with unknown genome sequence [1]. By obtaining millions of short reads from the population of interest and by mapping these reads to the reference genome, RNA-Seq produces read count data. With enough reads from a sample, it has the potential to detect and quantify biologically significant RNAs with low and moderate abundances. Before detecting biologically significant RNAs, systematic technical variations due to experimental variability need to be removed retaining effects resulting from the biological process of interest. This process is also known as normalization. Various procedures for normalization of RNA-Seq have been proposed in literature, such as transcripts parts per million [2], trimmed mean of M values [3], and quantile normalization [4]. Though these methods have been frequently used, no comparative analysis has been presented so far. Previous methods for identification of differential expressed genes include Bloom et al. [5] who identified differential expression by taking log ratio of the transcript counts; Hoen et al. [6] used a Student's t-test and alternatively also applied a Bayesian model of Vêncio et al.
%U http://www.hindawi.com/journals/isrn.bioinformatics/2012/817508/