Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
iASeq: integrative analysis of allele-specificity of protein-DNA interactions in multiple ChIP-seq datasets
Wei Yingying,Li Xia,Wang Qian-fei,Ji Hongkai
BMC Genomics , 2012, DOI: 10.1186/1471-2164-13-681
Abstract: Background ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. Results We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. Conclusions iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB.
RNA CoMPASS: A Dual Approach for Pathogen and Host Transcriptome Analysis of RNA-Seq Datasets  [PDF]
Guorong Xu, Michael J. Strong, Michelle R. Lacey, Carl Baribault, Erik K. Flemington, Christopher M. Taylor
PLOS ONE , 2014, DOI: 10.1371/journal.pone.0089445
Abstract: High-throughput RNA sequencing (RNA-seq) has become an instrumental assay for the analysis of multiple aspects of an organism's transcriptome. Further, the analysis of a biological specimen's associated microbiome can also be performed using RNA-seq data and this application is gaining interest in the scientific community. There are many existing bioinformatics tools designed for analysis and visualization of transcriptome data. Despite the availability of an array of next generation sequencing (NGS) analysis tools, the analysis of RNA-seq data sets poses a challenge for many biomedical researchers who are not familiar with command-line tools. Here we present RNA CoMPASS, a comprehensive RNA-seq analysis pipeline for the simultaneous analysis of transcriptomes and metatranscriptomes from diverse biological specimens. RNA CoMPASS leverages existing tools and parallel computing technology to facilitate the analysis of even very large datasets. RNA CoMPASS has a web-based graphical user interface with intrinsic queuing to control a distributed computational pipeline. RNA CoMPASS was evaluated by analyzing RNA-seq data sets from 45 B-cell samples. Twenty-two of these samples were derived from lymphoblastoid cell lines (LCLs) generated by the infection of na?ve B-cells with the Epstein Barr virus (EBV), while another 23 samples were derived from Burkitt's lymphomas (BL), some of which arose in part through infection with EBV. Appropriately, RNA CoMPASS identified EBV in all LCLs and in a fraction of the BLs. Cluster analysis of the human transcriptome component of the RNA CoMPASS output clearly separated the BLs (which have a germinal center-like phenotype) from the LCLs (which have a blast-like phenotype) with evidence of activated MYC signaling and lower interferon and NF-kB signaling in the BLs. Together, this analysis illustrates the utility of RNA CoMPASS in the simultaneous analysis of transcriptome and metatranscriptome data. RNA CoMPASS is freely available at http://rnacompass.sourceforge.net/.
Evaluation of Algorithm Performance in ChIP-Seq Peak Detection  [PDF]
Elizabeth G. Wilbanks,Marc T. Facciotti
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0011471
Abstract: Next-generation DNA sequencing coupled with chromatin immunoprecipitation (ChIP-seq) is revolutionizing our ability to interrogate whole genome protein-DNA interactions. Identification of protein binding sites from ChIP-seq data has required novel computational tools, distinct from those used for the analysis of ChIP-Chip experiments. The growing popularity of ChIP-seq spurred the development of many different analytical programs (at last count, we noted 31 open source methods), each with some purported advantage. Given that the literature is dense and empirical benchmarking challenging, selecting an appropriate method for ChIP-seq analysis has become a daunting task. Herein we compare the performance of eleven different peak calling programs on common empirical, transcription factor datasets and measure their sensitivity, accuracy and usability. Our analysis provides an unbiased critical assessment of available technologies, and should assist researchers in choosing a suitable tool for handling ChIP-seq data.
A normalization strategy for comparing tag count data
Koji Kadota, Tomoaki Nishiyama, Kentaro Shimizu
Algorithms for Molecular Biology , 2012, DOI: 10.1186/1748-7188-7-5
Abstract: We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.Development of next-generation sequencing technologies has enabled biological features such as gene expression and histone modification to be quantified as tag count data by ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses [1,2]. Different from hybridization-based microarray technologies [3,4], sequencing-based technologies do not require prior information about the genome or transcriptome sequences of the samples of interest [5]. Therefore, researchers can profile the expression of not only well-annotated model organisms but also poorly annotated non-model organisms. RNA-seq in such organisms enables the gene structures and expression levels to be determined.One important task for RNA-seq is to identify differential expression (DE) for genes or transcripts. Similar to microarray analysis, we typically start the analysis with a so-ca
Seq4SNPs: new software for retrieval of multiple, accurately annotated DNA sequences, ready formatted for SNP assay design
Helen I Field, Serena A Scollen, Craig Luccarini, Caroline Baynes, Jonathan Morrison, Alison M Dunning, Douglas F Easton, Paul DP Pharoah
BMC Bioinformatics , 2009, DOI: 10.1186/1471-2105-10-180
Abstract: We created Seq4SNPs, a web-based, walk-away software that can process one to several hundred SNPs given rs numbers as input. It outputs a file of fully annotated sequences formatted for one of three proprietary design softwares: TaqMan's Primer-By-Design FileBuilder, Sequenom's iPLEX or SNPstream's Autoprimer, as well as unannotated fasta sequences. We found genotyping assays to be inhibited by repetitive sequences or the presence of additional variations flanking the SNP under test, and in multiplexes, repetitive sequence flanking one SNP adversely affects multiple assays. Assay design software programs avoid such regions if the input sequences are appropriately annotated, so we used Seq4SNPs to provide suitably annotated input sequences, and improved our genotyping success rate. Adjacent SNPs can also be avoided, by annotating sequences used as input for primer design.The accuracy of annotation by Seq4SNPs is significantly better than manual annotation (P < 1e-5).Using Seq4SNPs to incorporate all annotation for additional SNPs and repetitive elements into sequences, for genotyping assay designer software, minimizes assay failure at the design stage, reducing the cost of genotyping. Seq4SNPs provides a rapid route for replacement of poor test SNP sequences. We routinely use this software for assay sequence preparation.Seq4SNPs is available as a service at http://moya.srl.cam.ac.uk/oncology/bio/s4shome.html webcite and http://moya.srl.cam.ac.uk/cgi-bin/oncology/srl/ncbi/seq4snp1.pl webcite, currently for human SNPs, but easily extended to include any species in dbSNP.A survey of single nucleotide polymorphism (SNP) and primer design software reveals several packages that align EST or genome sequences to discover SNPs [1-6]. SNP-VISTA visualizes SNPs from aligned genome sequences [7]. Other packages take a chromosome region then use recorded SNP genotypes, and additional information, to reduce the set of SNPs that need genotyping [[8,9] and references therein]. SNP i
Error estimates for the analysis of differential expression from RNA-seq count data  [PDF]
Conrad Burden,Sumaira Qureshi,Susan R Wilson
PeerJ , 2015, DOI: 10.7287/peerj.preprints.400v3
Abstract: A number of algorithms exist for analysing RNA-sequencing data to infer profiles of differential gene expression. Problems inherent in building algorithms around statistical models of over dispersed count data are formidable and frequently lead to non-uniform p-value distributions for null-hypothesis data and to inaccurate estimates of false discovery rates (FDRs). This can lead to an inaccurate measure of significance and loss of power to detect differential expression.
Error estimates for the analysis of differential expression from RNA-seq count data  [PDF]
Conrad J. Burden,Sumaira E. Qureshi,Susan R. Wilson
PeerJ , 2015, DOI: 10.7717/peerj.576
Abstract: Background. A number of algorithms exist for analysing RNA-sequencing data to infer profiles of differential gene expression. Problems inherent in building algorithms around statistical models of over dispersed count data are formidable and frequently lead to non-uniform p-value distributions for null-hypothesis data and to inaccurate estimates of false discovery rates (FDRs). This can lead to an inaccurate measure of significance and loss of power to detect differential expression.
SERE: Single-parameter quality control and sample comparison for RNA-Seq  [cached]
Schulze Stefan K,Kanwar Rahul,G?lzenleuchter Meike,Therneau Terry M
BMC Genomics , 2012, DOI: 10.1186/1471-2164-13-524
Abstract: Background Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. Conclusions SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.
Differential Expression Analysis for RNA-Seq Data  [PDF]
Rashi Gupta,Isha Dewan,Richa Bharti,Alok Bhattacharya
ISRN Bioinformatics , 2012, DOI: 10.5402/2012/817508
Abstract: RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression. 1. Introduction One of the recent methods for gene expression profiling is RNA-Seq. An advantage of RNA-Seq over other gene expression profiling technologies is that it allows a comprehensive assay that does not require probes for targets to be specified in advance. It has particularly been used for de novo detection of splice junctions and allows genome wide expression profiling of organisms with unknown genome sequence [1]. By obtaining millions of short reads from the population of interest and by mapping these reads to the reference genome, RNA-Seq produces read count data. With enough reads from a sample, it has the potential to detect and quantify biologically significant RNAs with low and moderate abundances. Before detecting biologically significant RNAs, systematic technical variations due to experimental variability need to be removed retaining effects resulting from the biological process of interest. This process is also known as normalization. Various procedures for normalization of RNA-Seq have been proposed in literature, such as transcripts parts per million [2], trimmed mean of M values [3], and quantile normalization [4]. Though these methods have been frequently used, no comparative analysis has been presented so far. Previous methods for identification of differential expressed genes include Bloom et al. [5] who identified differential expression by taking log ratio of the transcript counts; Hoen et al. [6] used a Student's t-test and alternatively also applied a Bayesian model of Vêncio et al.
Biological Averaging in RNA-Seq  [PDF]
Surojit Biswas,Yash N. Agrawal,Tatiana S. Mucyn,Jeffery L. Dangl,Corbin D. Jones
Quantitative Biology , 2013,
Abstract: RNA-seq has become a de facto standard for measuring gene expression. Traditionally, RNA-seq experiments are mathematically averaged -- they sequence the mRNA of individuals from different treatment groups, hoping to correlate phenotype with differences in arithmetic read count averages at shared loci of interest. Alternatively, the tissue from the same individuals may be pooled prior to sequencing in what we refer to as a biologically averaged design. As mathematical averaging sequences all individuals it controls for both biological and technical variation; however, is the statistical resolution gained always worth the additional cost? To compare biological and mathematical averaging, we examined theoretical and empirical estimates of statistical efficiency and relative cost efficiency. Though less efficient at a fixed sample size, we found that biological averaging can be more cost efficient than mathematical averaging. With this motivation, we developed a differential expression classifier, ICRBC, that can detect alternatively expressed genes between biologically averaged samples. In simulation studies, we found that biological averaging and subsequent analysis with our classifier performed comparably to existing methods, such as ASC, edgeR, and DESeq, especially when individuals were pooled evenly and less than 20% of the regulome was expected to be differentially regulated. In two technically distinct mouse datasets and one plant dataset, we found that our method was over 87% concordant with edgeR for the 100 most significant features. We therefore conclude biological averaging may sufficiently control biological variation to a level that differences in gene expression may be detectable. In such situations, ICRBC can enable reliable exploratory analysis at a fraction of the cost, especially when interest lies in the most differentially expressed loci.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.