We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gob?yweb2-plugins.
References
[1]
Langmead B, Hansen KD, Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11: R83.
[2]
Fischer M, Snajder R, Pabinger S, Dander A, Schossig A, et al. (2012) SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS One 7: e41948.
[3]
Campagne F, Dorff K, Chambwe N, Robinson JT, Mesirov JP, et al.. (2012) Compression of structured high-throughput sequencing data. Preprint at arXivorg.
[4]
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.
[5]
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, et al.. (2012) STAR: ultrafast universal RNA-seq aligner. Bioinformatics.
[6]
Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26: 873–881.
[7]
Anders S (2010) Analysing RNA-Seq data with the DESeq package. Molecular biology: 1–17.
[8]
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140.
[9]
Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, et al. (2011) Integrative genomics viewer. Nat Biotechnol 29: 24–26.
[10]
Robinson M (2009) edgeR: Methods for differential expression in digital gene expression datasets. Bioconductor: 1–7.
[11]
Lin Z, Puetter A, Coco J, Xu G, Strong MJ, et al. (2012) Detection of murine leukemia virus in the Epstein-Barr virus-positive human B-cell line JY, using a computational RNA-Seq-based exogenous agent detection pipeline, PARSES. J Virol 86: 2970–2977.
[12]
Wylie KM, Mihindukulasuriya KA, Sodergren E, Weinstock GM, Storch GA (2012) Sequence analysis of the human virome in febrile and afebrile children. PLoS One 7: e27735.
[13]
Yozwiak NL, Skewes-Cox P, Stenglein MD, Balmaseda A, Harris E, et al. (2012) Virus identification in unknown tropical febrile illness cases using deep sequencing. PLoS Negl Trop Dis 6: e1485.
[14]
Hapmap Consortium (2003) The International HapMap Project. Nature 426: 789–796.
[15]
Kielbasa SM, Wan R, Sato K, Horton P, Frith MC (2011) Adaptive seeds tame genomic sequence comparison. Genome Res 21: 487–493.
[16]
Krueger F, Andrews SR (2011) Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27: 1571–1572.
[17]
Steine EJ, Ehrich M, Bell GW, Raj A, Reddy S, et al. (2011) Genes methylated by DNA methyltransferase 3b are similar in mouse intestine and human colon cancer. J Clin Invest 121: 1748–1752.
[18]
Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, et al. (2012) methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 13: R87.
[19]
Angiuoli SV, Matalka M, Gussman A, Galens K, Vangala M, et al. (2011) CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12: 356.
[20]
Dorff KC, Chambwe N, Zeno Z, Simi M, Shaknovich R, et al. (2000–2013) GobyWeb software: http://gobyweb.campagnelab.org. Available: http://gobyweb.campagnelab.org. Accessed.
[21]
Dudley JT, Pouliot Y, Chen R, Morgan AA, Butte AJ (2010) Translational bioinformatics in the cloud: an affordable alternative. Genome Med 2: 51.
[22]
Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters; 2004.
[23]
Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, et al. (2006) Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34: W729–732.
[24]
Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11: R86.
[25]
Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38: e131.
[26]
Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11: 94.
[27]
Chikhi R, Rizk G. Space-efficient and exact de Bruijn graph representation based on a Bloom filter; 2012.
[28]
Wheeler DL, Church DM, Lash AE, Leipe DD, Madden TL, et al. (2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 29: 11–16.
[29]
Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, et al. (2013) Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 31: 213–219.
[30]
Li M, Chen WD, Papadopoulos N, Goodman SN, Bjerregaard NC, et al. (2009) Sensitive digital quantification of DNA methylation in clinical samples. Nat Biotechnol 27: 858–863.
[31]
Li H (2011) Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27: 718–719.
[32]
Ovaska K, Laakso M, Haapa-Paananen S, Louhimo R, Chen P, et al. (2010) Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med 2: 65.
[33]
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, et al. (2006) GenePattern 2.0. Nat Genet 38: 500–501.