The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of–the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction.
References
[1]
Ostrowski J, Wyrwicz LS (2009) Integrating genomics, proteomics and bioinformatics in translational studies of molecular medicine. Expert review of molecular diagnostics 9: 623–630.
[2]
Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, et al. (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of clinical oncology 27: 1160–1167.
[3]
Cardoso F, Van't Veer L, Rutgers E, Loi S, Mook S, et al. (2008) Clinical application of the 70-gene profile: the MINDACT trial. Journal of clinical oncology 26: 729–735.
[4]
Cleator S, Tsimelzon A, Ashworth A, Dowsett M, Dexter T, et al. (2006) Gene expression patterns for doxorubicin (Adriamycin) and cyclophosphamide (cytoxan)(AC) response and resistance. Breast cancer research and treatment 95: 229–233.
[5]
Hess KR, Anderson K, Symmans WF, Valero V, Ibrahim N, et al. (2006) Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. Journal of clinical oncology 24: 4236–4244.
[6]
Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP (2007) Classification of microarray data using gene networks. BMC Bioinformatics 8: 35.
[7]
Goldenberg A, Mostafavi S, Quon G, Boutros PC, Morris QD (2011) Unsupervised detection of genes of influence in lung cancer using biological networks. Bioinformatics 27: 3166–3172.
[8]
Chuang HY, Lee E, Liu YT, Lee D, Ideker T (2007) Network-based classification of breast cancer metastasis. Molecular systems biology 3.
[9]
Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, et al. (2009) Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nature biotechnology 27: 199–204.
[10]
Junjie S, Byung-Jun Y, Edward D (2010) Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network. BMC Bioinformatics 11.
[11]
Dao P, Wang K, Collins C, Ester M, Lapuk A, et al. (2011) Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics 27: i205–i213.
[12]
Lin T, Kaminski N, Bar-Joseph Z (2008) Alignment and classification of time series gene expression in clinical studies. Bioinformatics 24: i147–i155.
[13]
Kaminski N, Bar-Joseph Z (2007) A patient-gene model for temporal expression profiles in clinical studies. Journal of Computational Biology 14: 324–338.
[14]
Lottaz C, Kostka D, Markowetz F, Spang R (2008) Computational diagnostics with gene expression profiles. Methods Mol Biol 453: 281–296.
[15]
Río J, Nos C, Tintoré M, Borrás C, Galán I, et al. (2002) Assessment of different treatment failure criteria in a cohort of relapsing–remitting multiple sclerosis patients treated with interferon β: Implications for clinical trials. Annals of neurology 52: 400–406.
[16]
Langer-Gould A, Moses HH, Murray TJ (2004) Strategies for managing the side effects of treatments for multiple sclerosis. Neurology 63: S35–S41.
[17]
Baranzini SE, Mousavi P, Rio J, Caillier SJ, Stillman A, et al. (2004) Transcription-based prediction of response to IFNβ using supervised computational methods. PLoS Biology 3: e2.
[18]
Goertsches RH, Hecker M, Koczan D, Serrano-Fernandez P, Moeller S, et al. (2010) Long-term genome-wide blood RNA expression profiles yield novel molecular response candidates for IFN-β-1b treatment in relapsing remitting MS. Pharmacogenomics 11: 147–161.
[19]
Hemmer B, Archelos JJ, Hartung HP (2002) New concepts in the immunopathogenesis of multiple sclerosis. Nature Reviews Neuroscience 3: 291–301.
[20]
Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, et al. (2009) Human protein reference database—2009 update. Nucleic acids research 37: D767–D772.
[21]
Rodríguez E, Ruíz B, García-Crespo á, García F. Speech/speaker recognition using a HMM/GMM hybrid model; 1997. Springer. pp. 227–234.
[22]
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77: 257–286.
[23]
Schliep A, Sch?nhuth A, Steinhoff C (2003) Using hidden Markov models to analyze gene expression time course data. Bioinformatics 19: i255–i263.
[24]
Schliep A, Costa IG, Steinhoff C, Schonhuth A (2005) Analyzing gene expression time-courses. Computational Biology and Bioinformatics, IEEE/ACM Transactions on 2: 179–193.
[25]
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning: Springer Series in Statistics.
[26]
Forney Jr GD (1973) The viterbi algorithm. Proceedings of the IEEE 61: 268–278.
[27]
Ji L, Tan KL (2005) Identifying time-lagged gene clusters using gene expression data. Bioinformatics 21: 509–516.
[28]
Zhang Y, Zha H, Chu CH (2005) A time-series biclustering algorithm for revealing co-regulated genes. Information Technology: Coding and Computing, IEEE International Conference on: IEEE pp. 32–37 Vol. 31.
[29]
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. Computational Biology and Bioinformatics, IEEE/ACM Transactions on 1: 24–45.
[30]
Madeira SC, Teixeira MC, Sa-Correia I, Oliveira AL (2010) Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. Computational Biology and Bioinformatics, IEEE/ACM Transactions on 7: 153–165.
[31]
Qu JB, Zhang XS, Wu LY, Wang Y, Chen L (2011) Detecting coherent local patterns from time series gene expression data by a temporal biclustering method. Systems Biology (ISB), 2011 IEEE International Conference on: IEEE. pp. 388–393.
Cover T, Hart P (1967) Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13: 21–27.
[34]
Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on: IEEE. pp. 2126–2136.
[35]
Madeira SC, Oliveira AL (2005) An evaluation of discretization methods for non-supervised analysis of time-series gene expression data. Instituto de Engenharia de Sistemas e Computadores Investigacao e Desenvolvimento, Technical Report 42.
[36]
Li W, Wang R, Yan Z, Bai L, Sun Z (2012) High Accordance in Prognosis Prediction of Colorectal Cancer across Independent Datasets by Multi-Gene Module Expression Profiles. Plos One 7: e33653.
[37]
Weston AD, Hood L (2004) Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. Journal of proteome research 3: 179–196.
[38]
Etienne W, Meyer MH, Peppers J, Meyer RA (2004) Comparison of mRNA gene expression by RT-PCR and DNA microarray. Biotechniques 36: 618–627.
[39]
Moreau Y, Aerts S, Moor BD, Strooper BD, Dabrowski M (2003) Comparison and meta-analysis of microarray data: from the bench to the computer desk. TRENDS in Genetics 19: 570–577.
[40]
Dallas P, Gottardo N, Firth M, Beesley A, Hoffmann K, et al. (2005) Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR–how well do they correlate? Bmc Genomics 6: 59.
[41]
Li Y, Ngom A (2011) Classification of clinical gene-sample-time microarray expression data via tensor decomposition methods. Computational Intelligence Methods for Bioinformatics and Biostatistics: 275–286.
[42]
Carreiro A, Anuncia??o O, Carri?o J, Madeira S (2011) Prognostic Prediction through Biclustering-Based Classification of Clinical Gene Expression Time Series. Journal of integrative bioinformatics 8: 175.
[43]
Costa IG, Sch?nhuth A, Hafemeister C, Schliep A (2009) Constrained mixture estimation for analysis and robust classification of clinical time series. Bioinformatics 25: i6.
[44]
Borgwardt K, Vishwanathan S, Kriegel H (2006) Class prediction from time series gene expression profiles using dynamical systems kernels. Pacific Symposium on Biocomputing. pp. 547.
[45]
Olson DL, Delen D (2008) Advanced data mining techniques: Springer Verlag.
[46]
Ginsburg GS, McCarthy JJ (2001) Personalized medicine: revolutionizing drug discovery and patient care. TRENDS in Biotechnology 19: 491–496.
[47]
Huang T, Cui WR, Hu LL, Feng KY, Li YX, et al. (2009) Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. Plos One 4: e8126.
[48]
Koczan D, Drynda S, Hecker M, Drynda A, Guthke R, et al. (2008) Molecular discrimination of responders and nonresponders to anti-TNFalpha therapy in rheumatoid arthritis by etanercept. Arthritis Res Ther 10: R50.
[49]
Ruiz-Pe?a JL, Duque P, Izquierdo G (2008) Optimization of treatment with interferon beta in multiple sclerosis. Usefulness of automatic system application criteria. BMC neurology 8: 3.
[50]
Espinosa E, Gámez-Pozo A, Sánchez-Navarro I, Pinto A, Casta?eda C, et al.. (2012) The present and future of gene profiling in breast cancer. Cancer and Metastasis Reviews: 1–6.
[51]
Nicholson JK (2006) Global systems biology, personalized medicine and molecular epidemiology. Molecular systems biology 2.
[52]
Storey JD, Tibshirani R (2003) Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods in molecular biology (Clifton, NJ) 224: 149–158.
[53]
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome biology 3: research0036.
[54]
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences 99: 6567–6572.
[55]
Baraldi E, Carraro S, Giordano G, Reniero F, Perilongo G, et al. (2009) Metabolomics: moving towards personalized medicine. Ital J Pediatr 35: 30.