Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.
References
[1]
Sorlie, T.; Perou, C.M.; Tibshirani, R.; Aas, T.; Geisler, S.; Johnsen, H.; Hastie, T.; Eisen, M.B.; van de Rijn, M.; Jeffrey, S.S.; Thorsen, T.; Quist, H.; Matese, J.C.; Brown, P.O.; Botstein, D.; Lonning, P.E.; Borresen-Dale, A.L. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 2001, 98, 10869–10874.
[2]
van ’t Veer, L.J.; Dai, H.; van de Vijver, M.J.; He, Y.D.; Hart, A.A.M.; Mao, M.; Peterse, H.L.; van der Kooy, K.; Marton, M.J.; Witteveen, A.T.; Schreiber, G.J.; Kerkhoven, R.M.; Roberts, C.; Linsley, P.S.; Bernards, R.; Friend, S.H. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415, 530–536.
[3]
Duda, R.; Hart, P.; Stork, D. Pattern Classification; Wiley-Interscience: New York, NY, USA, 2001.
[4]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2001.
[5]
Tibshirani, R.; Hastie, T.; Narasimhan, B.; Chu, G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 2002, 99, 6567–6572.
[6]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422.
[7]
Diaz-Uriarte, R.; de Andres, S.A. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, doi:10.1186/1471-2105-7-3.
[8]
Tusher, V.G.; Tibshirani, R.; Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 2001, 98, 5116–5121.
[9]
Wang, L.; Zhu, J.; Zou, H. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 2008, 24, 412–419.
[10]
Zhang, H.H.; Ahn, J.; Lin, X.; Park, C. Gene selection using support vector machines with non-convex penalty. Bioinformatics 2006, 22, 88–95.
[11]
Becker, N.; Toedt, G.; Lichter, P.; Benner, A. Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data. BMC Bioinform. 2011, 12, doi:10.1186/1471-2105-12-138.
[12]
Goeman, J. L-1 penalized estimation in the cox proportional hazards model. Biom. J. 2010, 52, 70–84.
[13]
Binder, H.; Schumacher, M. Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinform. 2009, 10, doi:10.1186/1471-2105-10-18.
[14]
G?nen, M. Statistical aspects of gene signatures and molecular targets. Gastrointest. Cancer Res. 2009, 3, S19–S21.
[15]
Blazadonakis, M.E.; Zervakis, M.E.; Kafetzopoulos, D. Integration of gene signatures using biological knowledge. Artif. Intell. Med. 2011, 53, 57–71.
[16]
Kanehisa, M.; Araki, M.; Goto, S.; Hattori, M.; Hirakawa, M.; Itoh, M.; Katayama, T.; Kawashima, S.; Okuda, S.; Tokimatsu, T.; Yamanishi, Y. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36, D480–D484.
[17]
Prasad, T.S.K.; Kandasamy, K.; Pandey, A. Human protein reference database and human proteinpedia as discovery tools for systems biology. Methods Mol. Biol. 2009, 577, 67–79.
[18]
Cerami, E.G.; Gross, B.E.; Demir, E.; Rodchenkov, I.; Babur, O.; Anwar, N.; Schultz, N.; Bader, G.D.; Sander, C. Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 2011, 39, D685–D690.
[19]
Collins, S.R.; Kemmeren, P.; Zhao, X.C.; Greenblatt, J.F.; Spencer, F.; Holstege, F.C.P.; Weissman, J.S.; Krogan, N.J. Toward a comprehensive atlas of the physical interactome of Saccharomycescerevisiae. Mol. Cell. Proteomics 2007, 6, 439–450.
[20]
Gade, S.; Porzelius, C.; Faelth, M.; Brase, J.; Wuttig, D.; Kuner, R.; Binder, H.; Sueltmann, H.; Beissbarth, T. Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer. BMC Bioinform. 2011, 12, doi:10.1186/1471-2105-12-488.
[21]
Sch?lkopf, B.; Smola, A.J. Learning with Kernels; Sch?lkopf, B., Mika, S., Burges, C.J., Knirsch, K.-R.M., R?tsch, G., Smola, A.J., Eds.; MIT Press: Cambridge, MA, USA, 2002.
[22]
Tikhonov, A.; Arsenin, V. Solutions of Ill-Posed Problems; W.H. Winston & Sons: Washington, DC, USA, 1977.
[23]
Taylor, I.W.; Linding, R.; Warde-Farley, D.; Liu, Y.; Pesquita, C.; Faria, D.; Bull, S.; Pawson, T.; Morris, Q.; Wrana, J.L. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat. Biotechnol. 2009, 27, 199–204.
[24]
Guo, Z.; Zhang, T.; Li, X.; Wang, Q.; Xu, J.; Yu, H.; Zhu, J.; Wang, H.; Wang, C.; Topol, E.J.; Wang, Q.; Rao, S. Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinform. 2005, 6, doi:10.1186/1471-2105-6-58.
[25]
The Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32, D258–D261.
[26]
Vaske, C.J.; Benz, S.C.; Sanborn, J.Z.; Earl, D.; Szeto, C.; Zhu, J.; Haussler, D.; Stuart, J.M. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 2010, 26, i237–i245.
[27]
Teschendorff, A.E.; Gomez, S.; Arenas, A.; El-Ashry, D.; Schmidt, M.; Gehrmann, M.; Caldas, C. Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules. BMC Cancer 2010, 10, doi:10.1186/1471-2407-10-604.
Yang, R.; Daigle, B.J.; Petzold, L.R.; Doyle, F.J. Core module biomarker identification with network exploration for breast cancer metastasis. BMC Bioinform. 2012, 13, doi:10.1186/1471-2105-13-12.
[30]
Bild, A.H.; Yao, G.; Chang, J.T.; Wang, Q.; Potti, A.; Chasse, D.; Joshi, M.B.; Harpole, D.; Lancaster, J.M.; Berchuck, A.; Olson, J.A.; Marks, J.R.; Dressman, H.K.; West, M.; Nevins, J.R. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 2006, 439, 353–357.
[31]
Bentink, S.; Wessendorf, S.; Schwaenen, C.; Rosolowski, M.; Klapper, W.; Rosenwald, A.; Ott, G.; Banham, A.H.; Berger, H.; Feller, A.C.; Hansmann, M.L.; Hasenclever, D.; Hummel, M.; Lenze, D.; Mller, P.; Stuerzenhofecker, B.; Loeffler, M.; Truemper, L.; Stein, H.; Siebert, R.; Spang, R. in Malignant Lymphomas Network Project of the, M.M. Pathway activation patterns in diffuse large B-cell lymphomas. Leukemia 2008, 22, 1746–1754.
[32]
Yu, J.X.; Sieuwerts, A.M.; Zhang, Y.; Martens, J.W.M.; Smid, M.; Klijn, J.G.M.; Wang, Y.; Foekens, J.a. Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer. BMC Cancer 2007, 7, doi:10.1186/1471-2407-7-182.
[33]
Goeman, J.; van de Geer, S.; de Kort, F.; van Houwelingen, H. A global test for groups of genes: Testing association with a clinical outcome. Bioinformatics 2004, 20, 93–99.
[34]
Kammers, K.; Lang, M.; Hengstler, J.G.; Schmidt, M.; Rahnenfuhrer, J. Survival models with preclustered gene groups as covariates. BMC Bioinform. 2011, 12, doi:10.1186/1471-2105-12-478.
[35]
Kaufman, L.; Rousseeuw, P. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley: New York, NY, USA, 1990.
[36]
Chuang, H.Y.; Lee, E.; Liu, Y.T.; Lee, D.; Ideker, T. Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 2007, 3, doi:10.1038/msb4100180.
[37]
Chowdhury, S.A.; Koyutürk, M. Identification of coordinately dysregulatedsubnetworks in complex phenotypes. Pac. Symp.Biocomput. 2010, 2010, 133–144.
[38]
Fortney, K.; Kotlyar, M.; Jurisica, I. Inferring the functions of longevity genes with modular subnetwork biomarkers of Caenorhabditis elegans aging. Genome Biol. 2010, 11, doi:10.1186/gb-2010-11-2-r13.
[39]
Su, J.; Yoon, B.J.; Dougherty, E.R. Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network. BMC Bioinform. 2010, 11, doi:10.1186/1471-2105-11-S6-S8.
[40]
Ahn, J.; Yoon, Y.; Park, C.; Shin, E.; Park, S. Integrative gene network construction for predicting a set of complementary prostate cancer genes. Bioinformatics 2011, 27, 1846–1853, doi:10.1093/bioinformatics/btr283.
[41]
Dutkowski, J.; Ideker, T. Protein networks as logic functions in development and cancer. PLoS Comput. Biol. 2011, 7, doi:10.1371/journal.pcbi.1002180.
[42]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32, doi:10.1023/A:1010933404324.
[43]
Chowdhury, S.A.; Nibbe, R.K.; Chance, M.R.; Koyutürk, M. Subnetwork state functions define dysregulatedsubnetworks in cancer. J. Comput. Biol. 2011, 18, 263–281, doi:10.1089/cmb.2010.0269.
[44]
Dao, P.; Colak, R.; Salari, R.; Moser, F.; Davicioni, E.; Sch?nhuth, A.; Ester, M. Inferring cancer subnetwork markers using density-constrained biclustering. Bioinformatics 2010, 26, i625–i631.
[45]
Dittrich, M.T.; Klau, G.W.; Rosenwald, A.; Dandekar, T.; Müller, T. Identifying functional modules in protein-protein interaction networks: An integrated exact approach. Bioinformatics (Oxford, UK) 2008, 24, i223–i231.
Alon, N.; Dao, P.; Hajirasouliha, I.; Hormozdiari, F.; Sahinalp, S.C. Biomolecular network motif counting and discovery by color coding. Bioinformatics 2008, 24, i241–i249.
[48]
Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2004.
[49]
Kondor, R.; Lafferty, J. Diffusion Kernels on Graphs and Other Discrete Input Spaces. In Proceedings of the 9th International Conference on Machine Learning (ICML ’02), Sydney, NSW, Australia, 8-12 July 2002.
[50]
Gao, C.; Dang, X.; Chen, Y.; Wilkins, D. Graph ranking for exploratory gene data analysis. BMC Bioinform. 2009, 10, doi:10.1186/1471-2105-10-S11-S19.
[51]
Rapaport, F.; Zinovyev, A.; Dutreix, M.; Barillot, E.; Vert, J.P. Classification of microarray data using gene networks. BMC Bioinform. 2007, 8, doi:10.1186/1471-2105-8-35.
[52]
Nitsch, D.; Tranchevent, L.C.; Thienpont, B.; Thorrez, L.; Esch, H.V.; Devriendt, K.; Moreau, Y. Network analysis of differential expression for the identification of disease-causing genes. PLoS One 2009, 4, doi:10.1371/journal.pone.0005526.
[53]
Qiu, Y.Q.; Zhang, S.; Zhang, X.S.; Chen, L. Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinform. 2010, 11, doi:10.1186/1471-2105-11-26.
[54]
Chen, L.; Xuan, J.; Riggins, R.; Clarke, R.; Wang, Y. Identifying cancer biomarkers by network-constrained support vector machines. BMC Syst. Biol. 2011, 5, doi:10.1186/1752-0509-5-161.
[55]
Zhu, Y.; Shen, X.; Pan, W. Network-based support vector machine for classification of microarray samples. BMC Bioinform. 2009, 10, doi:10.1186/1471-2105-10-S1-S21.
[56]
Johannes, M.; Brase, J.; Fr?hlich, H.; Sültmann, H.; Beissbarth, T. Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients. Bioinformatics 2010, 26, 2136–2144.
[57]
Morrison, J.L.; Breitling, R.; Higham, D.J.; Gilbert, D.R. GeneRank: Using search engine technology for the analysis of microarray experiments. BMC Bioinform. 2005, 6, doi:10.1186/ 1471-2105-6-233.
[58]
Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Technical Report 1999-66; Stanford InfoLab: Stanford, CA, USA, 1999.
[59]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 1996, 58, 267–288.
[60]
Li, C.; Li, H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 2008, 24, 1175–1182, doi:10.1093/bioinformatics/btn081.
[61]
Cun, Y.; Fr?hlich, H. Prognostic signatures patient in gene for stratification breast cancer—Accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions. BMC Bioinform. 2012. revised.