OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

PLOS ONE 2011

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

DOI: 10.1371/journal.pone.0024233

Jiangang Liu, Robert A. Jolly, Aaron T. Smith, George H. Searfoss, Keith M. Goldstein, Vladimir N. Uversky, Keith Dunker, Shuyu Li, Craig E. Thomas, Tao Wei

Full-Text Cite this paper Add to My Lib

Abstract:

Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses.

References

[1]	Ozer J, Ratner M, Shaw M, Bailey W, Schomaker S (2008) The current state of serum biomarkers of hepatotoxicity. Toxicology 245: 194–205.
[2]	Ryan TP, Stevens JL, Thomas CE (2008) Strategic applications of toxicogenomics in early drug discovery. Curr Opin Pharmacol 8: 654–660.
[3]	Sistare FD, DeGeorge JJ (2007) Preclinical predictors of clinical safety: opportunities for improvement. Clin Pharmacol Ther 82: 210–214.
[4]	Mendrick DL (2008) Genomic and genetic biomarkers of toxicity. Toxicology 245: 175–181.
[5]	Fielden MR, Eynon BP, Natsoulis G, Jarnagin K, Banas D, et al. (2005) A gene expression signature that predicts the future onset of drug-induced renal tubular toxicity. Toxicol Pathol 33: 675–683.
[6]	Luo W, Fan W, Xie H, Jing L, Ricicki E, et al. (2005) Phenotypic anchoring of global gene expression profiles induced by N-hydroxy-4-acetylaminobiphenyl and benzo[a]pyrene diol epoxide reveals correlations between expression profiles and mechanism of toxicity. Chem Res Toxicol 18: 619–629.
[7]	Bushel PR, Heinloth AN, Li J, Huang L, Chou JW, et al. (2007) Blood gene expression signatures predict exposure levels. Proc Natl Acad Sci U S A 104: 18211–18216.
[8]	Zidek N, Hellmann J, Kramer PJ, Hewitt PG (2007) Acute hepatotoxicity: a predictive model based on focused illumina microarrays. Toxicol Sci 99: 289–302.
[9]	Eun JW, Ryu SY, Noh JH, Lee MJ, Jang JJ, et al. (2008) Discriminating the molecular basis of hepatotoxicity using the large-scale characteristic molecular signatures of toxicants by expression profiling analysis. Toxicology 249: 176–183.
[10]	Fan X, Lobenhofer EK, Chen M, Shi W, Huang J, et al. (2010) Consistency of predictive signature genes and classifiers generated using different microarray platforms. Pharmacogenomics J 10: 247–257.
[11]	Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DS, et al. (2006) Concordance among gene-expression-based predictors for breast cancer. N Engl J Med 355: 560–569.
[12]	Liu J, Campen A, Huang S, Peng SB, Ye X, et al. (2008) Identification of a gene signature in cell cycle pathway for breast cancer prognosis using gene expression profiling data. BMC Med Genomics 1: 39.
[13]	Ransohoff DF (2004) Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer 4: 309–314.
[14]	Ransohoff DF (2005) Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer 5: 142–149.
[15]	Dessì N, Pes B (2009) An Evolutionary Method for Combining Different Feature Selection Criteria in Microarray Data Classification. Journal of Artificial Evolution and Applications. pp. 1–10.
[16]	Sima C, Dougherty ER (2008) The Peaking Phenomenon in the Presence of Feature Selection Pattern Recognition Letters 29: 1667–1674.
[17]	Dougherty ER, Hua J, Sima C (2009) Performance of feature selection methods. Curr Genomics 10: 365–374.
[18]	Vittinghoff E, McCulloch CE (2007) Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 165: 710–718.
[19]	Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49: 1373–1379.
[20]	Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46: 389–422.
[21]	Zhang X, Lu X, Shi Q, Xu XQ, Leung HC, et al. (2006) Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7: 197.
[22]	Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23: 2507–2517.
[23]	Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. The Journal of Machine Learning Research 3: 1157–1182.
[24]	Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. Proceedings of the Eighteenth International Conference on Machine Learning. pp. 74–81.
[25]	John GH, Kohavi R, Pfleger K (1994) Irrelevant Features and the Subset Selection Problem. In Proceedings of ICML. pp. 121–129.
[26]	Kohavi R, John GH (1997) Wrappers for feature subset selection. Artificial Intelligence 97: 273–324.
[27]	Bo T, Jonassen I (2002) New feature subset selection procedures for classification of expression profiles. Genome Biol 3: research0017.1–0017.11.
[28]	Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, et al. (2000) Tissue classification with gene expression profiles. J Comput Biol 7: 559–583.
[29]	Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537.
[30]	Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5: 101–113.
[31]	Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440: 631–636.
[32]	Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99: 6567–6572.
[33]	Ganter B, Tugendreich S, Pearson CI, Ayanoglu E, Baumhueter S, et al. (2005) Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. J Biotechnol 119: 219–244.
[34]	Natsoulis G, Pearson CI, Gollub J, B PE, Ferng J, et al. (2008) The liver pharmacological and xenobiotic gene response repertoire. Mol Syst Biol 4: 175.
[35]	Bross IDJ (1958) How to use ridit analysis. Biometrics 14: 18–38.
[36]	Donaldson GW (1998) Ridit scores for analysis and interpretation of ordinal pain data. Eur J Pain 2: 221–227.
[37]	Lu C, King RD (2009) An investigation into the population abundance distribution of mRNAs, proteins, and metabolites in biological systems. Bioinformatics 25: 2020–2027.
[38]	Pachot A, Blond JL, Mougin B, Miossec P (2004) Peptidylpropyl isomerase B (PPIB): a suitable reference gene for mRNA quantification in peripheral whole blood. J Biotechnol 114: 121–124.
[39]	Cai JH, Deng S, Kumpf SW, Lee PA, Zagouras P, et al. (2007) Validation of rat reference genes for improved quantitative gene expression analysis using low density arrays. Biotechniques 42: 503–512.
[40]	Coussens L, Yang-Feng TL, Liao YC, Chen E, Gray A, et al. (1985) Tyrosine kinase receptor with extensive homology to EGF receptor shares chromosomal location with neu oncogene. Science 230: 1132–1139.
[41]	Olayioye MA (2001) Update on HER-2 as a target for cancer therapy: intracellular signaling pathways of ErbB2/HER-2 and family members. Breast Cancer Res 3: 385–389.
[42]	Moreau A, Vilarem MJ, Maurel P, Pascussi JM (2008) Xenoreceptors CAR and PXR activation and consequences on lipid metabolism, glucose homeostasis, and inflammatory response. Mol Pharm 5: 35–41.
[43]	Adams DH, Ju C, Ramaiah SK, Uetrecht J, Jaeschke H (2010) Mechanisms of immune-mediated liver injury. Toxicol Sci 115: 307–321.
[44]	Natsoulis G, El Ghaoui L, Lanckriet GR, Tolley AM, Leroy F, et al. (2005) Classification of a large microarray data set: algorithm comparison and analysis of drug signatures. Genome Res 15: 724–736.
[45]	Peterson JE (1990) Biliary hyperplasia and carcinogenesis in chronic liver damage induced in rats by phomopsin. Pathology 22: 213–222.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133