全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Applications of Bayesian Gene Selection and Classification with Mixtures of Generalized Singular -Priors

DOI: 10.1155/2013/420412

Full-Text   Cite this paper   Add to My Lib

Abstract:

Recent advancement in microarray technologies has led to a collection of an enormous number of genetic markers in disease association studies, and yet scientists are interested in selecting a smaller set of genes to explore the relation between genes and disease. Current approaches either adopt a single marker test which ignores the possible interaction among genes or consider a multistage procedure that reduces the large size of genes before evaluation of the association. Among the latter, Bayesian analysis can further accommodate the correlation between genes through the specification of a multivariate prior distribution and estimate the probabilities of association through latent variables. The covariance matrix, however, depends on an unknown parameter. In this research, we suggested a reference hyperprior distribution for such uncertainty, outlined the implementation of its computation, and illustrated this fully Bayesian approach with a colon and leukemia cancer study. Comparison with other existing methods was also conducted. The classification accuracy of our proposed model is higher with a smaller set of selected genes. The results not only replicated findings in several earlier studies, but also provided the strength of association with posterior probabilities. 1. Introduction Recent advancement in oligonucleotide microarray technologies has resulted in production of thousands of gene expression levels in a single experiment. With such vast amount of data, one major task for researchers is to develop classification rules for prediction of cancers or cancer subtypes based on gene expression levels of tissue samples. The accuracy of such classification rules may be crucial for diagnosis and treatment, since different cancer subtypes may require different target-specific therapies. However, the development of good and efficient classification rules has not been straightforward, either because of the huge number of genes collected from a relatively small number of tissue samples or because of the model complexity associated with the biological mechanism. The identification of a smaller set of relevant genes to characterize different disease classes, therefore, has been a challenging task. Procedures which are efficient in gene selection as well as in classification do play an important role in cancer research. Many approaches have been proposed for classes classification. For example, several analyses identified a subset of classifying genes with -statistics, regression model approach, mixture model, Wilcoxon score test, or the between-within

References

[1]  V. T. Chu, R. Gottardo, A. E. Raftery, R. E. Bumgarner, and K. Y. Yeung, “MeV+R: using MeV as a graphical user interface for Bioconductor applications in microarray analysis,” Genome Biology, vol. 9, no. 7, article R118, 2008.
[2]  S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discrimination methods for the classification of tumors using gene expression data,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 77–86, 2002.
[3]  A. Hirakawa, Y. Sato, D. Hamada, and I. Yoshimura, “A new test statistic based on shrunken sample variance for identifying differentially expressed genes in small microarray experiments,” Bioinformatics and Biology Insights, vol. 2, pp. 145–156, 2008.
[4]  W. Pan, J. Lin, and C. T. Le, “A mixture model approach to detecting differentially expressed genes with microarray data,” Functional and Integrative Genomics, vol. 3, no. 3, pp. 117–124, 2003.
[5]  K. Y. Yeung, R. E. Bumgarner, and A. E. Raftery, “Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data,” Bioinformatics, vol. 21, no. 10, pp. 2394–2402, 2005.
[6]  A. Gusnanto, A. Ploner, F. Shuweihdi, and Y. Pawitan, “Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data,” Journal of Biomedical Informatics, vol. 4, pp. 697–709, 2013.
[7]  Y. Liang, C. Liu, X. Z. Luan, et al., “Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification,” BMC Bioinformatics, vol. 14, article 198, 2013.
[8]  G.-Z. Li, H.-L. Bu, M. Q. Yang, X.-Q. Zeng, and J. Y. Yang, “Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis,” BMC Genomics, vol. 9, no. 2, article S24, 2008.
[9]  A. Wang and E. A. Gehan, “Gene selection for microarray data analysis using principal component analysis,” Statistics in Medicine, vol. 24, no. 13, pp. 2069–2087, 2005.
[10]  S. Bicciato, A. Luchini, and C. Di Bello, “PCA disjoint models for multiclass cancer analysis using gene expression data,” Bioinformatics, vol. 19, no. 5, pp. 571–578, 2003.
[11]  X. Q. Zeng, G. Z. Li, M. Q. Yang, G. F. Wu, and J. Y. Yang, “Orthogonal projection weights in dimension reduction based on partial least squares,” International Journal of Computational Intelligence in Bioinformatics and Systems Biology, vol. 1, pp. 100–115, 2009.
[12]  A.-L. Boulesteix and K. Strimmer, “Partial least squares: a versatile tool for the analysis of high-dimensional genomic data,” Briefings in Bioinformatics, vol. 8, no. 1, pp. 32–44, 2007.
[13]  D. V. Nguyen and D. M. Rocke, “Tumor classification by partial least squares using microarray gene expression data,” Bioinformatics, vol. 18, no. 1, pp. 39–50, 2002.
[14]  J. X. Liu, Y. Xu, C. H. Zheng, Y. Wang, and J. Y. Yang, “Characteristic gene selection via weighting principal components by singular values,” PLoS ONE, vol. 7, no. 7, Article ID e38873, 2012.
[15]  S. Student and K. Fujarewicz, “Stable feature selection and classification algorithms for multiclass microarray data,” Biology Direct, vol. 7, article 33, 2012.
[16]  T. B? and I. Jonassen, “New feature subset selection procedures for classification of expression profiles,” Genome Biology, vol. 3, no. 4, pp. 1–17, 2002.
[17]  Y. Wang, I. V. Tetko, M. A. Hall et al., “Gene selection from microarray data for cancer classification—a machine learning approach,” Computational Biology and Chemistry, vol. 29, no. 1, pp. 37–46, 2005.
[18]  F. C. Stingo and M. Vannucci, “Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data,” Bioinformatics, vol. 27, no. 4, pp. 495–501, 2011.
[19]  J. G. Ibrahim, M.-H. Chen, and R. J. Gray, “Bayesian models for gene expression with DNA microarray data,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 88–99, 2002.
[20]  Y.-C. Wei, S.-H. Wen, P.-C. Chen, C.-H. Wang, and C. K. Hsiao, “A simple Bayesian mixture model with a hybrid procedure for genome-wide association studies,” European Journal of Human Genetics, vol. 18, no. 8, pp. 942–947, 2010.
[21]  B. Peng, D. Zhu, and B. P. Ander, “An Integrative Framework for Bayesian variable selection with informative priors for identifying genes and pathways,” PLoS ONE, vol. 8, no. 7, Article ID 0067672, 2013.
[22]  K. E. Lee, N. Sha, E. R. Dougherty, M. Vannucci, and B. K. Mallick, “Gene selection: a Bayesian variable selection approach,” Bioinformatics, vol. 19, no. 1, pp. 90–97, 2003.
[23]  N. Sha, M. Vannucci, M. G. Tadesse et al., “Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage,” Biometrics, vol. 60, no. 3, pp. 812–819, 2004.
[24]  X. Zhou, K.-Y. Liu, and S. T. C. Wong, “Cancer classification and prediction using logistic regression with Bayesian gene selection,” Journal of Biomedical Informatics, vol. 37, no. 4, pp. 249–259, 2004.
[25]  J. G. Liao and K.-V. Chin, “Logistic regression for disease classification using microarray data: model selection in a large p and small n case,” Bioinformatics, vol. 23, no. 15, pp. 1945–1951, 2007.
[26]  A. Zellner, “On assessing prior distributions and Bayesian regression analysis with g-prior distributions,” in Bayesian Inference and Decision Techniques: Essays in Honor of Bruno De Finetti, pp. 233–243, North-Holland, Amsterdam, The Netherlands, 1986.
[27]  A.-J. Yang and X.-Y. Song, “Bayesian variable selection for disease classification using gene expression data,” Bioinformatics, vol. 26, no. 2, pp. 215–222, 2010.
[28]  M. Baragatti and D. Pommeret, “A study of variable selection using g-prior distribution with ridge parameter,” Computational Statistics and Data Analysis, vol. 56, no. 6, pp. 1920–1934, 2012.
[29]  E. Leya and M. F. J. Steel, “Mixtures of g-priors for Bayesian model averaging with economic applications,” Journal of Econometrics, vol. 171, no. 2, pp. 251–266, 2012.
[30]  M. Smith and R. Kohn, “Nonparametric regression using Bayesian variable selection,” Journal of Econometrics, vol. 75, no. 2, pp. 317–343, 1996.
[31]  E. I. George and D. P. Foster, “Calibration and empirical bayes variable selection,” Biometrika, vol. 87, no. 4, pp. 731–747, 2000.
[32]  F. Liang, R. Paulo, G. Molina, M. A. Clyde, and J. O. Berger, “Mixtures of g priors for Bayesian variable selection,” Journal of the American Statistical Association, vol. 103, no. 481, pp. 410–423, 2008.
[33]  W. Cui and E. I. George, “Empirical Bayes versus fully Bayes variable selection,” Journal of Statistical Planning and Inference, vol. 138, no. 4, pp. 888–900, 2008.
[34]  C. P. Robert, “Convergence control methods for Markov chain Monte Carlo algorithms,” Statistical Science, vol. 10, pp. 231–253, 1995.
[35]  U. Alon, N. Barka, D. A. Notterman et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 12, pp. 6745–6750, 1999.
[36]  T. R. Golub, D. K. Slonim, P. Tamayo et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531–527, 1999.
[37]  M. A. Shipp, K. N. Ross, P. Tamayo et al., “Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nature Medicine, vol. 8, no. 1, pp. 68–74, 2002.
[38]  A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, “Tissue classification with gene expression profiles,” Journal of Computational Biology, vol. 7, no. 3-4, pp. 559–583, 2000.
[39]  C. Furlanello, M. Serafini, S. Merler, and G. Jurman, “Entropy-based gene ranking without selection bias for the predictive classification of microarray data,” BMC Bioinformatics, vol. 4, article 54, 2003.
[40]  W. Chu, Z. Ghahramani, F. Falciani, and D. L. Wild, “Biomarker discovery in microarray gene expression data with Gaussian processes,” Bioinformatics, vol. 21, no. 16, pp. 3385–3393, 2005.
[41]  T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000.
[42]  M. Dettling and P. Bühlmann, “Boosting for tumor classification with gene expression data,” Bioinformatics, vol. 19, no. 9, pp. 1061–1069, 2003.
[43]  A. Antoniadis, S. Lambert-Lacroix, and F. Leblanc, “Effective dimension reduction methods for tumor classification using gene expression data,” Bioinformatics, vol. 19, no. 5, pp. 563–570, 2003.
[44]  B. Ding and R. Gentleman, “Classification Using Generalized Partial Least Squares,” Bioconductor Project Working Papers, 2004, http://www.bepress.com/bioconductor /paper5.
[45]  S. Ma, X. Song, and J. Huang, “Supervised group Lasso with applications to microarray data analysis,” BMC Bioinformatics, vol. 8, article 60, 2007.
[46]  P. Maji and S. Paul, “Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data,” International Journal of Approximate Reasoning, vol. 52, no. 3, pp. 408–426, 2011.
[47]  Y. Ji, K.-W. Tsui, and K. Kim, “A novel means of using gene clusters in a two-step empirical Bayes method for predicting classes of samples,” Bioinformatics, vol. 21, no. 7, pp. 1055–1061, 2005.
[48]  X. Zhao and L. W.-K. Cheung, “Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data,” BMC Bioinformatics, vol. 8, article 67, 2007.
[49]  O. Dagliyan, F. Uney-Yuksektepe, I. H. Kavakli, and M. Turkay, “Optimization based tumor classification from microarray gene expression data,” PLoS ONE, vol. 6, no. 2, Article ID e14579, 2011.
[50]  H. H. Zhang, J. Ahn, X. Lin, and C. Park, “Gene selection using support vector machines with non-convex penalty,” Bioinformatics, vol. 22, no. 1, pp. 88–95, 2006.
[51]  G. M. Fung and O. L. Mangasarian, “A feature selection Newton method for support vector machine classification,” Computational Optimization and Applications, vol. 28, no. 2, pp. 185–202, 2004.
[52]  A. Krishna, H. D. Bondell, and S. K. Ghosh, “Bayesian variable selection using an adaptive powered correlation prior,” Journal of Statistical Planning and Inference, vol. 139, no. 8, pp. 2665–2674, 2009.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133