OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computational and Mathematical Methods in Medicine 2013

Variable Selection in ROC Regression

DOI: 10.1155/2013/436493

Binhuan Wang

Full-Text Cite this paper Add to My Lib

Abstract:

Regression models are introduced into the receiver operating characteristic (ROC) analysis to accommodate effects of covariates, such as genes. If many covariates are available, the variable selection issue arises. The traditional induced methodology separately models outcomes of diseased and nondiseased groups; thus, separate application of variable selections to two models will bring barriers in interpretation, due to differences in selected models. Furthermore, in the ROC regression, the accuracy of area under the curve (AUC) should be the focus instead of aiming at the consistency of model selection or the good prediction performance. In this paper, we obtain one single objective function with the group SCAD to select grouped variables, which adapts to popular criteria of model selection, and propose a two-stage framework to apply the focused information criterion (FIC). Some asymptotic properties of the proposed methods are derived. Simulation studies show that the grouped variable selection is superior to separate model selections. Furthermore, the FIC improves the accuracy of the estimated AUC compared with other criteria. 1. Introduction In modern medical diagnosis or genetic studies, the receiver operating characteristic (ROC) curve is a popular tool to evaluate the discrimination performance of a certain biomarker on a disease status or a phenotype. For example, in a continuous-scale test, the diagnosis of a disease is dependent upon whether a test result is above or below a specified cutoff value. Also, genome-wide association studies in human populations aim at creating genomic profiles which combine the effects of many associated genetic variants to predict the disease risk of a new subject with high discriminative accuracy [1]. For a given cutoff value of a biomarker or a combination of biomarkers, the sensitivity and the specificity are employed to quantitatively evaluate the discriminative performance. By varying cutoff values throughout the entire real line, the resulting plot of sensitivity against 1-specificity is a ROC curve. The area under the ROC curve (AUC) is an important one-number summary index of the overall discriminative accuracy of a ROC curve, by taking the influence of all cutoff values into account. Let be the response of a diseased subject, and let be the response of a nondiseased subject; then, the AUC can be expressed as [2]. Pepe [3] and Zhou et al. [4] provided broad reviews on many statistical methods for the evaluation of diagnostic tests. Traditional ROC analyses do not consider the effect of characteristics of

References

[1]	N. R. Wray, J. Yang, M. E. Goddard, and P. M. Visscher, “The genetic interpretation of area under the ROC curve in genomic profiling,” PLoS Genetics, vol. 6, no. 2, Article ID e1000864, 2010.
[2]	D. Bamber, “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph,” Journal of Mathematical Psychology, vol. 12, no. 4, pp. 387–415, 1975.
[3]	M. S. Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford University Press, New York, NY, USA, 2003.
[4]	X. H. Zhou, N. A. Obuchowski, and D. M. McClish, Statistical Methods in Diagnostic Medicine, John Wiley & Sons, New York, NY, USA, 2nd edition, 2011.
[5]	M. X. Rodríguez-álvarez, P. G. Tahoces, C. Cadarso-Suárez, and M. J. Lado, “Comparative study of ROC regression techniques-applications for the computer-aided diagnostic system in breast cancer detection,” Computational Statistics and Data Analysis, vol. 55, no. 1, pp. 888–902, 2011.
[6]	M. Stone, “Cross-validatory choice and assessment of statistical predictions,” Journal of the Royal Statistical Society B, vol. 36, no. 2, pp. 111–147, 1974.
[7]	P. Craven and G. Wahba, “Smoothing noisy data with spline functions—estimating the correct degree of smoothing by the method of generalized cross-validation,” Numerische Mathematik, vol. 31, no. 4, pp. 377–403, 1979.
[8]	H. Akaike, “Information theory and an extension of the maximum likelihood principle,” in Proceedings of the 2nd International Symposium Information Theory, B. N. Petrov and F. Csaki, Eds., pp. 267–281, Akademia Kiado, Budapest, Hungary, 1973.
[9]	G. Schwarz, “Estimating the dimension of a model,” Annals of Statistics, vol. 6, pp. 461–464, 1978.
[10]	R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society B, vol. 58, no. 1, pp. 267–288, 1996.
[11]	J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.
[12]	H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418–1429, 2006.
[13]	L. Breiman, “Heuristics of instability and stabilization in model selection,” Annals of Statistics, vol. 24, no. 6, pp. 2350–2383, 1996.
[14]	T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, New York, NY, USA, 2009.
[15]	G. Claeskens and N. L. Hjort, “The focused information criterion,” Journal of the American Statistical Association, vol. 98, no. 464, pp. 900–916, 2003.
[16]	B. Wang and Y. Fang, “On the focused information criterion for variable selection,” submitted.
[17]	M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society B, vol. 68, no. 1, pp. 49–67, 2006.
[18]	L. Wang, G. Chen, and H. Li, “Group SCAD regression analysis for microarray time course gene expression data,” Bioinformatics, vol. 23, no. 12, pp. 1486–1494, 2007.
[19]	N. L. Hjort and G. Claeskens, “Frequentist model average estimators,” Journal of the American Statistical Association, vol. 98, no. 464, pp. 879–899, 2003.
[20]	P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer, 2011.
[21]	H. Wang, R. Li, and C.-L. Tsai, “Tuning parameter selectors for the smoothly clipped absolute deviation method,” Biometrika, vol. 94, no. 3, pp. 553–568, 2007.
[22]	H. Wang, B. Li, and C. Leng, “Shrinkage tuning parameter selection with a diverging number of parameters,” Journal of the Royal Statistical Society B, vol. 71, no. 3, pp. 671–683, 2009.
[23]	Y. Zhang, R. Li, and C.-L. Tsai, “Regularization parameter selections via generalized information criterion,” Journal of the American Statistical Association, vol. 105, no. 489, pp. 312–323, 2010.
[24]	Y. Yang, “Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation,” Biometrika, vol. 92, no. 4, pp. 937–950, 2005.
[25]	G. Claeskens, “Focused estimation and model averaging with penalization methods: an overview,” Statistica Neerlandica, vol. 66, no. 3, pp. 272–287, 2012.
[26]	C. Lim and B. Yu, “Estimation Stability with Cross Validation (ESCV),” http://arxiv.org/abs/1303.3128.
[27]	L. Stover, M. P. Gorga, S. T. Neely, and D. Montoya, “Toward optimizing the clinical utility of distortion product otoacoustic emission measurements,” Journal of the Acoustical Society of America, vol. 100, no. 2, part 1, pp. 956–967, 1996.
[28]	M. S. Pepe, “Three approaches to regression analysis of receiver operating characteristic curves for continuous test results,” Biometrics, vol. 54, no. 1, pp. 124–135, 1998.
[29]	L. E. Dodd and M. S. Pepe, “Semiparametric regression for the area under the receiver operating characteristic curve,” Journal of the American Statistical Association, vol. 98, no. 462, pp. 409–417, 2003.
[30]	J. Friedman, T. Hastie, and R. Tibshirani, A Note on the Group Lasso and a Sparse Group Lasso, 2010.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133