全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Nonparametric Feature Screening via the Variance of the Regression Function

DOI: 10.4236/ojs.2024.144017, PP. 413-438

Keywords: Sure Independence Screening, Nonparametric Regression, Ultrahigh-Dimensional Data, Variable Selection

Full-Text   Cite this paper   Add to My Lib

Abstract:

This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their respective marginal regression functions (RV-SIS). We show that, under some mild technical conditions, the RV-SIS possesses a sure screening property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedures, and outperforms them in many different model settings.

References

[1]  Fan, J.Q., Samworth, R. and Wu, Y.C. (2009) Ultrahigh Dimensional Feature Selection: BEYOND the linear Model. The Journal of Machine Learning Research, 10, 2013-2038.
[2]  Fan, J. and Lv, J. (2008) Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70, 849-911.
https://doi.org/10.1111/j.1467-9868.2008.00674.x
[3]  Fan, J., Feng, Y. and Song, R. (2011) Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models. Journal of the American Statistical Association, 106, 544-557.
https://doi.org/10.1198/jasa.2011.tm09779
[4]  Li, R., Zhong, W. and Zhu, L. (2012) Feature Screening via Distance Correlation Learning. Journal of the American Statistical Association, 107, 1129-1139.
https://doi.org/10.1080/01621459.2012.695654
[5]  Li, G.R., Peng, H., Zhang, J., Zhu, L.X., et al. (2014) Robust Rank Correlation Based Screening. The Annals of Statistics, 40, 1846-1877.
[6]  Wang, Z. and Deng, G. (2022) Model-Free Feature Screening Based on Gini Impurity for Ultrahigh-Dimensional Multiclass Classification. Open Journal of Statistics, 12, 711-732.
https://doi.org/10.4236/ojs.2022.125042
[7]  Chen, T. and Deng, G. (2023) Model-free Feature Screening via Maximal Information Coefficient (MIC) for Ultrahigh-Dimensional Multiclass Classification. Open Journal of Statistics, 13, 917-940.
https://doi.org/10.4236/ojs.2023.136046
[8]  Wang, L., Akritas, M.G. and Van Keilegom, I. (2008) An Anova-Type Nonparametric Diagnostic Test for Heteroscedastic Regression Models. Journal of Nonparametric Statistics, 20, 365-382.
https://doi.org/10.1080/10485250802066112
[9]  Zhu, L., Li, L., Li, R. and Zhu, L. (2011) Model-Free Feature Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 106, 1464-1475.
https://doi.org/10.1198/jasa.2011.tm10563
[10]  Segal, M.R., Dahlquist, K.D. and Conklin, B.R. (2003) Regression Approaches for Microarray Data Analysis. Journal of Computational Biology, 10, 961-980.
https://doi.org/10.1089/106652703322756177
[11]  Doksum, K. and Samarov, A. (1995) Nonparametric Estimation of Global Functionals and a Measure of the Explanatory Power of Covariates in Regression. The Annals of Statistics, 23, 1443-1473.
https://doi.org/10.1214/aos/1176324307
[12]  Kim, D., Li, R., Dudek, S.M., Frase, A.T., Pendergrass, S.A. and Ritchie, M.D. (2014) Knowledge-Driven Genomic Interactions: An Application in Ovarian Cancer. BioData Mining, 7, Article No. 20.
https://doi.org/10.1186/1756-0381-7-20
[13]  Chernoff, H., Lo, S. and Zheng, T. (2009) Discovering Influential Variables: A Method of Partitions. The Annals of Applied Statistics, 3, 1335-1369.
https://doi.org/10.1214/09-aoas265
[14]  Serfling, R.J. (2009) Approximation Theorems of Mathematical Statistics. Wiley.
[15]  Hansen, B.E. (2008) Uniform Convergence Rates for Kernel Estimation with Dependent Data, Econometric Theory. Cambridge University Press.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133