|
高维部分线性可加稳健Expectile回归模型
|
Abstract:
高维数据一般因具有异方差或非齐次协变量而具有异质性,分位数回归和expectile回归是分析异质高维数据的有力工具,但前者由于损失函数非光滑的特性在计算方面存在较大挑战,而后者会因异常值而不稳健。本文利用一类稳健的非对称损失函数来研究部分线性可加模型的稳健expectile回归,用B样条基函数近似非参数部分,利用加入非凸惩罚的正则化方法来实现变量筛选并进行参数估计。该方法的优势在于:(1) 通过取不同分位水平得到响应变量更完整的条件分布,从而探索数据的异质性分布;(2) 部分线性的模型结构兼顾了线性解释变量和非线性解释变量,一方面增加了模型的灵活性,同时也具有一定的模型可解释性;(3) 稳健expectile回归估计比分位数回归方法计算效率高,比expectile回归稳健。数值模拟和实际数据分析均显示了该方法在模型估计和计算效率上的优势。
High-dimensional data are generally heterogeneous due to heteroskedasticity or non-homogeneous covariates. Quantile regression and expectile regression are powerful tools for analyzing heterogeneous high-dimensional data, but the former is a great challenge in calculation due to the non-smooth nature of the loss function, while the latter is unstable due to outliers. In this paper, a class of robust asymmetric loss functions is used to study the robust expectile regression of partial linear additive models, the B-spline basis function is used to approximate the non-parametric part, and the regularization method with non-convex penalty is used to realize variable screening and parameter estimation. The advantages of this method are: (1) A more complete conditional distribution of response variables can be obtained by taking different quantile levels, so as to explore the heterogeneity distribution of data; (2) The partial linear model structure takes into account both linear explanatory variables and nonlinear explanatory variables, which increases the flexibility of the model on the one hand, and has a certain interpretability of the model; (3) The robust expectile regression estimation score digit regression method has higher computational efficiency and is more robust than the expectile regression. Both numerical simulation and actual data analysis show the advantages of the proposed method in model estimation and computational efficiency.
[1] | Rigby, R.A. and Stasinopoulos, D.M. (1996) A Semi-Parametric Additive Model for Variance Heterogeneity. Statistics and Computing, 6, 57-65. https://doi.org/10.1007/bf00161574 |
[2] | Horowitz, J.L. (1999) Semiparametric Estimation of a Proportional Hazard Model with Unobserved Heterogeneity. Econometrica, 67, 1001-1028. https://doi.org/10.1111/1468-0262.00068 |
[3] | Hastie, T. and Tibshirani, R. (1990) Exploring the Nature of Covariate Effects in the Proportional Hazards Model. Biometrics, 46, 1005-1016. https://doi.org/10.2307/2532444 |
[4] | Stone, C.J. (1985) Additive Regression and Other Nonparametric Models. The Annals of Statistics, 13, 689-705. https://doi.org/10.1214/aos/1176349548 |
[5] | Opsomer, J.D. and Ruppert, D. (1997) Fitting a Bivariate Additive Model by Local Polynomial Regression. The Annals of Statistics, 25, 186-211. https://doi.org/10.1214/aos/1034276626 |
[6] | Opsomer, J.D. and Ruppert, D. (1999) A Root-Nconsistent Backfitting Estimator for Semiparametric Additive Modeling. Journal of Computational and Graphical Statistics, 8, 715-732. https://doi.org/10.1080/10618600.1999.10474845 |
[7] | Liu, X., Wang, L. and Liang, H. (2011) Estimation and Variable Selection for Semiparametric Additive Partial Linear Models. Statistica Sinica, 21, 1225-1248. https://doi.org/10.5705/ss.2009.140 |
[8] | Hoshino, T. (2014) Quantile Regression Estimation of Partially Linear Additive Models. Journal of Nonparametric Statistics, 26, 509-536. https://doi.org/10.1080/10485252.2014.929675 |
[9] | Sherwood, B. and Wang, L. (2016) Partially Linear Additive Quantile Regression in Ultra-High Dimension. The Annals of Statistics, 44, 288-317. https://doi.org/10.1214/15-aos1367 |
[10] | Newey, W.K. and Powell, J.L. (1987) Asymmetric Least Squares Estimation and Testing. Econometrica, 55, 819-847. https://doi.org/10.2307/1911031 |
[11] | Kuan, C., Yeh, J. and Hsu, Y. (2009) Assessing Value at Risk with CARE, the Conditional Autoregressive Expectile Models. Journal of Econometrics, 150, 261-270. https://doi.org/10.1016/j.jeconom.2008.12.002 |
[12] | Daouia, A., Girard, S. and Stupfler, G. (2017) Estimation of Tail Risk Based on Extreme Expectiles. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80, 263-292. https://doi.org/10.1111/rssb.12254 |
[13] | Sobotka, F., Kauermann, G., Schulze Waltrup, L. and Kneib, T. (2011) On Confidence Intervals for Semiparametric Expectile Regression. Statistics and Computing, 23, 135-148. https://doi.org/10.1007/s11222-011-9297-1 |
[14] | Zhao, J., Yan, G. and Zhang, Y. (2019) Semiparametric Expectile Regression for High-Dimensional Heavy-Tailed and Heterogeneous Data. |
[15] | Man, R., Tan, K.M., Wang, Z. and Zhou, W. (2024) Retire: Robust Expectile Regression in High Dimensions. Journal of Econometrics, 239, Article 105459. https://doi.org/10.1016/j.jeconom.2023.04.004 |
[16] | Sun, Q., Zhou, W. and Fan, J. (2019) Adaptive Huber Regression. Journal of the American Statistical Association, 115, 254-265. https://doi.org/10.1080/01621459.2018.1543124 |
[17] | Abadie, A., Angrist, J. and Imbens, G. (2002) Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings. Econometrica, 70, 91-117. https://doi.org/10.1111/1468-0262.00270 |
[18] | Schumaker, L. (2007) Spline Functions: Basic Theory. 3rd Edition, Cambridge University Press. https://doi.org/10.1017/cbo9780511618994 |
[19] | Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x |
[20] | Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273 |
[21] | Zou, H. and Li, R. (2008) One-Step Sparse Estimates in Nonconcave Penalized Likelihood Models. The Annals of Statistics, 36, 1509-1533. https://doi.org/10.1214/009053607000000802 |
[22] | Barzilai, J. and Borwein, J.M. (1988) Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis, 8, 141-148. https://doi.org/10.1093/imanum/8.1.141 |
[23] | Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018) I-LAMM for Sparse Learning: Simultaneous Control of Algorithmic Complexity and Statistical Error. The Annals of Statistics, 46, 814-841. https://doi.org/10.1214/17-aos1568 |
[24] | Fairfield, K.M. and Fletcher, R.H. (2002) Vitamins for Chronic Disease Prevention in Adults. Journal of the American Medical Association, 287, 3116-3126. https://doi.org/10.1001/jama.287.23.3116 |
[25] | Nierenberg, D.W., Stukel, T.A., Baron, J.A., Dain, B.J. and Greenberg, E.R. (1989) Determinants of Plasma Levels of Beta-Carotene and Retinol. American Journal of Epidemiology, 130, 511-521. https://doi.org/10.1093/oxfordjournals.aje.a115365 |