|
基于回归方法的鲍鱼年龄预测
|
Abstract:
本文基于物理测量确定鲍鱼年龄的方法,根据测量数据,利用R语言,建立线性回归、逻辑回归、岭回归、LASSO回归模型,来预测鲍鱼的年龄。并通过平均绝对误差MAE、均方误差MSE和对称平均绝对百分比误差SMAPE对模型进行评价,结果表明,LASSO回归模型的拟合优度更好。考虑到变量间相关性强,可能存在多重共线性,本文利用偏最小二乘及主成分分析两种方法对变量降维,降维后再进行回归分析,以期消除多重共线性对模型带来的影响。利用MSE评价模型,结果表明,这两种降维方法都没能减小MSE,反而得到模型的MSE更大。
In this paper, based on physical measurements to determine the age of abalone, linear regression, logistic regression, ridge regression, and LASSO regression models are established to predict the age of abalone based on the measurement data, using R language. The models are evaluated by mean absolute error MAE, mean square error MSE, symmetric mean absolute percentage error SMAPE, and the results show that the LASSO regression model has a better goodness of fit. Consid-ering the strong correlation between the variables and the possible existence of multicollinearity, this paper uses two methods of partial least squares and principal component analysis to reduce the dimensionality of the variables, and then regression is performed after the reduction of dimen-sionality, in order to eliminate the impact of multicollinearity on the model. Using MSE to evaluate the model, the results show that both methods of dimensionality reduction fail to reduce the MSE, but instead, the MSE of the model is obtained to be larger.
[1] | 王学民. 应用多元统计分析[M]. 上海: 上海财经大学出版社, 2017: 328. |
[2] | 胡雪梅, 谢英, 蒋慧凤. 基于惩罚逻辑回归的乳腺癌预测[J]. 数据采集与处理, 2021, 36(6): 1237-1249.
https://doi.org/10.16337/j.1004-9037.2021.06.017 |
[3] | 张瑶瑶, 朱小栋. 基于岭回归极限学习机的微博垃圾用户分类[J]. 计算机与数字工程, 2021, 49(11): 2326-2330. |
[4] | 方彤, 苏治. 一种基于LASSO的多变量混频GARCH模型设计与优化算法研究[J]. 数量经济技术经济研究, 2021, 38(12): 146-163. https://doi.org/10.13653/j.cnki.jqte.2021.12.007 |
[5] | James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013) An In-troduction to Statistical Learning with Applications in R. Springer, Berlin, 426. https://doi.org/10.1007/978-1-4614-7138-7 |
[6] | Dua, D. and Graff, C. (2019) UCI Machine Learning Repository. Uni-versity of California, School of Information and Computer Science, Irvine, CA. https://doi.org/10.24432/C55C7W |
[7] | 王刚, 张福印, 李明辉, 王金龙, 王艺博, 武传伟. 基于偏最小二乘回归算法的空气质量监测系统研究[J]. 传感器与微系统, 2022, 41(1): 37-40+49. https://doi.org/10.13873/J.1000-9787(2022)01-0037-04 |
[8] | Cui, N.N., Wang, G.X., Ma, Q.H., Zhao, T.T., Han, Z.T., Yang, Z. and Liang, L.S. (2021) Evolution of Lipid Characteristics and Minor Compounds in Ha-zelnut Oil Based on Partial Least Squares Regression during Accelerated Oxidation Process. LWT, 150, Article ID: 112025. https://doi.org/10.1016/j.lwt.2021.112025 |
[9] | 刘鹏飞, 黄仕元, 张鸿钦, 丁志鹏, 李赢杰. 基于主成分分析与灰色预测的新型城镇化综合水平测度——以湖南省为例[J]. 华中建筑, 2021, 39(12): 57-63. https://doi.org/10.13942/j.cnki.hzjz.2021.12.012 |
[10] | 黄佳文, 孙瑞, 阮宇飞. 基于PCA与K-Means的注射成形制品质量在线检测[J]. 电子技术与软件工程, 2021(21): 117-120. |
[11] | 赵志挺, 朱亮宇, 高珣洋, 王力. 基于主成分分析协同深度神经网络的带钢板凸度预测[J/OL]. 冶金自动化, 2021: 1-12. http://kns.cnki.net/kcms/detail/11.2067.TF.20211129.1554.004.html |