|
基于两层Stacking模型的累积索赔额预测及定价研究
|
Abstract:
在传统的索赔额预测中,广义线性模型(GLM)是一种常用的方法。近年来,机器学习算法在该领域也取得了良好的效果,为索赔额预测提供了新的选择。在大数据时代,如何更准确地进行预测,是亟待解决的问题。为了解决该问题,本文利用两层Stacking模型,两种其他集成学习算法和广义线性模型对累积索赔额进行预测。通过比较各算法的均方根误差及平方绝对误差,可发现包括Stacking的集成算法精度全部优于传统广义线性模型。最后,本文利用累积索赔额建立了奖惩系统的转移规则,将之与集成学习结合可以更合理地开发新的保险产品。
In traditional claim amount prediction, generalized linear model (GLM) is a commonly used method. Recently, machine learning algorithms have also achieved good results in the field of it, providing a new choice for prediction. In the era of big data, how to make predictions more accurately is an ur-gent problem to be solved. To solve this problem, a two-layer Stacking model, two other integrated learning algorithms and a generalized linear model were used to predict the cumulative claim amount. By comparing the root mean square error and squared absolute error of each algorithm, it can be found that the accuracy of ensemble algorithms including Stacking are better than that of traditional generalized linear model. Finally, the paper established the transfer rules of the reward and punishment system based on the cumulative claim amount, which can be combined with the two-layer Stacking model to develop new insurance products more reasonably.
[1] | 张连增, 王缔. 保险大数据条件下车险费率厘定的研究——基于SOM神经网络方法的车险索赔强度建模[J]. 保险研究, 2018(9): 56-65. |
[2] | McCullagh, P. (1989) Generalized Liner Models. Routledge, London. https://doi.org/10.1007/978-1-4899-3242-6 |
[3] | Liu, Y., Wang, B.J. and Lv, S.G. (2014) Using Multi-Class Ada-Boost Tree for Prediction Frequency of Auto Insurance. Journal of Applied Finance and Banking, 4, 45-53. |
[4] | Noll, A., Salzmann, R. and Wuthrich, M.V. (2018) Case Study: French Motor Third-Party Liability Claims. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3164764 |
[5] | 张连增, 申晴. 提升算法对传统车险索赔频率建模模型的改进——基于我国五省交强险保单数据[J]. 保险研究, 2019(7): 67-78. |
[6] | 曾宇哲, 吴嫒博, 郑宏远, 等. 基于机器学习的车险索赔频率预测[J]. 统计与信息论坛, 2019, 34(5): 69-78. |
[7] | Tan, C.I., Li, J., Li, J.S.H., et al. (2015) Optimal Relativities and Transition Rules of a Bonus-Malus System. Insurance: Mathematics and Economics, 61, 255-263. https://doi.org/10.1016/j.insmatheco.2015.02.001 |
[8] | Gomez-Deniz, E., Hernandez-Bastida, A. and Fernandez-Sanchez, M.P. (2014) Computing Credibility Bonus-Malus Premiums Using the Total Claim Amount Distri-bution. Hacettepe Journal of Mathematics and Statistics, 43, 1047-1061. |
[9] | 孟生旺. 考虑个体保单风险特征的最优奖惩系统[J]. 数理统计与管理, 2013, 32(3): 505-510. |
[10] | 孙志强. 我国现行汽车保险奖惩系统研究[D]: [硕士学位论文]. 郑州: 郑州大学, 2018: 24-31. |
[11] | 孟生旺. 回归模型[M]. 北京: 中国人民大学出版社, 2015: 30-32. |
[12] | Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324 |
[13] | Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140. https://doi.org/10.1007/BF00058655 |
[14] | Schapire, R.E. (1990) The Strength of Weak Learn Ability. Machine Learning, 5, 197-227.
https://doi.org/10.1007/BF00116037 |
[15] | Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29, 1189-1232. https://doi.org/10.1214/aos/1013203451 |
[16] | 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 97-119. |
[17] | Arthur, C. (2014) Computational Actuarial Science with R. CRC Press, Boca Raton. |