|
基于多项有序Logistic回归的汽车购买意愿影响因素研究
|
Abstract:
本文以UCI机器学习数据库中的1728条汽车评价数据为研究对象,旨在探究影响消费者汽车购买意愿的因素以及有序Logistic模型在分类预测中的效果。数据集总共包括七个变量,其中消费者汽车购买意愿为因变量,购入费用、维修费用、车门数、座位、内部空间、安全程度为自变量,除车门数和座位外,其余变量均为分类型变量;本文借助于统计软件R,采用多项有序Logistic回归模型进行建模预测后发现:1) 购入费用和维修费用对消费者汽车购买意愿有显著的负向影响,车门数、座位数、内部空间、安全性对消费者汽车购买意愿有显著的正向影响;2) 安全性这个自变量对消费者汽车购买意愿的影响最大,其回归系数值为2.743,同时其优势比(OR值)为15.531,意味着安全性增加一个单位时,购买意愿的变化(增加)幅度为15.531倍;3) 利用构建的多项有序Logistic回归模型对测试集数据(后30%)进行预测时,对整体多项预测准确率达到了0.815,从因变量具体类别来看:Class: 0 (unacc)和Class: 3 (Vgood)的预测准确率最高,分别为0.830和0.829,Class: 2 (good)准确率最差仅为0.499。
This paper takes 1728 automobile evaluation data in the UCI machine learning database as the re-search object, and aims to explore the factors influencing consumers’ automobile purchase inten-tion and the effect of the ordered Logistic model in classification prediction. The data set consists of seven variables in total, of which, the purchase intention of consumers is the dependent variable, while the purchase cost, maintenance cost, number of car doors, seats, interior space and safety de-gree are independent variables. Except for the number of car doors and seats, other variables are sub-type variables. With the help of statistical software R, this paper adopts multiple ordered Lo-gistic regression model for modeling and prediction, and finds that: 1) Purchase cost and mainte-nance cost have a significant negative impact on consumers’ purchase intention, and the number of doors, seats, interior space and safety have a significant positive impact on consumers’ purchase intention; 2) The independent variable of safety has the greatest influence on consumers’ intention to buy automobiles. Its regression coefficient value is 2.743, and its odds ratio (OR value) is 15.531, which means that when safety increases by one unit, the change (increase) range of purchase inten-tion is 15.531 times. 3) When the constructed multiple ordered Logistic regression model was used to predict the test set data (the last 30%), the overall multiple prediction accuracy reached 0.815. From the specific category of dependent variables: The prediction accuracy of Class: 0 (unacc) and Class: 3 (Vgood) was 0.830 and 0.829, respectively, while the prediction accuracy of Class: 2 (good) was 0.499.
[1] | 徐国虎, 许芳. 新能源汽车购买决策的影响因素研究[J]. 中国人口资源与环境, 2010, 20(11): 91-95. |
[2] | 葛君. 基于Logistic模型的信用卡信用风险研究[J]. 中国信用卡, 2010(24): 26-32. |
[3] | 李星星. 广义线性模型的若干估计及比较[D]: [硕士学位论文]. 扬州: 扬州大学, 2017. |
[4] | 赵红. Logistic曲线参数估计方法及应用研究[D]: [硕士学位论文]. 长春: 吉林农业大学, 2015. |
[5] | 石永东, 胡树华. 汽车购买行为模型及其评价[J]. 汽车工业研究, 2003(2): 7-10. |
[6] | 李倩星. R语言与大数据编程实战[M]. 北京: 电子工业出版社, 2017: 230-250. |
[7] | 张良均, 云伟标, 王路. R语言数据分析与挖掘实战[M]. 北京: 机械工业出版社, 2015: 66-88. |
[8] | 许可. 关于在我国推行绿色汽车保险的可行性分析——基于调研数据的Logistic模型研究[J]. 应用概率统计, 2012(4): 334. |
[9] | Yee, T.W. (2015) Vector Generalized Linear and Additive Models (in Preparation). Springer, New York.
https://doi.org/10.1007/978-1-4939-2818-7 |