本文选取购买保险的客户的相关特性作为研究对象,基于关联分析以及逻辑回归等方法,对百姓在购买保险方面的续保率进行预测。首先对数据进行预处理,将可用变量转化为虚拟变量以做关联分析,本文分别对离散与连续两种不同类型的数据进行虚拟化处理,针对离散型数据可直接将离散化区间或属性直接转化为“项”,针对连续型数据,本文引入迭代二划分的思想,基于支持度与置信度对连续属性值进行最优区间划分,为每个不同的属性值创建一个新的项来得到连续型属性的虚拟变量。将量化关联规则后得到的数据,对其进行关联分析,选取强关联规则中包含“是否续保”变量的其他所有变量:车龄,续保年,被保险人年龄,新车购置价,签单保费,三者险保费6种因素作为续保的主要影响因子,利用Logistic回归模型得到这些影响因素与续保率之间的关系,再预测得到续保率,其拟合度为96.7%。而后构建Z统计量,借助统计推断可为客户定位其属性,以实现精准的客户画像。
This paper chooses
the relevant characteristics of the customers who buy insurance as the research
object, and based on the methods of correlation analysis and logistic
regression, predicts the renewal rate of the people who buy insurance. Firstly,
the data are preprocessed, and the available variables are transformed into
virtual variables for correlation analysis. In this paper, two different types
of data, discrete and continuous, are virtualized. For discrete data, discrete
intervals or attributes can be directly transformed into “items”. For
continuous data, the idea of iteration dichotomy is introduced, which is based
on support and confidence. The continuous attribute values are divided into
optimal intervals, and a new item is created for each different attribute value
pair to obtain the virtual variables of the continuous attribute. After
quantifying the data obtained from association rules, the association analysis
is carried out, and all other variables including “whether to renew insurance”
variables in strong association rules are selected as the six main influencing
factors: vehicle age, renewal year, insured age, new car purchase price,
signing premium and insurance premium. The logistic regression model is used to
obtain the relationship between these influencing factors and renewal rate.Then we predicted again to get renewal rate value. The fitting degree is
96.7%. Then Z statistics is constructed to locate the attributes of customers
by statistical inference, so as to achieve accurate customer portraits.