全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于随机森林的用户网购行为数据填充方法研究
Research on Data Filling Method of User Online Shopping Behavior Based on Random Forest

DOI: 10.12677/AIRR.2022.111003, PP. 19-26

Keywords: 用户网络购买行为,机器学习,随机森林,缺失数据填补
Users’ Online Purchase Behavior
, Machine Learning, Random Forest, Missing Data Filling

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文针对用户网络购物行为预测问题,研究使用随机森林方法对用户网购行为数据进行填充。首先通过数据分析对数据集中缺失数据的缺失分布、缺失数量以及缺失数据存在依赖性进行分析,结合成对删除、对象删除的方法处理简单缺失数据,再重构数据集,基于随机森林方法对缺失数据进行填补。最后使用不同算法搭建用户网购行为预测模型,对比填补前后的数据集在这些模型下的预测效果,证明了随机森林方法在填补用户网购行为缺失数据中的有效性与通用性。
Aiming at the prediction of user online shopping behavior, this paper studies the filling of user online shopping behavior data by using random forest method. Firstly, through data analysis, the missing distribution, missing quantity and the dependence of missing data in the data set are analyzed. Combined with the methods of paired deletion and object deletion, the simple missing data are processed, and then the data set is reconstructed to fill the missing data based on the random forest method. Finally, different algorithms are used to build user online shopping behavior prediction models, and the prediction effects of the data sets before and after filling are compared under these models, which proves the effectiveness and universality of the random forest method in filling the missing data of user online shopping behavior.

References

[1]  王茜, 喻继军. 基于商品购买关系网络的多样性推荐[J]. 系统管理学报, 2020, 29(1): 61-72.
[2]  祝歆, 刘潇蔓, 陈树广, 李静, 张天宇. 基于机器学习融合算法的网络购买行为预测研究[J]. 统计与信息论坛, 2017, 32(12): 94-100.
[3]  胡晓丽, 张会兵, 董俊超, 吴冬强. 基于CNN-LSTM的用户购买行为预测模型[J]. 计算机应用与软件, 2020, 37(6): 59-64.
[4]  Patidar, P. and Tiwari, A. (2013) Handling Missing Value in Decision Tree Algorithm. International Journal of Computer Applications, 70, 31-36.
https://doi.org/10.5120/12023-8063
[5]  Bertsimas, D., Pawlowski, C. and Zhuo, Y.D. (2018) From Predictive Methods to Missing Data Imputation: An Optimization Approach. Journal of Machine Learning Research, 18, 1-39.
[6]  Maheswari, K., Packia Amutha Priya, P., Ramkumar, S. and Arun, M. (2020) Missing Data Handling by Mean Imputation Method and Statistical Analysis of Classification Algorithm. EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing, Coimbatore, 13-15 December 2018, 137-149.
https://doi.org/10.1007/978-3-030-19562-5_14
[7]  Wang, S., Li, M., Hu, N., Zhu, E., Hu, J., Liu, X., et al. (2019) K-means Clustering with Incomplete Data. IEEE Access, 7, 69162-69171.
https://doi.org/10.1109/ACCESS.2019.2910287
[8]  Kabir, G., Tesfamariam, S., Hemsing, J. and Sadiq, R. (2019) Handling Incomplete and Missing Data in Water Network Database Using Imputation Methods. Sustainable & Resilient Infrastructure, 5, 365-377.
https://doi.org/10.1080/23789689.2019.1600960
[9]  丁明珠. 正态模型缺失数据的贝叶斯和Jackknife多重插补法的比较[J]. 计算技术与自动化, 2020, 39(2): 119-123.
[10]  徐鸿艳, 孙云山, 秦琦琳, 朱明涛. 缺失数据插补方法性能比较分析[J]. 软件工程, 2021, 24(11): 11-14+10.
[11]  Gorshenin, A.K. and Lukina, S.S. (2021) On the Efficiency of Machine Learning Algorithms for Imputation in Spatiotemporal Meteorological Data. International Conference of Artificial Intelligence, Medical Engineering, Education, Moscow, 3-4 October 2020, 347-356.
https://doi.org/10.1007/978-3-030-67133-4_32
[12]  郑智泉, 王孟孟, 田维琦. 基于加权K近邻算法的缺失数据填补研究[J]. 智能计算机与应用, 2021, 11(11): 31-33+42.
[13]  张晓琴, 程誉莹. 基于随机森林模型的成分数据缺失值填补法[J]. 应用概率统计, 2017, 33(1): 102-110.
[14]  游凤, 李代伟, 张海清, 汪杰, 彭莉, 王震. 基于归一化KNNI的随机森林填补算法[J]. 成都信息工程大学学报, 2021, 36(1): 32-40.
[15]  Martinez, W.G. (2021) Ensemble Pruning via Quadratic Margin Maximization. IEEE Access, 9, 48931-48951.
https://doi.org/10.1109/ACCESS.2021.3062867
[16]  Zhang, J., Dai, Q. and Yao, C. (2021) DEP-TSPmeta: A Multiple Criteria Dynamic Ensemble Pruning Technique Ad-Hoc for Time Series Prediction. International Journal of Machine Learning and Cybernetics, 12, 2213-2236.
https://doi.org/10.1007/s13042-021-01302-y
[17]  陈磊, 韩飞, 易文祥. 基于信息熵的多尺度FAST角点[J]. 计算机应用与软件, 2020, 37(10): 244-248+269.
[18]  黄伟庆, 杨召阳, 魏冬, 张萌, 王文, 叶彬. 基于信息增益的无线通信信号指纹构建及识别机制研究[J]. 信息安全学报, 2020, 5(6): 11-26.
[19]  董红瑶, 王弈丹, 李丽红. 随机森林优化算法综述[J]. 信息与电脑(理论版), 2021, 33(17): 34-37.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133