|
基于随机森林方法的国产电影票房研究
|
Abstract:
随着经济的发展和人民生活水平的提高,电影行业迅速发展。电影票房影响因素的研究及预测,对提高国产电影质量十分必要。目前学者们多采用神经网络方法对电影票房进行研究,神经网络方法未给出变量重要性排序,预测结果不够稳健。本文依据2014~2018年225部国产影片的相关数据,采用随机森林方法建立电影票房预测模型。得到了影响我国国产电影票房的因素主要有首周末票房、首映日票房、百度指数、豆瓣评分和点映票房。同时本文采用线性回归模型和神经网络模型建立电影票房的预测模型,应用三种方法对2019年12部国产电影票房进行预测。结果表明:随机森林在电影票房预测方面更加精确稳健,对《飞驰人生》、《银河补习班》等八部影片的预测误差在10%左右。神经网络和线性回归模型预测误差较大。
With the development of economy and the improvement of people’s living standard, the film indus-try develops rapidly. It is necessary to study and forecast the influencing factors of film box office to improve the quality of domestic films. At present, most scholars use the neural network method to study the box office of films. The neural network method does not give the order of importance of variables, and the prediction results are not robust enough. Based on the relevant data of 225 do-mestic films from 2014 to 2018, this paper adopts the random forest method to establish the box office prediction model. The main factors that influence the box office of domestic films in China are the box office of the first weekend, the first day box office, baidu index, douban score and the ad-vance screenings box office. At the same time, this paper adopts linear regression model and neural network model to establish the box office prediction model, and applies three methods to predict the box office of 12 domestic films in 2019. The results show that the random forest is more accu-rate and stable in the prediction of box office, and the prediction error of eight films such as “Pega-sus” and “Looking Up” is around 10%. The prediction error of neural network and linear regression model is large.
[1] | 杨威. 基于微博数据的电影票房预测模型研究[D]: [硕士学位论文]. 安徽: 安徽大学计算机应用技术专业, 2014. |
[2] | 张雪. 基于深度学习卷积神经网络的电影票房预测[D]: [硕士学位论文]. 北京: 首都经济贸易大学统计学院, 2017. |
[3] | 郭萱. 基于随机森林的电影票房预测研究[D]: [硕士学位论文]. 北京: 中国石油大学(北京)数学系, 2018. |
[4] | 鲁月. 基于随机森林因素筛选的国产电影票房组合预测模型研究[D]: [硕士学位论文]. 江苏: 南京航空航天大学经济与管理学院, 2019. |
[5] | Boulesteix, A.L., Janitza, S., Kruppa, J., et al. (2012) Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics. Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery, 2, 493-507. https://doi.org/10.1002/widm.1072 |
[6] | 曹正凤. 随机森林算法优化研究[D]: [博士学位论文]. 北京: 首都经济贸易大学统计学院, 2014. |
[7] | 王星. 大数据分析: 方法与应用[M]. 北京: 清华大学出版社, 2013: 63-65. |
[8] | Hastie, T., Tibshirani, R. and Friedman, J. (2008) The Elements of Statistical Learning. Stanford, California, August, 588-590. |
[9] | 徐戈. 基于随机森林模型的房产价格评估[J]. 统计与决策, 2014(17): 22-25. |
[10] | 聂鸿迪. 中国电影票房的影响因素及其实证研究[D]: [硕士学位论文]. 北京: 北京交通大学经济管理学院, 2015. |
[11] | 郎倩雯. 中国电影公关营销策略研究[D]: [硕士学位论文]. 浙江: 浙江大学传媒与国际文化学院, 2011. |
[12] | 宋恩梅, 朱梦娴. 社会化媒体信息分布规律研究: 以电影评论为例[J]. 信息资源管理学报, 2015(3): 25-36. |
[13] | James, G., Witten, D., Hastie, T., et al. (2013) An Introduction to Statistical Learning: With Applications in R. Springer, New York, 320-321. https://doi.org/10.1007/978-1-4614-7138-7_1 |