本文针对牛奶中所含蛋白质的纵向数据，利用R软件，运用机器学习方法中的决策树、boost、bagging、随机森林、神经网络、支持向量机和传统处理纵向数据的线性随机效应混合模型做预测对比。变化训练集并进行八折交叉验证，对得到的标准均方误差分析可知：对于该数据，无论是长期预测(训练集更大)还是短期预测，传统的方法远远不如机器学习方法，机器学习方法有很好的稳健性。 This study investigates the longitudinal data of protein in cows by using linear mixed models with random effects and other methods including six machine learning methods (trees, boost, bagging, random forest, neural networks, support vector machines) with R software and makes compassion and prediction for the data. According to the change of the training set and via 8-fold cross va-lidation, it analyzes the mean square error and shows the traditional linear mixed models with random effects method is inferior in general to the machine learning method no matter for the long-term or short-term forecasting. Here long-term forecasting corresponds to the larger size of training sets and smaller size of testing sets in machine learning terminology. Also, machine learning methods are stable.
Collins, L.M. (2006) Analysis of longitudinal data: The integration of theoretical model, temporal design, and statistical model. Annual Review of Psychology, 57, 505-528. http://dx.doi.org/10.1146/annurev.psych.57.102904.190146