|
- 2019
基于随机森林的热轧带钢质量分析与预测方法DOI: 10.12068/j.issn.1005-3026.2019.01.003 Keywords: 热轧带钢, 缺陷预测, 数据驱动, 特征提取, 随机森林Key words: hot-rolled strip defect prediction data driven feature selection random forests Abstract: 摘要 以某钢铁企业的热轧带钢生产实际数据作为分析对象,基于改进的随机森林算法分析工艺参数与产品质量间的隐含关系,进行影响产品质量关键工艺参数的特征提取,建立热轧带钢产品缺陷预测模型.实验结果表明,对非平衡数据集进行平衡处理可以提高样本预测精度;采用CART与C4.5相结合的方法比单一方法可以进一步提升预测精度;同时根据特征的高相关与低相关特性,将互信息作为评价指标应用于特征选择,可以提升随机森林算法的分类效果.在以上三种改进策略下,热轧带钢缺陷的识别率得到明显提高.Abstract:The process data of hot-rolled strips from an iron and steel enterprise were analyzed to find out the inherent relationship between process parameters and production quality by using an improved random forests algorithm. After critical features being extracted, a defect prediction model was built. According to the experiment, balancing operation can improve the prediction accuracy of the imbalanced data sets. Meanwhile, the combination of CART and C4.5 can further improve the prediction accuracy than each single method. Furthermore, in consideration of the characteristics whose features have high or low correlations with the response variable, mutual information was introduced as an evaluation criterion for feature selection. Mutual information makes great contribution to classification effect of random forest algorithm, and recognition rate of defects of hot-rolled strips is obviously improved by using three strategies.
|