针对当前大数据在单机运算时间过长,对硬件设备要求高的问题,为此提出基于云环境下使用分布式逻辑回归算法DLR (Distributed Logistic Regression)模型对PM10与能见度以及湿度之间的相关性问题,根据二分类思想,将能见度、湿度作为特征值,PM10作为标签值使用逻辑回归算法构建模型对其进行分析,实验结果表明,在同一湿度范围下能见度值越小,大气气溶胶PM10浓度偏大,在同一能见度范围下湿度值越大,大气气溶胶PM10浓度偏低。并且DLR算法模型在时间性能方面要优于传统回归模型,具有更好的鲁棒性以及实时性。
Considering the problem that the current big data has a long stand-alone operation time and high requirements for hardware devices, this paper proposes the use of the Logistic Regression (DLR) model in the cloud environment for the correlation between PM10 and visibility and humidity. According to the idea of two classifications, visibility and humidity are used as feature values, and PM10 is used as a tag value to construct a model using a logistic regression algorithm. The experimental results show that under the same humidity range, the smaller the visibility value is, the higher the PM10 concentration of atmospheric aerosol is. The higher the humidity value in the same visibility range, the lower the concentration of PM10 in atmospheric aerosols. And the DLR algorithm model outperforms the traditional regression model in terms of temporal performance, and has better robustness and real-time performance.
Cheung, H.-C., Wang, T., Baumann, K., et al. (2005) Influence of Regional Pollution Outflow on the Concentrations of Fine Particulate Matter and Visibility in the Coastal Area of Southern China. Atmospheric Environment, 39, 6463-6474.
https://doi.org/10.1016/j.atmosenv.2005.07.033
Papanastasiou, D.K., Melas, D. and Kioutsioukis, I. (2007) Development and Assessment of Neural Network and Multiple Regression Models in Order to Predict PM10 Levels in a Medium-sized Mediterranean City. Water, Air, and Soil Pollution, 182, 325-334. https://doi.org/10.1007/s11270-007-9341-0