|
- 2018
基于快速峰值聚类的高速公路异常事件识别方法??
|
Abstract:
为准确全面感知高速公路交通运行状况,根据高速公路海量收费数据,提出一种高速公路通行异常事件识别的数据挖掘方法。首先,选取贵州省2017年1月的高速公路收费数据,筛选指定的进站、出站数据并去除多余字段,利用车辆进入和驶出收费站时间计算其在该路段的通行时长。然后,使用快速峰值聚类算法对通行时长和车辆总重进行聚类分析,计算数据间欧式距离,将此距离矩阵作为算法输入,计算各数据点的局部密度??ρ??及与密度更高点的距离??δ??两项指标;这两项指标均以较高的点为聚类中心,进而对非中心点进行分类及优化,输出聚类结果;聚类结果中除被分为若干类的正常数据外,还存在一些数据点明显异于大部分正常数据的噪声点,即异常数据,对这些异常数据进行具体分析。接着,采用孤立点检测法对筛选出的数据进行清洗处理,提取异常数据,检测出通行时间过长、过短及车辆总重过高、过低等异常事件。最后,将孤立点检测法得到的异常数据与快速峰值聚类算法的异常数据进行对比。研究结果表明:快速峰值聚类识别异常事件的准确率高于孤立点检测法约20%,验证了提出算法的有效性和准确性;提出的算法能有效准确识别收费数据中隐藏的公路拥堵、长时间停留、疑似逃费和网络设备故障等异常事件,进而为高速公路运营服务和管理决策提供数据支持。
To sense the expressway traffic operation??status more accurately and comprehensively, a data mining method for identifying abnormal traffic events on an expressway using mass data collection was proposed. First, fee data from January 2017 were selected from the massive data available for the Guizhou Expressway toll. The data on the specific entrance and exit stations were selected, and some redundant fields were deleted, with those data only related to this study being retained. The time for driving into the entrance station and driving out of the exit station was used to calculate the vehicle staying time between the two toll stations. The selected data were analyzed based on the driving time and axle weight using a fast peak clustering algorithm. The distance between each data point was calculated, and the distance matrix was used as the input of the algorithm. The local density of each data point and the distance between the points with higher density were calculated. In addition, the cluster centers were selected based on the principle that the two indicators were higher. The non??central points were classified and optimized, and the clustering result was then outputted. The normal data of clustering results were divided into several categories, and there exists some noise whose data points were significantly different from most of the normal data. A specific analysis was conducted for these abnormal data. An outlier detection algorithm was then used to process the original data, the cleaned abnormal data were extracted, and abnormal events such as excessive transit time, a short transit time, and a high load were detected. Finally, the anomalies in the data obtained using the isolated point detection method were compared with the anomalies in the data of the fast peak clustering algorithm. The results show that the accuracy of fast peak clustering used to identify anomalous events is higher than that of the isolated point detection method by nearly 20%, which verifies the validity and accuracy of the proposed