|
计算机应用 2007
Real-time clustering algorithm for multiple data streams
|
Abstract:
To overcome the imbalance between clustering quality and efficiency in current multiple data streams clustering algorithms, a clustering algorithm based on correlation coefficient was proposed. The algorithm can dynamically discover the clusters in the data streams over a fLxed time period. The attenuation coefficient was introduced to improve the performance of clustering and the correlation coefficient was used to measure the similarity between data streams. In the algorithm, the time horizon was divided into several equal segments and statistical information was computed for stream data in each time segment. The algorithm can modify the clustering structure according to the statistical information in real time. Experimental results show that the algorithm has higher efficiency, clustering quality and stability than other methods.