|
计算机应用研究 2009
Real-time data stream clustering based on damped window and pruning dimension tree
|
Abstract:
This paper proposed a novel real-time data stream clustering algorithm PDStream, which was based on damped window. PDStream firstly divided data space into grids, then used an improved dimension tree structure to maintain and update the data stream summary statistics. Designed a pruning strategy to prune the sparse grids in dimension tree periodically. Finally used the depth first search (DSF) method to deal with online clustering request. The experimental results on synthetic dataset and real dataset demonstrate that PDStream has the advantages of discovering clusters of arbitrary shape effectively, low memory consumption, preferable precision.