|
自动化学报 2009
A Fast Clustering Algorithm for Large-scale and High Dimensional Data
|
Abstract:
A novel self-organizing-mapping algorithm for large-scale and high dimensional data is proposed in this paper. By compressing neurons' feature sets and only selecting relative features to construct neurons' feature vectors, the clustering time can be dramatically decreased. Simultaneously, because the selected features can effectively distinguish different documents which are mapped to different neurons, the algorithm can avoid interferences of irrelative features and improve clustering precision. Experiments results demonstrate that this methodology can accelerate clustering speed and improve clustering precision significantly and can reach relatively ideal clustering effect.