|
自动化学报 2009
Two Spectral Algorithms for Ensembling Document Clusters
|
Abstract:
A critical problem in cluster ensemble is how to combine multiple clusters to yield a superior result. In this paper, the idea of spectral clustering algorithm is brought into the document cluster ensemble problem. Since spectral clustering algorithm needs to solve eigenvalue decomposition problem of a large scale matrix to get the low dimensional embedding of documents for later clustering, a fast spectral algorithm is first proposed, in which the large scale matrix eigenvalue decomposition problem is transformed to an equivalent singular value decomposition problem and then to a much smaller matrix eigenvalue decomposition problem. The characteristic of spectral clustering algorithm is further investigated and another spectral algorithm is proposed, in which the low dimensional embedding of documents are obtained indirectly by those of hyperedges. Experiments on TREC and Reuters document sets show that both proposed spectral algorithms outperform other cluster ensemble techniques based on graph partitioning, and can effectively solve document cluster ensemble problem.