|
中国图象图形学报 2008
A Cluster Validity Function Based on Geometric Probability
|
Abstract:
Determining optimum cluster number is a key research topic included in cluster validity,a fundamental unsolved problem in cluster analysis.In order to determine the optimum cluster number,this article proposes a new cluster validity function for two dimensional datasets theoretically based on geometric probability.The function uses of the relationship between a two dimensional dataset and the corresponding two dimensional discrete point set to measure the cluster structure of the dataset according to the distributive feature of the point set in the characteristic space.It is designed from the perspective of intuition and thus can be easily understood.During the process of measurement,the structure information of the point set has been stored in a line segment set generated by connecting each pair points in the point set.The cluster validity function is formed by comparing the values of line segment direction in the line segment set with those resulted from completely random condition.In the case study,it is testified that the pattern of the function curve generated with a given example dataset effectively enables the determination of the optimum cluster number of the dataset and supports the design of cluster algorithms.