Fuzzy C-means (FCM) is simple and widely
used for complex data pattern recognition and image analyses. However,
selecting an appropriate fuzzifier (m) is crucial in identifying an optimal
number of patterns and achieving higher clustering accuracy, which few studies
have investigated. Built upon two existing methods on selecting fuzzifier, we
developed an integrated fuzzifier evaluation and selection algorithm and tested
it using real datasets. Our findings indicate that the consistent optimal
number of clusters can be learnt from testing different fuzzifiers for each
dataset and the fuzzifier with the lowest value for this consistency should be
selected for clustering. Our evaluation also shows that the fuzzifier impacts
the clustering accuracy. For longitudinal data with missing values, m = 2 could
be an empirical rule to start fuzzy clustering, and the best clustering
accuracy was achieved for tested data, especially using our multiple-imputation
based fuzzy clustering.
References
[1]
Clark, M.C., Hall, L.O., Goldgof, D.B., et al. (2002) MRI Segmentation Using Fuzzy Clustering Techniques. IEEE Engineering in Medicine and Biology Magazine, 13, 730-742. http://dx.doi.org/10.1109/51.334636
[2]
Wang, C.J., Fang, H. and Wang, H. (2014) DAG-Searched and Density-Based Initial Centroid Location Method for Fuzzy Clustering of Big Biomedical Data. BICT2014. http://dx.doi.org/10.4108/icst.bict.2014.257932
[3]
Tsai, D.-M. and Lin, C.-C. (2011) Fuzzy C-Means Based Clustering for Linearly and Nonlinearly Separable Data. Pattern Recognition, 44, 1750-1760. http://dx.doi.org/10.1016/j.patcog.2011.02.009
Fang, H., Johnson, C., et al. (2011) A New Look at Quantifying Tobacco Exposure during Pregnancy Using Fuzzy Clustering. Neurotoxicology and Teratology, 33, 155-165. http://dx.doi.org/10.1016/j.ntt.2010.08.003
[6]
Fang, H., Dukic, V., et al. (2012) Detecting Graded Exposure Effects: A Report on an East Boston Pregnancy Cohort. Nicotine & Tobacco Research, 14, 1115-1120. http://dx.doi.org/10.1093/ntr/ntr272
[7]
Bezdek, J.C. and Hathaway, R.J. (1987) Convergence and Theory for Fuzzy c-Means Clustering: Counterexamples and Repairs. IEEE Trans. Pattern Anal., 17, 873-877.
[8]
Chan, K.P. and Cheung, Y.S. (1992) Clustering of Clusters. Pattern Recognition Letters, 25, 211-217.
http://dx.doi.org/10.1016/0031-3203(92)90102-O
[9]
Pal, N.R. and Bezdek, J.C. (1995) On Cluster Validity for the Fuzzy c-Means Model. IEEE Transactions on Fuzzy Systems, 3, 370-379. http://dx.doi.org/10.1109/91.413225
[10]
Ozkan, I. and Turksen, I.B. (2007) Upper and Lower Values for the Level of Fuzziness in FCM. Information Sciences, 177, 5143-5152. http://dx.doi.org/10.1016/j.ins.2007.06.028
[11]
Huang, M., Xia, Z.X., Wang, H.B., et al. (2012) The Range of the Value for the Fuzzifier of the Fuzzy c-Means Algorithm. Pattern Recognition Letters, 33, 2280-2284. http://dx.doi.org/10.1016/j.patrec.2012.08.014
[12]
Yu, J., Cheng, Q.S. and Huang, H.K. (2004) Analysis of the Weighting Exponent in the FCM. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34, 634-639. http://dx.doi.org/10.1109/TSMCB.2003.810951
[13]
Schw?mmle, V. and Jensen, O.N. (2010) A Simple and Fast Method to Determine the Parameters for Fuzzy c-Means Cluster Analysis. Bioinformatics, 26, 2841-2848. http://dx.doi.org/10.1093/bioinformatics/btq534
[14]
Xie, X.L. and Beni, G. (1991) A Validity Measure for Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 841-847. http://dx.doi.org/10.1109/34.85677
[15]
Rezaee, B. (2010) A Cluster Validity Index for Fuzzy Clustering. Fuzzy Sets and Systems, 161, 3014-3025.
http://dx.doi.org/10.1016/j.fss.2010.07.005
[16]
https://archive.ics.uci.edu/ml/datasets/Iris
[17]
Kim, S.S., Kim, S.H., Fang, H., et al. (2014) A Culturally Adapted Smoking Cessation Intervention for Korean Americans: A Mediating Effect of Perceived Family Norm toward Quitting. Journal of Immigrant and Minority Health, 31 May 2014. [Epub ahead of print]. http://dx.doi.org/10.1007/s10903-014-0045-4