全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

集群数据下因子分析模型K-Means聚类
Factor Analysis Model K-Means Clustering under Cluster Data

DOI: 10.12677/sa.2024.135172, PP. 1750-1758

Keywords: 集群数据,因子分析模型,主成分法,K-Means聚类
Cluster Data
, Factor Analysis Model, Principal Component Analysis, K-Means Clustering

Full-Text   Cite this paper   Add to My Lib

Abstract:

集群数据刻画了不同研究对象在群内的动态关系,在经济学、社会和医学等领域被广泛应用。经典的聚类分析方法常用来刻画样本之间的相似性,进而对样本或者指标进行聚类,对于集群数据子群之间的聚类研究较少。本文对集群数据建立因子分析模型,通过主成分法,产生群组各异的集群数据,使用K-means聚类方法对集群数据群聚类。随机模拟用因子分析模型主成分法产生集群数据,模拟表明了聚类方法的有效性。实例分析对集群数据群进行聚类,使用轮廓系数对聚类进行评价。评价结果表明,运用机器学习K-means算法对集群数据子群聚类效果较好。
Cluster data characterizes the dynamic relationships among different research objects within a cluster, and is widely used in fields such as economics, society, and medicine. Classic clustering analysis methods are commonly used to characterize the similarity between samples and cluster samples or indicators, but there is relatively little research on clustering between subgroups of cluster data. This article establishes a factor analysis model for cluster data, generates cluster data with different groups through principal component analysis, and uses K-means clustering method to cluster the cluster data. Random simulation uses factor analysis model principal component analysis to generate cluster data, and the simulation shows the effectiveness of the clustering method. Case analysis is used to cluster data groups and evaluate the clustering using silhouette coefficients. The evaluation results indicate that the use of machine learning K-means algorithm has a good clustering effect on subgroups of cluster data.

References

[1]  McNeish, D. and Stapleton, L.M. (2016) Modeling Clustered Data with Very Few Clusters. Multivariate Behavioral Research, 51, 495-518.
https://doi.org/10.1080/00273171.2016.1167008
[2]  Zhou, B., Fine, J., Latouche, A. and Labopin, M. (2011) Competing Risks Regression for Clustered Data. Biostatistics, 13, 371-383.
https://doi.org/10.1093/biostatistics/kxr032
[3]  Lee, C.H. and Steigerwald, D.G. (2018) Inference for Clustered Data. The Stata Journal: Promoting Communications on Statistics and Stata, 18, 447-460.
https://doi.org/10.1177/1536867x1801800210
[4]  Agarwal, S. (2013). Data Mining: Data Mining Concepts and Techniques. 2013 International Conference on Machine Intelligence and Research Advancement, Katra, 21-23 December 2013, 203-207.
https://doi.org/10.1109/icmira.2013.45
[5]  周水庚, 周傲英, 曹晶. 基于数据分区的DBSCAN算法[J]. 计算机研究与发展, 2000, 37(10): 1153-1159.
[6]  夏业茂, 刘应安. 多元空间因子模型的仿真、模型比较与应用[J]. 数理统计与管理, 2014, 33(5): 851-859.
[7]  Julian, M. (2001) The Consequences of Ignoring Multilevel Data Structures in Nonhierarchical Covariance Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 8, 325-352.
https://doi.org/10.1207/s15328007sem0803_1
[8]  Xuan Do, C. and Tsukai, M. (2017) Exploring Potential Use of Mobile Phone Data Resource to Analyze Inter-Regional Travel Patterns in Japan. In: Lecture Notes in Computer Science, Springer, 314-325.
https://doi.org/10.1007/978-3-319-61845-6_32
[9]  Baltagi, B.H., Griffin, J.M. and Xiong, W. (2000) To Pool or Not to Pool: Homogeneous versus Heterogeneous Estimators Applied to Cigarette Demand. Review of Economics and Statistics, 82, 117-126.
https://doi.org/10.1162/003465300558551
[10]  Akaike, H. (1987) Factor Analysis and AIC. Psychometrika, 52, 317-332.
https://doi.org/10.1007/bf02294359

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133