全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Data Clustering Method for Very Large Databases using entropy-based algorithm

Keywords: Data mining , categorical clustering , data labeling.

Full-Text   Cite this paper   Add to My Lib

Abstract:

Finding useful patterns in large datasets has attracted considerable interest recently and one of the most widely studied problems in this area is the identification of clusters, or densely y populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs. Clustering of categorical attributes is a difficult problem that has not received as much attention as its numerical counterpart. In this paper we explore the connection between clustering and entropy: clusters of similar points have lower entropy than those of dissimilar ones. We use this connection to design a heuristic algorithm, which is capable of efficiently cluster large data sets of records with categorical attributes. In contrast with other categorical clustering algorithms published in the past, clustering results are very stable for different sample sizes and parameter settings. Also, the criteria for clustering are a very intuitive one, since it is deeply rooted on the well-known notion of entropy.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133