%0 Journal Article %T A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values
一种基于GA的混合属性特征大数据集聚类算法 %A Li Jie %A Gao Xin-bo %A Jiao Li-cheng %A
李洁 %A 高新波 %A 焦李成 %J 电子与信息学报 %D 2004 %I %X In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numerical and categorical values. However, most existing clustering algorithms are only efficient for the numerical data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The Genetic Algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numerical and categorical values. %K Cluster analysis %K Numerical data %K Categorical data %K Genetic Algorithm(GA)
聚类分析 %K 数值特征 %K 类属特征 %K 遗传算法 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=1319827C0C74AAE8D654BEA21B7F54D3&jid=EFC0377B03BD8D0EF4BBB548AC5F739A&aid=AAC324AA4E8DA82E&yid=D0E58B75BFD8E51C&vid=96C778EE049EE47D&iid=5D311CA918CA9A03&sid=9E33BE6F309BC209&eid=B169842BD52221DD&journal_id=1009-5896&journal_name=电子与信息学报&referenced_num=2&reference_num=10