全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

多变量连续属性离散化方法

, PP. 792-797

Keywords: 数据挖掘,多变量离散化,最小描述长度原理(MDLP),Nave贝叶斯分类器

Full-Text   Cite this paper   Add to My Lib

Abstract:

目前很多离散化方法仅考虑单个变量,不能得到最优的离散化方案。文中提出一种多属性关系的数据离散化方法。凭借概率的模型选择和最小描述长度原理,获得多变量离散化衡量标准,基于该标准提出一种有效的启发式算法来寻找最好的离散化方案。对UCI数据集进行分类预测,实验结果表明该方法提高Nave贝叶斯分类器的学习精度。

References

[1]  Wu Xiaodong, Vipin K, Quinlan J R, et al. Top 10 Algorithms in Data Mining. Knowledge Information System, 2008, 14(1): 1-37
[2]  Dougherty J, Kohavi R, Sahami M. Supervised and Unsupervised Discretization of Continuous Feature // Proc of the 12th International Conference on Machine Learning. Edinburgh, UK, 1995: 194-202
[3]  Liu Huan, Setiono R. Feature Selection via Discretization. IEEE Trans on Knowledge and Data Engineering, 1997, 9(4): 642-645
[4]  Su C T, Hsu J H. An Extended Chi2 Algorithm for Discretization of Real Value Attributes. IEEE Trans on Knowledge and Data Engineering, 2005, 17(3): 437-441
[5]  Sang Yu, Yan Deqin, Liang Hongxia, et al. Modification to Algorithms of the Series of Chi2 Algorithm. Journal of Chinese Computer Systems, 2009, 30(3): 524-529 (in Chinese)(桑 雨,闫德勤,梁宏霞,等.对Chi2系列算法的改进方法.小型微型计算机系统, 2009, 30(3): 524-529)
[6]  Fayyad U, Irani K. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning // Proc of the 13th International Joint Conference on Artificial Intelligence. Chambery, France, 1993: 1022-1027
[7]  Xie Hong, Cheng Haozhong, Niu Dongxiao. Discretization of Continuous Attributes in Rough Set Theory Based on Information Entropy. Chinese Journal of Computers, 2005, 28(9): 1570-1574 (in Chinese)(谢 宏,程浩忠,牛东晓.基于信息熵的粗糙集连续属性离散化算法.计算机学报, 2005, 28(9): 1570-1574)
[8]  Kurgan L A, Cios K J. CAIM Discretization Algorithm. IEEE Trans on Knowledge and Data Engineering, 2004, 16(2): 145-153
[9]  Tai C J, Lee C I, Yang W P. A Discretization Algorithm Based on Class-Attribute Contingency Coefficient. Information Sciences, 2008, 178(3): 714-731.
[10]  Li Gang.An Unsupervised Discretization Algorithm Based on Mixture Probabilistic Model. Chinese Journal of Computers, 2002, 25(2): 158-164 (in Chinese)(李 刚.基于混合概率模型的无监督离散化算法.计算机学报, 2002, 25(2): 158-164)
[11]  Ruiz F J, Angulo C, Agell N. IDD: A Supervised Interval Distance-Based Method for Discretization. IEEE Trans on Knowledge and Data Engineering, 2008, 20(9): 1230-1238
[12]  Jin Ruoming, Breitbart Y, Muoh C. Data Discretization Unification. Knowledge and Information System, 2008, 14(1): 115-142
[13]  Hansen M H, Yu Bin. Model Selection and the Principle of Minimum Description Length. Journal of the American Statistical Association, 2001, 96(454): 746-774
[14]  Fazlollah M R. An Introduction to Information Theory. New York, USA: Dover Publications, 1994
[15]  Mussard S, Seyte F, Terraza M. Decomposition of Gini and the Generalized Entropy Inequality Measures. Economic Bulletin, 2003, 4(7): 1-6
[16]  Pawlak Z. Rough Sets. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356
[17]  Li Linshu. Probability and Mathematical Statistics. Beijing, China: China Renmin University Press, 2006 (in Chinese)(李林曙. 概率论与数理统计. 北京: 中国人民大学出版社, 2006)
[18]  Hsu C N, Huang H J, Wong T T. Why Discretization Works for Nave Bayesian Classifiers // Proc of the 17th International Conference on Machine Learning. Stanford, USA, 2000: 309-406

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133