%0 Journal Article
%T A Scalable Classification Algorithm Exploring Database Technology
利用数据库技术实现的可扩展的分类算法
%A LIU Hong-yan
%A LU Hong-jun
%A CHEN Jian
%A
刘红岩
%A 陆宏钧
%A 陈剑
%J 软件学报
%D 2002
%I
%X This paper focuses on the study of efficient and scalable classification algorithm that tightly integrates classification technology with relational database system technology. In this paper, an approach based on grouping and counting is proposed to build classifier, which uses SQL (structured query language) provided by relational database to implement the major computation tasks. In order to improve the performance, several optimization strategies and a redundant rules'pruning strategy together with a feature selection method integrating with the process of inding classification rules are also proposed.With all methods and strategies,the classification algrthm can find a compact set of classification rules quickly from a large volume of data.In addition the same classification accuracy with current popular classification algorithms and high training speed,the unique features of the classification algorithm also include its linear scalability with respect to the number of training samples and the number of attributes,and the simplicity in implementation.
%K data mining
%K classification
%K RDBMS (relational database management system)
%K SQL (structured query language)
数据挖掘
%K 分类
%K 关系数据库管理系统
%K 结构化查询语言
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=C0E903A90E4FC4B2&yid=C3ACC247184A22C1&vid=FC0714F8D2EB605D&iid=B31275AF3241DB2D&sid=6A12B9FCEF71AE29&eid=5A9F0976AE79CB6F&journal_id=1000-9825&journal_name=软件学报&referenced_num=8&reference_num=7