%0 Journal Article %T Outlier Mining of the High-dimension Datasets Based on Information Theory
基于信息论的高维海量数据离群点挖掘 %A ZHANG Jing %A SUN Zhi-hui %A SONG Yu-qing %A NI Wei-wei %A YAN Yan-hua %A
张 净 %A 孙志挥 %A 宋余庆 %A 倪巍伟 %A 晏燕华 %J 计算机科学 %D 2011 %I %X Phenomena of "curse of dimensionality" deteriorate lots of existing outlier mining algorithms validity. Conconing thw problem, the outlier mining algorithm of high-dimension and large datasets based on information theory was proposed. This algorithm used the concept of information entropy and the mutual information in the information theory,carried on the feature selection after using estimated mutual information value objective basis entropy power sorting, and eliminated redundant attribute for dimensionality reduction. Outlier mining using information entropy as a measure standard to judge eliminated the drawbacks of distance and density metric. The experimental result in the real data sets indicates that the algorithm for outlicr mining in high-dimensional mass data is effective and feasible, its efficiency and accuracy arc significantly improved. %K Outlier mining %K Information theory %K Feature selection %K Entropy %K Mutual information
离群点挖掘,信息论,属性选择,嫡,互信息 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=3FDBEA69C21D71CD1001834BFDA93A7E&yid=9377ED8094509821&vid=16D8618C6164A3ED&iid=DF92D298D3FF1E6E&sid=856C2E13D1000DB7&eid=82D9EBF3290C72B6&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=16