|
自动化学报 2008
分类分析中基于信息论准则的特征选取DOI: 10.3724/SP.J.1004.2008.00383, PP. 383-392 Keywords: Patternclassification,datamining,featureselection,information-theoreticmeasures Abstract: ?Featureselectionaimstoreducethedimensionalityofpatternsforclassificatoryanalysisbyselectingthemostinformativeinsteadofirrelevantand/orredundantfeatures.Inthisstudy,twonovelinformation-theoreticmeasuresforfeaturerankingarepresented:oneisanimprovedformulatoestimatetheconditionalmutualinformationbetweenthecandidatefeaturefiandthetargetclassCgiventhesubsetofselectedfeaturesS,i.e.,I(C;fi|S),undertheassumptionthatinformationoffeaturesisdistributeduniformly;theotherisamutualinformation(MI)basedconstructivecriterionthatisabletocapturebothirrelevantandredundantinputfeaturesunderarbitrarydistributionsofinformationoffeatures.Withthesetwomeasures,twonewfeatureselectionalgorithms,calledthequadraticMI-basedfeatureselection(QMIFS)approachandtheMI-basedconstructivecriterion(MICC)approach,respectively,areproposed,inwhichnoparameterslikeβinBattiti'sMIFSand(KwakandChoi)'sMIFS-Umethodsneedtobepreset.Thus,theintractableproblemofhowtochooseanappropriatevalueforβtodothetradeoffbetweentherelevancetothetargetclassesandtheredundancywiththealready-selectedfeaturesisavoidedcompletely.ExperimentalresultsdemonstratethegoodperformancesofQMIFSandMICConbothsyntheticandbenchmarkdatasets.
|