全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Hybrid Feature Selection Method Based on Rough Conditional Mutual Information and Naive Bayesian Classifier

DOI: 10.1155/2014/382738

Full-Text   Cite this paper   Add to My Lib

Abstract:

We introduced a novel hybrid feature selection method based on rough conditional mutual information and Naive Bayesian classifier. Conditional mutual information is an important metric in feature selection, but it is hard to compute. We introduce a new measure called rough conditional mutual information which is based on rough sets; it is shown that the new measure can substitute Shannon’s conditional mutual information. Thus rough conditional mutual information can also be used to filter the irrelevant and redundant features. Subsequently, to reduce the feature and improve classification accuracy, a wrapper approach based on naive Bayesian classifier is used to search the optimal feature subset in the space of a candidate feature subset which is selected by filter model. Finally, the proposed algorithms are tested on several UCI datasets compared with other classical feature selection methods. The results show that our approach obtains not only high classification accuracy, but also the least number of selected features. 1. Introduction With increase of data dimensionality in many domains such as bioinformatics, text categorization, and image recognition, feature selection has become one of the most important data mining preprocessing methods. The aim of feature selection is to find a minimal feature subset of the original datasets that is the most characterizing. Since feature selection can bring lots of advantages, such as avoiding overfitting, facilitating data visualization, reducing storage requirements, and reducing training times, it has attracted considerable attention in various areas [1]. In the past two decades, different techniques are proposed to address these challenging tasks. Dash and Liu [2] point out that there are four basic steps in a typical feature selection method, that is, subset generation, subset evaluation, stopping criterion, and validation. Most studies focus on the two major steps of feature selection: subset generation and subset evaluation. According to subset evaluation function, feature selection methods can be divided into two categories: filter method and wrapper method [3]. Filter methods are independent of predictor, whereas wrapper methods utilize their predictive power as the evaluation function. The merits of filter methods are high computation efficiency and its generality. However, the result of filter method is not always satisfactory. This is because the filter model separates feature selection from the classifier learning and selects the feature subsets that are independent from the learning algorithm. On

References

[1]  I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.
[2]  M. Dash and H. Liu, “Feature selection for classification,” Intelligent Data Analysis, vol. 1, no. 1–4, pp. 131–156, 1997.
[3]  M. Dash and H. Liu, “Consistency-based search in feature selection,” Artificial Intelligence, vol. 151, no. 1-2, pp. 155–176, 2003.
[4]  C. J. Merz and P. M. Murphy, UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine, Calif, USA, 1996, http://mlearn.ics.uci.edu/MLRepository.html.
[5]  K. Kira and L. A. Rendell, “Feature selection problem: traditional methods and a new algorithm,” in Proceedings of the 9th National Conference on Artificial Intelligence (AAAI '92), pp. 129–134, July 1992.
[6]  M. Robnik-?ikonja and I. Kononenko, “Theoretical and empirical analysis of ReliefF and RReliefF,” Machine Learning, vol. 53, no. 1-2, pp. 23–69, 2003.
[7]  I. Kononenko, “Estimating attributes: analysis and extension of RELIEF,” in Proceedings of European Conference on Machine Learning (ECML '94), pp. 171–182, 1994.
[8]  M. A. Hall, Correlation-based feature subset selection for machine learning [Ph.D. thesis], Department of Computer Science, University of Waikato, Hamilton, New Zealand, 1999.
[9]  J. G. Bazan, “A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision table,” in Rough Sets in Knowledge Discovery, L. Polkowski and A. Skowron, Eds., pp. 321–365, Physica, Heidelberg, Germany, 1998.
[10]  R. Battiti, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 537–550, 1994.
[11]  N. Kwak and C. H. Choi, “Input feature selection for classification problems,” IEEE Transactions on Neural Networks, vol. 13, no. 1, pp. 143–159, 2002.
[12]  H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005.
[13]  J. Martínez Sotoca and F. Pla, “Supervised feature selection by clustering using conditional mutual information-based distances,” Pattern Recognition, vol. 43, no. 6, pp. 2068–2081, 2010.
[14]  B. Guo, R. I. Damper, S. R. Gunn, and J. D. B. Nelson, “A fast separability-based feature-selection method for high-dimensional remotely sensed image classification,” Pattern Recognition, vol. 41, no. 5, pp. 1670–1679, 2008.
[15]  D. W. Scott, Multivariate Density Estimation: Theory, Practice and Visualization, John Wiley & Sons, New York, NY, USA, 1992.
[16]  B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, London, UK, 1986.
[17]  A. Kraskov, H. St?gbauer, and P. Grassberger, “Estimating mutual information,” Physical Review E, vol. 69, no. 6, Article ID 066138, 16 pages, 2004.
[18]  T. Beaubouef, F. E. Petry, and G. Arora, “Information-theoretic measures of uncertainty for rough sets and rough relational databases,” Information Sciences, vol. 109, no. 1–4, pp. 185–195, 1998.
[19]  I. Düntsch and G. Gediga, “Uncertainty measures of rough set prediction,” Artificial Intelligence, vol. 106, no. 1, pp. 109–137, 1998.
[20]  G. J. Klir and M. J. Wierman, Uncertainty Based Information: Elements of Generalized Information Theory, Physica, New York, NY, USA, 1999.
[21]  J. Liang and Z. Shi, “The information entropy, rough entropy and knowledge granulation in rough set theory,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 12, no. 1, pp. 37–46, 2004.
[22]  C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, pp. 379–423, 1948.
[23]  T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, New York, NY, USA, 1991.
[24]  Z. Pawlak, “Rough sets,” International Journal of Computer and Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.
[25]  X. Hu and N. Cercone, “Learning in relational databases: a rough set approach,” Computational Intelligence, vol. 11, no. 2, pp. 323–338, 1995.
[26]  J. Huang, Y. Cai, and X. Xu, “A hybrid genetic algorithm for feature selection wrapper based on mutual information,” Pattern Recognition Letters, vol. 28, no. 13, pp. 1825–1844, 2007.
[27]  H. Liu and L. Yu, “Toward integrating feature selection algorithms for classification and clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–502, 2005.
[28]  L. Yu and H. Liu, “Efficient feature selection via analysis of relevance and redundancy,” Journal of Machine Learning Research, vol. 5, pp. 1205–1224, 2003/04.
[29]  J. D. M. Rennie, L. Shih, J. Teevan, and D. Karger, “Tackling the poor assumptions of naive bayes text classifiers,” in Proceedings, Twentieth International Conference on Machine Learning, pp. 616–623, Washington, DC, USA, August 2003.
[30]  R. Setiono and H. Liu, “Neural-network feature selector,” IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 654–662, 1997.
[31]  S. Foithong, O. Pinngern, and B. Attachoo, “Feature subset selection wrapper based on mutual information and rough sets,” Expert Systems with Applications, vol. 39, no. 1, pp. 574–584, 2012.
[32]  U. Fayyad and K. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning,” in Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027, Morgan Kaufmann, San Mateo, Calif, USA, 1993.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133