全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于最大化联合互信息和最小化联合熵的特征选择
Feature Selection Based on Maximizing Joint Mutual Information and Minimizing Joint Entropy

DOI: 10.12677/AAM.2023.124149, PP. 1451-1460

Keywords: 信息熵,互信息,联合互信息,联合熵,特征选择,降维
Information Entropy
, Mutual Information, Joint Mutual Information, Joint Entropy, Feature Selection, Dimension Reduction

Full-Text   Cite this paper   Add to My Lib

Abstract:

随着大数据时代的到来,数据越来越容易获得,同时获得的数据的维度也越来越高。高维的数据能够详细记录事物的属性,但维度越高,冗余数据也就越多。从数据中剔除冗余特征就显得很重要。基于互信息(MI)的特征选择方法能够有效地降低数据维数和提高分类精度。但现有的方法在特征选择的过程中评判特征的标准单一,无法有效排除冗余特征。本文为此提出了一种基于最大联合互信息和最小联合熵的特征选择方法(JMIMJE)。JMIMJE在特征选择时考虑了整体联合互信息和联合熵两种因素,联合互信息衡量特征子集整体的与分类的相关性,联合熵衡量特征子集的稳定性。JMIMJE在筛选特征时对特征子集的相关性和稳定性进行了平衡。在预测精度方面,JMIMJE比mRMR (最小冗余度最大相关性)提高了2个百分点;与联合互信息(JMI)相比提高了1个百分点。
With the advent of the data era, data is getting easier and easier to obtain, and the dimensions of the data are getting higher and higher. Higher-dimensional data can record the attributes of things in detail, but the higher the dimension, the more redundant data. It is important to remove redun-dant features from the data. The feature selection method based on mutual information (MI) is not good at reducing the data dimension and improving the classification accuracy. In the process of feature selection, the existing methods have a single feature evaluation criterion and can not effec-tively eliminate redundant features. A feature selection method (JMIMJE) based on maximum joint mutual information and minimum joint entropy is proposed. JMIMJE considers two factors, global joint mutual information and joint entropy, in feature selection. Combined mutual information to measure the correlation between the whole feature subset and the classification, combined entropy to measure the stability of the feature subset. JMIMJE balances the correlation and stability of fea-ture subsets during feature screening. In terms of prediction accuracy, JMIMJE is 2 percentage points higher than mRMR (minimum redundancy maximum correlation). Compared with Joint Mu-tual Information (JMI), an increase of 1 percentage point.

References

[1]  Gandhi, S.S. and Prabhune, S.S. (2017) Overview of Feature Subset Selection Algorithm for High Dimensional Data. ICISC 2017: Proceedings of the 2017 IEEE International Conference on Inventive Systems and Control, Coimbatore, 19-20 January 2017, 1-6.
https://doi.org/10.1109/ICISC.2017.8068599
[2]  Fleuret, F. (2004) Fast Binary Feature Selection with Conditional Mutual Information. Journal of Machine Learning Research, 5, 1531-1555.
[3]  董泽民, 石强. 基于归一化模糊联合互信息最大的特征选择[J]. 计算机工程与应用, 2017, 53(22): 105-110.
[4]  黄志艳. 一种基于信息增益的特征选择方法[J]. 山东农业大学学报(自然科学版), 2013, 44(2): 252-256.
[5]  Liu, C., Wang, W., Zhao, Q., et al. (2017) A New Feature Selection Method Based on a Validity Index of Feature Subset. Pattern Recognition Letters, 92, 1-8.
https://doi.org/10.1016/j.patrec.2017.03.018
[6]  Battiti, R. (1994) Using Mutual In-formation for Selecting Features in Supervised Neural Net Learning. IEEE Transactions on Neural Networks, 5, 537-550.
https://doi.org/10.1109/72.298224
[7]  Hoque, N., Bhattacharyya, D.K. and Kalita, J.K. (2014) MIFS-ND: A Mu-tual Information-Based Feature Selection Method. Expert Systems with Applications, 41, 6371-6385.
https://doi.org/10.1016/j.eswa.2014.04.019
[8]  Cho, D. and Lee, B. (2017) Optimized Automatic Sleep Stage Classification Using the Normalized Mutual Information Feature Selection (NMIFS) Method. Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Jeju, 11-15 July 2017, 3094-3097.
https://doi.org/10.1109/EMBC.2017.8037511
[9]  Peng, H., Long, F. and Ding, C. (2005) Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226-1238.
https://doi.org/10.1109/TPAMI.2005.159
[10]  Bennasarm, H.Y. and Setchi, R. (2015) Feature Selection Using Joint Mutual Information Maximisation. Expert Systems with Applications, 42, 8520-8532.
https://doi.org/10.1016/j.eswa.2015.07.007
[11]  Amaratunga, D. and Cabrera, J. (2016) High-Dimensional Data. Journal of the National Science Foundation of Sri Lanka, 44, 3-9.
https://doi.org/10.4038/jnsfsr.v44i1.7976

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133