|
基于对称不确定性网络流量特征筛选方法
|
Abstract:
网络流量特征越来越多,导致相关深度学习模型的训练耗时增加,且对这些特征的选择仍未达成共识。为解决过滤排序式方法中特征子集维数不好确定、包裹式方法中计算耗时过大的问题,本文提出将特征筛选效果良好的基于对称不确定性的过滤排序式方法扩展至过滤式方法中,并设计目标函数兼顾特征与标签之间的相关性和特征之间的冗余度。从初始化和搜索策略两个方面改进用于寻优目标函数的灰狼优化算法,以提高网络流量特征筛选精度。实验结果表明,该方法在保证异常流量检测精度的前提下,取得了最好的特征筛选效果。
More and more network flow features lead to an increase in the training time of related deep learning models. There is still no consensus on the selection of these features. In order to solve the problem that the dimension of the feature subset of the filter-based feature ranking method is not easy to determine, and the calculation time of the wrapper-based feature subset selection method is too large, we propose to extend the filter-based feature ranking method based on symmetric uncertainty, which has a good feature filtering effect, to the filter-based feature subset selection method. A new objective function is designed to take into account the correlation between features and labels and the redundancy between features. The grey wolf optimizer for optimizing the objective function is improved from two aspects of initialization and search strategy to improve the accuracy of network flow feature filtering. The experimental results show that the method proposed in this paper achieves the best feature filtering effect on the premise of ensuring the detection accuracy of abnormal traffic.
[1] | 李杰铃, 张浩. 半监督异常流量检测研究综述[J]. 小型微型计算机系统, 2020, 41(11): 2371-2379. |
[2] | 李翼宏, 杜镇宇, 胡劲松. APT样本的有效网络特征筛选算法[J]. 计算机工程与应用, 2019, 55(3): 83-89. |
[3] | Baig, Z.A., Sait, S.M. and Shaheen, A.R. (2013) GMDH-Based Networks for Intelligent Intrusion Detection. Engineering Applica-tions of Artificial Intelligence, 26, 1731-1740. https://doi.org/10.1016/j.engappai.2013.03.008 |
[4] | Zhao, K., Xu, Z., Yan, M., Zhang, T., Yang, D. and Li, W. (2021) A Comprehensive Investigation of the Impact of Feature Selection Techniques on Crashing Fault Residence Prediction Models. Information and Software Technology, 139, Article ID: 106652. https://doi.org/10.1016/j.infsof.2021.106652 |
[5] | 宋智超, 康健, 孙广路, 何勇军. 特征选择方法中三种度量的比较研究[J]. 哈尔滨理工大学学报, 2018, 23(1): 111-116. |
[6] | Dai, J.H., Chen, J.L., Liu, Y. and Hu, H. (2020) Novel Multi-Label Feature Selection via Label Symmetric Uncertainty Correlation Learning and Feature Redun-dancy Evaluation. Knowledge-Based Systems, 207, Article ID: 106342.
https://doi.org/10.1016/j.knosys.2020.106342 |
[7] | Lin, H., Wang, C.D. and Hao, Q.B. (2023) A Novel Personality Detection Method Based on High-Dimensional Psycholinguistic Features and Improved Distributed Gray Wolf Optimiz-er for Feature Selection. Information Processing & Management, 60, Article ID: 103217. https://doi.org/10.1016/j.ipm.2022.103217 |
[8] | 李占山, 吕艾娜. 基于新冗余度的特征选择方法[J]. 东北大学学报(自然科学版), 2020, 41(11): 1550-1556. |
[9] | Wang, C.D., Yao, H.Y. and Liu, Z.L. (2019) An Efficient DDoS Detection Based on SU-Genetic Feature Selection. Cluster Computing, 22, 2505-2515. https://doi.org/10.1007/s10586-018-2275-z |
[10] | Mirjalili, S., Mirjalili, S.M. and Lewis, A. (2014) Grey Wolf Op-timizer. Advances in Engineering Software, 69, 46-61.
https://doi.org/10.1016/j.advengsoft.2013.12.007 |
[11] | Khokhar, B., Dahiy, S. and Singh Parmar, K.P.(2021) Load Frequency Control of a Microgrid Employing a 2D Sine Logistic Map Based Chaotic Sine Cosine Algorithm. Applied Soft Computing, 109, Article ID: 107564.
https://doi.org/10.1016/j.asoc.2021.107564 |
[12] | 张娜, 赵泽丹, 包晓安, 等. 基于改进的Tent混沌万有引力搜索算法[J]. 控制与决策, 2020, 35(4): 893-900. |
[13] | 任家东, 张亚飞, 张炳, 李尚洋. 基于特征选择的工业互联网入侵检测分类方法[J]. 计算机研究与发展, 2022, 59(5): 1148-1159. |
[14] | 李锐光, 段鹏宇, 沈蒙, 祝烈煌. 基于随机森林的物联网设备流量分类算法[J]. 北京航空航天大学学报, 2022, 48(2): 233-239. |
[15] | Adhikary, K., Bhu-shan, S., Kumar, S. and Dutta, K. (2022) Evaluating the Performance of Various SVM Kernel Functions Based on Basic Features Extracted from KDDCUP’99 Dataset by Random Forest Method for Detecting DDoS Attacks. Wireless Per-sonal Communications, 123, 3127-3145. https://doi.org/10.1007/s11277-021-09280-8 |
[16] | 严益鑫, 邹春明. 工业控制系统IDS技术研究综述[J]. 网络空间安全, 2019, 10(2): 62-69. |
[17] | Wolpert, D.H. and Macready, W.G. (1997) No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation, 1, 67-82. https://doi.org/10.1109/4235.585893 |
[18] | Mirjalili, S., Mirjalili, S.M. and Hatamlou, A. (2016) Multi-Verse Opti-mizer: A Nature-Inspired Algorithm for Global Optimization. Neural Computing and Applications, 27, 495-513. https://doi.org/10.1007/s00521-015-1870-7 |
[19] | Mirjalili, S., Gandomi, A.H., Mirjalili, S.Z., et al. (2017) Salp Swarm Algorithm: A Bio-Inspired Optimizer for Engineering Design Problems. Advances in Engineering Software, 114, 163-191.
https://doi.org/10.1016/j.advengsoft.2017.07.002 |
[20] | Ambusaidi, M.A, He, X., Nanda, P. and Tan, Z. (2016) Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm. IEEE Transactions on Com-puters, 65, 2986-2998.
https://doi.org/10.1109/TC.2016.2519914 |
[21] | Chen, S, Huang, Z., Zuo, Z. and Guo, X. (2016) A Feature Selec-tion Method for Anomaly Detection Based on Improved Genetic Algorithm. Proceedings of the 2016 4th International Conference on Mechanical Materials and Manufacturing Engineering, Atlantis Press, Amsterdam, 186-189. https://doi.org/10.2991/mmme-16.2016.41 |