|
- 2015
用于木马流量检测的集成分类模型
|
Abstract:
针对传统集成学习方法运用到木马流量检测中存在对训练样本要求较高、分类精度难以提升、泛化能力差等问题,提出了一种木马流量检测集成分类模型。对木马通信和正常通信反映在流量统计特征上的差别进行区分,提取行为统计特征构建训练集。通过引入均值化的方法对旋转森林算法中的主成分变换进行改进,并采用改进后的旋转森林算法对原始训练样本进行旋转处理,选取朴素贝叶斯、C4.5决策树和支持向量机3种差异性较大的分类算法构建基分类器,采用基于实例动态选择的加权投票策略实现集成并产生木马流量检测规则。实验结果表明:该模型充分利用了不同训练集之间的差异性以及异构分类器之间的互补性,在误报率不超过4.21%时检测率达到了96.30%,提高了木马流量检测的准确度和泛化能力。
An ensemble classification model for Trojan traffic detection is proposed to solve the problem that traditional ensemble learning methods overly depend on training samples, have low classification precision and poor generalization ability when they are applied to the detection of Trojan traffic. Traffic statistics features between Trojan communication and normal communication are distinguished and then extracted to build training sets. Equalization method is introduced to improve the principal component transform of rotation forest algorithm, and the updated rotation forest algorithm is used to rotate original training samples. Then, base classifiers are constructed by using three classification algorithms: Naive Bayes, C4.5 decision tree and Support Vector Machine. Integration is realized and the Trojan traffic detection rules are eventually established by using a weighted voting strategy based on the dynamic choice of instance. Experimental results show that the model makes full use of the diversity of different training sets and the complementarity of heterogeneous classifiers, and that a 96.30% detection rate is reached while the false positive rate is not higher than 4.21%, that is, both the accuracy and the generalization ability of Trojan traffic detection are improved
[1] | [7]KOTSIANTIS S. Combining bagging, boosting, rotation forest and random subspace methods [J]. Artificial Intelligence Review, 2011, 35(3): 223??240. |
[2] | [8]KROGH A, VEDELSBY J. Neural network ensembles, cross validation, and active learning [C]∥Proceedings of the Annual Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 1995: 231??238. |
[3] | [9]B?aHLMANN P. Handbook of Computational Statistics [M]. Berlin, Germany: Springer, 2012: 985??1022. |
[4] | [10]GARCIA N. Supervised projection approach for boosting classifiers [J]. Pattern Recognition, 2009, 42(9): 1742??1760. |
[5] | [1]国家互联网应急中心. 网络安全信息与动态周报 [EB/OL]. (2014??11??06) [2014??11??18]. http:∥www.cert.org.cn/publish/main/uplood/File/2014CNCERT 44.pdf. |
[6] | XU Pan, LIU Shengli, LAN Jinghong, et al. Trojan detection method based on analysis of multiple data flow [J]. Application Research of Computers, 2015, 32(3): 890??894. |
[7] | [6]李世淙, 云晓春, 张永铮. 一种基于分层聚类方法的木马通信行为检测模型 [J]. 计算机研究与发展, 2012, 49(S2): 9??16. |
[8] | [16]VAPNIK V. The nature of statistical learning theory [M]. Berlin, Germany: Springer Verlag, 2000: 9??11. |
[9] | [2]国家互联网应急中心. 2013年我国互联网网络安全态势综述 [EB/OL]. (2014??03??28) [2014??11??20]. http:∥www.cert.org.cn/publish/main/upload/File/CNC ERT%202013.pdf. |
[10] | [3]ROESCH M. Snort??lightweight intrusion detection for networks [C]∥Proceedings of the LISA 13th Systems Administration Conference. Berkeley, CA, USA: USENIX, 1999: 229??238. |
[11] | [4]TEGELER F, FU X, VIGNA G, et al. Botfinder: finding bots in network traffic without deep packet inspection [C]∥Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies. New York, USA: ACM, 2012: 349??360. |
[12] | [5]胥攀, 刘胜利, 兰景宏, 等. 基于多数据流分析的木马检测方法 [J]. 计算机应用研究, 2015, 32(3): 890??894. |
[13] | LI Shicong, YUN Xiaochun, ZHANG Yongzheng. A model of Trojan communication behavior detection based on hierarchical clustering technique [J]. Journal of Computer Research and Development, 2012, 49(S2): 9??16. |
[14] | [11]ZHANG Chunxia, ZHANG Jiangshe. Rotboost: a technique for combining rotation forest and adaBoost [J]. Pattern Recognition Letters, 2008, 29(10): 1524??1536. |
[15] | [12]RODRIGUEZ J J, KUNCHEVA L I, ALONSO C J. Rotation forest: a new classifier ensemble method [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619??1630. |
[16] | [13]高艳, 于飞. 一种用于综合评价的主成分分析改进方法 [J]. 西安文理学院学报: 自然科学版, 2011, 14(1): 105??108. |
[17] | GAO Yan, YU Fei. A modified principal component analysis algorithm for comprehensive evaluation [J]. Journal of Xi’an University of Arts and Science: Natural Science, 2011, 14(1): 105??108. |
[18] | [14]HAN J, KAMBER M. 数据挖掘概念与技术 [M]. 范明, 孟小峰, 译. 2版. 北京: 机械工业出版社, 2007: 185??188. |
[19] | [15]CALLADO A, KELNER J, SADOK D, et al. Better network traffic identification through the independent combination of techniques [J]. Journal of Network and Computer Applications, 2010, 33(4): 433??446. |