Malware detection done at the network infrastructure level is still an open research problem ,considering the evolution of malwares and high detection accuracy needed to detect these threats. Content based classification techniques have been proven capable of detecting malware without matching for malware signatures. However, the performance of the classification techniques depends on observed training samples. In this paper, a new detection method that incorporates Snort malware signatures into Naive Bayes model training is proposed. Through experimental work, we prove that the proposed work results in low features search space for effective detection at the packet level. This paper also demonstrates the viability of detecting malware at the stateless level (using packets) as well as at the stateful level (using TCP byte stream). The result shows that it is feasible to detect malware at the stateless level with similar accuracy to the stateful level, thus requiring minimal resource for implementation on middleboxes. Stateless detection can give a better protection to end users by detecting malware on middleboxes without having to reconstruct stateful sessions and before malwares reach the end users. 1. Introduction Content based malware detection can be done using antivirus solution at the user’s end station. This requires the codes (as packets payloads) to be fully constructed into files at the end station for malware detection. Even if partial codes could be detected by the antivirus, the codes have already reached the end station. Detection at this level has its limitations since complete observability; that is, reassembly and stateful detection on the Internet byte streams are required [1]. As network speed increases, reassembly inside network nodes, even on network boundaries, requires increasing computational resources in terms of computation overhead [2]. Therefore, stateless detection is a better alternative to detect malware whilst the codes are in transit between the source (router or gateway) and destination to relax stateful restrictions such as packets buffering and reassembling. This provides early detection and the possibility of not having to construct the whole flow for malware detection. The use of intrusion detection system (IDS) [3] and intrusion prevention system (IPS) [4] is a popular retrofit strategy to complement the limitations of malware detection at end stations. However, the evolution of today’s modern malware makes these signature based methods ineffective in detecting fast spreading sophisticated malware (e.g.,
References
[1]
G. Varghese, J. A. Fingerhut, and F. Bonomi, “Detecting evasion attacks at high speeds without reassembly,” in Proceedings of the SIGCOMM Conference, pp. 327–338, Pisa, Italy, 2006.
[2]
E. P. Markatos, “Speeding up TCP/IP: faster processors are not enough,” in Proceedings of the 21st IEEE International Performance, Computing, and Communications Conference (IPCCC '02), pp. 341–345, Phoenix, Ariz, USA, April 2002.
[3]
P. Inella, An Introduction to Intrusion IDS, 2001, http://www.securityfocus.com/.
[4]
N. Desai, Intrusion Prevention Systems: the Next Step in the Evolution of IDS, 2003, http://www.securityfocus.com/.
[5]
J. Zico Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,” Journal of Machine Learning Research, vol. 7, pp. 2721–2744, 2006.
[6]
R. Moskovitch, D. Stopel, C. Feher, N. Nissim, and Y. Elovici, “Unknown malcode detection via text categorization and the imbalance problem,” in Proceedings of the IEEE International Conference on Intelligence and Security Informatics, pp. 156–161, Taiwan, June 2008.
[7]
M. Roesch, Snort, 2001, http://www.snort.org/.
[8]
T. H. Ptacek and T. N. Newsham, “Insertion, evasion, and denial of service: eluding network intrusion detection,” Tech. Rep. T2R-0Y6, Calgary, Canada, 1998.
[9]
M. Z. Shafiq, S. A. Khayam, and M. Farooq, “Improving accuracy of immune-inspired malware detectors by using intelligent features,” in Proceedings of the 10th Annual Genetic and Evolutionary Computation Conference (GECCO '08), pp. 119–126, Atlanta, Ga, usa, July 2008.
[10]
C. Sarkar, Connection Establishment in TCP Three Way Handshaking, M. Tech—I, CSE IIT Bombay, 2009.
[11]
T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, “Detection of new malicious code using N-grams signatures,” in Proceedings of the 2nd Annual Conference on Privacy, Security and Trust, pp. 193–196, Fredericton, NB, Canada.
[12]
Y. Yang and J. A. Pedersen, “Comparative study on feature selection in text categorization,” in Proceedings of the 14th International Conference on Machine Learning, pp. 412–420.
[13]
I. Ismail, M. N. Marsono, and S. M. Nor, “Detecting worms using data mining techniques : learning in the presence of class noise,” in Proceedings of the 6th International Conference on Signal Image Technology and Internet Based Systems (SITIS '10), pp. 187–194, Kuala Lumpur, Malaysia, December 2010.
[14]
A. McCalum and K. A. Nigam, “Comparison of event models for naive bayes text classification,” in Proceedings of the 15th National Conference on Artificial Intelligence (AAAI '98), pp. 41–48, Madison, Wis, USA, 1998.
[15]
L. M. Garcia, Tcpdump and Libpcap, 2010, http://www.tcpdump.org/.
[16]
L. Zeltser, “Understanding Anti-Virus Software,” The Monthly Security Awareness Newsletter for Computer Users, The SANS Institute, 2011.
[17]
P. Simonea, “The OSI Model: understanding the seven layers of computer networks,” Expert Reference Series of White Papers, Global Knowledge, 2006.
[18]
C. Fosnock, “Computer worms: past, present and future,” CISSP, MCSE, CNE East Carolina University, 2005.