All Title Author
Keywords Abstract


Survey on Spam Filtering Techniques

DOI: 10.4236/cn.2011.33019, PP. 153-160

Keywords: E-mail Spam, Unsolicited Bulk Messages, Filtering, Traditional Methods, Learning-Based Methods, Classification

Full-Text   Cite this paper   Add to My Lib

Abstract:

In the recent years spam became as a big problem of Internet and electronic communication. There developed a lot of techniques to fight them. In this paper the overview of existing e-mail spam filtering methods is given. The classification, evaluation, and comparison of traditional and learning-based methods are provided. Some personal anti-spam products are tested and compared. The statement for new approach in spam filtering technique is considered.

References

[1]  Wikipedia, “Spam”. http://en.wikipedia.org/wiki/Spam_(electronic)
[2]  Wikipedia, “E-mail spam”. http://en.wikipedia.org/wiki/E-mail_spam
[3]  Symantec, “State of Spam and Phishing. A Monthly Report 2010,” 2010. http://symantec.com/content/en/us/enterprise/other_rsources/b-state_of_spam_and_phishing_report_09-2010.en-us.pdf.
[4]  J. P. Denning, “ACM President’s Letter: Electronic Junk,” Communications of the ACM, Vol. 25, No. 3, March 1982, pp. 163-165. doi:10.1145/358453.358454
[5]  M. Sahami, “Learning Limited Dependence Bayesian Classifiers,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, The AAAI Press, Menlo Park, 1996, pp. 334-338.
[6]  M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, “A Bayesian Approach to Filtering Junk Email,” AAAI Technical Report WS-98-05, AAAI Workshop on Learning for Text Categorization, 1998.
[7]  J. R. Hall, “How to Avoid Unwanted Email,” Communications of the ACM, Vol. 41, No. 3, 1998, pp. 88-95. doi:10.1145/272287.272329
[8]  E. Gabber, M. Jakobsson, Y. Matias and A.J. Mayer, “Curbing Junk E-Mail via Secure Classification,” Proceedings of the Second International Conference on Financial Cryptography, Springer-Verlag London, 23-25 March 1998, pp. 198-213.
[9]  R. A. Fisher, “On Some Extensions of Bayesian Inference Proposed by Mr. Lindley,” Journal of the Royal Statistical Society: Series B, Vol. 22, No. 2, 1960, pp. 299-301.
[10]  G. Robinson, “A Statistical Approach to the Spam Problem,” 2003. http://www.linuxjournal.com/article.php?sid=6467 (accessed March 2011).
[11]  P. Boldi, M. Santini and S. Vigna, “PageRank as a Function of the Damping Factor,” Proceedings of the 14th International Conference on World Wide Web, ACM New York, 10-14 May 2005. doi:10.1145/1060745.1060827
[12]  J. Gordillo and E. Conde, “An HMM for Detecting Spam Mail,” Expert Systems with Applications, Vol. 33, No. 3, 2007, pp. 667-682. doi:10.1016/j.eswa.2006.06.016
[13]  L. M. Spracklin and L. V. Saxton, “Filtering Spam Using Kolmogorov Complexity Estimates,” in Russian, 21st International Conference on Advanced Information Networking and Applications Workshops (Ainaw’07), Niagara Falls, 21-23 May 2007, pp. 321-328.
[14]  S. V. Korelov, A. K. Kryukov and L. U. Rotkov, “Text Messages’ Digital Analysis on Spam Identification,” in Russian, Proceedings of Scientific Conference on Radiophysics, Nizhni Novgorod State University, Nizhny Novgorod Oblast, 2006.
[15]  W.-F. Hsiao and T.-M. Chang, “An Incremental Cluster-Based Approach to Spam Filtering,” Expert Systems with Applications, No. 34, No. 3, 2008, pp. 1599-1608. doi:10.1016/j.eswa.2007.01.018
[16]  S. M. Lee, D. S. Kim and J. S. Park, “Spam Detection Using Feature Selection and Parameters Optimization,” IEEE International Conference on Intelligent and Software Intensive Systems, Krakow, 15-18 February 2010, pp. 883-888. doi:10.1109/CISIS.2010.116
[17]  M. F. Saeddian and H. Beigy, “Spam Detection Using Dynamic Weighted Voting Based on Clustering,” Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application, Vol. 2, pp. 122-126. doi:10.1109/IITA.2008.140
[18]  M. Sasaki and H. Shinnou, “Spam Detection Using Text Clustering,” IEEE Proceedings of the 2005 International Conference on Cyberwords, Singapore, 23-25 November 2005, pp. 316-319. doi:10.1109/CW.2005.83
[19]  P. Cortez, C. Lopes, P. Sousa, M. Rocha and M. Rio, “Symbiotic Data Mining for Personalized Spam Filtering,” IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Milan, 15-18 September 2009, pp. 149-156. doi:10.1109/WI-IAT.2009.30
[20]  W. Lauren, “Spam Wars,” Communications of the ACM —Program Compaction, Vol. 46, No. 8, 2003, p. 136.
[21]  G. Pawel and M. Jacek, “Fighting the Spam Wars: A Re-Mailer Approach with Restrictive Aliasing,” ACM Transactions on Internet Technology (TOIT), Vol. 4, No. 1, 2004, pp. 1-30.
[22]  F. Li, H. Mo-Han and G. Pawel, “The Community Behavior of Spammers” 2011. http://web.media.mit.edu/~fulu/ClusteringSpammers.pdf.
[23]  K. S. Xu, M. Kliger, Y. Chen, P. J. Woolf and A. O. Hero, “Revealing Social Networks of Spammers through Spectral Clustering,” IEEE International Conference on Communications, Dresden, 14-18 June 2009, pp. 1-6. doi:10.1109/ICC.2009.5199418
[24]  K. S. Xu, M. Kliger and A. O. Hero, “Tracking Communities of Spammers by Evolutionary Clustering,” 2011. http://www.eecs.umich.edu/~xukevin/xu_spam_icml_2010_sna.pdf.
[25]  Laboratory CSAIL MIT in USA, 2011. http://projects.csail.mit.edu/spamconf/.
[26]  Computer Laboratory Faculty Cambridge University in UK, 2011. http://www.cl.cam.ac.uk/~rnc1/.
[27]  National Center for Scientific Research, “Demokritos,” 2011. http://www.iit.demokritos.gr/.
[28]  D. Mertz, “Spam Filtering Techniques,” 2002. http://www.ibm.com/developerworks/linux/library/l-spamf.html.
[29]  R. Segal, J. Crawford, J. Kephart and B. Leib, “SpamGuru: An Enterprise Anti-Spam Filtering System,” IBM Thomas J. Watson Research Center. http://www.research.ibm.com/people/r/rsegal/papers/spamguru-overview.pdf.
[30]  Microsoft Antispam Technologies. http://www.microsoft.com/mscorp/safety/technologies/antispam/default.mspx.
[31]  Symantec Antispam Protection for E-Mail. http://www.symantec.com/business/premium-antispam.
[32]  Kasperskiy Ant-Spam. http://www.kaspersky.ru/anti-spam.
[33]  Anti-Spam Research Group. http://asrg.sp.am/.
[34]  The Internet Engineering Task Force. http://www.ietf.org/.
[35]  Spam Events. http://spamlinks.net/conf.htm.
[36]  S. A. Nazirova, “Anti-Spam Module for Filtering the Outgoing Correspondence,” in Russian, Transactions of ANAS, Informatics and Control Problems, Vol. XXVIII, No. 3, 2008, pp. 158-162.
[37]  S. A. Nazirova, “New Anti Spam Methods,” Proceedings on the Second International Conference on Problems of Cybernetics and Informatics, Baku, 10-12 September 2008, pp. 89-92.
[38]  Spam URL Realtime Block Lists. http://www.surbl.org/.
[39]  Razor’s homepage. http://razor.sourceforge.net/.
[40]  Pyzor’s homepage. http://sourceforge.net/apps/trac/pyzor/.
[41]  DCC Spam Control Delayed Your E-Mail. http://mail.cc.umanitoba.ca/grey/.
[42]  Symantec Brightmail Anti-Spam. http://www.symantec.com/business/premium-antispam.
[43]  Yandex, “Some Automatic Spam Detection Methods”. http://company.yandex.ru/public/articles/antispam.xml.
[44]  Microsoft Sender ID Framework. http://www.microsoft.com/mscorp/safety/technologies/senderid/default.mspx.
[45]  Sender Policy Framework. http://www.openspf.org/Introduction.
[46]  J. Klensin, “RFC-2821: Simple Mail Transfer Protocol,” April 2001. http://www.rfc-ref.org/RFC-TEXTS/2821/index.html.
[47]  T.-J. Liu, W.-L. Tsao and C.-L. Lee, “A High Performance Image-Spam Filtering System,” Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, 10-12 August 2010, Hong Kong, pp. 445-449. doi:10.1109/DCABES.2010.97
[48]  M. Soranamageswari and C. Meena, “Statistical Feature Extraction for Classification of Image Spam Using Artificial Neural Networks,” Second International Conference on Machine Learning and Computing, Bangalore, 9-11 February, 2010, pp. 101-105. doi:10.1109/ICMLC.2010.72
[49]  Bag of Words Model. http://en.wikipedia.org/wiki/Bag_of_words_model_in_computer_vision.
[50]  K. Li, Z. Zhong and L. Ramaswamy, “Privacy-Aware Collaborative Spam Filtering,” IEEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 5, May 2009, pp. 725-739. doi:10.1109/TPDS.2008.143
[51]  F. Weidong and D. Shoubin, “Addressing Interest Diversity in P2P Based Collaborative Spam Filtering,” Fifth International Conference on Grid and Cooperative Computing Workshops, Hunan, October 2006, pp. 163-169. doi:10.1109/GCCW.2006.16
[52]  J. S. Kong, B. A. Rezaei, N. Sarshar, V. P. Roychowdhury and P. O. Boykin, “Collaborative Spam Filtering Using E-Mail Networks,” IEEE Computer Society on Computer, Vol. 39, No. 8, 2006, pp. 67-73.
[53]  A. Gray and M. Haahr, “Personalised, Collaborative Spam Filtering,” Proceedings of the First Conference on Email and Anti-Spam (CEAS), Mountain View, 30-31 July 2004.
[54]  R. M. Alguliyev and S. H. Nazirova, “Multilayer and Multiagent Automated Email Filtration System,” Telecommunications and Radioengeneering, Vol. 67, No. 12, pp. 1089-1095.
[55]  P. A. Chirita, J. Diederich and W. Nejdl, “MailRank: Using Ranking for Spam Detection,” Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, 31 October-5 November 2005.
[56]  R. Bhuleskar, A. Sherlekar and A. Pandit, “Hybrid Spam E-Mail Filtering,” 2009 First International Conference on Computational Intelligence, Communication Systems and Networks, Indore, 23-25 July 2009, pp. 302-307. doi:10.1109/CICSYN.2009.34
[57]  Google Message Security Postini Services. http://www.google.com/postini/email.html.
[58]  R. M. Alguliyev and S. H. Nazirova, “Architecture of Hierarchical Intellectual Nation-Wide System of Struggle against Spam,” in Russian, Information Technologies, Moscow, No. 8, 2006, pp. 32-36.
[59]  R. M. Alguliyev and S. H. Nazirova, “Mechanism of Formation and Realisation of Anti-Spam Policy,” in Russian, Telecommunications, Moscow, No. 12, 2009, pp. 6-10.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

微信:OALib Journal