Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we employ deep neural networks like RNN, LSTM, and GRU, incorporating attention mechanisms such as Bahdanua, scaled dot product (SDP), and Luong scaled dot product self-attention for spam email filtering. We evaluate our approach on various datasets, including Trec spam, Enron spam emails, SMS spam collections, and the Ling spam dataset, which constitutes a substantial custom dataset. All these datasets are publicly available. For the Enron dataset, we attain an accuracy of 99.97% using LSTM with SDP self-attention. Our custom dataset exhibits the highest accuracy of 99.01% when employing GRU with SDP self-attention. The SMS spam collection dataset yields a peak accuracy of 99.61% with LSTM and SDP attention. Using the GRU (Gated Recurrent Unit) alongside Luong and SDP (Structured Self-Attention) attention mechanisms, the peak accuracy of 99.89% in the Ling spam dataset. For the Trec spam dataset, the most accurate results are achieved using Luong attention LSTM, with an accuracy rate of 99.01%. Our performance analyses consistently indicate that employing the scaled dot product attention mechanism in conjunction with gated recurrent neural networks (GRU) delivers the most effective results. In summary, our research underscores the efficacy of employing advanced deep learning techniques and attention mechanisms for spam email filtering, with remarkable accuracy across multiple datasets. This approach presents a promising solution to the ever-growing problem of spam emails.
References
[1]
Islam, M.K., Al Amin, M., Islam, M.R., Ibna Mahbub, M.N., Hossain Showrov, M.I. and Kaushal, C. (2021) Spam-Detection with Comparative Analysis and Spamming Words Extractions. 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, 3-4 September 2021, 1-9. https://doi.org/10.1109/ICRITO51393.2021.9596218
[2]
Jáñez-Martino, F., Alaiz-Rodríguez, R., González-Castro, V., Fidalgo, E. and Alegre, E. (2023) A Review of Spam Email Detection: Analysis of Spammer Strategies and the Dataset Shift Problem. Artificial Intelligence Review, 56, 1145-1173. https://doi.org/10.1007/s10462-022-10195-4
[3]
Farhana, K., Rahman, M. and Ahmed, M.T. (2020) An Intrusion Detection System for Packet and Flow-Based Networks Using Deep Neural Network Approach. International Journal of Electrical & Computer Engineering, 10, 5514-5525. https://doi.org/10.11591/ijece.v10i5.pp5514-5525
[4]
Kuchipudi, B., Nannapaneni, R.T. and Liao, Q. (2020) Adversarial Machine Learning for Spam Filters. Proceedings of the 15th International Conference on Availability, Reliability and Security, 25-28 August 2020, 1-6. https://doi.org/10.1145/3407023.3407079
[5]
Liu, X.X., Lu, H.Y. and Nayak, A. (2021) A Spam Transformer Model for SMS Spam Detection. IEEE Access, 9, 80253-80263. https://doi.org/10.1109/ACCESS.2021.3081479
[6]
Shen, H., Liu, X.Y. and Zhang, X.C. (2022) Boosting Social Spam Detection via Attention Mechanisms on Twitter. Electronics, 11, Article No. 1129. https://doi.org/10.3390/electronics11071129
[7]
Fang, Y., Zhang, C., Huang, C., Liu, L. and Yang, Y. (2019) Phishing Email Detection Using Improved RCNN Model with Multilevel Vectors and Attention Mechanism. IEEE Access, 7, 56329-56340. https://doi.org/10.1109/ACCESS.2019.2913705
[8]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017) Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017.
[9]
Yang, Z.C., Yang, D.Y., Dyer, C., He, X.D., Smola, A. and Hovy, E. (2016) Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, June 2016, 1480-1489. https://doi.org/10.18653/v1/N16-1174
[10]
Soni, A.N. (2019). Spam E-Mail Detection Using Advanced Deep Convolution Neural Network Algorithms. Journal for Innovative Development in Pharmaceutical and Technical Science, 2, 74-80.
[11]
Luong, M.T., Pham, H. and Manning, C.D. (2015) Effective Approaches to Attention-Based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, September 2015, 1412-1421. https://doi.org/10.18653/v1/D15-1166
[12]
Vinitha, V.S., Renuka, D.K. and Kumar, L.A. (2023) Long Short-Term Memory Networks for Email Spam Classification. 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS), Coimbatore, 9-11 February 2023, 176-180. https://doi.org/10.1109/ICISCoIS56541.2023.10100445
[13]
Mani, S., Gunasekaran, G. and Geetha, S. (2023) Email Spam Detection Using Gated Recurrent Neural Network. International Journal of Prograssive Research in Engineering Management and Science, 3, 90-99.
[14]
Urmi, A.S., Ahmed, M.T., Rahman, M. and Islam, A.T. (2022) A Proposal of Systematic SMS Spam Detection Model Using Supervised Machine Learning Classifiers. In: Bansal, J.C., Engelbrecht, A. and Shukla, P.K., Eds., Computer Vision and Robotics, Springer, Singapore, 459-471. https://doi.org/10.1007/978-981-16-8225-4_35
[15]
Vinoth, N.A.S. and Rajesh, A. (2023) An Improvised Email Spam Detection Using FSSDL-ESDC Model. International Journal of Intelligent Systems and Applications in Engineering, 11, 618-626.
[16]
Sheneamer, A. (2021) Comparison of Deep and Traditional Learning Methods for Email Spam Filtering. International Journal of Advanced Computer Science and Applications, 12, 560-565. https://doi.org/10.14569/IJACSA.2021.0120164
[17]
Zavrak, S. and Yilmaz, S. (2023) Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method. Expert Systems with Applications, 233, Article ID: 120977. https://doi.org/10.1016/j.eswa.2023.120977
[18]
Ali, N., Fatima, A., Shahzadi, H., Ullah, A. and Polat, K. (2021) Feature Extraction Aligned Email Classification Based on Imperative Sentence Selection through Deep Learning. Journal of Artificial Intelligence and Systems, 3, 93-114. https://doi.org/10.33969/AIS.2021.31007
[19]
Zavvar, M., Rezaei, M. and Garavand, S. (2016) Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine. International Journal of Modern Education and Computer Science, 8, 68-74. https://doi.org/10.5815/ijmecs.2016.07.08
[20]
Dey, R. and Salem, F.M. (2017) Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks. 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, 6-9 August 2017, 1597-1600. https://doi.org/10.1109/MWSCAS.2017.8053243
[21]
Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K. and Bengio, Y. (2015) Attention-Based Models for Speech Recognition. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 7-12 December 2015, 577-585.
[22]
Yang, H., Liu, Q.H., Zhou, S.J. and Luo, Y. (2019) A Spam Filtering Method Based on Multi-Modal Fusion. Applied Sciences, 9, Article 1152. https://doi.org/10.3390/app9061152
[23]
Rahman, M., Nur, S., Ahmed, M.T., Das, D. and Islam, A.T. (2022) A Feature Engineering Approach for Detecting Cyberbullying in Bangla Text Using Machine Learning. 2022 International Conference on Recent Progresses in Science, Engineering and Technology (ICRPSET), Rajshahi, 26-27 December 2022, 1-5. https://doi.org/10.1109/ICRPSET57982.2022.10188573