This study presents a comparative analysis of machine learning models for threat detection in Internet of Things (IoT) devices using the CICIoT2023 dataset. We evaluate Logistic Regression, K-Nearest Neighbors, and Random Forest algorithms across three classification granularities: binary (benign vs. attack), multi-class (8 categories), and fine-grained (34 subtypes). Our methodology incorporates comprehensive preprocessing including feature engineering, variance thresholding, correlation filtering, and dimensionality reduction. Performance assessment focuses on accuracy, precision, recall, and F1-score, along with model scalability when trained on small datasets and tested on larger ones. Results demonstrate that Random Forest consistently outperforms other models across all classification tasks (binary: F1 = 0.710, 8-class: F1 = 0.629, 34-class: F1 = 0.590). All models show performance degradation as classification granularity increases, with notable challenges in detecting BruteForce and Web attacks. Feature importance analysis reveals protocol-specific characteristics and TCP flag information as crucial for attack identification. Scalability testing indicates significant performance decline when models trained on limited data (0.1%) are applied to larger datasets (0.5%, 1%), though Random Forest demonstrates superior generalization capabilities. An unsupervised autoencoder approach achieves moderate success for anomaly detection (accuracy = 0.881) but struggles with recall (0.070). These findings highlight the trade-off between detection granularity and accuracy in IoT security implementations and suggest hierarchical classification approaches for resource-constrained environments. The study provides valuable guidance for selecting appropriate machine learning techniques for real-world IoT security applications.
References
[1]
Tawalbeh, L.C., Muheidat, R., Tawalbeh, A. and Quwaider, M. (2020) IoT Privacy and Security: Challenges and Solutions. Applied Sciences, 10, Article 4102.
[2]
GSMA Intelligence (2022) IoT Connections Forecast: The Rise of Enterprise. GSMA, Technical Report.
[3]
Mrabet, A.K., Belguith, M., Alhomoud, C. and Emhamed, A.Z. (2020) A Survey of IoT Securi-ty Based on a Layered Architecture of Sensing and Data Analysis. Future Generation Computer Systems, 102, 799-821.
[4]
Roman, R., Zhou, J. and Lopez, J. (2013) On the Features and Challenges of Security and Privacy in Distrib-uted Internet of Things. Computers and Electronics in Agriculture, 15, 287-298.
[5]
Atzori, L., Iera, A. and Morabito, G. (2010) The Internet of Things: A Survey. Computer Networks, 54, 2787-2805. https://doi.org/10.1016/j.comnet.2010.05.010
[6]
Hussain, F., Hussain, R., Hassan, S.A. and Hossain, E. (2020) Machine Learning in Iot Security: Current Solutions and Future Challenges. IEEE Communications Surveys & Tutorials, 22, 1686-1721. https://doi.org/10.1109/comst.2020.2986444
[7]
Antonakakis, M., et al. (2017) Understanding the Mirai Botnet. Proceeding of 26th USENIX Security Symposium, Vancouver, 16-18 August 2017, 1093-1110.
[8]
LeCun, Y., Ben-gio, Y. and Hinton, G. (2015) Deep Learning. Nature, 521, 436-444. https://doi.org/10.1038/nature14539
[9]
Schmidhuber, J. (2015) Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117. https://doi.org/10.1016/j.neunet.2014.09.003
[10]
Meidan, Y., Bohadana, M., Shabtai, A., Guarnizo, J.D., Ochoa, M., Tippenhauer, N.O., et al. (2017) ProfilIoT: A Machine Learning Approach for IoT Device Identifi-cation Based on Network Traffic Analysis. Proceedings of the Symposium on Applied Computing, Marrakech, 3-7 April 2017, 506-509. https://doi.org/10.1145/3019612.3019878
[11]
Canadian Institute for Cybersecurity (2023) CI-CIoT2023 Dataset. https://www.unb.ca/cic/datasets/iotdataset-2023.html
[12]
Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/a:1010933404324
[13]
Cortes, C. and Vapnik, V. (1995) Sup-port-Vector Networks. Machine Learning, 20, 273-297. https://doi.org/10.1007/bf00994018
[14]
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J. and Scholkopf, B. (1998) Support Vector Machines. IEEE Intelligent Systems and their Applications, 13, 18-28. https://doi.org/10.1109/5254.708428
[15]
Widiyasono, A., Fakhrulddin, M. and Kusuma, Y. (2021) IoT De-vice Malware Detection Using Random Forest Algorithm. Proc. Int. Conf. Inf. Technol. Syst., 2021, 234-240.
[16]
Bishop, C. (2006) Pattern Recognition and Machine Learning. Springer.
[17]
Chandrashekar, G. and Sahin, F. (2014) A Survey on Feature Selection Methods. Computers & Electrical Engineering, 40, 16-28. https://doi.org/10.1016/j.compeleceng.2013.11.024
[18]
Hajjouz, S. and Avksentieva, N. (2022) Autoencoder-Based Anomaly Detection for IoT DDoS Attack Identification. Journal of Network Security, 24, 512-525.
[19]
Kumar, R., Singh, S. and Verma, A. (2023) Evaluating Machine Learning Approaches on the CICIoT2023 Dataset: Baseline Performance and Insights. Proceeding of International Conference on Machine Learning for Cybersecurity 2023, 78-92.