|
基于GRU-TextCNN的日志序列异常检测方法
|
Abstract:
系统详细记录着系统的运行情况和事件,因此系统维护人员常常基于日志对系统状态进行分析,判断系统有无出现异常,以此更好地维护系统。由于现代系统日志数据大规模增加,传统的日志异常检测方法已经不适用现代系统日志。本文基于深度学习技术,提出了一种基于GRU-TextCNN的日志异常检测方法。该方法首先通过预处理将日志处理成日志语句,然后利用SBERT模型将日志语句转换成相应的句向量,随后利用滑动窗口提取日志序列,最后利用本文提出的基于GRU-TextCNN的日志序列异常检测模型检测日志序列。通过在两个数据集上的实验结果表明,该方法能够有效检测出日志序列异常。
The system meticulously records its operations and events, allowing system maintenance personnel to frequently analyze its status based on log data. This analysis is crucial for determining any abnormalities and ensuring optimal system maintenance. However, with the immense growth in modern system log data, traditional log anomaly detection methods have become inadequate for contemporary systems. In this paper, we introduce a deep learning-based log anomaly detection method utilizing GRU-TextCNN. This method begins by preprocessing logs into log statements, followed by converting these statements into corresponding sentence vectors using the SBERT model. Next, log sequences are extracted through sliding windows, and the log sequence anomaly detection model, based on GRU-TextCNN, is applied. Experimental results from two datasets demonstrate the effectiveness of this method in detecting log sequence anomalies.
[1] | 张颖君, 刘尚奇, 杨牧, 等. 基于日志的异常检测技术综述[J]. 网络与信息安全学报, 2020, 6(6): 1-12. |
[2] | Xu, W., Huang, L., Fox, A., et al. (2009) Detecting Large-Scale System Problems by Mining Console Logs. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09), Big Sky, 11-14 October 2009, 117-132. https://doi.org/10.1145/1629575.1629587 |
[3] | Lou, J.G., Fu, Q., Yang, S., et al. (2010) Mining Invari-ants from Console Logs for System Problem Detection. Proceedings of 2010 USENIX Annual Technical Conference, Boston, 23-25 June 2010, 1-14. |
[4] | Chen, M., Zheng, A.X., Lloyd, J., et al. (2004) Failure Diagnosis Using Decision Trees. Proceedings of 2004 International Conference on Autonomic Computing, New York, 17-18 May 2004, 36-43.
https://doi.org/10.1109/ICAC.2004.1301345 |
[5] | Liang, Y., Zhang, Y., Xiong, H., et al. (2007) Failure Prediction in IBM BlueGene/L Event Logs. Proceedings of Seventh IEEE International Conference on Data Mining (ICDM), Omaha, 28-31 October 2007, 583-588.
https://doi.org/10.1109/ICDM.2007.46 |
[6] | Lu, S., Wei, X., Li, Y., et al. (2018) Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. Proceedings of 2018 IEEE 16th International Conference on De-pendable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Athens, 12-15 August 2018, 151-158. https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00037 |
[7] | Du, M., Li, F., Zheng, G., et al. (2017) DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer & Communications Security (CCS’17), Dallas, 30 October-3 November 2017, 1285-1298. https://doi.org/10.1145/3133956.3134015 |
[8] | 张林栋, 鲁燃, 刘培玉. 基于双向长短时记忆网络的系统异常检测方法[J]. 计算机应用与软件, 2020, 37(12): 297-303+333. |
[9] | 周建国, 戴华, 杨庚, 等. 基于并列GRU分类模型的日志异常检测方法[J]. 南京理工大学学报, 2022, 46(2): 198-204. |
[10] | He, P.J., Zhu, J.M., He, S.L., et al. (2016) An Evaluation Study on Log Parsing and Its Use in Log Mining. Proceeding of 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, 28 June-1 July 2016, 654-661. https://doi.org/10.1109/DSN.2016.66 |
[11] | Zhu, J., He, S., Liu, J., et al. (2019) Tools and Benchmarks for Automated Log Parsing. Proceeding of 2019 IEEE/ ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, 25-31 May 2019, 121-130. https://doi.org/10.1109/ICSE-SEIP.2019.00021 |
[12] | Le, V.H. and Zhang, H. (2021) Log-Based Anomaly Detec-tion without Log Parsing. Proceedings of 2021 36th IEEE/ ACM International Conference on Automated Software Engi-neering (ASE), Melbourne, 15-19 November 2021, 492-504.
https://doi.org/10.1109/ASE51524.2021.9678773 |
[13] | Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space. (Preprint) |
[14] | Pennington, J., Socher, R. and Manning, C.D. (2014) Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 26-28 October 2014, 1532-1543. https://doi.org/10.3115/v1/D14-1162 |
[15] | Peters, M.E., Neumann, M., Iyyer, M., et al. (2018) Deep Contextual-ized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, 1-6 June 2018, 2227-2237. (Preprint)
https://doi.org/10.18653/v1/N18-1202 |
[16] | Thakur, N., Reimers, N., Daxenberger, J., et al. (2021) Augmented Sbert: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6-11 June 2021, 296-310.
https://doi.org/10.18653/v1/2021.naacl-main.28 |
[17] | Reimers, N. and Gurevych, I. (2019) Sentence-BERT: Sen-tence Embeddings Using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Nat-ural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 3-7 November 2019, 3982-3992. (preprint) https://doi.org/10.18653/v1/D19-1410 |
[18] | Cho, K., Van Merri?nboer, B., Bahdanau, D., et al. (2014) On the Properties of Neural Machine Translation: Encoder—Decoder Approaches. https://arxiv.org/abs/1409.1259 |
[19] | Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 25-29 October 2014, 1746-1751.
https://doi.org/10.3115/v1/D14-1181 |
[20] | Oliner, A. and Stearley, J. (2007) What Supercomputers Say: A Study of Five System Logs. Proceedings of 37th Annual IEEE/IFIP International conference on Dependable Systems and Net-works (DSN’07), Edinburgh, 25-28 June 2007, 575-584. https://doi.org/10.1109/DSN.2007.103 |
[21] | He, S., Zhu, J., He, P., et al. (2020) Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. (Pre-print) |
[22] | 张颖君, 刘尚奇, 杨牧, 等. 基于日志的异常检测技术综述[J]. 网络与信息安全学报, 2020, 6(6): 1-12. |
[23] | Xu, W., Huang, L., Fox, A., et al. (2009) Detecting Large-Scale System Problems by Mining Console Logs. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09), Big Sky, 11-14 October 2009, 117-132. https://doi.org/10.1145/1629575.1629587 |
[24] | Lou, J.G., Fu, Q., Yang, S., et al. (2010) Mining Invari-ants from Console Logs for System Problem Detection. Proceedings of 2010 USENIX Annual Technical Conference, Boston, 23-25 June 2010, 1-14. |
[25] | Chen, M., Zheng, A.X., Lloyd, J., et al. (2004) Failure Diagnosis Using Decision Trees. Proceedings of 2004 International Conference on Autonomic Computing, New York, 17-18 May 2004, 36-43.
https://doi.org/10.1109/ICAC.2004.1301345 |
[26] | Liang, Y., Zhang, Y., Xiong, H., et al. (2007) Failure Prediction in IBM BlueGene/L Event Logs. Proceedings of Seventh IEEE International Conference on Data Mining (ICDM), Omaha, 28-31 October 2007, 583-588.
https://doi.org/10.1109/ICDM.2007.46 |
[27] | Lu, S., Wei, X., Li, Y., et al. (2018) Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. Proceedings of 2018 IEEE 16th International Conference on De-pendable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Athens, 12-15 August 2018, 151-158. https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00037 |
[28] | Du, M., Li, F., Zheng, G., et al. (2017) DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer & Communications Security (CCS’17), Dallas, 30 October-3 November 2017, 1285-1298. https://doi.org/10.1145/3133956.3134015 |
[29] | 张林栋, 鲁燃, 刘培玉. 基于双向长短时记忆网络的系统异常检测方法[J]. 计算机应用与软件, 2020, 37(12): 297-303+333. |
[30] | 周建国, 戴华, 杨庚, 等. 基于并列GRU分类模型的日志异常检测方法[J]. 南京理工大学学报, 2022, 46(2): 198-204. |
[31] | He, P.J., Zhu, J.M., He, S.L., et al. (2016) An Evaluation Study on Log Parsing and Its Use in Log Mining. Proceeding of 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, 28 June-1 July 2016, 654-661. https://doi.org/10.1109/DSN.2016.66 |
[32] | Zhu, J., He, S., Liu, J., et al. (2019) Tools and Benchmarks for Automated Log Parsing. Proceeding of 2019 IEEE/ ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, 25-31 May 2019, 121-130. https://doi.org/10.1109/ICSE-SEIP.2019.00021 |
[33] | Le, V.H. and Zhang, H. (2021) Log-Based Anomaly Detec-tion without Log Parsing. Proceedings of 2021 36th IEEE/ ACM International Conference on Automated Software Engi-neering (ASE), Melbourne, 15-19 November 2021, 492-504.
https://doi.org/10.1109/ASE51524.2021.9678773 |
[34] | Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space. (Preprint) |
[35] | Pennington, J., Socher, R. and Manning, C.D. (2014) Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 26-28 October 2014, 1532-1543. https://doi.org/10.3115/v1/D14-1162 |
[36] | Peters, M.E., Neumann, M., Iyyer, M., et al. (2018) Deep Contextual-ized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, 1-6 June 2018, 2227-2237. (Preprint)
https://doi.org/10.18653/v1/N18-1202 |
[37] | Thakur, N., Reimers, N., Daxenberger, J., et al. (2021) Augmented Sbert: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6-11 June 2021, 296-310.
https://doi.org/10.18653/v1/2021.naacl-main.28 |
[38] | Reimers, N. and Gurevych, I. (2019) Sentence-BERT: Sen-tence Embeddings Using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Nat-ural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 3-7 November 2019, 3982-3992. (preprint) https://doi.org/10.18653/v1/D19-1410 |
[39] | Cho, K., Van Merri?nboer, B., Bahdanau, D., et al. (2014) On the Properties of Neural Machine Translation: Encoder—Decoder Approaches. https://arxiv.org/abs/1409.1259 |
[40] | Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 25-29 October 2014, 1746-1751.
https://doi.org/10.3115/v1/D14-1181 |
[41] | Oliner, A. and Stearley, J. (2007) What Supercomputers Say: A Study of Five System Logs. Proceedings of 37th Annual IEEE/IFIP International conference on Dependable Systems and Net-works (DSN’07), Edinburgh, 25-28 June 2007, 575-584. https://doi.org/10.1109/DSN.2007.103 |
[42] | He, S., Zhu, J., He, P., et al. (2020) Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. (Pre-print) |