全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

一种基于聚类的系统日志解析算法
A System Log Parsing Algorithm Based on Cluster Algorithm

DOI: 10.12677/CSA.2020.101011, PP. 98-111

Keywords: 系统日志,日志解析,聚类,异常检测
System Log
, Log Parsing, Anomaly Detection

Full-Text   Cite this paper   Add to My Lib

Abstract:

系统日志是软件系统中检查系统状态的重要来源,系统日志中包含的运行时状态报告以及错误信息被广泛地用于系统运维中。随着现阶段软件系统变得日益庞大和复杂,大型软件系统通常会使用日志分析挖掘技术来自动地从系统日志中发掘系统关键信息,日志数据被用于异常检测、根因分析、行为分析等等应用中。日志数据通常是无结构化的文本数据,在使用数据挖掘算法对日志数据进行训练之前,需要使用日志解析算法对原始日志数据进行结构化处理,本文根据日志数据分析挖掘技术的特点和需求,提出一种基于聚类算法的日志解析算法,可以从原始日志数据中提取消息模板和事件序列,同时提出了一种固定深度的树结构模型对消息模板进行存储以实现新日志消息的快速结构化和异常日志消息的检测。通过在特定系统日志数据上的实验证明本文的日志解析算法具有较高的准确性和通用性。
System log is an important source of checking system status in software system. The runtime status report and error information contained in system log are widely used in system operation and maintenance. With the current software system becoming increasingly large and complex, large-scale software systems usually use log analysis mining technology to automatically mine key system information from the system log. Log data is used in anomaly detection, root analysis, behavior analysis and other applications. Log data is usually unstructured text data. Before using data mining algorithm to train the log data, we need to use log parsing algorithm to process the original log data structurally. According to the characteristics and requirements of log data analysis and mining technology, this paper proposes a log parsing algorithm based on clustering algorithm, which can analyze the original log data from the number of original logs. A fixed depth tree structure model is also proposed to store the message template to realize the fast structure of new log messages and the detection of abnormal log messages. Through the experiments on the specific system log data, it is proved that the log parsing algorithm in this paper has high accuracy and generality.

References

[1]  Yuan, D., Park, S. and Zhou, Y.Y. (2012) Characterizing Logging Practices in Open-Source Software. 34th Internation-al Conference on Software Engineering, Zurich, 2-9 June 2012, 1-11.
https://doi.org/10.1109/ICSE.2012.6227202
[2]  Oliner, A. and Stearley, J. (2007) What Supercomputers Say: A Study of Five System Logs. 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Edinburgh, 25-28 June 2007, 575-584.
[3]  Fu, Q., Lou, J., Wang, Y. and Li, J. (2009) Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. Proceedings of International Conference on Data Mining, Miami, 6-9 December 2009, 149-158.
https://doi.org/10.1109/ICDM.2009.60
[4]  He, S.L., Zhu, J.M., He, P.J. and Lyu, M.R. (2016) Experience Report: System Log Analysis for Anomaly Detection. 27th International Symposium on Software Reliability Engineering, Ottawa, 23-27 October 2016, 207-218.
[5]  Xu, W., Huang, L., Fox, A., Patterson, D. and Jordon, M. (2009) Detecting Large Scale System Problems by Mining Console Logs. Proceedings of the ACM Symposium on Operating Systems Principles, Haifa, October 2009, 117-132.
https://doi.org/10.1145/1629575.1629587
[6]  Mi, H., Wang, H., Zhou, Y., Lyu, R. and Cai, H. (2013) Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems. IEEE Trans-actions on Parallel and Distributed Systems, 24, 1245-1255.
https://doi.org/10.1109/TPDS.2013.21
[7]  Zou, D.-Q., Qin, H. and Jin, H. (2016) UiLog: Improving Log-Based Fault Diagnosis by Log Analysis. Journal of Computer Science and Technology, 31, 1038-1052.
https://doi.org/10.1007/s11390-016-1678-7
[8]  Nagaraj, K., Killian, C. and Neville, J. (2012) Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems. Proceed-ings of the 9th USENIX Conference on Networked Systems Design and Implementation, San Jose, 25-27 April 2012, 1-14.
[9]  Lin, Q.W., Zhang, H.Y., Lou, J.-G., Zhang, Y. and Chen, X.W. (2016) Log Clustering Based Problem Iden-tification for Online Service Systems. IEEE/ACM 38th IEEE International Conference on Software Engineering Com-panion, Austin, 14-22 May 2016, 102-111.
[10]  Mi, H.B., Wang, H.M., Zhou, Y.F., Lyu, M.R.-T. and Cai, H. (2013) Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems. IEEE Transactions on Parallel and Distributed Systems, 24, 1245-1255.
https://doi.org/10.1109/TPDS.2013.21
[11]  Du, M. and Li, F.F. (2019) Spell: Online Streaming Parsing of Large Unstructured System Logs. IEEE Transactions on Knowledge and Data Engineering, 31, 2213-2227.
https://doi.org/10.1109/TKDE.2018.2875442
[12]  Debnath, B., Solaimani, M., Gulzar, M.A., Arora, N., Lumezanu, C., Xu, J.W., Zong, B., Zhang, H., Jiang, G.F. and Khan, L. (2018) LogLens: A Real-Time Log Analysis System. IEEE 38th International Conference on Distributed Computing Systems, Vienna, 2-6 July 2018, 1052-1062.
https://doi.org/10.1109/ICDCS.2018.00105
[13]  Lou, J.-G., Fu, Q., Yang, S.Q., Xu, Y. and Li, J. (2010) Mining Invariants from Console Logs for System Problem Detection. Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, Boston, 23-25 June 2010, 1-14.
[14]  Lang, D. (2013) Using SEC. USENIX; Login: Magazine, 38, 38-43.
[15]  Stearley, J. (2004) Towards Informatic Analysis of Syslogs. IEEE International Conference on Cluster Computing, San Diego, 20-23 September 2004, 1-10.
[16]  Vaarandi, R. (2003) A Data Clustering Algorithm for Mining Patterns from Event Logs. IP Operations and Management, Kansas City, 3 October 2003, 119-126.
[17]  Makanju, A., Zincir-Heywood, A. and Milios, E. (2009) Clustering Event Logs Lusing Iterative Partitioning. Proceedings of International Conference on Knowledge Discovery and Data Mining, Paris, 28 June-1 July 2009, 1255-1264.
https://doi.org/10.1145/1557019.1557154
[18]  Tang, L., Li, T. and Perng, C. (2011) LogSig: Generating System Events from Raw Textual Logs. Proceedings of ACM International Conference on Information and Knowledge Man-agement, Glasgow, October 2011, 785-794.
https://doi.org/10.1145/2063576.2063690
[19]  Makanju, A., Zincir-Heywood, A.N. and Milios, E.E. (2012) A Lightweight Algorithm for Message Type Extraction in System Application Logs. IEEE Transactions on Knowledge and Data Engineering, 24, 1921-1936.
https://doi.org/10.1109/TKDE.2011.138
[20]  Bontemps, L., Cao, V.L., McDermott, J. and Le-Khac, N.-A. (2017) Collective Anomaly Detection Based on Long Short Term Memory Recurrent Neural Network.
[21]  Lim, C., Singh, N. and Yajnik, S. (2008) A Log Mining Approach to Failure Analysis of Enterprise Telephony Systems. International Conference on Dependable Systems & Networks with FTCS and DCC (DSN), Anchorage, 24-27 June 2008, 398-403.
https://doi.org/10.1109/DSN.2008.4630109

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133