%0 Journal Article
%T 一种基于聚类的系统日志解析算法
A System Log Parsing Algorithm Based on Cluster Algorithm
%A 霍文君
%J Computer Science and Application
%P 98-111
%@ 2161-881X
%D 2020
%I Hans Publishing
%R 10.12677/CSA.2020.101011
%X
系统日志是软件系统中检查系统状态的重要来源,系统日志中包含的运行时状态报告以及错误信息被广泛地用于系统运维中。随着现阶段软件系统变得日益庞大和复杂,大型软件系统通常会使用日志分析挖掘技术来自动地从系统日志中发掘系统关键信息,日志数据被用于异常检测、根因分析、行为分析等等应用中。日志数据通常是无结构化的文本数据,在使用数据挖掘算法对日志数据进行训练之前,需要使用日志解析算法对原始日志数据进行结构化处理,本文根据日志数据分析挖掘技术的特点和需求,提出一种基于聚类算法的日志解析算法,可以从原始日志数据中提取消息模板和事件序列,同时提出了一种固定深度的树结构模型对消息模板进行存储以实现新日志消息的快速结构化和异常日志消息的检测。通过在特定系统日志数据上的实验证明本文的日志解析算法具有较高的准确性和通用性。
System log is an important source of checking system status in software system. The runtime status report and error information contained in system log are widely used in system operation and maintenance. With the current software system becoming increasingly large and complex, large-scale software systems usually use log analysis mining technology to automatically mine key system information from the system log. Log data is used in anomaly detection, root analysis, behavior analysis and other applications. Log data is usually unstructured text data. Before using data mining algorithm to train the log data, we need to use log parsing algorithm to process the original log data structurally. According to the characteristics and requirements of log data analysis and mining technology, this paper proposes a log parsing algorithm based on clustering algorithm, which can analyze the original log data from the number of original logs. A fixed depth tree structure model is also proposed to store the message template to realize the fast structure of new log messages and the detection of abnormal log messages. Through the experiments on the specific system log data, it is proved that the log parsing algorithm in this paper has high accuracy and generality.
%K 系统日志,日志解析,聚类,异常检测
System Log
%K Log Parsing
%K Anomaly Detection
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=34007