%0 Journal Article
%T 面向APT攻击调查的溯源图冗余结构压缩<br>Provenance Graph Redundant Structure Compression for APT Attack Investigation
%A 李苍铭
%A 徐志强
%J Computer Science and Application
%P 35-44
%@ 2161-881X
%D 2025
%I Hans Publishing
%R 10.12677/csa.2025.156155
%X 面向系统审计日志的溯源图分析已经是APT攻击调查的主要手段。溯源图节点代表系统实体(包括进程、文件和网络)&#65292;边代表系统实体之间的依赖关系。攻击调查是在溯源图上追踪攻击源头并构建完整的攻击路径。依赖爆炸导致溯源图规模庞大&#65292;将为攻击调查带来巨大的存储开销和时间开销。为解决该问题&#65292;本文提出进程重复模式压缩和文件重复模式压缩去减小溯源图规模。其中&#65292;进程重复模式代表系统在不同时间调用相同进程执行相同的文件读写任务&#65292;而文件重复模式代表多个文件被相同进程处理。这些模式均表示重复的行为&#65292;不会带来更多有价值的信息&#65292;因此压缩它们不会影响攻击调查。本文在6个真实攻击数据集(约1948万个系统事件)进行实验验证&#65292;结果指出溯源图节点和边的压缩率平均分别为56.5%和58.0%。此外&#65292;在压缩前和压缩后的溯源图上分别执行攻击调查&#65292;结果证明本文的压缩方法不会影响攻击调查结果。<br>A provenance graph analysis of system audit logs has become a primary method for investigating APT attacks. The nodes in the provenance graph represent system entities (including processes, files, and network activities), while the edges denote dependency relationships between these entities. Attack investigation involves tracing the attack origin and reconstructing the complete attack path on the provenance graph. However, dependency explosion leads to excessively large provenance graphs, imposing significant storage and computational overhead on attack investigations. To address this issue, this paper proposes process repetition pattern compression and file repetition pattern compression to reduce the scale of provenance graphs. Specifically, process repetition patterns refer to cases where the system repeatedly invokes the same process to perform identical file read/write operations at different times. File repetition patterns describe scenarios where multiple files are processed by the same process. Since these patterns represent redundant behaviors and do not provide additional valuable information, compressing them does not affect attack investigations. Experiments were conducted on six real-world attack datasets (comprising approximately 19.48 million system events). The results demonstrate an average compression rate of 56.5% for nodes and 58.0% for edges in the provenance graph. Furthermore, attack investigations (using Nodoze and DepComm) were performed on both the original and compressed provenance graphs, confirming that the proposed compression method does not compromise investigation accuracy.
%K 高级持续威胁&#65292
%K 系统审计日志&#65292
%K 攻击调查&#65292
%K 图压缩<br>Advanced Persistent Threat
%K System Audit Logs
%K Attack Investigation
%K Graph Compression
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=117095