%0 Journal Article %T 基于DFA的中拼混合敏感词过滤算法
Chinese Characters and Pinyin Mixed Sensitive Word Filtering Algorithm Based on DFA %A 杨扬 %A 游福成 %J Software Engineering and Applications %P 1310-1318 %@ 2325-2278 %D 2022 %I Hans Publishing %R 10.12677/SEA.2022.116134 %X 本文针对当前网络上通过各种干扰形式“伪装”的敏感词,提出了一种基于DFA的中拼混合敏感词过滤算法,解决了一般的系统过滤方法难以成功检测过滤该类敏感词的问题,提高了包含该类敏感词文本过滤的查全率和查准率。本文提出的算法包括中文拼音敏感词库的扩充算法、敏感词树的构建算法、待检测文本的预处理算法以及敏感词过滤算法,通过实验得到该算法查准率为100%,查全率约为95%~100%,算法复杂度较低,满足实际应用需要。
Aiming at the sensitive words that are “camouflaged” through various interference forms on the current network, this paper proposes a Chinese character and Pinyin mixed sensitive word filtering algorithm based on DFA, which solves the problem that the general system filtering methods are difficult to successfully detect and filter such sensitive words, and improves the recall and precision of text filtering containing such sensitive words. The algorithm proposed in this paper includes the expansion algorithm of the Chinese characters and Pinyin mixed sensitive word library, the construction algorithm of the sensitive word tree, the pretreatment algorithm of the text to be detected, and the sensitive word filtering algorithm. Through the experiment, the precision of the algorithm is 100%, and the recall is about 95%~100%. The algorithm complexity is low so this algorithm meets the practical application needs. %K DFA,词库构建,敏感词过滤
DFA %K Construction of Thesaurus %K Sensitive Word Filtering %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=59190