%0 Journal Article
%T 基于BiLSTM-CRF的中文藏头诗敏感词检测算法
Chinese Hidden-Head Poem Sensitive Word Detection Algorithm Based on BiLSTM-CRF
%A 何亚楠
%A 游福成
%J Software Engineering and Applications
%P 915-921
%@ 2325-2278
%D 2023
%I Hans Publishing
%R 10.12677/SEA.2023.126089
%X 在数字化和社交媒体时代,藏头诗作为一种文化传承与现代表达相结合的文学形式,其内容监控成为了互联网平台管理的一个挑战。由于其特殊的构造方式,即每行的开头字连起来可以表达特定意义,这一特性使得其成为了隐藏敏感信息的一种手段。尤其是在社交媒体和即时通讯平台上,用户可能会利用藏头诗来规避敏感词过滤机制。本研究提出了一种基于双向长短期记忆网络(BiLSTM-CRF)的藏头诗敏感词检测算法。该算法首先采用词嵌入方法将文字表示成高维向量,再利用BiLSTM模型对藏头诗正反双向的上下文语义进行理解,并捕获文本序列中跨句藏头词的依赖关系,最后通过CRF模型根据标签相关性输出标记序列。我们对算法在不同类型的藏头诗数据集上进行了测试,结果显示该算法能够有效地识别出敏感词汇,具有较高的准确率和召回率。本算法对于监管自动生成的文本内容,尤其是在保护文化传承和遵守网络法规方面显示出其重要价值。
In the era of digitization and social media, acrostic poetry, as a literary form that combines cultural heritage with modern expression, has posed a challenge to internet platform management due to content monitoring. Because of its unique construction, where the initial letters of each line can convey a specific meaning when connected, this feature makes it a means of hiding sensitive information. Particularly on social media and instant messaging platforms, users may use acrostic poems to circumvent sensitive word filtering mechanisms. This study proposes a sensitive word detection algorithm for acrostic poetry based on Bidirectional Long Short-Term Memory Networks (BiLSTM-CRF). The algorithm first uses word embedding to represent the text as high-dimensional vectors, then utilizes the BiLSTM model to understand the semantic context of acrostic poems in both forward and backward directions and capture dependencies of acrostic words across sentences in the text sequence. Finally, the CRF model outputs label sequences based on label relevance. We tested the algorithm on various types of acrostic poetry datasets, and the results demonstrate that the algorithm can effectively identify sensitive words with high accuracy and recall. This algorithm has significant value for monitoring automatically generated text content, particularly in preserving cultural heritage and complying with internet regulations.
%K 藏头诗,敏感词检测,BiLSTM-CRF
Acrostic Poetry
%K Sensitive Word Detection
%K BiLSTM-CRF
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=78028