|
基于语义无向带权图的文本零水印算法
|
Abstract:
在生成式语言模型兴起的今天,人工智能为文本创作和传播带来了前所未有的变革,但是生成式语言模型的广泛应用也带来了版权保护的问题。本研究基于文本的语义特征,提出了一种创新的文本零水印算法,通过语义相似度编码模型将文本的基础粒度编码为高维向量,接着利用文本粒度的高维语义嵌入向量的方向各异性,构建文本语义特征图,对文本特征进行相关性分析实现相似度的评估。经实验证明,本文所提出的零水印算法,在误判率方面的表现较好;在鲁棒性上,对同义改写和文本添加攻击具有良好的抵抗力,对文本的删除攻击具有一定的鲁棒性。
With the rise of generative language models, artificial intelligence has brought unprecedented changes to text creation and dissemination, but the widespread application of generative language models has also brought the problem of copyright protection. Based on the semantic features of the text, this study proposes an innovative text zero watermark algorithm, which encodes the basic granularity of the text into high-dimensional vectors through the semantic similarity coding model, and then uses the directional heterogeneity of the high-dimensional semantic embedding vectors of the text granularity to construct a text semantic feature map, and analyzes the relevance of the text features to achieve similarity evaluation. Experiments show that the zero-watermark algorithm proposed in this paper has a better performance in terms of false positive rate. In terms of robustness, it has good resistance to synonymous rewriting and text addition attacks, and has a certain robustness to text deletion attacks.
[1] | 龚礼春, 姚晔, 唐观根, 等. 基于命名实体识别的医疗文本零水印方案[J]. 密码学报, 2020, 7(5): 643-654. |
[2] | 张娜, 张琨, 张先国, 等. 基于主题词与信息熵编码的文本零水印算法[J]. 计算机与数字工程, 2021, 49(8): 1612-1618. |
[3] | 戴夏菁, 徐谊程, 王馨娅, 等. 基于Word2Vec的中文文本零水印算法[J]. 软件工程, 2023, 26(1): 19-23. |
[4] | 姚然. 说明文零水印算法研究与设计[D]: [硕士学位论文]. 兰州: 兰州大学, 2022. |
[5] | 胡毅光. 记叙文零水印算法研究与设计[D]: [硕士学位论文]. 兰州: 兰州大学, 2024. |
[6] | Devlin, J., Chang, M.W., Lee, K., et al. (2019) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 4171-4186. |
[7] | 旷怡, 邓家俊, 段斌. 基于模糊认知图的学生学习效果预测方法[J/OL]. 东南大学学报(自然科学版), 1-11. http://kns.cnki.net/kcms/detail/32.1178.N.20250214.1646.002.html, 2025-03-06. |
[8] | Liu, X., Chen, Q., Deng, C., et al. (2018) Lcqmc: A Large-Scale Chinese Question Matching Corpus. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, 20-26 August 2018, 1952-1962. |
[9] | Reimers, N. and Gurevych, I. (2019) Sentence-Bert: Sentence Embeddings Using Siamese Bert-Networks. |