%0 Journal Article %T 基于柔性粒度的文本摘要自动化技术创新研究
Research on Innovation of Text Summarization Automation Technology Based on Flexible Granularity %A 涂著刚 %A 李正军 %A 杨敏 %J Computer Science and Application %P 2546-2554 %@ 2161-881X %D 2021 %I Hans Publishing %R 10.12677/CSA.2021.1110258 %X
本文对使用序列到序列模型进行文本摘要时的方法进行研究,重点分析了集外词难以生成以及单词间联系缓慢两个不足产生的原因;结合字节对编码算法,提出了柔性粒度字节对编码算法FG-BPE。改进后的FG-BPE算法将完整单词分割为不相交的子词单元,通过降低文本粒度大小解决缓解集外词难以生成的问题,通过子词单元二次分割实现单词之间联系的更好学习。关于Gigaword集的实验证明,与原始子词分割算法相比,FG-BPE实现了一元组、二元组及最长公共子串的共现召回率整体提升。
In this paper, the method of text summarization using sequence-to-sequence model is studied, and the causes of two shortcomings, which are difficult to generate extra words and slow connection between words, are emphatically analyzed. Combined with byte pair coding algorithm, a flexible granularity byte pair coding algorithm FG-BPE is proposed. The improved FG-BPE algorithm divides the whole word into disjoint sub-word units, solves the problem that it is difficult to generate words outside the set by reducing the text granularity, and realizes better learning of the relationship between words through the secondary segmentation of sub-word units. Experiments on Gigaword set show that compared with the original sub-word segmentation algorithm, FG-BPE can improve the recall rate of co-occurrence of one tuple, two tuples and the longest common substring as a whole.
%K 文本摘要自动化,子词,字节对编码,粒度
Text Automation %K Sub Words %K Encoding Byte Pairs %K Granularity %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=45962