%0 Journal Article %T 面向严格对齐任务的文本自动生成:以招标技术范本为例
Automatic Text Generation for Strictly Aligned Tasks: Taking the Tendering Technical Template as an Example %A 卢爽 %J Computer Science and Application %P 1923-1930 %@ 2161-881X %D 2021 %I Hans Publishing %R 10.12677/CSA.2021.117197 %X
自动生成的严格对齐的文本,生活中更有常用,例如:自动生成对齐的招投标文件等。然而,自动生成对齐文本时,首先需要的是结构化数据。本文设计了基于历史招标文件的严格对齐文本自动生成模型。方法包括:基于正则匹配的数据清洗和结构化关键标签的抽取(例如:招标文件的技术参数等);基于k-means的结构化关键标签聚类;基于word2vec计算词向量之间余弦距离的结构化关键标签去重;最后,基于结构化关键标签,预测出最终的编制范本。实验以专家手工标记的100篇招标文件技术范本为参照,文中算法不仅可以达到与专家人工编制范本之间80%以上的重合度,同时参数覆盖更全面,鲁棒性高,可以满足生产需求。
Automatic generation of strictly aligned text, is more commonly used in life, such as: automatic generation of aligned bidding documents, etc. However, when you automatically generate aligned text, you need structured data first. In this paper, a strict alignment text automatic generation model based on historical bidding documents is designed. The methods include: data cleaning based on regular matching and extraction of structured key labels (such as technical parameters of bidding documents); Structured key label clustering based on k-means; Structured key tag deduplication based on word2vec to calculate cosine distance between word vectors; Finally, based on the structured key label, the final compilation template is predicted. The experiment takes 100 technical templates of bidding documents manually marked by experts as reference, and the algorithm in this paper can not only achieve more than 80% coincidence degree with the manual templates compiled by experts, but also have more comprehensive parameter coverage and high robustness, which can meet the production requirements.
%K 对齐文本自动生成,关键标签抽取,文本去重
Align Text Automatically Generated %K Key Label Extraction %K Text De-Duplication %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=43937