全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

面向严格对齐任务的文本自动生成:以招标技术范本为例
Automatic Text Generation for Strictly Aligned Tasks: Taking the Tendering Technical Template as an Example

DOI: 10.12677/CSA.2021.117197, PP. 1923-1930

Keywords: 对齐文本自动生成,关键标签抽取,文本去重
Align Text Automatically Generated
, Key Label Extraction, Text De-Duplication

Full-Text   Cite this paper   Add to My Lib

Abstract:

自动生成的严格对齐的文本,生活中更有常用,例如:自动生成对齐的招投标文件等。然而,自动生成对齐文本时,首先需要的是结构化数据。本文设计了基于历史招标文件的严格对齐文本自动生成模型。方法包括:基于正则匹配的数据清洗和结构化关键标签的抽取(例如:招标文件的技术参数等);基于k-means的结构化关键标签聚类;基于word2vec计算词向量之间余弦距离的结构化关键标签去重;最后,基于结构化关键标签,预测出最终的编制范本。实验以专家手工标记的100篇招标文件技术范本为参照,文中算法不仅可以达到与专家人工编制范本之间80%以上的重合度,同时参数覆盖更全面,鲁棒性高,可以满足生产需求。
Automatic generation of strictly aligned text, is more commonly used in life, such as: automatic generation of aligned bidding documents, etc. However, when you automatically generate aligned text, you need structured data first. In this paper, a strict alignment text automatic generation model based on historical bidding documents is designed. The methods include: data cleaning based on regular matching and extraction of structured key labels (such as technical parameters of bidding documents); Structured key label clustering based on k-means; Structured key tag deduplication based on word2vec to calculate cosine distance between word vectors; Finally, based on the structured key label, the final compilation template is predicted. The experiment takes 100 technical templates of bidding documents manually marked by experts as reference, and the algorithm in this paper can not only achieve more than 80% coincidence degree with the manual templates compiled by experts, but also have more comprehensive parameter coverage and high robustness, which can meet the production requirements.

References

[1]  Reiter, E., Dale, R. and Feng, Z. (2000) Building Natural Language Generation Systems. Cambridge University Press, Cambridge.
https://doi.org/10.1017/CBO9780511519857
[2]  刘挺. 人机对话浪潮: 语音助手、聊天机器人、机器伴侣[J]. 中国计算机学会通讯, 2015, 11(10): 54-56.
[3]  万小军, 冯岩松, 孙薇薇. 文本自动生成研究进展与趋势[C]//中国计算机学会. CCF2014-2015中国计算机科学技术发展报告会论文集. 2015: 298-323.
[4]  See, A., Liu, P.J. and Manning, C.D. (2017) Get to the Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, July 2017, 1073-1083.
https://doi.org/10.18653/v1/P17-1099
[5]  Finch, A., Hwang, Y.S. and Sumita, E. (2005) Using Machine Translation Evaluation Techniques to Determine Sentence-Level Semantic Equivalence. Proceedings of the IWP, Jeju, 14 October 2005, 17-24.
[6]  Dan, F. (2002) On Building a More Efficient Grammar by Exploiting Types. Natural Language Engineering, 6, 15-28.
https://doi.org/10.1017/S1351324900002370
[7]  Reiter, E. (2007) An Architecture for Data-to-Text Systems. Proceedings of the Eleventh European Workshop on Natural Language Generation, Saarbrücken, June 2007, 97-104.
https://doi.org/10.3115/1610163.1610180
[8]  Jeong, W.S., Lee, C., Kim, K., et al. (2020) REACT: Scalable and High-Performance Regular Expression Pattern Matching Accelerator for In-Storage Processing. IEEE Transactions on Parallel and Distributed Systems, 31, 1137-1151.
https://doi.org/10.1109/TPDS.2019.2953646
[9]  Gibrael, A. and Hadi, O. (2021) Using Residual Networks and Cosine Distance-Based K-NN Algorithm to Recognize On-Line Signatures. IEEE Access, 9, 54962-54977.
https://doi.org/10.1109/ACCESS.2021.3071479
[10]  Mohamed, E.H. and El-Behaidy, W.H. (2021) An Ensemble Multi-Label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding. Arabian Journal for Science and Engineering, 46, 3519-3529.
https://doi.org/10.1007/s13369-020-05184-0
[11]  Bertossi, L., Kolahi, S. and Lakshmanan, L. (2013) Data Clean-ing and Query Answering with Matching Dependencies and Matching Functions. Theory of Computing Systems, 52, 441-482.
https://doi.org/10.1007/s00224-012-9402-7
[12]  Jabi, M., Pedersoli, M., Mitiche, A. and Ayed, I.B. (2021) Deep Clustering: On the Link between Discriminative Models and k-Means. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1887-1896.
https://doi.org/10.1109/TPAMI.2019.2962683
[13]  Zhang, L. (2021) Research on Case Reasoning Method Based on TF-IDF. International Journal of System Assurance Engineering and Management, 12, 608-615.
https://doi.org/10.1007/s13198-021-01135-6
[14]  Chen, H., Zhang, B., Sun, F., Huang, Y. and Yuan, J. (2020) Incremental Scene Detection in Outdoor Environment Based on Hierarchical Bag-of-Words Model. Control Theory & Applications, 37, 1471-1480.
[15]  Liu, X., Di, H., Yang, W., Lin, P. and Wang, S. (2020) Mosaic of Cultural Relics Fragments Based on SURF Feature Extraction Descriptor and Jaccard Distance. Optics and Precision Engineering, 28, 963-972.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133