全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于MS-Cluster与Prompt-Learning话题检测与追踪技术
Topic Detection and Tracking Technology Based on MS-Cluster and Prompt-Learning

DOI: 10.12677/CSA.2023.1310190, PP. 1918-1927

Keywords: 话题检测追踪技术,提示学习,小样本学习,聚类分析
Topic Detection and Tracking Technology
, Prompt-Learning, Few-Shot Learning, Clustering Analysis

Full-Text   Cite this paper   Add to My Lib

Abstract:

话题检测与追踪技术随着信息处理技术以及人工智能技术的发展,已经取得了较好的发展,但在实际应用中,由于算法标注数据需求高、训练代价大,很难较好的落地应用。本文提出了基于MS-Cluster与Prompt-Learning的话题检测追踪技术,通过聚类分析过程初步进行话题聚合,在此基础上通过提示学习推理进行话题补偿,完成话题检测与追踪过程。该方法在包含13个话题的测试数据集上进行测试验证,证明该方法在零样本与低样本标注情况下有较好效果,同时相较于其他主流话题检测追踪技术在准确率与召回率上都有提升。
Topic detection and tracking technology has been developing well with the development of information processing technology and artificial intelligence technology. However, in practical applications, it is difficult to achieve good deployment due to the high demand for algorithm annotated data and the large training cost. This article proposes a topic detection and tracking technology based on MS-Cluster and Prompt-Learning. The method performs topic aggregation through clustering analysis and topic supplementation through prompt learning reasoning to complete the topic de-tection and tracking process. The method was tested on a dataset of 13 topics, and it showed good results in the case of zero-shot learning and few-shot learning, and it outperformed other main-stream topic detection and tracking technologies in terms of accuracy and recall rate.

References

[1]  Liu, P., Yuan, W., Fu, J., et al. (2021) Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.
[2]  Nallapati, R., Feng, A., Peng, F.C. and Allan, J. (2004) Event Threading within News Topics. Proceedings of the 13th ACM Conference on Information and Knowledge Management, Washington DC, 8-13 November 2004, 446-453.
https://doi.org/10.1145/1031171.1031258
[3]  Lim, K.W. and Buntine, W. (2014) Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon. Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, Shanghai, 3-7 November 2014, 1319-1328.
https://doi.org/10.1145/2661829.2662005
[4]  苏金树, 张博锋, 徐昕. 基于机器学习的文本分类技术研究进展[J]. 软件学报, 2006, 17(9): 1848-1859.
[5]  黄卫东, 陈凌云, 吴美蓉. 网络舆情话题情感演化研究[J]. 情报杂志, 2014(1): 102-107.
[6]  Huang, S., Yang, Y., Li, H. and Sun, G.Z. (2015) Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis. 2014 Asia-Pacific Services Computing Conference, Fuzhou, 4-6 Decem-ber 2014, 88-92.
https://doi.org/10.1109/APSCC.2014.18
[7]  Pavlinek, M. and Podgorelec, V. (2017) Text Classification Method Based on Self-Training and LDA Topic Models. Expert Systems with Applications, 80, 83-93.
https://doi.org/10.1016/j.eswa.2017.03.020
[8]  Aldawsari, M. and Finlayson, M.A. (2019) Detecting Subevents Using Discourse and Narrative Features. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, July 2019, 4780-4790.
https://doi.org/10.18653/v1/P19-1471
[9]  Bekoulis, G., Deleu, J., Demeester, T., et al. (2019) Sub-Event Detec-tion from Twitter Streams as a Sequence Labeling Problem. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, June 2019, 745-750.
https://doi.org/10.18653/v1/N19-1081
[10]  Araki, J., Liu, Z., Hovy, E., et al. (2014) Detecting Subevent Structure for Event Coreference Resolution. Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, May 2014, 4553-4558.
[11]  张帆, 潘亚雄, 胡勇, 等. 基于改进single-pass算法的新闻话题检测与追踪技术研究[J]. 信息安全研究, 2020, 6(5): 396.1-396.8.
[12]  Wang, M., Jayaraman, P.P., Solaiman, E., et al. (2018) A Multi-Layered Performance Analysis for Cloud-Based Topic Detection and Tracking in Big Data Appli-cation. Future Generation Computer Systems, 87, 580-590.
https://doi.org/10.1016/j.future.2018.01.047
[13]  Xie, J., Liu, G.S. and Ning, W. (2012) A Topic Detection Method for Chinese Microblog. 2012 Fourth International Symposium on Information Science and Engineering, Shanghai, 14-16 December 2012, 100-103.
https://doi.org/10.1109/ISISE.2012.30
[14]  张小明, 李舟军, 巢文涵. 基于增量型聚类的自动话题检测研究[J]. 软件学报, 2012, 23(6): 1578-1587.
[15]  Ge, B., He, C.H., Hu, S.Z. and Guo, C. (2018) Chinese News Hot Sub-topic Discovery and Recommendation Method Based on Key Phrase and the LDA Model. Proceedings of the 2018 In-ternational Conference on Electrical, Control, Automation and Robotics, Xiamen, 16-17 September 2018, 349-358.
https://doi.org/10.12783/dtetr/ecar2018/26371
[16]  Pang, J.H., Li, X.S., Xie, H.R. and Rao, Y.H. (2016) SBTM: Topic Modeling over Short Texts. In: Gao, H., Kim, J. and Sakurai, Y., Eds., DASFAA 2016: Database Systems for Ad-vanced Applications, Springer, Cham, 43-56.
https://doi.org/10.1007/978-3-319-32055-7_4
[17]  Liu, P.F., Yuan, W.Z., Fu, J.L., Jiang, Z.B., Hayashi, H. and Neubig, G. (2021) Pre-Tain Prompt and Predict: A Systematic Survey of Prompting Methods in Natural Language Pro-cessing. ACM Computing Surveys, 55, 1-35.
[18]  黄彦乾, 迟冬祥, 徐玲玲. 面向小样本学习的嵌入学习方法研究综述[J]. 计算机工程与应用, 2022, 58(3): 34-49.
[19]  赵凯琳, 靳小龙, 王元卓. 小样本学习研究综述[J]. 软件学报, 2021, 32(2): 349-369.
[20]  Mikolov, T., Sutskever, I., Kai, C., Corrado, G.S. and Dean, J. (2013) Distributed Rep-resentations of Words and Phrases and Their Compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119.
[21]  王立平, 赵晖. 融合词向量与关键词提取的微博话题发现[J]. 现代计算机, 2020(23): 3-9.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133