The rise of social media platforms has revolutionized communication, enabling the exchange of vast amounts of data through text, audio, images, and videos. These platforms have become critical for sharing opinions and insights, influencing daily habits, and driving business, political, and economic decisions. Text posts are particularly significant, and natural language processing (NLP) has emerged as a powerful tool for analyzing such data. While traditional NLP methods have been effective for structured media, social media content poses unique challenges due to its informal and diverse nature. This has spurred the development of new techniques tailored for processing and extracting insights from unstructured user-generated text. One key application of NLP is the summarization of user comments to manage overwhelming content volumes. Abstractive summarization has proven highly effective in generating concise, human-like summaries, offering clear overviews of key themes and sentiments. This enhances understanding and engagement while reducing cognitive effort for users. For businesses, summarization provides actionable insights into customer preferences and feedback, enabling faster trend analysis, improved responsiveness, and strategic adaptability. By distilling complex data into manageable insights, summarization plays a vital role in improving user experiences and empowering informed decision-making in a data-driven landscape. This paper proposes a new implementation framework by fine-tuning and parameterizing Transformer Large Language Models to manage and maintain linguistic and semantic components in abstractive summary generation. The system excels in transforming large volumes of data into meaningful summaries, as evidenced by its strong performance across metrics like fluency, consistency, readability, and semantic coherence.
References
[1]
Gupta, S. and Gupta, S.K. (2019) Abstractive Summarization: An Overview of the State of the Art. Expert Systems with Applications, 121, 49-65. https://doi.org/10.1016/j.eswa.2018.12.011
[2]
Luhn, H.P. (1958) The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 2, 159-165. https://doi.org/10.1147/rd.22.0159
[3]
Gao, S., Chen, X., Li, P., Ren, Z., Bing, L., Zhao, D., et al. (2019) Abstractive Text Summarization by Incorporating Reader Comments. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 6399-6406. https://doi.org/10.1609/aaai.v33i01.33016399
[4]
Liang, Z., Du, J. and Li, C. (2020) Abstractive Social Media Text Summarization Using Selective Reinforced Seq2seq Attention Model. Neurocomputing, 410, 432-440. https://doi.org/10.1016/j.neucom.2020.04.137
[5]
Wang, Q. and Ren, J. (2021) Summary-Aware Attention for Social Media Short Text Abstractive Summarization. Neurocomputing, 425, 290-299. https://doi.org/10.1016/j.neucom.2020.04.136
[6]
Bhandarkar, P. and Thomas, K.T. (2022) Text Summarization Using Combination of Sequence-to-Sequence Model with Attention Approach. In: Smys, S., Lafata, P., Palanisamy, R. and Kamel, K.A., Eds., Computer Networks and Inventive Communication Technologies, Springer, 283-293. https://doi.org/10.1007/978-981-19-3035-5_22
[7]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[8]
Gupta, A., Chugh, D. and Katarya, R. (2022) Automated News Summarization Using Transformers. In Aurelia, S., Hiremath, S.S., Subramanian, K. and Biswas, S.K., Eds., Sustainable Advanced Computing, Springer, 249-259. https://doi.org/10.1007/978-981-16-9012-9_21
[9]
Su, M., Wu, C. and Cheng, H. (2020) A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2061-2072. https://doi.org/10.1109/taslp.2020.3006731
[10]
Singhal, D., Khatter, K., A, T. and R, J. (2020) Abstractive Summarization of Meeting Conversations. 2020 IEEE International Conference for Innovation in Technology (INOCON), Bangluru, 6-8 November 2020, 1-4. https://doi.org/10.1109/inocon50539.2020.9298305
[11]
Blekanov, I.S., Tarasov, N. and Bodrunova, S.S. (2022) Transformer-Based Abstractive Summarization for Reddit and Twitter: Single Posts vs. Comment Pools in Three Languages. Future Internet, 14, Article 69. https://doi.org/10.3390/fi14030069
[12]
Pal, A., Fan, L. and Igodifo, V. (n.d.) Text Summarization Using BERT and T5. https://anjali001.github.io/Project_Report.pdf
[13]
Nguyen, M., Nguyen, V., Vu, H. and Nguyen, V. (2020) Transformer-Based Summarization by Exploiting Social Information. 2020 12th International Conference on Knowledge and Systems Engineering (KSE), Can Tho, 12-14 November 2020, 25-30. https://doi.org/10.1109/kse50997.2020.9287388
[14]
Li, Q. and Zhang, Q. (2020) Abstractive Event Summarization on Twitter. Companion Proceedings of the Web Conference 2020, Taipei, 20-24 April 2020, 22-23. https://doi.org/10.1145/3366424.3382678
[15]
Kerui, Z., Haichao, H. and Yuxia, L. (2020) Automatic Text Summarization on Social Media. Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control, Newcastle upon Tyne, 17-19 November 2020, 1-5. https://doi.org/10.1145/3440084.3441182
[16]
Tampe, I., Mendoza, M. and Milios, E. (2021) Neural Abstractive Unsupervised Summarization of Online News Discussions. In: Arai, K. Ed., Intelligent Systems and Applications, Springer International Publishing, 822-841. https://doi.org/10.1007/978-3-030-82196-8_60
[17]
Rawat, R., Rawat, P., Elahi, V. and Elahi, A. (2021) Abstractive Summarization on Dynamically Changing Text. 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, 8-10 April 2021, 1158-1163. https://doi.org/10.1109/iccmc51019.2021.9418438
[18]
Ding, N., Hu, S., Zhao, W., Chen, Y., Liu, Z., Zheng, H., et al. (2022) Openprompt: An Open-Source Framework for Prompt-Learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Dublin, 22-27 May 2022, 105-113. https://doi.org/10.18653/v1/2022.acl-demo.10
[19]
Narayan, S., Zhao, Y., Maynez, J., Simões, G., Nikolaev, V. and McDonald, R. (2021) Planning with Learned Entity Prompts for Abstractive Summarization. Transactions of the Association for Computational Linguistics, 9, 1475-1492. https://doi.org/10.1162/tacl_a_00438
[20]
Li, X.L. and Liang, P. (2021) Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1-6 August 2021, 4582-4597. https://doi.org/10.18653/v1/2021.acl-long.353
[21]
Lester, B., Al-Rfou, R. and Constant, N. (2021) The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, 7-11 November 2021, 3045-3059. https://doi.org/10.18653/v1/2021.emnlp-main.243
[22]
Wang, J., Shi, E., Yu, S., Wu, Z., Ma, C., Dai, H., et al. (2024) Prompt Engineering for Healthcare: Methodologies and Applications.
[23]
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H. and Neubig, G. (2023) Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55, 1-35. https://doi.org/10.1145/3560815
[24]
Papagiannopoulou, A. and Angeli, C. (2024) Encoder-Decoder Transformers for Textual Summaries on Social Media Content. Automation, Control and Intelligent Systems, 12, 48-59. https://doi.org/10.11648/j.acis.20241203.11
[25]
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narag, S., Matena, M., Zhou, Y., Li, W., and Liu P. J. (2019) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. The Journal of Machine Learning Research, 21, 1-67. https://dl.acm.org/doi/abs/10.5555/3455716.3455856
[26]
Papagiannopoulou, A. and Angeli, C. (2024) Summarizing User Comments on Social Media Using Transformers. European Conference on Social Media, 11, 198-205. https://doi.org/10.34190/ecsm.11.1.2046
[27]
Papagiannopoulou, A. and Angeli, C. (2023) Designing a Summarization System on Social Comments Using Transformers. 2023 12th International Conference on Computer Technologies and Development (TechDev), Rome, 14-16 October 2023, 1-5. https://doi.org/10.1109/techdev61156.2023.00008
[28]
Lu, L., Liu, Y., Xu, W., Li, H. and Sun, G. (2023) From Task to Evaluation: An Automatic Text Summarization Review. Artificial Intelligence Review, 56, 2477-2507. https://doi.org/10.1007/s10462-023-10582-5
[29]
Lin, C.Y. (2004) Rouge: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, Barcelona, 22 July 2004, 74-81.
[30]
Papineni, K., Roukos, S., Ward, T. and Zhu, W. (2001) BLEU: A Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, 7-12 July 2002, 311-318. https://doi.org/10.3115/1073083.1073135
[31]
Martin, A.F. and Przybocki, M.A. (2001) The NIST Speaker Recognition Evaluations: 1996-2001. 2001: A Speaker Odyssey—The Speaker Recognition Workshop, Crete, 18-22 June 2001, 39-43.
[32]
Post, M. (2018) A Call for Clarity in Reporting BLEU Scores. Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, 31 October-1 November, 2018, 186-191. https://doi.org/10.18653/v1/w18-6319
[33]
Oliveira dos Santos, G., Colombini, E.L. and Avila, S. (2021) CIDEr-R: Robust Consensus-Based Image Description Evaluation. Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-NUT 2021), Online, 11 November 2021, 351-360. https://doi.org/10.18653/v1/2021.wnut-1.39
[34]
Lavie, A. and Agarwal, A. (2007) Meteor: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments. Proceedings of the Second Workshop on Statistical Machine Translation, Prague, 23 June 2007, 228-231. https://doi.org/10.3115/1626355.1626389
[35]
Popović, M. (2015) ChrF: Character N-Gram F-Score for Automatic MT Evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, 17-18 September 2015, 392-395. https://doi.org/10.18653/v1/w15-3049
[36]
Popović, M. (2017) ChrF++: Words Helping Character N-Grams. Proceedings of the Second Conference on Machine Translation, Copenhagen, 7-11 September 2017, 612-618. https://doi.org/10.18653/v1/w17-4770
[37]
Kusner, M., Sun, Y., Kolkin, N., et al. (2015) From Word Embeddings to Document Distances. Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, 6-11 July 2015, 957-966.
[38]
Zhang, T., Kishore, V., Wu, F., et al (2019) BERTScore: Evaluating Text Generation with BERT. arXiv. https://doi.org/10.48550/arXiv.1904.09675
[39]
Kryscinski, W., McCann, B., Xiong, C. and Socher, R. (2020) Evaluating the Factual Consistency of Abstractive Text Summarization. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16-20 November 2020, 9332-9346. https://doi.org/10.18653/v1/2020.emnlp-main.750
[40]
Recasens, M. and Hovy, E. (2010) BLANC: Implementing the Rand Index for Coreference Evaluation. Natural Language Engineering, 17, 485-510. https://doi.org/10.1017/s135132491000029x
[41]
Gao, Y., Zhao, W. and Eger, S. (2020) SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5-10 July 2020, 1347-1354. https://doi.org/10.18653/v1/2020.acl-main.124
[42]
Egan, N., Vasilyev, O. and Bohannon, J. (2022) Play the Shannon Game with Language Models: A Human-Free Approach to Summary Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 10599-10607. https://doi.org/10.1609/aaai.v36i10.21304
[43]
Liu, Y., Jia, Q. and Zhu, K. (2022) Reference-Free Summarization Evaluation via Semantic Correlation and Compression Ratio. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, 10-15 July 2022, 2109-2115. https://doi.org/10.18653/v1/2022.naacl-main.153
[44]
Ng, J. and Abrecht, V. (2015) Better Summarization Evaluation with Word Embeddings for Rouge. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 17-21 September 2015, 1925-1930. https://doi.org/10.18653/v1/d15-1222