OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Journal of Data Analysis and Information Processing 2022

Construction of an Automatic Bengali Text Summarizer Using Machine Learning Approaches

DOI: 10.4236/jdaip.2022.101003, PP. 43-57

Busrat Jahan, Mahfuja Khatun, Zinat Ara Zabu, Afranul Hoque, Sayed Uddin Rayhan

Keywords: Natural Language Processing, Formatting, Bangla Text Summarizer, Bengali Language Processing, Word Tagging, Pronoun Replacement, Sentence Ranking

Full-Text Cite this paper Add to My Lib

Abstract:

In our study, we chose python as the programming platform for finding an Automatic Bengali Document Summarizer. English has sufficient tools to process and receive summarized records. However, there is no specifically applicable to Bengali since Bengali has a lot of ambiguity, it differs from English in terms of grammar. Afterward, this language holds an important place because this language is spoken by 26 core people all over the world. As a result, it has taken a new method to summarize Bengali documents. The proposed system has been designed by using the following stages: pre-processing the sample doc/input doc, word tagging, pronoun replacement, sentence ranking, as well as summary. Pronoun replacement has been used to reduce the incidence of swinging pronouns in the performance review. We ranked sentences based on sentence frequency, numerical figures, and pronoun replacement. Checking the similarity between two sentences in order to exclude one since it has less duplication. Hereby, we’ve taken 3000 data as input from newspaper and book documents and learned the words to be appropriate with syntax. In addition, to evaluate the performance of the designed summarizer, the design system looked at the different documents. According to the assessment method, the recall, precision, and F-score were 0.70, 0.82 and 0.74, respectively,

References

[1]	De Kunder, M. (2005) The Size of the World Wide Web.
[2]	Ferreira, R. and Luciano, S. (2014) A Multi-Document Summarization System Based on Statistics and Linguistic Treatment. Journal of Expert Systems with Applications, 41, 5780-5787.
[3]	Luhn, H.P. (1958) The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 2, 159-165. https://doi.org/10.1147/rd.22.0159
[4]	Edmundson, H.P. (1969) New Methods in Automatic Extracting. Journal of the ACM, 16, 264-285. https://doi.org/10.1145/321510.321519
[5]	Sarkar, K. (2012) Bengali Text Summarization by Sentence Extraction. Proceedings of International Conference on Business and Information Management (ICBIM-2012), Durgapur, 9-11 January 2012, 233-245.
[6]	Sarkar, K. (2012) An Approach to Summarizing Bengali News Documents. Proceedings of the International Conference on Advances in Computing, Communications and Informatics, Chennai, 3-5 August 2012, 857-862. https://doi.org/10.1145/2345396.2345535
[7]	Efat, I.A., Ibrahim, M. and Kayesh, H. (2013) Automated Bangla Text Summarization by Sentence Scoring and Ranking. Proceedings of 2013 International Conference on Informatics, Electronics & Vision (ICIEV), Dhaka, 17-18 May 2013, 1-5. https://doi.org/10.1109/ICIEV.2013.6572686
[8]	Jahan, B., Emon, I.S., Milu, S.A., Hossain, M.M. and Mahtab S.S. (2021) A Pronoun Replacement-Based Special Tagging System for Bengali Language Processing (BLP). In: Saini, H.S., Sayal, R., Govardhan, A. and Buyya, R., Eds., Innovations in Computer Science and Engineering, Springer, Singapore, 761-768. https://doi.org/10.1007/978-981-33-4543-0_80
[9]	Farrier, J. (2015) The Second Most Spoken Languages around the World. Olivet Nazarene University, Bourbonnais.
[10]	Jahan, B., Mahtab, S.S., Arif, F.H., Emon, I.S., Milu, S.A. and Raju, J. (2021) An Automated Bengali Text Summarization Technique Using Lexicon-Based Approach. In: Saini, H.S., Sayal, R., Govardhan, A. and Buyya, R., Eds., Innovations in Computer Science and Engineering, Springer, Singapore, 363-373. https://doi.org/10.1007/978-981-33-4543-0_39
[11]	Charniak, E. and McDermott, D. (1985) Introduction to Artificial Intelligence. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[12]	Abuobieda, A., Salim, N., Albaham, A.T., Osman, A.H. and Kumar, Y.J. (2012) Text Summarization Features Selection Method Using Pseudo-Genetic-Based Model. Proceedings of the 2012 International Conference on Information Retrieval Knowledge Management, Kuala Lumpur, 13-15 March 2012, 193-197. https://doi.org/10.1109/InfRKM.2012.6204980
[13]	Sarkar, K. (2014) A Keyphrase-Based Approach to Text Summarization for English and Bengali Documents. International Journal of Technology Diffusion, 5, 28-38.
[14]	Baxendale, P.B. (1958) Machine-Made Index for Technical Literature—An Experiment. IBM Journal of Research and Development, 2, 354-361. https://doi.org/10.1147/rd.24.0354
[15]	Radev, D.R., Hovy, E. and McKeown, K. (2002) Introduction to the Special Issue on Summarization. Computational Linguistics, 28, 399-408. https://doi.org/10.1162/089120102762671927
[16]	Chandra, P., Arif, F., Rahman, M., Siddik, S., Rahman M.S. and Rahman, A. (2018) Automated Bengali Document Summarization by Collaborating Individual Word &Sentence Scoring. 2018 21st IEEE International Conference of Computer and Information Technology (ICCIT), Dhaka, 21-23 December 2018, 1-6. https://doi.org/10.1109/ICCITECHN.2018.8631926

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133