Multilingual Text Recognition and Assistance for Low-Resource Languages Using Computer Vision

doi:10.4236/oalib.1113574

OALib Journal期刊
ISSN: 2333-9721
费用：99美元

查看量	下载量

Open Access Library Journal 12 2025

查看所有领域

Multilingual Text Recognition and Assistance for Low-Resource Languages Using Computer Vision

DOI: 10.4236/oalib.1113574, PP. 1-20

Franck Senu Binunya,Huabing Zhou

Subject Areas: Information Management, Artificial Intelligence

Keywords: Long Short-Term Memory, Optical Character Recognition, Reinforced Learning

Full-Text Cite this paper Add to My Lib

Abstract

In this paper, a novel multilingual OCR (Optical Character Recognition) method for scanned papers is provided. Current open-source solutions, like Tesseract, offer extremely high accuracy when it comes to Latin letters. Nonetheless, multilingual texts using Asian characters typically have less accuracy than ones that are simply in Latin. The challenges for OCR increase when handling the logographic Chinese and Korean scripts because these languages feature complex multi-stroke characters. Text segmentation in these scripts proves challenging because their scripts lack word boundaries that Latin-based languages possess. OCR performs substantially worse at document processing when mixed English, Chinese, and Korean content exists within a single document rather than when the documents contain English-only content. The mix of complex character structures that includes no word boundaries and numerous dense character sets leads to subpar performance of current OCR systems, which process multilingual content. We provide a novel architecture that addresses these issues by using three neural blocks a segmenter and switcher as well as numerous recognizers as well as the segmenter’s reinforcement learning: Our system solves multilingual OCR challenges by implementing a segmenter to separate word images into single-character sub-images which helps minimize the difficulties of recognizing multi-stroke characters present in Chinese and Korean languages. Each sub-image goes to the switcher, which distributes it among specialized recognizers to increase accuracy due to task assignments based on character type. A new approach deals with non-Latin script word boundaries by eliminating their identification challenges. The training process of recognizers through supervised learning enhances both character recognition performance and overall output for multilingual documents. Nevertheless, there are two significant problems with the segmenter’s supervised learning: Its training necessitates a significant amount of annotation work, and its target function is not optimal. Therefore, by using the reinforcement learning method, training for the segmenter can minimise the edit distance of the final recognition results, thereby optimising overall performance. According to experimental results, the suggested approach, which does not use character boundary markers, greatly enhances performance for multilingual scripts and languages with huge character sets.

Cite this paper

Binunya, F. S. and Zhou, H. (2025). Multilingual Text Recognition and Assistance for Low-Resource Languages Using Computer Vision. Open Access Library Journal, 12, e3574. doi: http://dx.doi.org/10.4236/oalib.1113574.

References

[1]	Memon, J., Sami, M., Khan, R.A. and Uddin, M. (2020) Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR). IEEE Access, 8, 142642-142668. https://doi.org/10.1109/access.2020.3012542
[2]	Mathew, M., Mondal, A. and Jawahar, C.V. (2024) Towards Deployable OCR Models for Indic Languages. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S. and Pal, U., Eds., Lecture Notes in Computer Science, Springer, 167-182. https://doi.org/10.1007/978-3-031-78495-8_11
[3]	Mann, D., Raissi, T., Michel, W., Schlüter, R. and Ney, H. (2023) End-to-End Training of a Neural HMM with Label and Transition Probabilities. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, 16-20 December 2023, 1-8. https://doi.org/10.1109/asru57964.2023.10389749
[4]	Idrees, S. and Hassani, H. (2021) Exploiting Script Similarities to Compensate for the Large Amount of Data in Training Tesseract LSTM: Towards Kurdish OCR. Ap-plied Sciences, 11, Article 9752. https://doi.org/10.3390/app11209752
[5]	Park, J., Lee, E., Kim, Y., Kang, I., Koo, H.I. and Cho, N.I. (2020) Multilingual Optical Character Recognition Sys-tem Using the Reinforcement Learning of Character Segmenter. IEEE Access, 8, 174437-174448. https://doi.org/10.1109/access.2020.3025769
[6]	Chang, C., Arora, A., Garcia Perera, L.P., Etter, D., Povey, D. and Khudanpur, S. (2019) Optical Character Recognition with Chinese and Korean Character Decomposi-tion. 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, 22-25 September 2019, 134-139. https://doi.org/10.1109/icdarw.2019.40094
[7]	Etter, D., Carpenter, C. and King, N. (2023) A Hybrid Model for Multilingual OCR. In: Fink, G.A., Jain, R., Kise, K. and Zanibbi, R., Eds., Lecture Notes in Computer Science, Springer, 467-483. https://doi.org/10.1007/978-3-031-41676-7_27
[8]	Rho, M., Tian, Y. and Chen, Q. (2024) Word Segmentation for Asian Languages: Chinese, Korean, and Japanese. arXiv:2407.19400.
[9]	Yu, Y., Wang, C., Fu, Q., Kou, R., Huang, F., Yang, B., et al. (2023) Techniques and Challenges of Image Seg-mentation: A Review. Electronics, 12, Article 1199. https://doi.org/10.3390/electronics12051199
[10]	Ramesh, K.K.D., Kumar, G.K., Swapna, K., Datta, D. and Rajest, S.S. (2021) A Review of Medical Image Segmentation Algorithms. EAI Endorsed Transactions on Pervasive Health and Technology, 7, e6. https://doi.org/10.4108/eai.12-4-2021.169184
[11]	Trivedi, A. and Sar-vadevabhatla, R.K. (2021) Boundarynet: An Attentive Deep Network with Fast Marching Distance Maps for Semi-Automatic Layout Annotation. In: Lladós, J., Lopresti, D. and Uchida, S., Eds., Lecture Notes in Computer Science, Springer International Publishing, 3-18. https://doi.org/10.1007/978-3-030-86549-8_1
[12]	Gao, Y., Chen, Y., Wang, J. and Lu, H. (2021) Semi-Supervised Scene Text Recognition. IEEE Transac-tions on Image Processing, 30, 3005-3016. https://doi.org/10.1109/tip.2021.3051485
[13]	Liu, J., Zhong, Q., Yuan, Y., Su, H. and Du, B. (2020) SemiText: Scene Text Detection with Semi-Supervised Learning. Neurocomputing, 407, 343-353. https://doi.org/10.1016/j.neucom.2020.05.059
[14]	Gupta, M., Choudhary, A. and Parmar, J. (2021) Analysis of Text Identification Techniques Using Scene Text and Optical Character Recognition. International Journal of Computer Vi-sion and Image Processing, 11, 39-62. https://doi.org/10.4018/ijcvip.2021100104
[15]	Neng, H.Z. (2022) Auto-mated Scanned Receipt Processing with Optical Character Recognition and Ma-chine Learning. University of Malaya (Malaysia).
[16]	Rexi F, A. and Jacob, L. (2022) Optical Character Recognition System with Projection Profile Based Segmentation and Deep Learning Techniques. 2022 4th International Confer-ence on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, 16-17 December 2022, 12-16. https://doi.org/10.1109/icac3n56670.2022.10074151
[17]	Ptak, R., żygadło, B. and Unold, O. (2017) Projection-Based Text Line Segmentation with a Varia-ble Threshold. International Journal of Applied Mathematics and Computer Science, 27, 195-206. https://doi.org/10.1515/amcs-2017-0014
[18]	Sharma, P. and Sachan, M.K. (2017) A Review on Character Segmentation of Touching and Half Character in Handwritten Hindi Text. International Journal of Advanced Research in Com-puter Science, 8, 1078-1083.
[19]	Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N. and Terzopoulos, D. (2021) Image Segmentation Using Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 3523-3542. https://doi.org/10.1109/tpami.2021.3059968
[20]	Gao, Z., Liu, J., Li, Y., Yang, Y. and He, H. (2020) A Novel Semantic Segmentation Model for Chinese Characters. IEEE Access, 8, 179083-179093. https://doi.org/10.1109/access.2020.3027019
[21]	Chernyshova, Y.S., Sheshkus, A.V. and Arlazarov, V.V. (2020) Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images. IEEE Access, 8, 32587-32600. https://doi.org/10.1109/access.2020.2974051
[22]	Wick, C. and Puppe, F. (2021) Experiments and Detailed Error-Analysis of Automatic Square Notation Transcription of Medieval Music Manuscripts Using CNN/LSTM-Networks and a Neume Dictionary. Journal of New Music Research, 50, 18-36. https://doi.org/10.1080/09298215.2021.1873393
[23]	Feng, L., Zhao, C. and Sun, Y. (2021) Dual Attention-Based Encoder-Decoder: A Customized Se-quence-to-Sequence Learning for Soft Sensor Development. IEEE Transactions on Neural Networks and Learning Systems, 32, 3306-3317. https://doi.org/10.1109/tnnls.2020.3015929
[24]	Wang, J., Liu, Y. and Li, B. (2020) Reinforcement Learning with Perturbed Rewards. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 6202-6209. https://doi.org/10.1609/aaai.v34i04.6086
[25]	Wu, Y., Xing, M., Zhang, Y., Luo, X., Xie, Y. and Qu, Y. (2024) UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior. Advances in Neural Infor-mation Processing Systems, 37, 101223-101249.
[26]	Liang, Q., Peng, J., Li, Z., Xie, D., Sun, W., Wang, Y., et al. (2020) Robust Table Recognition for Printed Document Images. Mathematical Biosciences and Engineering, 17, 3203-3223. https://doi.org/10.3934/mbe.2020182
[27]	Kaundilya, C., Chawla, D. and Chopra, Y. (2019) Automated Text Extraction from Images Using OCR System. 2019 6th International Conference on Computing for Sustainable Global De-velopment (INDIACom), New Delhi, 13-15 March 2019, 145-150.
[28]	Xu, G., Li, J., Gao, G., Lu, H., Yang, J. and Yue, D. (2023) Lightweight Real-Time Seman-tic Segmentation Network with Efficient Transformer and CNN. IEEE Transac-tions on Intelligent Transportation Systems, 24, 15897-15906. https://doi.org/10.1109/tits.2023.3248089
[29]	Nugraha, G.S., Darmawan, M.I. and Dwiyansaputra, R. (2023) Comparison of CNN’S Architecture Goog-lenet, Alexnet, VGG-16, Lenet -5, Resnet-50 in Arabic Handwriting Pattern Recognition. Kinetik: Game Technology, Information System, Computer Net-work, Computing, Electronics, and Control, 8, 545-554. https://doi.org/10.22219/kinetik.v8i2.1667
[30]	Patil, S., Varadarajan, V., Mahadevkar, S., Athawade, R., Maheshwari, L., Kumbhare, S., et al. (2022) Enhancing Optical Character Recognition on Images with Mixed Text Using Se-mantic Segmentation. Journal of Sensor and Actuator Networks, 11, Article 63. https://doi.org/10.3390/jsan11040063
[31]	Sporici, D., Cușnir, E. and Boian-giu, C. (2020) Improving the Accuracy of Tesseract 4.0 OCR Engine Using Con-volution-Based Preprocessing. Symmetry, 12, Article 715. https://doi.org/10.3390/sym12050715
[32]	Preethi, P. and Mamatha, H.R. (2023) Region-Based Convolutional Neural Network for Segmenting Text in Epigraphical Images. Artificial Intelligence and Applications, 1, 103-111. https://doi.org/10.47852/bonviewaia2202293
[33]	Yi, W., Stavrinides, V., Baum, Z.M.C., Yang, Q., Barratt, D.C., Clarkson, M.J., et al. (2023) Boundary-RL: Reinforcement Learning for Weakly-Supervised Prostate Segmentation in TRUS Images. In: Cao, X., Xu, X., Rekik, I., Cui, Z. and Ouyang, X., Eds., Lecture Notes in Computer Science, Springer, 277-288. https://doi.org/10.1007/978-3-031-45673-2_28
[34]	Tian, J., Yan, B., Yu, J., Weng, C., Yu, D. and Watanabe, S. (2022) Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks. arXiv:2210.07499.
[35]	Renkin, M. and Rahman, J.S. (2020) Improving the Stability of a Convolutional Neural Network Time-Series Classifier Using Selu and Tanh. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H. and King, I., Eds., Communications in Com-puter and Information Science, Springer International Publishing, 788-795. https://doi.org/10.1007/978-3-030-63823-8_89a

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133