In this paper, a novel multilingual OCR (Optical Character Recognition) method for scanned papers is provided. Current open-source solutions, like Tesseract, offer extremely high accuracy when it comes to Latin letters. Nonetheless, multilingual texts using Asian characters typically have less accuracy than ones that are simply in Latin. The challenges for OCR increase when handling the logographic Chinese and Korean scripts because these languages feature complex multi-stroke characters. Text segmentation in these scripts proves challenging because their scripts lack word boundaries that Latin-based languages possess. OCR performs substantially worse at document processing when mixed English, Chinese, and Korean content exists within a single document rather than when the documents contain English-only content. The mix of complex character structures that includes no word boundaries and numerous dense character sets leads to subpar performance of current OCR systems, which process multilingual content. We provide a novel architecture that addresses these issues by using three neural blocks a segmenter and switcher as well as numerous recognizers as well as the segmenter’s reinforcement learning: Our system solves multilingual OCR challenges by implementing a segmenter to separate word images into single-character sub-images which helps minimize the difficulties of recognizing multi-stroke characters present in Chinese and Korean languages. Each sub-image goes to the switcher, which distributes it among specialized recognizers to increase accuracy due to task assignments based on character type. A new approach deals with non-Latin script word boundaries by eliminating their identification challenges. The training process of recognizers through supervised learning enhances both character recognition performance and overall output for multilingual documents. Nevertheless, there are two significant problems with the segmenter’s supervised learning: Its training necessitates a significant amount of annotation work, and its target function is not optimal. Therefore, by using the reinforcement learning method, training for the segmenter can minimise the edit distance of the final recognition results, thereby optimising overall performance. According to experimental results, the suggested approach, which does not use character boundary markers, greatly enhances performance for multilingual scripts and languages with huge character sets.
Cite this paper
Binunya, F. S. and Zhou, H. (2025). Multilingual Text Recognition and Assistance for Low-Resource Languages Using Computer Vision. Open Access Library Journal, 12, e3574. doi: http://dx.doi.org/10.4236/oalib.1113574.
Memon, J., Sami, M., Khan, R.A. and Uddin, M. (2020) Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR). IEEE Access, 8, 142642-142668. https://doi.org/10.1109/access.2020.3012542
Mathew, M., Mondal, A. and Jawahar, C.V. (2024) Towards Deployable OCR Models for Indic Languages. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S. and Pal, U., Eds., Lecture Notes in Computer Science, Springer, 167-182. https://doi.org/10.1007/978-3-031-78495-8_11
Mann, D., Raissi, T., Michel, W., Schlüter, R. and Ney, H. (2023) End-to-End Training of a Neural HMM with Label and Transition Probabilities. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, 16-20 December 2023, 1-8. https://doi.org/10.1109/asru57964.2023.10389749
Idrees, S. and Hassani, H. (2021) Exploiting Script Similarities to Compensate for the Large Amount of Data in Training Tesseract LSTM: Towards Kurdish OCR. Ap-plied Sciences, 11, Article 9752. https://doi.org/10.3390/app11209752
Park, J., Lee, E., Kim, Y., Kang, I., Koo, H.I. and Cho, N.I. (2020) Multilingual Optical Character Recognition Sys-tem Using the Reinforcement Learning of Character Segmenter. IEEE Access, 8, 174437-174448. https://doi.org/10.1109/access.2020.3025769
Chang, C., Arora, A., Garcia Perera, L.P., Etter, D., Povey, D. and Khudanpur, S. (2019) Optical Character Recognition with Chinese and Korean Character Decomposi-tion. 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, 22-25 September 2019, 134-139. https://doi.org/10.1109/icdarw.2019.40094
Etter, D., Carpenter, C. and King, N. (2023) A Hybrid Model for Multilingual OCR. In: Fink, G.A., Jain, R., Kise, K. and Zanibbi, R., Eds., Lecture Notes in Computer Science, Springer, 467-483. https://doi.org/10.1007/978-3-031-41676-7_27
Ramesh, K.K.D., Kumar, G.K., Swapna, K., Datta, D. and Rajest, S.S. (2021) A Review of Medical Image Segmentation Algorithms. EAI Endorsed Transactions on Pervasive Health and Technology, 7, e6. https://doi.org/10.4108/eai.12-4-2021.169184
Trivedi, A. and Sar-vadevabhatla, R.K. (2021) Boundarynet: An Attentive Deep Network with Fast Marching Distance Maps for Semi-Automatic Layout Annotation. In: Lladós, J., Lopresti, D. and Uchida, S., Eds., Lecture Notes in Computer Science, Springer International Publishing, 3-18. https://doi.org/10.1007/978-3-030-86549-8_1
Gao, Y., Chen, Y., Wang, J. and Lu, H. (2021) Semi-Supervised Scene Text Recognition. IEEE Transac-tions on Image Processing, 30, 3005-3016. https://doi.org/10.1109/tip.2021.3051485
Liu, J., Zhong, Q., Yuan, Y., Su, H. and Du, B. (2020) SemiText: Scene Text Detection with Semi-Supervised Learning. Neurocomputing, 407, 343-353. https://doi.org/10.1016/j.neucom.2020.05.059
Gupta, M., Choudhary, A. and Parmar, J. (2021) Analysis of Text Identification Techniques Using Scene Text and Optical Character Recognition. International Journal of Computer Vi-sion and Image Processing, 11, 39-62. https://doi.org/10.4018/ijcvip.2021100104
Rexi F, A. and Jacob, L. (2022) Optical Character Recognition System with Projection Profile Based Segmentation and Deep Learning Techniques. 2022 4th International Confer-ence on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, 16-17 December 2022, 12-16. https://doi.org/10.1109/icac3n56670.2022.10074151
Ptak, R., żygadło, B. and Unold, O. (2017) Projection-Based Text Line Segmentation with a Varia-ble Threshold. International Journal of Applied Mathematics and Computer Science, 27, 195-206. https://doi.org/10.1515/amcs-2017-0014
Sharma, P. and Sachan, M.K. (2017) A Review on Character Segmentation of Touching and Half Character in Handwritten Hindi Text. International Journal of Advanced Research in Com-puter Science, 8, 1078-1083.
Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N. and Terzopoulos, D. (2021) Image Segmentation Using Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 3523-3542. https://doi.org/10.1109/tpami.2021.3059968
Gao, Z., Liu, J., Li, Y., Yang, Y. and He, H. (2020) A Novel Semantic Segmentation Model for Chinese Characters. IEEE Access, 8, 179083-179093. https://doi.org/10.1109/access.2020.3027019
Chernyshova, Y.S., Sheshkus, A.V. and Arlazarov, V.V. (2020) Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images. IEEE Access, 8, 32587-32600. https://doi.org/10.1109/access.2020.2974051
Wick, C. and Puppe, F. (2021) Experiments and Detailed Error-Analysis of Automatic Square Notation Transcription of Medieval Music Manuscripts Using CNN/LSTM-Networks and a Neume Dictionary. Journal of New Music Research, 50, 18-36. https://doi.org/10.1080/09298215.2021.1873393
Feng, L., Zhao, C. and Sun, Y. (2021) Dual Attention-Based Encoder-Decoder: A Customized Se-quence-to-Sequence Learning for Soft Sensor Development. IEEE Transactions on Neural Networks and Learning Systems, 32, 3306-3317. https://doi.org/10.1109/tnnls.2020.3015929
Wang, J., Liu, Y. and Li, B. (2020) Reinforcement Learning with Perturbed Rewards. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 6202-6209. https://doi.org/10.1609/aaai.v34i04.6086
Wu, Y., Xing, M., Zhang, Y., Luo, X., Xie, Y. and Qu, Y. (2024) UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior. Advances in Neural Infor-mation Processing Systems, 37, 101223-101249.
Kaundilya, C., Chawla, D. and Chopra, Y. (2019) Automated Text Extraction from Images Using OCR System. 2019 6th International Conference on Computing for Sustainable Global De-velopment (INDIACom), New Delhi, 13-15 March 2019, 145-150.
Patil, S., Varadarajan, V., Mahadevkar, S., Athawade, R., Maheshwari, L., Kumbhare, S., et al. (2022) Enhancing Optical Character Recognition on Images with Mixed Text Using Se-mantic Segmentation. Journal of Sensor and Actuator Networks, 11, Article 63. https://doi.org/10.3390/jsan11040063
Sporici, D., Cușnir, E. and Boian-giu, C. (2020) Improving the Accuracy of Tesseract 4.0 OCR Engine Using Con-volution-Based Preprocessing. Symmetry, 12, Article 715. https://doi.org/10.3390/sym12050715
Preethi, P. and Mamatha, H.R. (2023) Region-Based Convolutional Neural Network for Segmenting Text in Epigraphical Images. Artificial Intelligence and Applications, 1, 103-111. https://doi.org/10.47852/bonviewaia2202293
Renkin, M. and Rahman, J.S. (2020) Improving the Stability of a Convolutional Neural Network Time-Series Classifier Using Selu and Tanh. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H. and King, I., Eds., Communications in Com-puter and Information Science, Springer International Publishing, 788-795. https://doi.org/10.1007/978-3-030-63823-8_89a