|
基于上下文注意的场景文本识别
|
Abstract:
作为计算机视觉领域的研究热点,自然场景中不规则文本的识别是一项具有挑战的任务。本文提出了一种简单有效的方法来识别不规则文本。所提出的方法采用薄板样条变换将不规则文本转换为规则文本,采用融合空间多尺度感知模块的ResNet34提取文本特征,然后将文本特征通过Bi-LSTM编码为上下文特征。整个模型分别使用上下文感知模块和文本特征增强模块进行监督。上下文感知模块关注于文本特征与上下文特征构成的新的特征空间,文本特征增强模块重点关注单个字符本身以处理无上下文语义的文本行。与其他的文本识别模型相比,所提出的方法对于不规则文本的识别能力有较大的提高,同时保持了对于常规文本的识别能力。在通用的场景文本数据集上通过大量的实验验证了模型对于不规则文本识别的有效性。
As a research hotspot in the field of computer vision, the recognition of irregular text in natural scenes is a challenging task. In this paper, we propose a simple and effective method to recognize irregular text. The proposed method uses Thin Plate Spline to convert irregular text into regular text, ResNet34 with fused spatial multiscale perception module to extract text features, and then encodes text features into contextual features by Bi-LSTM. The whole model is supervised using a context-aware module and a text feature enhancement module, respectively. The context-aware module focuses on a new feature space composed of text features and contextual features, and the text feature enhancement module focuses on individual characters to handle text lines without contextual semantics. Compared with other text recognition models, the proposed approach has a large improvement in the recognition of irregular text while maintaining the recognition capability for regular text. The effectiveness of the model for irregular text recognition is verified by extensive experiments on scene text datasets.
[1] | Shi, B., Xiang, B. and Cong, Y. (2016) An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39, 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371 |
[2] | Xie, Z., Huang, Y., Zhu, Y., et al. (2019) Aggregation Cross-Entropy for Sequence Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 6538-6547.
https://doi.org/10.1109/CVPR.2019.00670 |
[3] | Liao, M., Zhang, J., Wan, Z., et al. (2019) Scene Text Recognition from Two-Dimensional Perspective. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 8714-8721. https://doi.org/10.1609/aaai.v33i01.33018714 |
[4] | Shi, B., Yang, M., Wang, X., et al. (2018) Aster: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 2035-2048.
https://doi.org/10.1109/TPAMI.2018.2848939 |
[5] | Cheng, Z., Xu, Y., Bai, F., et al. (2018) Aon: Towards Arbitrarily-Oriented Text Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 5571-5579.
https://doi.org/10.1109/CVPR.2018.00584 |
[6] | Li, H., Wang, P., Shen, C., et al. (2019) Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 8610-8617.
https://doi.org/10.1609/aaai.v33i01.33018610 |
[7] | Lin, Q., Luo, C., Jin, L., et al. (2021) STAN: A Sequential Transformation Attention-Based Network for Scene Text Recognition. Pattern Recognition, 111, Article ID: 107692. https://doi.org/10.1016/j.patcog.2020.107692 |
[8] | 李利荣, 张开, 张云良, 等. 基于多级特征选择的自然场景文本识别算法[J]. 光电子·激光, 2022(5): 33. |
[9] | Qiao, Z., Zhou, Y., Yang, D., et al. (2020) Seed: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 13528-13537. https://doi.org/10.1109/CVPR42600.2020.01354 |
[10] | Baek, J., Kim, G., Lee, J., et al. (2019) What Is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 4715-4723. https://doi.org/10.1109/ICCV.2019.00481 |
[11] | Li, H., Yang, D., Huang, S., et al. (2020) Two-Dimensional Multi-Scale Perceptive Context for Scene Text Recognition. Neurocomputing, 413, 410-421. https://doi.org/10.1016/j.neucom.2020.06.071 |
[12] | Zuo, L.Q., Sun, H.M., Mao, Q.C., et al. (2019) Natural Scene Text Recognition Based on Encoder-Decoder Framework. IEEE Access, 7, 62616-62623. https://doi.org/10.1109/ACCESS.2019.2916616 |
[13] | Wang, T., Zhu, Y., Jin, L., et al. (2020) Decoupled Attention Network for Text Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12216-12224. https://doi.org/10.1609/aaai.v34i07.6903 |
[14] | Luo, C., Jin, L. and Sun, Z. (2019) Moran: A Multi-Object Rectified Attention Network for Scene Text Recognition. Pattern Recognition, 90, 109-118. https://doi.org/10.1016/j.patcog.2019.01.020 |
[15] | Huang, Y., Sun, Z., Jin, L., et al. (2020) EPAN: Effective Parts Attention Network for Scene Text Recognition. Neurocomputing, 376, 202-213. https://doi.org/10.1016/j.neucom.2019.10.010 |
[16] | Zhan, F. and Lu, S. (2019) Esir: End-to-End Scene Text Recognition via Iterative Image Rectification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 16-17 June 2019, 2059-2068.
https://doi.org/10.1109/CVPR.2019.00216 |