|
基于SVM的印刷体数学公式识别的研究
|
Abstract:
传统的数学公式识别,通常建立在OCR技术进行图片文字识别的基础上,对目标公式进行符号切割,通过构建数学符号数据库,然后两两比较相似度,然后返回最大相似度的符号名称,作为识别结果。该方法,对数学符号数据库要求极高,鉴于实际情况,公式存在字号大小、粗细体、正斜体、各种字体等差异,导致该方法识别效果不佳。本文基于印刷体数学公式特点,重新构建字符标准库,并结合机器学习思想,应用SVM算法进行公式识别,并进一步提取字符特征,提升公式识别精度,实验结果显示,识别结果良好。
Traditional mathematical formula recognition, usually based on OCR technology for image and text recognition, cuts the symbol of the target formula, builds the mathematical symbol database, com-pares the similarity, and then returns the symbol name of the maximum similarity as the recogni-tion result. In view of the actual situation, there are some differences in the formula, such as font size, thickness, italics, various fonts and so on. Based on the characteristics of printed mathematical formulas, this paper reconstructs the character standard library, and combines with the machine learning idea, uses SVM algorithm to recognize formulas, and further extracts the character features, improves the accuracy of formula recognition. The experimental results show that the recognition results are good.
[1] | Anderson, R.H. (1968) Syntex-Directed Recognition of Hand-Printed Two-Dimensional Mathematics. In: Interactive Systems for Experimental Applied Mathematics. Academic Press, New York, 436-459.
https://doi.org/10.1016/B978-0-12-395608-8.50048-7 |
[2] | Twaakyondo, H.M. and Okmoto, M. (1995) Structure Analysis and Recognition of Mathematical Expressions. Proceedings of the 3th International Conference on Document Analysis and Recognition, Montreal, Canada, 14-16 August 1995, 430-437. |
[3] | Okamoto, M., Imai, H. and Takagi, K. (2001) Performance Evaluation of a Robust Method for Mathematical Expression Recognition. Proceedings of the 6th International Conference on Document Analysis and Recognition, Seattle, WA, USA, 13 September 2001, 121-128. |
[4] | Lee, H.-J. and Lee, M.-C. (1994) Understanding Mathematical Expressions Using Procedure Oriented Transformation. Pattern Recognition, 27, 447-457. https://doi.org/10.1016/0031-3203(94)90121-X |
[5] | Lee, H.J. and Wang, J.S. (1995) Design of a Mathematical Expression Recognition System. Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, Canada, 14-16 August 1995, 1084-1087. |
[6] | Scientific, W. (1997) Handbook of Character Recognition and Document Image Analysis. World Scientific, Singapore. |
[7] | 王琪辉. 基于深度学习的印刷体数学公式符号识别方法研究[D]: [硕士学位论文]. 沈阳: 沈阳工业大学, 2016. |
[8] | 张学工. 关于统计学习理论与支持向量机[J]. 自动化学报. 2000, 26(1): 32-42. |