|
A novel framework for Farsi and latin script identification and Farsi handwritten digit recognitionDOI: 10.2298/jac1001017b Keywords: curvature scale space , script identification , optical character recognition , CBWSVM , clustering , Handwritten digit recognition , PCA , PCA-LDA Abstract: Optical character recognition is an important task for converting handwritten and printed documents to digital format. In multilingual systems, a necessary process before OCR algorithm is script identification. In this paper novel methods for the script language identification and the recognition of Farsi handwritten digits are proposed. Our method for script identification is based on curvature scale space features. The proposed features are rotation and scale invariant and can be used to identify scripts with different fonts. We assumed that the bilingual scripts may have Farsi and English words and characters together; therefore the algorithm is designed to be able to recognize scripts in the connected components level. The output of the recognition is then generalized to word, line and page levels. We used cluster based weighted support vector machine for the classification and recognition of Farsi handwritten digits that is reasonably robust against rotation and scaling. The algorithm extracts the required features using principle component analysis (PCA) and linear discrimination analysis (LDA) algorithms. The extracted features are then classified using a new classification algorithm called cluster based weighted SVM (CBWSVM). The experimental results showed the promise of the algorithms.
|