In handwritten character recognition, benchmark database plays an important role in evaluating the performance of various algorithms and the results obtained by various researchers. In Devnagari script, there is lack of such official benchmark. This paper focuses on the generation of offline benchmark database for Devnagari handwritten numerals and characters. The present work generated 5137 and 20305 isolated samples for numeral and character database, respectively, from 750 writers of all ages, sex, education, and profession. The offline sample images are stored in TIFF image format as it occupies less memory. Also, the data is presented in binary level so that memory requirement is further reduced. It will facilitate research on handwriting recognition of Devnagari script through free access to the researchers. 1. Introduction With the advent of development in computational power, machine simulation of human reading has become a topic of serious research. Optical character recognition (OCR) and document processing have become the need of time with the popularization of desktop publishing and usage of internet. OCR involves recognition of characters from digitized images of optically scanned document pages. The characters thus recognized from document pages are coded with American Standard Code for Information Interchange (ASCII) or some other standard codes like UNICODE for storing in a file, which can further be edited like any other file created with some word processing software. A lot of research has been done in developed countries for English, European, and Chinese languages. But there is a dearth of need to carry out research in Indian languages. One common problem with the research is the need of benchmark database. To facilitate results on uniform data set, several document processing research groups have collected large numeral and character databases to make it available to the fellow researchers around the world. However, such existing databases are available only in few languages such as English, Japanese, and Chinese . These standard databases include MNIST, CEDAR , and CENPARMI in English. Some work is also done for Indic scripts such as Bangla , Kannada , and Devnagari [5–8]. India is a multilingual and multiscript country having more than 1.2 billion population with 22 constitutional languages and 10 different scripts. Devnagari is the most popular script in India. Hindi, the national language of India which is spoken by more than 500 million population worldwide, is written in the Devnagari script. Moreover, Hindi is the
T. Saito, H. Yamada, and K. Yamamoto, “On the database ELT9 of hand printed characters in JIS Chinese characters and its analysis,” Transactions of the Institute of Electronics and Communication Engineers of Japan, vol. J.68-D, no. 4, pp. 757–764, 1985 (Japanese).
U. Bhattacharya and B. B. Chaudhuri, “Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 444–457, 2009.
U. Bhattacharya and B. B. Chaudhuri, “Databases for research on recognition of handwritten characters of Indian scripts,” in Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR '05), pp. 789–793, September 2005.
R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu, “CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image,” International Journal on Document Analysis and Recognition, vol. 15, no. 1, pp. 71–83, 2012.
M. P. Kumar, S. R. Kiran, A. Nayani, C. V. Jawahar, and P. J. Narayanan, “Tools for developing OCRs for Indian scripts,” in Proceedings of the Computer Vision and Pattern Recognition Workshop (CVPRW '03), pp. 33–38, 2003.
A. Aleai, P. Nagbhushan, and U. Pal, “Benchmark Kannada handwritten document database and its segmentation,” in Proceedings of the International Conference on Document Analysis and Research (ICDAR '11), pp. 141–145, 2011.
U. Bhattacharya and B. B. Chaudhuri, “A majority voting scheme for multi-resolution recognition of hand printed numerals,” in Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR '2003), 2003.
C. V. Jawahar, J. P. Pavan Kumar, and S. S. Ravi Kiran, “A bilingual OCR for Hindi-Telugu documents and its applications,” in Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR '2003), pp. 1–7, 2003.
R. J. Ramteke, P. D. Borkar, and S. C. Mehrotra, “Recognition of isolated Marathi handwritten numerals: an invariant moments approach,” in Proceedings of the International Conference on Cognition and Recognition, pp. 482–489, 2005.
T. K. Bhowmik, S. K. Parui, and U. Roy, “Discriminative HMM training with GA for handwritten word recognition,” in Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), IEEE, Tampa, Fla, USA, December 2008.
B. V. Dhandra, R. G. Benne, and M. Hangarge, “Kannada, Telugu and Devnagari handwritten numeral recognition with probabilistic neural network: a novel approach,” International Journal of Computer Applications, pp. 83–88, 2010, IJCA special issue on recent trends in image processing and pattern recognition, RTIPPR.
B. Singh, A. Mittal, and D. Ghosh, “An evaluation of different feature extractors and classifiers for offline handwritten Devnagari character recognition,” Journal of Pattern Recognition Research, vol. 2, pp. 269–277, 2011.
V. J. Dongre and V. H. Mankar, “Devnagari document segmentation using histogram approach,” International Journal of Computer Science, Engineering and Information Technology, vol. 1, no. 3, pp. 46–53, 2011.