%0 Journal Article %T Development of Comprehensive Devnagari Numeral and Character Database for Offline Handwritten Character Recognition %A Vikas J. Dongre %A Vijay H. Mankar %J Applied Computational Intelligence and Soft Computing %D 2012 %I Hindawi Publishing Corporation %R 10.1155/2012/871834 %X In handwritten character recognition, benchmark database plays an important role in evaluating the performance of various algorithms and the results obtained by various researchers. In Devnagari script, there is lack of such official benchmark. This paper focuses on the generation of offline benchmark database for Devnagari handwritten numerals and characters. The present work generated 5137 and 20305 isolated samples for numeral and character database, respectively, from 750 writers of all ages, sex, education, and profession. The offline sample images are stored in TIFF image format as it occupies less memory. Also, the data is presented in binary level so that memory requirement is further reduced. It will facilitate research on handwriting recognition of Devnagari script through free access to the researchers. 1. Introduction With the advent of development in computational power, machine simulation of human reading has become a topic of serious research. Optical character recognition (OCR) and document processing have become the need of time with the popularization of desktop publishing and usage of internet. OCR involves recognition of characters from digitized images of optically scanned document pages. The characters thus recognized from document pages are coded with American Standard Code for Information Interchange (ASCII) or some other standard codes like UNICODE for storing in a file, which can further be edited like any other file created with some word processing software. A lot of research has been done in developed countries for English, European, and Chinese languages. But there is a dearth of need to carry out research in Indian languages. One common problem with the research is the need of benchmark database. To facilitate results on uniform data set, several document processing research groups have collected large numeral and character databases to make it available to the fellow researchers around the world. However, such existing databases are available only in few languages such as English, Japanese, and Chinese [1]. These standard databases include MNIST, CEDAR [2], and CENPARMI in English. Some work is also done for Indic scripts such as Bangla [3], Kannada [4], and Devnagari [5¨C8]. India is a multilingual and multiscript country having more than 1.2 billion population with 22 constitutional languages and 10 different scripts. Devnagari is the most popular script in India. Hindi, the national language of India which is spoken by more than 500 million population worldwide, is written in the Devnagari script. Moreover, Hindi is the %U http://www.hindawi.com/journals/acisc/2012/871834/