全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
软件学报  2005 

Research on Chinese/English Mixed Document Recognition
中英文混合文章识别问题

Keywords: systems design,language discrimination,character segmentation,multilingual OCR (optical character recognition) system,document image processing
系统设计
,语言判别,字符切分,多语光学字符识别系统,文档图像处理

Full-Text   Cite this paper   Add to My Lib

Abstract:

Currently, OCR (optical character recognition) classifiers are generally designed for one character set (or language). On the other hand, multilingual document increases drastically due to the globalization. Therefore, designing a document processing system with multilingual capability is very important. A general scheme is presented in this paper: two OCR techniques, a system, and a language classification. For embodying the scheme, a Chinese/English mixed document processing system is implemented. Three key problems are considered: the control of the system flow, the classification of Chinese/English regions, and the segmentation of English characters. Compared with old systems presented in other papers, the module of the classification of Chinese/English regions is added in the system, and a novel approach based on the equidistance is applied to the module. To verify the effectiveness of the system, another system is implemented according to the methods presented in other papers. Experiment shows, the new system is more effective than the old system. The recognition rate increases from 98.48% to 99.13% on magazine samples and from 98.68% to 99.25% on book samples, respectively.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133