%0 Journal Article %T Research on Chinese/English Mixed Document Recognition
中英文混合文章识别问题 %A WANG Kai %A WANG Qing-Ren %A
王恺 %A 王庆人 %J 软件学报 %D 2005 %I %X Currently, OCR (optical character recognition) classifiers are generally designed for one character set (or language). On the other hand, multilingual document increases drastically due to the globalization. Therefore, designing a document processing system with multilingual capability is very important. A general scheme is presented in this paper: two OCR techniques, a system, and a language classification. For embodying the scheme, a Chinese/English mixed document processing system is implemented. Three key problems are considered: the control of the system flow, the classification of Chinese/English regions, and the segmentation of English characters. Compared with old systems presented in other papers, the module of the classification of Chinese/English regions is added in the system, and a novel approach based on the equidistance is applied to the module. To verify the effectiveness of the system, another system is implemented according to the methods presented in other papers. Experiment shows, the new system is more effective than the old system. The recognition rate increases from 98.48% to 99.13% on magazine samples and from 98.68% to 99.25% on book samples, respectively. %K systems design %K language discrimination %K character segmentation %K multilingual OCR (optical character recognition) system %K document image processing
系统设计 %K 语言判别 %K 字符切分 %K 多语光学字符识别系统 %K 文档图像处理 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=BC50E2826CD4BC28&yid=2DD7160C83D0ACED&vid=7801E6FC5AE9020C&iid=94C357A881DFC066&sid=CFBDB06850C21CC6&eid=B3645A659773B73C&journal_id=1000-9825&journal_name=软件学报&referenced_num=9&reference_num=16