%0 Journal Article
%T 针对无切分维吾尔文文本行识别的字符模型优化&lt;br&gt;Character model optimization for segmentation-free Uyghur text line recognition
%A 姜志威
%A 丁晓青
%A 彭良瑞
%J 清华大学学报（自然科学版）
%D 2015
%X 基于隐含Markov模型(hidden Markov model, HMM)的无切分文本行识别方法能够利用概率图的思想, 同步完成文本行图像的切分与识别, 避免因字符预切分失败而导致的识别错误, 但对字符模型的设计与训练要求很高, 并且在多字体融合问题中难以提高模型泛化性能。该文通过分析模型状态在图像层面的聚类意义, 先提出基于观测合理聚类的模型结构优化方法, 再提出结构与参数相结合的字符模型优化策略, 最后将其应用于多字体维吾尔文文本行的无切分识别系统。实验结果表明, 该方法能够改善模型的状态分配合理性, 并且在多字体融合问题中提高了模型泛化性能和状态利用效率。&lt;br&gt;Abstract：A text line recognition method was developed without pre-segmentation using a hidden Markov model (HMM) for simultaneously segmenting and recognizing text line images. The algorithm uses a probability graph to reduce recognition error from failed pre-segmentation results. However, the HMM design and training is complicated and the HMM generalization performance can not be easily improved in multi-font texts. Therefore, a character model optimization method with reasonably clustered observations was developed based on the most common HMM state in images. Then, a method was developed to optimize the model structure and parameters together for a multi-font Uyghur text line recognition system. Tests show that this method improves the state allocation, the generalization performance and the state efficiency of the character model for multi-font texts.
%K 信息处理
%K 文字识别
%K 隐含Markov模型
%K 统计学习
%K 维吾尔文
%K &lt
%K br&gt
%K information processing
%K character recognition
%K hidden Markov model (HMM)
%K statistical learning
%K Uyghur
%U http://jst.tsinghuajournals.com/CN/Y2015/V55/I8/873#FigureTableTab