%0 Journal Article %T 针对无切分维吾尔文文本行识别的字符模型优化<br>Character model optimization for segmentation-free Uyghur text line recognition %A 姜志威 %A 丁晓青 %A 彭良瑞 %J 清华大学学报(自然科学版) %D 2015 %X 基于隐含Markov模型(hidden Markov model, HMM)的无切分文本行识别方法能够利用概率图的思想, 同步完成文本行图像的切分与识别, 避免因字符预切分失败而导致的识别错误, 但对字符模型的设计与训练要求很高, 并且在多字体融合问题中难以提高模型泛化性能。该文通过分析模型状态在图像层面的聚类意义, 先提出基于观测合理聚类的模型结构优化方法, 再提出结构与参数相结合的字符模型优化策略, 最后将其应用于多字体维吾尔文文本行的无切分识别系统。实验结果表明, 该方法能够改善模型的状态分配合理性, 并且在多字体融合问题中提高了模型泛化性能和状态利用效率。<br>Abstract:A text line recognition method was developed without pre-segmentation using a hidden Markov model (HMM) for simultaneously segmenting and recognizing text line images. The algorithm uses a probability graph to reduce recognition error from failed pre-segmentation results. However, the HMM design and training is complicated and the HMM generalization performance can not be easily improved in multi-font texts. Therefore, a character model optimization method with reasonably clustered observations was developed based on the most common HMM state in images. Then, a method was developed to optimize the model structure and parameters together for a multi-font Uyghur text line recognition system. Tests show that this method improves the state allocation, the generalization performance and the state efficiency of the character model for multi-font texts. %K 信息处理 %K 文字识别 %K 隐含Markov模型 %K 统计学习 %K 维吾尔文 %K < %K br> %K information processing %K character recognition %K hidden Markov model (HMM) %K statistical learning %K Uyghur %U http://jst.tsinghuajournals.com/CN/Y2015/V55/I8/873#FigureTableTab