全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于模式链分析的文本页面图像的分割与分类

DOI: 10.11834/jig.200506145

Keywords: 矩形框链表,模式链表,模式上下文,页面分割和分类

Full-Text   Cite this paper   Add to My Lib

Abstract:

为了能对复杂版式的文本图像(如包含镶嵌在文字中的形状不规则的图片区)的页面进行图文分割与分类,提出了一种新的基于模式链分析的文本页面分割与分类算法。该算法首先使用外接矩形框出图像中的所有黑像素,并且存入矩形框链表中,再组合所有相邻的矩形进而形成模式,最后依据各模式的统计特征分类,输出文字区和图片区两类图像。另外,对大图片模式周围个别不确定的模式,本文采用了上下文分类的算法进行再次分类。实验结果表明,该算法不仅运算速度快,而且能够对复杂版式的页面图像进行正确的图文分割和分类。

References

[1]  Nagy G, Seth S C. Document analysis with an expert system [A].In: Pattern Recognition Practice [M], Gelsema E S, Kanal L N Editors, North Holland: Elsevier Science Publishers B. V. , 1986:149~159.
[2]  Shulan Deng, Shahram Latifi, Emma Regetova. Document segmentation using polynomial spline wavelets [J]. Pattern Recognition, 2001,34 (12): 2533~2545.
[3]  Mitchell Phillip E, Yan Hong. Document page segmentation based on pattern spread analysis [J]. Optical Engineer, 2000, 39 ( 3 ):724 ~ 734.
[4]  Tseng Lin Yu, Chen Rung Ching. Segmenting handwritten Chinese characters based on heuristic merging of stroke bounding boxes and dynamic programming [J]. Pattern Recognition Letters, 1998,19(10): 963~973.
[5]  Abele L, Wahl F, Scherl W. Procedures for an automated segmentation of text, graphic and halftone regions in documents[A].In: Proceedings of the 2nd Seandinavian Conference on Image Analysis[C], Hellsinkii, 1981: 177 ~ 182.
[6]  Strouthopoulos C, Papamarkos N. PLA using RLSA and a neural network [J]. Engineering Applications of Artificial Intelligence,1999,12(2): 119~138.
[7]  Jaekyu Ha, Robert M Haralick, Ihsin T. Philips recursive X-Y cut using bounding boxes of connected components[A]. In: Proceedings of Third International Conference on Document Analysis and Recognition[C], Montreal, Canada, 1995:952 ~955.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133