|
中国图象图形学报 2004
Page Segmentation and Classification Algorithm for Document Images
|
Abstract:
In this paper, a system valid of the segmentation and classification of skewed document images with irregular graph regions and form regions is proposed. In this system, the skew angle of the document images is detected with a novel algorithm based on the morphological operation of Hit-or-Miss and the hierarchical Hough transform. The former(Hit-or-Miss operation) is for the detection of the baseline points while the latter(Hough transform) is for the detection of the skew angle of the baseline which is also of the page image. To make the system valid for the document images with irregular graph regions involved, we proposed to introduce a middle point cut process to the traditional projection profile cut algorithm so that the irregular graph regions can be approximated with a lot of small rectangles. The segmented regions are classified with two features of the black to white ratio and the cross correlation between adjacent pixels of the sub-blocks. Experimental results have proved the fastness and the reliability of the system proposed in this paper.