OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of New Computer Architectures and their Applications 2011

AUTOMATIC DOCUMENT STRUCTURE ANALYSIS OF STRUCTURED PDF FILES

Rosmayati Mohemad,Abdul Razak Hamdan,Zulaiha Ali Othman,Noor MaizuraMohamad Noor

Full-Text Cite this paper Add to My Lib

Abstract:

Portable Document Format (PDF) is the most comfortable way to publish information because of its operating system independent. However, information on PDF document is unstructured and are applicable only for human reader. In addition, PDF consists of non-tagged internal structure which make the extraction task difficult. Automatically details analyzing and recognizing of PDF document structures especially paragraph and tabular area is vital for extracting relevant information precisely for use in other domain applications. Motivation of this study is to support knowledge extraction and exploit its actual semantic for improving further analysis. This paper proposed an intelligent approach to identify and recognize automatically the layout and structure of PDF documents together with their text and then structure the extracted information into ontological- based representation. An experimental study has been conducted using a collection of construction tender documents in PDF to test the performance of the proposed approach. The accuracies of precision, recall and f-measures have shown significant results when detecting tabular and paragraph structure.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133