全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

AUTOMATIC DOCUMENT STRUCTURE ANALYSIS OF STRUCTURED PDF FILES

Full-Text   Cite this paper   Add to My Lib

Abstract:

Portable Document Format (PDF) is the most comfortable way to publish information because of its operating system independent. However, information on PDF document is unstructured and are applicable only for human reader. In addition, PDF consists of non-tagged internal structure which make the extraction task difficult. Automatically details analyzing and recognizing of PDF document structures especially paragraph and tabular area is vital for extracting relevant information precisely for use in other domain applications. Motivation of this study is to support knowledge extraction and exploit its actual semantic for improving further analysis. This paper proposed an intelligent approach to identify and recognize automatically the layout and structure of PDF documents together with their text and then structure the extracted information into ontological- based representation. An experimental study has been conducted using a collection of construction tender documents in PDF to test the performance of the proposed approach. The accuracies of precision, recall and f-measures have shown significant results when detecting tabular and paragraph structure.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133