An Ontology Based Approach for Automatically Annotating Document Segments

This paper presents an approach for automatically annotating document segments within information rich texts using a domain ontology. The work exploits the logical structure of input documents in order to achieve its task. The underlying assumption behind this work is that segments in such documents embody self contained informative units. Another assumption is that segment headings coupled with a documents hierarchical structure offer informal representations of segment content; and that matching segment headings to concepts in an ontology/thesaurus can result in the creation of formal labels/meta-data for these segments. A series of experiments was carried out using the presented approach on a set of Arabic agricultural extension documents. The results of carrying out these experiments demonstrate that the proposed approach is capable of automatically annotating segments with concepts that describe a segments content with a high degree of accuracy.


