All Title Author
Keywords Abstract


CONTENT AND STRUCTURE BASED CLASSIFICATION OF XML DOCUMENTS

Keywords: XML documents , text classification , ‘k’ nearest neighbors , cosine similarity , tree structure

Full-Text   Cite this paper   Add to My Lib

Abstract:

The ever increasing amount of XML documents available on the World Wide Web demands automated tools and techniques that would make the search and retrieval of XML documents more effective and efficient. Classification of XML documents is one of the significant tasks which are being explored by many researchers in this direction. Due to the presence of inherent structure in the XML documents, conventional text classification methods cannot be used to classify XML documents directly. Hence, there is a need for the development of tools and techniques that automatically classifies XML documents. In this work, we have developed an algorithm based on ‘k’ nearest neighbors to classify XML documents by considering both the content and structure. The developed algorithm is tested on a subset of MEDLINE dataset for different values of ‘k’ and varying size of training set and the results are tabulated.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

微信:OALib Journal