|
- 2016
COMPARISON OF DATA MINING ALGORITHMS, INVERTED INDEX SEARCH AND SUFFIX TREE CLUSTERING SEARCHKeywords: application, clustering, data mining, inverted index, Lucene, suffix tree Abstract: New documents are created every day and the number of digital documents in the world is exponentially growing. Search engines do a great job by making these documents easily available to the world population. Data mining works with large amount of data sets and offers data to the end user; it consists of many different techniques and algorithms. These techniques allow faster and better search for large amounts of data. Clustering is one of the techniques used in a data mining process; it is based on data grouping according to the features, or any property they have in common, thus, a search process is faster, and a user gets better search results. On the other hand, an inverted index is a structure that provides fast search too, but this structure does not create clusters or groups of similar data. Instead, it processes all data in a document and measures appearance of specific terms in a document. The goal of this paper is to compare these two algorithms. The authors created applications that use these two algorithms and tested them on the same corpus of documents. For both algorithms, the authors are presenting improvements that provide faster search and better search results
|