Enhanced Document Retrieval System Using Suffix Tree Clustering Algorithm

doi:10.4236/oalib.1110228

OALib Journal期刊
ISSN: 2333-9721
费用：99美元

查看量	下载量

Open Access Library Journal 10 2023

查看所有领域

Enhanced Document Retrieval System Using Suffix Tree Clustering Algorithm

DOI: 10.4236/oalib.1110228, PP. 1-10

Linda Uchenna Oghenekaro, Ifeanyi Emmanuel Olughu, Joshua Oluwasegun Jatto

Subject Areas: Information retrieval

Keywords: Retrieval System, Document, Clustering Algorithm, Suffix Tree, Document Clustering

Full-Text Cite this paper Add to My Lib

Abstract

Most search engines in use today present the user with a single-ordered list of documents matching the search query leading to lexical ambiguity. An alternative to a single-ordered list is to cluster the search results and present a list of clusters to the user. This study implements the suffix tree clustering algorithm to optimize search. The user selects which cluster appears most relevant and the search results in that cluster are then displayed in a list under the assumption that similar documents are likely to be relevant to the same query. The proposed system clusters search results from the file system. The proposed system allows the user to issue a search query and we return the results as a set of coherent clusters. The suffix tree clustering algorithm efficiently determined documents that share common phrases. The nodes in the suffix tree define the initial cluster and to increase the number of documents in each cluster, clusters that are sufficiently similar are merged. The proposed system adopted web technologies such as hypertext markup language (HTML), and cascade styling sheet (CSS), to design the interface, while Javascript programming language was used to implement the entire system. The proposed system was implemented using PHP5 and MySQL database. The experimental results show that the suffix tree clustering algorithm can be used to cluster documents efficiently. The resulting system demonstrated an optimized search of 4.1 trillion search results of the word “Electricity” whereas a total result of 4.3 trillion was retrieved by the conventional Google Search Engine.

Cite this paper

Oghenekaro, L. U. , Olughu, I. E. and Jatto, J. O. (2023). Enhanced Document Retrieval System Using Suffix Tree Clustering Algorithm. Open Access Library Journal, 10, e228. doi: http://dx.doi.org/10.4236/oalib.1110228.

References

[1]	Chim, H. and Deng, X. (2018) Efficient Phrase-Based Document Similarity for Clustering. IEEE Transaction on Knowledge and Data Engineering, 20, 1217-1229. https://doi.org/10.1109/TKDE.2008.50
[2]	Chung, S.M., Holt, J.D. and Li, Y. (2014) Text Document Clustering Based on Frequent Word Meaning Sequences. Data &Knowledge Engineering, 64, 381-404. https://doi.org/10.1016/j.datak.2007.08.001
[3]	Hill, D.R. (2016) A Vector Clustering Technique. In: Samuelson, K., Ed., Mechanized Information Storage, Retrieval and Dissemination, North Holland, Amsterdam.
[4]	Campi, A. and Ronchi, S. (2019) The Role of Clustering in Search Computing. 2009 20th International Workshop on Database and Expert Systems Application, Linz, 31 August-4 September 2009, 432-436.
[5]	Baeza-Yates, R.A. and Gonnet, G.H. (2021) Fast Text Searching for Regular Expressions or Automaton Searching on Tries. Journal of the ACM, 43, 915-936. https://doi.org/10.1145/235809.235810
[6]	Farach-Colton, M., Ferragina, P. and Muthukrishnan, S. (2010) On the Sorting-Complexity of Suffix Tree Construction. Journal of the ACM, 47, 987-1011. https://doi.org/10.1145/355541.355547
[7]	Giegerich, R. and Kurtz, S. (2016) From Ukkonen to MeCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction. Algorithmica, 19, 331-353.
[8]	Porter, M.F. (2022) An Algorithm for Suffix Stripping. Program: Electronic Library and Information Systems, 14, 130-137. https://doi.org/10.1108/eb046814
[9]	Hammouda, K.M. and Kamel, M.S. (2015) Efficient Phrase-Based Document Indexing for Web Document Clustering. IEEE Transaction on Knowledge and Data Engineering, 16, 1279-1296. https://doi.org/10.1109/TKDE.2004.58
[10]	Roccbio, J.J. (2019) Document Retrieval Systems—Optimization and Evaluation. Ph.D. Thesis, Harvard University, Boston.
[11]	Cutting, D.R., Karger, D.R, Pedersen, J.O. and Tukey, J.W. (2015) Scatter/Gather: A Cluster-Based Approach to Browsing Large Document Collections. Proceedings of the 15th International ACM SIGIR Conference on Research and Development in information Retrieval, Copenhagen, 21-24 June 1992, 318-329.
[12]	Oren, Z. and Oren, E. (1999) Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks, 31, 1361-1374.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133