|
Document Clustering in Web Search EngineKeywords: Document clustering , k-means , Fast kmeans algorithm Abstract: As the number of web pages grows, it becomes more difficult to find the relavant documentsfrom the information retrieval engines, so by using clustering concept we can find the grouped relavant documents. The main purpose of clustering techniques is to partitionate a set of entities into different groups, called clusters. These groups may be consistent in terms of similarity of its members. As the name suggests, the representative-based clustering techniques uses some form of representation for each cluster. Thus, every grouphas a member that represents it. The main use is to reduce the cost of the algorithm, the use of representatives makes the process easier to understand. The most popular Clustering technique is the k-means algorithm where it has a lot ofdisadvantages, it works very slow and it is not applicable for large databases. So fast greedy kmeans algorithm is used, which overcomes thedrawbacks of k-means algorithm and it is very much accurate and efficient. So we introduce an efficient method to compute the distortion for this algorithm.
|