%0 Journal Article %T A Novel Approach to Detect the Near Duplicate by Refining Provenance Matrix %A Tanvi Gupta %A Asst.Prof. Latha Banda %J International Journal of Computer Technology and Applications %D 2012 %I Technopark Publications %X In this paper, the provenance matrix is refined to get more accuracy and efficiency in detecting near-duplicates by adding two more factors ¡®How¡¯ and ¡®Why¡¯ , as the performance of the web search depends on the search results having information without duplicates or redundancy . More redundancy leads to more time consume and more storage, that¡¯s why search engines try to avoid indexing of duplicates documents. Provenance model combines both the content-based and trust-based factors for classifying near-duplicates or original documents, as now a days, many of near-duplicates are from the distrusted websites %K near-duplicates %K Provenance %K distrusted %K provenance matrix %K trustworthiness %U http://ijcta.com/documents/volumes/vol3issue1/ijcta2012030142.pdf