|
现代图书情报技术 2005
on the Specific Topic on Web
|
Abstract:
Information Retrieval (IR) on the Web is the automatic retrieval of all relevant documents, the same as resource finding of intended Web documents, while the same time retrieves as few of the non - relevant as possible. Web IR has become very popular and favorite at present. It concentrates on the using traditional text IR methods in the Internet, as well as the properties of Web graph. This research focuses on how to effectively and broadly get relevant Web pages and contents, filter Web pages and assign proper labels for them. Accurate finding user-specific information in the Web is very difficult. And traditional Web search engines take a query as input and produce a set of (hopefully) relevant pages that match the query terms. While useful in many circumstances, search engines have the disadvantage that users have to formulate queries that specify their information need, which is prone to errors. Based on the discussion of Page Rank, HITS and similarity between Web texts, some new algorithms called RG-HITS ( Resemblance Graph-HITS) for finding relevant documents on the Web are introduced.