%0 Journal Article %T DF or IDF? On the Use of Primary Feature Model for Web Information Retrieval
DF还是IDF?主特征模型在Web信息检索中的使用 %A ZHANG Min %A MA Shao-ping %A SONG Rui-hua %A
张敏 %A 马少平 %A 宋睿华 %J 软件学报 %D 2005 %I %X In Web information retrieval (IR), input queries are too short and fuzzy to describe user request, which leads to the mismatch problem between user query and the documents full of redundancy and noise. This paper first studies the feature of web documents information and proposes the concepts of primary feature word, primary feature field and primary feature space (PFS). Then a new PFS query term weighting scheme is proposed, which takes document frequency (DF) into account instead of the traditional IDF factor. Finally, a combination strategy of term weighting is given. Using this PFS model, three groups of experiments have been performed on 10G and 19G large scale Web collections with TREC9, TREC10 and TREC11 standard tests of Web tracks. Comparative studies indicate that the new DF-related PFS term weighting improves the system performance consistently and effectively in terms of recall, top n precision and mean average precision. At most 18.6% improvement has been made. %K Web information retrieval %K primary feature model %K term weighting %K document frequency
Web信息检索 %K 主特征模型 %K 权值计算 %K 文档频度 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=7DEDE521DAB83B03&yid=2DD7160C83D0ACED&vid=7801E6FC5AE9020C&iid=94C357A881DFC066&sid=85873A559EE29055&eid=E947FD4445DA7BA0&journal_id=1000-9825&journal_name=软件学报&referenced_num=10&reference_num=15