%0 Journal Article
%T A Method to Query Document Database by Content and Structure
一种通过内容和结构查询文档数据库的方法
%A WANG Xiao-Ling
%A WEN Ji-Rong
%A LUAN Jin-Feng
%A MA Wei-Ying
%A DONG Yi-Sheng
%A
王晓玲
%A 文继荣
%A 栾金锋
%A 马维英
%A 董逸生
%J 软件学报
%D 2003
%I
%X Structured documents are made up of a few logical components, such as title, sections, subsections andparagraphs. The components in each structured document can be represented by an ordered tree model, which canalso be viewed as a hierarchical concept relationship. To meet the user's requirements for more precise andconcentrated search results, the retrieval techniques should allow the user to retrieve document components withvarying granularity. This paper presents a method to query document database by content and structure. The keyidea is to construct a more comprehensive similarity function by taking advantage of the inherent hierarchicalstructure in documents. This work combines Information Retrieval techniques, semi-structured data query andproximate search for document documents. The proposed method is evaluated on the Encarta encyclopediadocument set and the experimental results show that it can provide more accurate and focused answers thantraditional document retrieval methods.
%K document database
%K information retrieval
%K passage retrieval
%K structured document
文档数据库
%K 信息检索
%K 段落检索
%K 结构化文档
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=F36997BFCA1C979E&yid=D43C4A19B2EE3C0A&vid=F3583C8E78166B9E&iid=94C357A881DFC066&sid=816AB2919A4FEDD7&eid=88B4027FEBE4F5FF&journal_id=1000-9825&journal_name=软件学报&referenced_num=4&reference_num=8