%0 Journal Article %T Classification of Deep Web Databases Based on the Context of Web Pages
基于网页上下文的Deep Web数据库分类 %A MA Jun %A SONG Ling %A HAN Xiao-Hui %A YAN Po %A
马 军 %A 宋 玲 %A 韩晓晖 %A 闫 泼 %J 软件学报 %D 2008 %I %X New techniques are discussed for enhancing the classification precision of deep Web databases, which include utilizing the content texts of the HTML pages containing the database entry forms as the context and a unification processing for the database attribute labels. An algorithm to find out the content texts in HTML pages is developed based on multiple statistic characteristics of the text blocks in HTML pages. The unification processing for database attributes is to let the attribute labels that are closed semantically be replaced with delegates. The domain and language knowledge found in learning samples is represented in hierarchical fuzzy sets and an algorithm for the unification processing is proposed based on the presentation. Based on the pre-computing a k-NN (k nearest neighbors) algorithm is given for deep Web database classification, where the semantic distance between two databases is calculated based on both the distance between the content texts of the HTML pages and the distance between database forms embedded in the pages. Various classification experiments are carried out to compare the classification results done by the algorithm with pre-computing and the one without the pre-computing in terms of classification precision, recall and F1 values. %K deep Web %K hidden Web %K database classification %K content text extraction %K semantic classification
deep %K Web %K 隐式Web %K 数据库分类 %K 内容文本抽取 %K 语义分类 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=42FA206B40E3EB083BDB3C553C53CA92&yid=67289AFF6305E306&vid=2A8D03AD8076A2E3&iid=0B39A22176CE99FB&sid=866F8A6B640835A7&eid=6826CBE9C80ACB20&journal_id=1000-9825&journal_name=软件学报&referenced_num=0&reference_num=19