%0 Journal Article %T Implementation of Web information extraction system based on similar pages
基于相似页面的Web信息抽取系统的实现 %A GONG Zheng-xian %A ZHU Qiao-ming %A LI Pei-feng %A
贡正仙 %A 朱巧明 %A 李培峰 %J 计算机应用 %D 2006 %I %X The core algorithm of RoadRunner was analyzed. After analyzing the deficiencies of RoadRunner, a Web information extraction system based on similar pages was designed and implemented. The system architecture was introduced, then the key techniques, such as the method for getting similar Web pages, reliably dealing with Web nosy blocks and automatically deducing rules for extracting data items were presented. %K RoadRunner
Web页面 %K 相似页面 %K 信息抽取 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=831E194C147C78FAAFCC50BC7ADD1732&aid=3AC0A24610D9C0AF&yid=37904DC365DD7266&vid=96C778EE049EE47D&iid=5D311CA918CA9A03&sid=A7F20A391020FDEE&eid=4E65715CCF57055A&journal_id=1001-9081&journal_name=计算机应用&referenced_num=0&reference_num=10