%0 Journal Article %T A Method of Web Information Extraction Based on Classification Algorithm
一种基于分类算法的网页信息提取方法 %A WANG Jian-Wei %A YANG Dong-Qing GAO Jun WANG Teng-Jiao %A
汪建伟 %A 杨冬青 %A 高军 %A 王腾蛟 %J 计算机科学 %D 2008 %I %X In the research of Web information extraction,most of the existing algorithms are based on HTML structure. As the structure of HTML files changes frequently,wrapper must be updated accordingly. But the update of wrapper needs a lot of domain knowledge. In this paper,a new Web information extraction method based on classification algorithm is provided,which can group the Web text by HTML text display attributes. The information extraction of Web pages is finished by classifying the Web text with different va... %K Web information extraction %K Attribute vector %K Wrapper %K Display attributes
信息提取 %K 属性向量 %K Wrapper %K 显示属性 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=1B9BB928ECED366A0B15396639669C60&yid=67289AFF6305E306&vid=6209D9E8050195F5&iid=38B194292C032A66&sid=C753EB8AC8F551B9&eid=39EEF47180459690&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=7