%0 Journal Article
%T Content extraction of Web pages based on characteristic symbols
一种基于特征符号的网页主题信息抽取方法
%A WANG Shu
%A ZHU Min
%A ZHANG Ming
%A NIU Hao
%A ZHAO Yu
%A
王舒
%A 朱敏
%A 张明
%A 牛颢
%A 赵瑜
%J 计算机应用研究
%D 2009
%I
%X With the popularity of the Internet, the large amounts of data on the Web provides many challenges for data mining techniques, especially for content extraction of Web pages. The existing methods can not guarantee the generality and effectiveness of Web mining approaches. By studying the internal structure of Web pages, this paper proposed an improved document tree model and discovered the general rules for analyzing it. In addition, extracted content from Web pages based on characteristic symbols. The experimental results prove that the proposed method is accurate as well as generic.
%K document tree model
%K characteristic symbols
%K relevance
%K content extraction
生成树模型
%K 特征符号
%K 相关度
%K 主题提取
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=A0438C44CDBA5ADF0351DA14B6C479AE&yid=DE12191FBD62783C&vid=96C778EE049EE47D&iid=59906B3B2830C2C5&sid=6CAC94E9D69540E7&eid=CBE1B0E213325D7F&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=15