全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Method of paper information extraction based on HTML tree and template
树和模板的文献信息提取方法研究*

Keywords: Web information extraction,DOM tree,template,document information extraction
网页信息提取
,文档对象模型树,模板,文献信息搜集

Full-Text   Cite this paper   Add to My Lib

Abstract:

The automatic collection of the teacher research paper information is an important means of effective management of scientific research, there is a broad application prospects to apply the method of Web page information extraction to the paper information collection. This paper proposed a method of paper information collection based on the HTML tree and template. This method would represent the Web page into a DOM tree using the hierarchy relationship of the HTML tags, then the DOM tree would be used to the measure of the page similarity and the classification of Web pages. The information of Web pages with high similarity would be extracted using the same template. The experiment result shows that the accuracy of this method is above 94% in collecting the paper information from the Web database.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133