OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用研究 2010

Method of paper information extraction based on HTML tree and template
树和模板的文献信息提取方法研究*

LI Wen-li,WANG Le-chao,SONG Chun-lei,
李文立,王乐超,宋春雷

Keywords: Web information extraction,DOM tree,template,document information extraction
网页信息提取,文档对象模型树,模板,文献信息搜集

Full-Text Cite this paper Add to My Lib

Abstract:

The automatic collection of the teacher research paper information is an important means of effective management of scientific research, there is a broad application prospects to apply the method of Web page information extraction to the paper information collection. This paper proposed a method of paper information collection based on the HTML tree and template. This method would represent the Web page into a DOM tree using the hierarchy relationship of the HTML tags, then the DOM tree would be used to the measure of the page similarity and the classification of Web pages. The information of Web pages with high similarity would be extracted using the same template. The experiment result shows that the accuracy of this method is above 94% in collecting the paper information from the Web database.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

Method of paper information extraction based on HTML tree and template树和模板的文献信息提取方法研究*

Method of paper information extraction based on HTML tree and template
树和模板的文献信息提取方法研究*