%0 Journal Article %T Method of paper information extraction based on HTML tree and template
树和模板的文献信息提取方法研究* %A LI Wen-li %A WANG Le-chao %A SONG Chun-lei %A
李文立 %A 王乐超 %A 宋春雷 %J 计算机应用研究 %D 2010 %I %X The automatic collection of the teacher research paper information is an important means of effective management of scientific research, there is a broad application prospects to apply the method of Web page information extraction to the paper information collection. This paper proposed a method of paper information collection based on the HTML tree and template. This method would represent the Web page into a DOM tree using the hierarchy relationship of the HTML tags, then the DOM tree would be used to the measure of the page similarity and the classification of Web pages. The information of Web pages with high similarity would be extracted using the same template. The experiment result shows that the accuracy of this method is above 94% in collecting the paper information from the Web database. %K Web information extraction %K DOM tree %K template %K document information extraction
网页信息提取 %K 文档对象模型树 %K 模板 %K 文献信息搜集 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=1DF53C4F75977B9250A55F662F7B74E4&yid=140ECF96957D60B2&vid=DB817633AA4F79B9&iid=59906B3B2830C2C5&sid=4464691482A95011&eid=81DD5B1D08F1ACF9&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=1&reference_num=11