|
计算机应用研究 2009
Template-based information automatic extraction of Web
|
Abstract:
In order to deal with the contradiction between accuracy and efficiency in the traditional Web information extraction,proposed one method to automatically extract Web information,which was based on the combination of template and machine automatic diagnosis.First,used a set of heuristic rules of automatic diagnosis to detect separating characters between different attributes in HTML text,and deployed those characters to the template,then based on the template analyzed Web page of the same kind,and finally s...