|
计算机应用 2008
Optimized Web information extraction based on XQuery
|
Abstract:
Due to lack of the analysis of the adaptability of the Web page's characteristics, the current typical systems can hardly provide robust extraction rules. This paper proposed an optimized Web information extraction method which divided rules into three associated layers, suggested an optimized algorithm for extraction rules from the view of the precision and recall ratio through analyzing the adaptability of the page's characteristics, and expressed the complicated object rule in standard XQuery. Experiments indicate that our approach enhances the robustness and usability of the rules.