|
计算机应用 2006
Text information extraction based on wrapper model
|
Abstract:
A new wrapper induction algorithm was proposed for text information extraction after analyzing two types of algorithms based on landmark and text pattern. The new algorithm can take the advantage of above-mentioned two algorithms. It can locate the information based on the landmark information of Web pages, and can use the text pattern to extract and filter large quantity of Web text. Experiment results show that the new method achieves higher accuracy and expressiveness of information extraction.