%0 Journal Article
%T Extracting Semi-Structured Information from the WEB
从WEB文档中构造半结构化信息的抽取器
%A HUANG Yuq-ing
%A QI Guang-zhi
%A ZHANG Fu-yan
%A
黄豫清
%A 戚广志
%A 张福炎
%J 软件学报
%D 2000
%I
%X In order to integrate and query irregular and dynamic information on WEB in a database-like fashion, the authors use object exchange model (OEM) to construct information model of WEB in this paper. To express each component of pages as an OEM object, the authors design an algorithm which extracts semi-structured data from HTML pages, and the testing results are given. This method can extract structured and semi-structured data. It has better applicability than other existing methods.
%K Heuristics rule
%K data extracting format
%K object exchange model
启发式规则
%K 数据抽取格式
%K 对象交换模型.
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=0C94A01B1E360252&yid=9806D0D4EAA9BED3&vid=708DD6B15D2464E8&iid=CA4FD0336C81A37A&sid=B9704B40A4225A24&eid=46CB27789995047D&journal_id=1000-9825&journal_name=软件学报&referenced_num=32&reference_num=4