%0 Journal Article
%T Research and Realization of a Web Information Extraction and Knowledge Presentation System
Web信息抽取及知识表示系统的研究与实现
%A TAN Shou-Biao
%A XU Chao
%A JIANG Yuan
%A NING Ren-Xia
%A
谭守标
%A 徐超
%A 江元
%A 宁仁霞
%J 计算机系统应用
%D 2010
%I
%X The Web Information Extraction and Knowledge Presentation System is proposed to extract information from data intensive web pages. It downloads dynamic web pages, based on a knowledge database, changes them to XML documents after preprocessing, finds repeated patterns from them, by using a PAT-array based Pattern Discovery Algorithm, recognizes their data display structure models, automatically based on the repeated patterns and an ontology-based keyword library, and then extracts the data and stores them in the knowledge database with the object-relational mapping technology of XML. Through these steps, web data is extracted automatically, and the knowledge database is also expanded automatically. Experiments on the traffic information auto-extraction and mixed traffic travel schemes auto-creation system showed that the system has high precision and is adaptive to web pages in different domains with different structures.
%K web information extraction
%K knowledge presentation
%K data intensive web pages
%K ontology-based keyword library
Web信息提取
%K 知识表示
%K 数据密集型Web页面
%K 基于本体的关键词库
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=D4F6864C950C88FFCE5B6C948A639E39&aid=B19704F7B701FC29A4A436A5A8C31250&yid=140ECF96957D60B2&vid=2A8D03AD8076A2E3&iid=9CF7A0430CBB2DFD&sid=CA4FD0336C81A37A&eid=E158A972A605785F&journal_id=1003-3254&journal_name=计算机系统应用&referenced_num=0&reference_num=11