%0 Journal Article %T L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises
L-tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises %A Xu-Bin Deng %A Yang-Yong Zhu %A
Xu-Bin %A Deng %A and %A Yang-Yong %A Zhu %J 计算机科学技术学报 %D 2005 %I %X In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China. %K data extraction %K data model %K extraction algorithm %K regular expression %K wrapper
树型匹配 %K 数据分离模型 %K 分离算法 %K 逻辑性 %K 数据库 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=F57FEF5FAEE544283F43708D560ABF1B&aid=B6AAD06A177D0F6AF9E2F852ECE3680E&yid=2DD7160C83D0ACED&vid=A04140E723CB732E&iid=B31275AF3241DB2D&sid=50FF665B2730AEEC&eid=1F7317C17A9AF4FA&journal_id=1000-9000&journal_name=计算机科学技术学报&referenced_num=3&reference_num=12