全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises
L-tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

Keywords: data extraction,data model,extraction algorithm,regular expression,wrapper
树型匹配
,数据分离模型,分离算法,逻辑性,数据库

Full-Text   Cite this paper   Add to My Lib

Abstract:

In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133