全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

Examination of Extraction Rules in Web Data Extraction

Keywords: ??kar?m y?ntemleri,Web veri ??kar?m?,DOM,Düzenli ifadeler

Full-Text   Cite this paper   Add to My Lib

Abstract:

Extracting the desired data from the web page is important issue for applications in the fields of data mining and information retrieval. DOM-based methods or regular expressions can be used to extract data from a web page. For this extraction process, multiple extraction rules can be prepared for both DOM-based methods and regular expressions. In this study, the effectiveness of obtaining more than one data with extraction rules is investigated. As a data set, fifteen websites including in the fields of news, film and shopping have been selected. Extraction rule files have been created for data extraction with different extraction techniques for these websites. Web sites are mainly focused on repetitive data such as reviews. Experiments have shown that regular expressions, the creation process is more laborious and time consuming, give better results than DOM-based methods. Among the DOM-based methods, the lxml parser library provided the best results as expected. Experiments indicate that the extraction rules prepared by a developer affect the extraction time. As a result, it is possible to extract the desired data much faster in web pages with the well-prepared regular expressions

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133