全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
电子学报  2013 

基于主题和表单属性的深层网络数据源分类方法

DOI: 10.3969/j.issn.0372-2112.2013.02.009, PP. 260-266

Keywords: 表单主题和属性,查询接口标记,深层网络,数据源自动分类

Full-Text   Cite this paper   Add to My Lib

Abstract:

当前深层网络中蕴含着高质量的海量信息并且其数量不断地增长,由于深层网络具有分布、异构、自治等特点,用户高效、快捷地获取自己感兴趣的信息面临巨大挑战.将深层网络数据源按领域分类是解决这一挑战的基础.本文以对航空订票、图书、汽车和房地产领域的200多个数据源的统计和分析为基础,充分利用主题和表单属性信息,提出了一种新的深层网络数据源分类方法以及改进的查询接口相似性度量方法,实现深层网络数据源的自动分类.本文还提出了一种查询接口标记策略,以降低随机选择初始中心点所产生的影响.实验结果表明该方法具有较高的分类精度.

References

[1]  Chang K C-C,He B,et al.Structured databases on the web:observations and implications[J].SIGMOD Record,2004,33(3):61-70.
[2]  Madhavan J,Cohen S,et al.Web scale data integration:you can afford to pay as you go[A].Proceedings of CIDR''07[C].United States:CIDR,2007.342-350.
[3]  刘伟,孟小峰,等.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. Liu Wei,Meng Xiaofeng,et al.A survey of deep web data integration[J].Chinese Journal of Computers,2007,30(9):1475-1489.(in Chinese)
[4]  Wu C M,Qiang B H,et al.Deep web classification based on domain feature text[J].International Journal of Advancements in Computing Technology,2011,3(6):267-275.
[5]  Noor U,Rashid Z,et al.TODWEB:training-less ontology based deep web source classification[A].ACM International Conference Proceeding Series[C].United States:ACM,2011.190-197.
[6]  Le H Q,Conrad S.Classifying structured web sources using aggressive feature selection[A].WEBIST 2009 [C].United States:ISA,2009.618-625.
[7]  He B,Tao T,et al.Organizing structured web sources by query schemas:a clustering approach[A].Proceeding of CIKM''04[C].United States:ACM,2004.22-31.
[8]  Gao Y,Qi H,et al.Semi-supervised k-means clustering for multi-type relational data[A].Proceedings of ICMLC''08[C].United States:IEEE,2008.326-330.
[9]  Nguyen H,Nguyen T,et al.Learning to extract form labels[A].Proceedings of the VLDB Endowment[C].New York:Springer,2008.684-694.
[10]  申德荣,刘丽楠,等.一种面向Deep Web数据源的重复记录识别模型[J].电子学报,2010,38(2):275-281. Sheng Derong,Liu Linan,et al.A duplicate records identification model for deep web data sources[J].Acta Electronica Sinica,2010,38(2):275-281.(in Chinese)
[11]  Feng Y,Zhou Q W.Attribute decentralization algorithm-based deep web sources classification[J].Advances in Information Sciences and Service Sciences,2012,4(1):423-431.
[12]  马军,宋玲,等.基于网页上下文的Deep Web数据库分类[J].软件学报,2008,19(2):267-274. Ma J,Song L,et al.Classification for deep web databases based on the context of web pages[J].Journal of Software,2008,19(2):267-274.(in Chinese)
[13]  Barbosa L,Freire J,et al.Organizing hidden-web databases by clustering visible web documents[A].Proceedings of International Conference on Data Engineering[C].United States:IEEE,2007.326-335.
[14]  Zhao P P,Huang L,et al.Organizing structured deep web by clustering query interfaces link graph[A].Lecture Notes in Computer Science[C].Germany:Springer,2008.683-690.
[15]  于娟,党延忠.领域特征词的提取方法研究[J].情报学报,2009,28(3):368-373. Yu Juan,Dang Yanzhong.Domain feature and its extracting approach[J].Journal of the China Society for Scientific and Technical Information,2009,28(3):368-373.(in Chinese)
[16]  Miller G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
[17]  CS,UIUC.The UIUC Web integration repository .http://metaquerier.cs.uiuc.edu/repository/,2003.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133