全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于分块重要度和二维条件随机场的web信息抽取

DOI: DOI:10.13232/j.cnki.jnju.2014.01.012

Keywords: web对象,信息抽取,网页分块,分块重要度模型,二维条件随机场

Full-Text   Cite this paper   Add to My Lib

Abstract:

网页分块方法使得web信息抽取的单位由原来的页面缩小为分块。结合分块重要度模型与二维条件随机场的优点,提出一种web对象信息抽取方法。该方法利用分块重要度模型对网页分块进行重要度标注,过滤掉大量与主题无关信息,更加准确的定位待抽取信息的位置。二维条件随机场模型相比传统的线性条件随机场模型更好的适应了网页分块的二维结构,有效的提高信息抽取准确率。实验结果表明,该方法对web对象信息抽取具有良好的效果。

References

[1]  laendera,ribeiro-netob,desilvaa,etal.abriefsurveyofwebdataextractiontools.sigmodrecord,2002,31(2):84~93.[2]niez,may,shis.webobjectretrieval.proceedingsofthe16thinternationalconferenceonworldwideweb.banff,canada:acm,2007:81~90.[3]韩先培,刘康,赵军.基于布局特征与语言特征的网页主要内容块发现.中文信息学报,2008,22(1):15~21.[4]songr,liuh,wenjr,etal.learningblockimportancemodelsforwebpages.proceedingsofthe13thinternationalconferenceonworldwideweb.newyork:acm,2004:203~211.[5]zhuj,niez,wenj,etal.2dconditionalrandomfieldsforwebinformationextraction.proceedingsofthe22thinternationalconferenceonmachinelearning.newyork:acm,2005:1044~1051.[6]顾韵华,田伟.基于dom模型扩展的web信息提取.计算机科学,2009,36(11):235~237.[7]chenjl,zhouby,shij,etal.function-basedobjectmodeltowardswebsiteadaptation.proceedingsofthe10thworldwidewebconference.hongkong:acmpress,2001:587~596.[8]chenjj,jiajy,duanlg.domsemanticexpansion-basedextractionoftopicalinformationfromwebpages.webinformationsystemsandmining,2011,6988:343~350.[9]liul,caltonp,hanw.xwrap:anxml-enabledwrapperconstructionsystemforwebinformationsources.proceedingsofthe16thieeeinternationalconferenceondataengineering.washington,dc:ieeecomputersociety,2000:611~621.[10]linsh,hojm.discoveringinformativecontentblocksfromwebdocuments.proceedingsofthe8thacmsigkddinternationalconferenceonknowledgediscoveryanddatamining.newyork:acm,2002:588~593.[11]caid,yus,wenjr,etal.vips:avision-basedpagesegmentationalgorithm.microsofttechnicalreport,msr-tr-203-79,2003.[12]耿焕同,宋庆席,何宏强.一种基于视觉分块的web信息抽取方法研究.情报理论与实践,2009,32(3):106~109.[13]fayzrakhmanovr.informationextractionfromwebpagesbasedontheirvisualrepresentation.currenttrendsinwebengineering,2012,7059:342~346.[14]wangp,zhoumq,youy,etal.anewvision-basedmethodforextractingacademicinformationfromconferencewebpages.proceedingsofthe24thieeeinternationalconferenceontoolswithartificialintelligence.athens,ieee,2012:976~981.[15]burgesc.atutorialonsupportvectormachinesforpatternrecognition.dataminingandknowledgediscovery,1998,2(2):955~974.[16]laffertyj,mccalluma,pereiraf.conditionalrandomfields:probabilisticmodelsforsegmentingandlabelingsequencedata.proceedingsofthe18thinternationalconferenceonmachinelearning.sanfrancisco:morgankaufmannpublishersinc,2001:282~289.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133