全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
电子学报  2012 

基于ODP的上下文主题描述方法

DOI: 10.3969/j.issn.0372-2112.2012.11.028, PP. 2320-2323

Keywords: 主题爬行,下文相关,特征选择,主题描述

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对以往主题描述方法未充分考虑主题上下文的问题,提出了基于ODP(开放式分类目录)的上下文主题描述方法.使用新的特征选择算法对主题特征进行了确定,并使用分类主题树的上下文对主题描述方法进行优化以提高主题爬行的性能.实验表明,该特征选择算法能够有效地提取出主题特征,并在保证正确率的基础上尽量减少特征维数以提高计算效率.同时,该主题描述算法充分考虑了主题上下文关系,且无论是在准确性还是在信息量总和上都有良好的性能.

References

[1]  AHMED P,NIKITA S.Application of structured document parsing to focused web crawling[J].Computer Standards & Interfaces,2011,33(3):325-331.
[2]  LIU H Y,JEANNETTE J,EVANGELOS M.Using HMM to learn user browsing patterns for focused Web crawling[J].Data & Knowledge Engineering,2006,59(2):270-291.
[3]  CHEN Z M,MA J,LEI J S,et al.A cross-language focused crawling algorithm based multiple relevance prediction strategies[J].Computers and Mathematics with Applications.2009,57(6):1057-1072.
[4]  NIRAN A,ARNON R.Learnable crawling:an efficient approach to topic-specific web resource discovery .The 2002 International Symposium on Communication and Information Technology .Songkla:IEEE Press,2002.1034-2038.
[5]  WANG C,GUAN Z Y,CHEN C,et al.On-line topical importance estimation:an effective focused crawling algorithm combining link and content analysis[J].Journal of Zhejiang University SCIENCE A,2009,10(8):1114-1124.
[6]  SOUMEN C,KUNAL P,MALLEL A S.Accelerated focused crawling through online relevance feedback .The 7th International World Wide Web Conference .Brisbane:Elsevier Science B.V.Press,2002.336-348.
[7]  SELMA A.A Web page classification systems based on a genetic algorithm using tagged-terms as features[J].Expert Systems with Application.2011,38(4):3407-3415.
[8]  NETSCAPE.Open Directory Project .http://www.dmoz.org/,2011-5-18.
[9]  刘桃,等.领域术语自动抽取及其在文本分类中的应用[J].电子学报,2007,35(2):328-332. Liu Tao,et al.Automatic domain-specific extraction and its application in text classification [J].Acta Electronica Sinica,2007,35(2):328-332.(in Chinese)
[10]  戴新宇,等.一种基于潜在语义分析和直推式谱图算法的文本分类方法LSASGT.电子学报,2005,36(8):1626-1630. Dai Xin-yu,et al.LSASGT:an approach to text categorization based on latent semantic analysis and spectral graph transducer[J].Acta Electronica Sinica,2005,36(8):1626-1630.(in Chinese)

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133