OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2008

对文本分类评测方法稳定性的研究*

, PP. 12-17

龚笔宏,彭波

Keywords: 分类技术,评测方法,数据挖掘

Full-Text Cite this paper Add to My Lib

Abstract:

文本分类算法一般采用宏平均精度、宏平均召回率以及宏平均F1值作为评价指标,然而同一个分类器在不同数据集上所得的评测数值往往存在很大差异,使得评测数值只在特定的数据集上有价值,而在其他数据集上没有意义.为了解决这个问题,本文提出3个因素来刻画数据集对分类结果的影响,并利用这3个因素构造一种评测指标newmacroF1.这一评测指标将数据集的因素从评测过程中独立出来,使得newmacroF1表示的仅仅是分类算法本身.实验结果表明使用该评测指标同一分类器在不同的数据集上波动较小.通过分类器在1个数据集上的表现,可以近似计算得到该分类器在另一个数据集上的分类质量.

References

[1]	Sebastiani F. A Tutorial on Automated Text Categorization // Proc of the European Symposium on Telematics, Hypermedia and Artificial Intelligence. Varese, Italy, 1999: 105119
[2]	Harman D. Evaluation Issues in Information Retrieval. Information Processing and Management, 1992, 28(4): 439440
[3]	Yang Yiming, Liu Xin. A ReExamination of Text Categorization Methods // Proc of the ACM SIGIR Conference of the Research and Development in Information Retrieval. Berkeley, USA, 1999: 4249
[4]	Gong Bihong. The Guideline of Chinese Webpages Categorization Contest in SEWM2005[EB/OL]. [20050721]. http: //www. cwirf. org/ 2005 Web Track /SEWM 2005 ClassificationTrackGuidelines.pdf (in Chinese) (龚笔宏.SEWM2005中文网页分类评测指南[EB/OL]. [20050721]. http://www.cwirf.org/ 2005WebTrack/ SEWM2005 ClassificationTrackGuidelines.pdf)
[5]	Yang Yiming. An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval, 1999, 1(1/2): 6788
[6]	Hulth A, Megyesi B B. A Study on Automatically Extracted Keywords in Text Categorization // Proc of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. Sydney, Australia, 2006: 537544
[7]	RuizRico F, Vicedo J L, RubioSánchez M C. NEWPAR: An Automatic Feature Selection and Weighting Schema for Category Ranking // Proc of the ACM Symposium on Document Engineering. Amsterdam, Netherlands, 2006: 128137
[8]	Liu Tieyan, Yang Yiming, Wan Hao, et al. An Experimental Study on LargeScale Web Categorization // Proc of the 14th International Conference on World Wide Web. Chiba, Japan, 2005: 11061107
[9]	Buckley C, Voorhees E M. Evaluating Evaluation Measure Stability // Proc of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athens, Greece, 2000: 3340
[10]	Aslam J A, Yilmaz E. A Geometric Interpretation and Analysis of RPrecision // Proc of the 14th ACM International Conference on Information and Knowledge Management. Bremen, Germany, 2005: 664671

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133