OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

基于K-最近距离的自动文本分类的研究

, PP. 42-46

Full-Text Cite this paper Add to My Lib

Abstract:

提出并实现了利用统计词频信息和语言信息相结合的方法选择特征,计算特征的权重值时不仅考虑词频,还利用了特征的集中度、分散度.经过训练和统计对每一类文本形成特征的权重向量,利用K-最近距离的方法对测试集进行分类.对英文文本的测试结果表明,该算法提高了文本分类的准确率.

References

[1]	Christopher D M,Hinrich Schutze.Foundations of statistical natural language processing[M].MIT
[2]	Press,1999.
[3]	Cover T M,Hart P E.Nearest neighbor pattern classification[J]. IEEE Transactions on Information
[4]	Theory,1967,IT-13(1):21-27.
[5]	Apte C.Automated learning of decision rules for text categorization[J].ACM Transaction on Information Systems,1994,12(3):5-9.
[6]	Yang Yiming, Jan O P.A comparative study on feature selection in text categorization[A].Proceedings
[7]	of ICML-97,14th International Conference on Machine Learning[C].1997.
[8]	李国臣.文本分类中基于对数似然比测试的特征词选择方法[J].中文信息学报,1997,13(4):10-15.
[9]	Yang Yiming.An evaluation of statistical approaches to text categorization[J].Information Retrieval,
[10]	1999,1(1-2):69-90.
[11]	David D L,Marc Ringuette.A comparison of two learning algorithms for text categorization[A].Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval[C].1994.21-25.

Contact Us

service@oalib.com

WhatsApp +8615387084133