全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2015 

领域实例迁移的交互文本非平衡情感分类方法
An Unbalanced Emotion Classification Method for Interactive Texts Based on Multiple??Domain Instance Transfer

DOI: 10.7652/xjtuxb201504011

Keywords: 交互文本,非平衡情感分类,多领域,实例迁移
interactive texts
,imbalanced sentiment classification,multiple domain,instance transfer

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对交互文本句子短、成分缺失、多领域下类分布不均衡导致的高维、特征值稀疏、正样本稀少的难点,提出面向目标数据集实例迁移的数据层面采样方法。该方法提出目标数据集和源数据集共性特征的Top??N信息增益和值占比函数,选择评价两个数据集实例相似度的特征;提出目标数据集和源数据集特征空间一致性处理方法,克服两者特征空间不一致的问题;提出分领域的实例选取与迁移方法,克服多领域下的类分布不均衡问题。实验结果表明:该方法有效缓解了交互文本的非平衡问题,使支持向量机、随机森林、朴素贝叶斯、随机委员会4个经典分类算法的加权平均的接收者运行特征曲线(receiver operating characteristic, ROC)指标提升了11.3%。
A data level sampling method of target dataset??oriented instance transfer is proposed to solve the problem that the characteristics of interactive texts such as short sentences, missing parts of sentences and unbalanced class distribution in multiple??domains result in difficulties of high dimension, sparse eigenvalue in feature space and lack of positive instances. A function is employed to choose features for evaluating the instance similarity between source and target datasets. The function calculates the sum of the information gains of Top??N common features of these two datasets and their proportions in the sum. Moreover, a homogenization processing method is presented for feature spaces of the target dataset and the source dataset to overcome the feature spaces inconsistency between these two datasets. A method for selecting and transferring instances from a domain of source dataset to the corresponding one of target dataset is adopted to solve the problem of unbalanced class distribution in multiple domains. Experimental results show that the proposed method effectively alleviates the unbalanced problem in target dataset. The proposed method running with four classic classification methods, i.e. support vector machine, random forest, naive Bayes, and random committee, results in an 11.3% improvement in average of weighted receiver operating characteristic curve (ROC)

References

[1]  [4]TIAN Feng, LIANG Huijun, LI Longzhuang, et al. Sentiment classification in turn??level interactive Chinese texts of e??learning applications [C]∥Proceedings of the 2012 IEEE 12th International Conference on Advanced Learning Technologies. Piscataway, NJ, USA: IEEE, 2012: 480??484.
[2]  [10]LIU Ying, LOH H T, SUN Aixin. Imbalanced text classification: a term weighting approach [J]. Expert Systems with Applications, 2009, 36(1): 690??701.
[3]  [2]DAI Wenyuan, YANG Qiang, XUE Guirong, et al. Boosting for transfer learning [C]∥Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 193??200.
[4]  [5]BARANDELA R, VALDOVINOS R M, S?BNCHEZ J S, et al. The imbalanced training sample problem: under or over sampling? [C]∥Proceedings of the IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition. Berlin, Germany: Springer, 2004: 806??814.
[5]  [6]CHAWLA N V. C4??5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure [C]∥Proceedings of the International Conference on Machine Learning from Imbalanced Datasets: Ⅱ. New York, USA: ACM, 2003: 1??8.
[6]  [7]SUN Yanmin, KAMEL M S, WONG A K C, et al. Cost??sensitive boosting for classification of imbalanced data [J]. Pattern Recognition, 2007, 40(12): 3358??3378.
[7]  [9]WANG Suge, LI Deyu, ZHAO Lidong, et al. Sample cutting method for imbalanced text sentiment classification based on BRC [J]. Knowledge??Based Systems, 2013, 37: 451??461.
[8]  [11]HAN Jiawei, KAMBER M. 数据挖掘概念与技术 [M]. 2版. 范明, 孟小峰,译. 北京: 机械工业出版社, 2006: 216??217.
[9]  [1]OGURA H, AMANO H, KONDO M. Comparison of metrics for feature selection in imbalanced text classification [J]. Expert Systems with Applications, 2011, 38(5): 4978??4989.
[10]  [3]TIAN Feng, GAO Pengda, LI Longzhuang, et al. Recognizing and regulating e??learners’ emotions based on interactive Chinese texts in e??learning systems [J]. Knowledge??Based Systems, 2014, 55: 148??164.
[11]  [8]ZHOU Zhihua, LIU Xuying. Training cost??sensitive neural networks with methods addressing the class imbalance problem [J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 63??77.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133