全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

超平面距离的非平衡交互文本情感实例迁移方法
A Transfer Method of Emotional Instances for Unbalanced Interactive Texts Based on Hyperplane Distance

DOI: 10.7652/xjtuxb201810001

Keywords: 实例迁移,信息效用,非平衡分类,超平面
instance transfer
,information utility,unbalance classification,hyperplane

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对非平衡交互文本少数类实例匮乏易导致训练的情感分类模型泛化性能差的问题,提出基于超平面距离的非平衡交互文本情感实例迁移方法。该方法将在少数类和多数类支持向量之间的源数据集实例作为待迁实例,并基于目标数据集上的分类超平面构造一个偏移超平面。依据最优信息效用原则基于待迁实例到偏移超平面的距离最短来筛选迁入的实例,同时通过调节迁入比例控制迁入实例规模生成合成数据集。实验结果表明:随着迁入实例增多,合成数据集对原始分布的偏离增大,所训练的序列最小优化算法(SMO)模型的泛化分类性能呈现先提升后降低的现象,类似于信息效用的Wundt曲线;与SMOTE、Subsampling、Oversampling 3种数据层处理方法相比,所提方法训练的SMO、LibSVM、随机森林、代价敏感、CNN 5个分类模型在少数类识别F值上平均获得11%的增幅,且迁入比例最佳范围为20%~30%,在有效缓解非平衡特性的同时提高了少数类识别的泛化分类性能。
A transfer method of emotional instances for unbalanced interactive texts is proposed based on hyperplane distance to focus the problem of poor generalization ability of sentiment classification models when they are trained on an unbalanced interactive text dataset that lacks of minority??class instances. The method uses instances of source dataset between support vectors of the minority class and the majority class as the transferrable instances, and constructs an offset hyperplane based on the classification hyperplane on the target dataset. The method uses the principle of optimal information utility to select the transfer instances based on the shortest distance between the instances and the offset hyperplane, and adopts the migration ratio to control the size of the transfer instances and to generate a synthetic dataset. Experiment results show that when transfer instances increase, the deviation of the synthetic dataset from the original distribution increases, and the generalized classification performance of the trained SMO model rises at the beginning and then decreases after it reaches its maximum, which is similar to the Wundt curve of the information utility. Comparisons with three data layer processing methods (SMOTE, Subsampling and Oversampling) show that five classification models (SMO, LibSVM, random forest, cost sensitive and CNN) trained by the proposed method obtain an average increase of 11% in the F??value of recognizing the minority class, and the optimal range of the migration ratio is [20%, 30%]. It is concluded that the proposed method effectively alleviates the unbalanced characteristics and raises the generalized classification performance of the minority class

References

[1]  [3]CORTES C, VAPNIK V. Support??vector network [J]. Machine Learning, 1995, 20(3): 273??297.
[2]  [5]PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge & Data Engineering, 2010, 22(10): 1345??1359.
[3]  [6]BAGHERI H, ISLAM M J. Sentiment analysis of Twitter data [EB/OL]. (2017??12??16) [2018??01??05]. https:∥cn??arxiv??org/ftp/arxiv/papers/1711/1711?? 10377??pdf.
[4]  [7]YONG R, WANG C, HE X. A transfer learning based boosting model for emotion analysis [C]∥Proceedings of the IEEE International Conference on Big Knowledge. Piscataway, NJ, USA: IEEE, 2017: 264??269.
[5]  [8]WU H, JIN Q. Improving emotion classification on Chinese microblog texts with auxiliary cross??domain data [C]∥Proceedings of the International Conference on Affective Computing and Intelligent Interaction. Piscataway, NJ, USA: IEEE, 2015: 821??826.
[6]  [10]ZHANG W, ZHANG H, WANG D, et al. Transfer learning by linking similar feature clusters for sentiment classification [C]∥Proceedings of the IEEE International Conference on Tools with Artificial Intelligence. Piscataway, NJ, USA: IEEE, 2017: 1019??1026.
[7]  [11]LI Tao, SINDHWANI V, DING C, et al. Knowledge transformation for cross??domain sentiment classification [C]∥Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2009: 716??717.
[8]  [14]CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over??sampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321??357.
[9]  [1]TIAN Feng, WU Fan, CHAO Kuo??Ming, et al. A topic sentence??based instance transfer method for imbalanced sentiment classification of Chinese product reviews [J]. Electronic Commerce Research and Applications, 2016, 16(3): 66??76.
[10]  [2]Al??STOUHI S, REDDY C K. Transfer learning for class imbalance problems with inadequate data [J]. Knowledge & Information Systems, 2016, 48(1): 201??228.
[11]  [4]HE H, GARCIA E A. Learning from imbalanced data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263??1284.
[12]  [12]庄福振. 迁移学习中文本分类算法研究 [D]. 北京: 中国科学院大学, 2011: 31??46.
[13]  [13]戴昌钧. 信息效用函数与Wundt曲线 [J]. 高校应用数学学报, 1991(2): 241??252.
[14]  [15]KIM Y. Convolutional neural networks for sentence classification [EB/OL]. (2014??09??03) [2018??01??05]. https: ∥arxiv??org/abs/1408??5882.
[15]  [9]田锋, 兰田, CHAO Kuo??Ming, 等. 领域实例迁移的交互文本非平衡情感分类方法 [J]. 西安交通大学学报, 2015, 49(4): 67??72.
[16]  TIAN Feng, LAN Tian, CHAO Kuo??Ming, et al. A unbalanced motion classification method for interactive texts based on multiple??domain instance transfer [J]. Journal of Xi’an Jiaotong University, 2015, 49(4): 67??72.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133