|
控制理论与应用 2011
Hybrid algorithm for classification of unbalanced datasets
|
Abstract:
A novel hybrid algorithm of radial basis function neural network(RBFNN) integrated with the random forest algorithm is proposed to improve the poor classification result produced by traditional algorithm in classifying minor class of unbalanced datasets. Firstly, random interpolations are inserted between adjacent data in the minor dataset to balance the data distribution. Receiver operator characteristics(ROC) with degree of confidence less than 95% are considered the redundant characteristic and are deleted. The input data are perturbed by the Bagging technique. Radial Basis Function Neural Network is employed to be the basic classifier in the random forest. The fusion of decisions and the outputs are determined by the vast majority of votes. This method is applied to UCI dataset. The precision of G-mean and the area under the ROC demonstrate the improvement of the accuracy in the classifications of medium-size unbalanced and largesize unbalance class data sets.