This paper proposes an adaptive and diverse hybrid-based ensemble method
to improve the performance of binary classification. The proposed method is a
non-linear combination of base models and the application of adaptive selection
of the most suitable model for each data instance. Ensemble method, an
important machine learning technique uses multiple single models to construct a
hybrid model. A hybrid model generally performs better compared to a single
individual model. In a given dataset the application of diverse single models
trained with different machine learning algorithms will have different
capabilities in recognizing patterns in the given training sample. The proposed
approach has been validated on Repeat Buyers Prediction dataset and Census
Income Prediction dataset. The experiment results indicate up to 18.5%
improvement on F1 score for the Repeat Buyers dataset compared to
the best individual model. This improvement also indicates that the proposed
ensemble method has an exceptional ability of dealing with imbalanced datasets.
In addition, the proposed method outperforms two other commonly used ensemble
methods (Averaging and Stacking) in terms of improved F1 score.
Finally, our results produced a slightly higher AUC score of 0.718 compared to
the previous result of AUC score of 0.712 in the Repeat Buyers competition.
This roughly 1% increase AUC score in performance is significant considering a
very big dataset such as Repeat Buyers.
Canuto, A.M.P., et al. (2005) Performance and Diversity Evaluation in Hybrid and Non-Hybrid Structures of Ensembles. Fifth International Conference on Hybrid Intelligent Systems, Rio de Janeiro, 6-9 November 2005, 6.
Canuto, A.M.P., et al. (2006) Using Weighted Dynamic Classifier Selection Methods in Ensembles with Different Levels of Diversity. International Journal of Hybrid Intelligent Systems, 3, 147-158. https://doi.org/10.3233/HIS-2006-3303
Sariel, H.-P., Roth, D. and Zimak, D. (2002) Constraint Classification: A New Approach to Multiclass Classification. International Conference on Algorithmic Learning Theory. Springer, Berlin Heidelberg.
Jesse, D. and Goadrich, M. (2006) The Relationship between Precision-Recall and ROC Curves. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, 25-29 June 2006, 233-240. https://doi.org/10.1145/1143844.1143874
Freund, Y., et al. (1997) Using and Combining Predictors That Specialize. Proceedings of the 29th Annual ACM Symposium on Theory of Computing, El Paso, 4-6 May 1997, 334-343. https://doi.org/10.1145/258533.258616
Ham, J., et al. (2005) Investigation of the Random Forest Framework for Classification of Hyperspectral Data. IEEE Transactions on Geoscience and Remote Sensing, 43, 492-501. https://doi.org/10.1109/TGRS.2004.842481
Ghorbani, A.A. and Owrangh, K. (2001) Stacked Generalization in Neural Networks: Generalization on Statistically Neutral Problems. Proceedings of the International Joint Conference Neural Networks, Washington DC, 15-19 July 2001, 1715-1720.
Wang, S.-Q., Yang, J. and Chou, K.-C. (2006) Using Stacked Generalization to Predict Membrane Protein Types Based on Pseudo-Amino Acid Composition. Journal of Theoretical Biology, 242, 941-946. https://doi.org/10.1016/j.jtbi.2006.05.006
Liu, G., et al. (2015) Report for Repeated Buyer Prediction Competition by Team 9*STAR*. Proceedings of the 1st International Workshop on Social Influence Analysis Co-Located with 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, 27 July 2015.
He, B., et al. (2015) Repeat Buyers Prediction after Sales Promotion for Tmall Platform. Proceedings of the 1st International Workshop on Social Influence Analysis co-located with 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, Argentina, 27 July 2015.