All Title Author
Keywords Abstract

Publish in OALib Journal
ISSN: 2333-9721
APC: Only $99

ViewsDownloads

Relative Articles

More...

Bayesian Classifier Based on Robust Kernel Density Estimation and Harris Hawks Optimisation

DOI: 10.4236/ijids.2024.61001, PP. 1-23

Keywords: Classification, Robust Kernel Density Estimation, M-Estimation, Harris Hawks Optimisation Algorithm, Complete Cross-Validation

Full-Text   Cite this paper   Add to My Lib

Abstract:

In real-world applications, datasets frequently contain outliers, which can hinder the generalization ability of machine learning models. Bayesian classifiers, a popular supervised learning method, rely on accurate probability density estimation for classifying continuous datasets. However, achieving precise density estimation with datasets containing outliers poses a significant challenge. This paper introduces a Bayesian classifier that utilizes optimized robust kernel density estimation to address this issue. Our proposed method enhances the accuracy of probability density distribution estimation by mitigating the impact of outliers on the training sample’s estimated distribution. Unlike the conventional kernel density estimator, our robust estimator can be seen as a weighted kernel mapping summary for each sample. This kernel mapping performs the inner product in the Hilbert space, allowing the kernel density estimation to be considered the average of the samples’ mapping in the Hilbert space using a reproducing kernel. M-estimation techniques are used to obtain accurate mean values and solve the weights. Meanwhile, complete cross-validation is used as the objective function to search for the optimal bandwidth, which impacts the estimator. The Harris Hawks Optimisation optimizes the objective function to improve the estimation accuracy. The experimental results show that it outperforms other optimization algorithms regarding convergence speed and objective function value during the bandwidth search. The optimal robust kernel density estimator achieves better fitness performance than the traditional kernel density estimator when the training data contains outliers. The Na?ve Bayesian with optimal robust kernel density estimation improves the generalization in the classification with outliers.

References

[1]  Shu, X. and Ye, Y. (2023) Knowledge Discovery: Methods from Data Mining and Machine Learning. Social Science Research, 110, Article 102817.
https://doi.org/10.1016/j.ssresearch.2022.102817
[2]  Su, S., Xiao, L., Ruan, L., Gu, F., Li, S., Wang, Z., et al. (2019) An Efficient Density-Based Local Outlier Detection Approach for Scattered Data. IEEE Access, 7, 1006-1020.
https://doi.org/10.1109/access.2018.2886197
[3]  Wang, T., Li, Q., Chen, B. and Li, Z. (2017) Multiple Outliers Detection in Sparse High-Dimensional Regression. Journal of Statistical Computation and Simulation, 88, 89-107.
https://doi.org/10.1080/00949655.2017.1379521
[4]  Guo, W., Xu, P., Dai, F., Zhao, F. and Wu, M. (2021) Improved Harris Hawks Optimization Algorithm Based on Random Unscented Sigma Point Mutation Strategy. Applied Soft Computing, 113, Article 108012.
https://doi.org/10.1016/j.asoc.2021.108012
[5]  Zhang, X., Mallick, H., Tang, Z., Zhang, L., Cui, X., Benson, A.K., et al. (2017) Negative Binomial Mixed Models for Analyzing Microbiome Count Data. BMC Bioinformatics, 18, Article No. 4.
https://doi.org/10.1186/s12859-016-1441-7
[6]  Lee, K.H. and Kim, M.H. (2022) Bayesian Inductive Learning in Group Recommendations for Seen and Unseen Groups. Information Sciences, 610, 725-745.
https://doi.org/10.1016/j.ins.2022.08.010
[7]  Wang, Q. (2020) Multivariate Kernel Smoothing and Its Applications. Journal of the American Statistical Association, 115, 486-486.
https://doi.org/10.1080/01621459.2020.1721247
[8]  Aggarwal, C.C. and Yu, P.S. (2008) Outlier Detection with Uncertain Data. Proceedings of the 2008 SIAM International Conference on Data Mining, Atlanta, 24-26 April 2008, 483-493.
https://doi.org/10.1137/1.9781611972788.44
[9]  Cao, K., Shi, L., Wang, G., Han, D. and Bai, M. (2014) Density-Based Local Outlier Detection on Uncertain Data. Web-Age Information Management, Macau, 16-18 June 2014, 67-71.
https://doi.org/10.1007/978-3-319-08010-9_9
[10]  Scott, D.W. (2015) Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley.
https://doi.org/10.1002/9781118575574
[11]  Knuth, K.H. (2019) Optimal Data-Based Binning for Histograms and Histogram-Based Probability Density Models. Digital Signal Processing, 95, Article 102581.
https://doi.org/10.1016/j.dsp.2019.102581
[12]  Kamalov, F. (2020) Kernel Density Estimation Based Sampling for Imbalanced Class Distribution. Information Sciences, 512, 1192-1201.
https://doi.org/10.1016/j.ins.2019.10.017
[13]  Kim, J. and Scott, C.D. (2012) Robust Kernel Density Estimation. The Journal of Machine Learning Research, 13, 2529-2565.
[14]  Ou, G., He, Y., Fournier-Viger, P. and Huang, J.Z. (2022) A Novel Mixed-Attribute Fusion-Based Naive Bayesian Classifier. Applied Sciences, 12, Article 10443.
https://doi.org/10.3390/app122010443
[15]  Yang, F. (2018) An Implementation of Naive Bayes Classifier. 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, 12-14 December 2018, 301-306.
https://doi.org/10.1109/csci46756.2018.00065
[16]  Bertsimas, D. and Koduri, N. (2022) Data-Driven Optimization: A Reproducing Kernel Hilbert Space Approach. Operations Research, 70, 454-471.
https://doi.org/10.1287/opre.2020.2069
[17]  Wang, S., Li, A., Wen, K. and Wu, X. (2020) Robust Kernels for Kernel Density Estimation. Economics Letters, 191, Article 109138.
https://doi.org/10.1016/j.econlet.2020.109138
[18]  Vandermeulen, R.A. and Scott, C. (2014) Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space. Proceedings of the 27th International Conference on Neural Information Processing System, Montreal, 8-13 December 2014, 1-8.
[19]  López-Rubio, E., Palomo, E.J. and Domínguez, E. (2015) Robust Self-Organization with M-Estimators. Neurocomputing, 151, 408-423.
https://doi.org/10.1016/j.neucom.2014.09.024
[20]  Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1-175.
[21]  Duong, T. (2022) Bandwidth Selection for Kernel Density Estimation: A Review of Fully Automatic Selectors. Advances in Statistical Analysis, 35, 159-188.
[22]  He, Y., Ye, X., Huang, D., Huang, J.Z. and Zhai, J. (2021) Novel Kernel Density Estimator Based on Ensemble Unbiased Cross-Validation. Information Sciences, 581, 327-344.
https://doi.org/10.1016/j.ins.2021.09.045
[23]  Sain, S.R., Baggerly, K.A. and Scott, D.W. (1994) Cross-Validation of Multivariate Densities. Journal of the American Statistical Association, 89, 807-817.
https://doi.org/10.1080/01621459.1994.10476814
[24]  Hall, P. (1984) Central Limit Theorem for Integrated Square Error of Multivariate Nonparametric Density Estimators. Journal of Multivariate Analysis, 14, 1-16.
https://doi.org/10.1016/0047-259x(84)90044-7
[25]  Ghorai, J.K. and Pattanaik, L.M. (1991) A Central Limit Theorem for the Weighted Integrated Squared Error of the Kernel Type Density Estimator under the Proportional Hazard Model. Journal of Nonparametric Statistics, 1, 111-126.
https://doi.org/10.1080/10485259108832514
[26]  Miecznikowski, J.C., Wang, D. and Hutson, A. (2010) Bootstrap MISE Estimators to Obtain Bandwidth for Kernel Density Estimation. Communications in Statistics-Simulation and Computation, 39, 1455-1469.
https://doi.org/10.1080/03610918.2010.500108
[27]  Taylor, C.C. (1989) Bootstrap Choice of the Smoothing Parameter in Kernel Density Estimation. Biometrika, 76, 705-712.
https://doi.org/10.1093/biomet/76.4.705
[28]  Mojirsheibani, M. (2021) A Note on the Performance of Bootstrap Kernel Density Estimation with Small Re-Sample Sizes. Statistics & Probability Letters, 178, Article 109189.
https://doi.org/10.1016/j.spl.2021.109189
[29]  Chen, X., Fu, M., Liu, Z., Jia, C. and Liu, Y. (2022) Harris Hawks Optimization Algorithm and BP Neural Network for Ultra-Wideband Indoor Positioning. Mathematical Biosciences and Engineering, 19, 9098-9124.
https://doi.org/10.3934/mbe.2022423
[30]  Chen, L., Song, N. and Ma, Y. (2022) Harris Hawks Optimization Based on Global Cross-Variation and Tent Mapping. The Journal of Supercomputing, 79, 5576-5614.
https://doi.org/10.1007/s11227-022-04869-7
[31]  Shehab, M., Mashal, I., Momani, Z., Shambour, M.K.Y., AL-Badareen, A., Al-Dabet, S., et al. (2022) Harris Hawks Optimization Algorithm: Variants and Applications. Archives of Computational Methods in Engineering, 29, 5579-5603.
https://doi.org/10.1007/s11831-022-09780-1
[32]  Li, X., Fang, W. and Zhu, S. (2023) An Improved Binary Quantum-Behaved Particle Swarm Optimization Algorithm for Knapsack Problems. Information Sciences, 648, Article 119529.
https://doi.org/10.1016/j.ins.2023.119529
[33]  Hu, G., Du, B., Wang, X. and Wei, G. (2022) An Enhanced Black Widow Optimization Algorithm for Feature Selection. Knowledge-Based Systems, 235, Article 107638.
https://doi.org/10.1016/j.knosys.2021.107638

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133