全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Using Decision Tree Classification and Principal Component Analysis to Predict Ethnicity Based on Individual Characteristics: A Case Study of Assam and Bhutan Ethnicities

DOI: 10.4236/jsea.2024.1712046, PP. 833-850

Keywords: Decision Tree Classification, Principal Component Analysis, Anthropometric Features, Dimensionality Reduction, Machine Learning in Anthropology

Full-Text   Cite this paper   Add to My Lib

Abstract:

This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.

References

[1]  Ma, X. and Zhang, J. (2020) Decision Trees in Forensic Anthropology: A Machine Learning Approach to Human Identification. IEEE Transactions on Human-Machine Systems, 50, 126-134.
[2]  Kumar, S. and Mitra, A. (2019) Applying Machine Learning Techniques in Social Science: The Rise of Interpretable Models. International Journal of Social Data Science, 4, 45-58.
[3]  Jolliffe, I.T. and Cadima, J. (2016) Principal Component Analysis: A Review and Recent Developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374, Article 20150202.
https://doi.org/10.1098/rsta.2015.0202
[4]  Ali, R. and Wang, Q. (2021) Dimensionality Reduction in Machine Learning: Applications in Forensic Anthropology. IEEE Access, 9, 112938-112948.
[5]  Naser, M.Z. (2022) Deriving Mapping Functions to Tie Anthropometric Measurements to Body Mass Index via Interpretable Machine Learning. Machine Learning with Applications, 8, Article 100259.
https://doi.org/10.1016/j.mlwa.2022.100259
[6]  Wells, J.C. and Cole, T.J. (2018) The Impact of Observer Variation on Anthropometric Measurements. Journal of Human Biology, 30, 29-39.
[7]  Nguyen, H. and Lee, J. (2018) Facial Feature Recognition and Ethnic Classification Using PCA and Decision Trees. Journal of Computer Vision and Image Understanding, 176, 45-53.
[8]  Zhang, Y., Huang, Y., Rosen, A., Jiang, L.G., McCarty, M., RoyChoudhury, A., et al. (2024) Aspiring to Clinical Significance: Insights from Developing and Evaluating a Machine Learning Model to Predict Emergency Department Return Visit Admissions. PLOS Digital Health, 3, e0000606.
https://doi.org/10.1371/journal.pdig.0000606
[9]  Cao, J. and Liu, Y. (2020) Ethnic Classification Using Support Vector Machines in Anthropology. IEEE Transactions on Human-Machine Systems, 50, 126-134.
[10]  Quinlan, J.R. (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann.
[11]  Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32.
https://doi.org/10.1023/a:1010933404324
[12]  Kumar, P., Patnaik, A. and Chaudhary, S. (2018) Effect of Bond Layer Thickness on Behaviour of Steel-Concrete Composite Connections. Engineering Structures, 177, 268-282.
https://doi.org/10.1016/j.engstruct.2018.07.054
[13]  Shakya, R., Mishra, P. and Deb, S. (2021) A Decision Tree Approach to Anthropometric Feature Analysis for Ethnic Classification in Asia. Journal of Data-Driven Anthropology, 12, 80-94.
[14]  Darabant, A.S., Borza, D. and Danescu, R. (2021) Recognizing Human Races through Machine Learning—A Multi-Network, Multi-Features Study. Mathematics, 9, 195.
https://doi.org/10.3390/math9020195
[15]  Navega, D., Coelho, C., Vicente, R., Ferreira, M.T., Wasterlain, S. and Cunha, E. (2014) Ancestrees: Ancestry Estimation with Randomized Decision Trees. International Journal of Legal Medicine, 129, 1145-1153.
https://doi.org/10.1007/s00414-014-1050-9
[16]  Hisham, S., Mamat, C.R. and Ibrahim, M.A. (2012) Multivariate Statistical Analysis for Race Variation from Foot Anthropometry in the Malaysian Population. Australian Journal of Forensic Sciences, 44, 285-293.
https://doi.org/10.1080/00450618.2012.657682
[17]  Khan, K., Ullah Khan, R., Ali, J., Uddin, I., Khan, S. and Roh, B. (2021) Race Classification Using Deep Learning. Computers, Materials & Continua, 68, 3483-3498.
https://doi.org/10.32604/cmc.2021.016535
[18]  Saliha, M., Ali, B. and Rachid, S. (2019) Towards Large-Scale Face-Based Race Classification on Spark Framework. Multimedia Tools and Applications, 78, 26729-26746.
https://doi.org/10.1007/s11042-019-7672-7
[19]  Dunteman, G.H. (1989) Principal Components Analysis. SAGE Publications.
[20]  Bergstra, J. and Bengio, Y. (2012) Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305.
[21]  Sokolova, M. and Lapalme, G. (2009) A Systematic Analysis of Performance Measures for Classification Tasks. Information Processing & Management, 45, 427-437.
https://doi.org/10.1016/j.ipm.2009.03.002
[22]  Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classification and Regression Trees. CRC Press.
[23]  Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, Springer Series in Statistics.
https://doi.org/10.1007/978-0-387-84858-7
[24]  Jolliffe, I.T. (2005) Principal Component Analysis. Springer Series in Statistics.
https://doi.org/10.1007/978-1-4757-1904-8
[25]  Botchkarev, A. (2019) Performance Metrics (Error Measures) in Machine Learning Regression, Classification, Clustering, and Anomaly Detection. International Journal of Intelligence and Learning, 45, 29-50.
[26]  Alavi, M. and Habel, K. (2021) Selecting the Right Performance Metric for Supervised Machine Learning: A Perspective from Business Analytics. Journal of Data Science and Machine Learning, 50, 152-163.
[27]  Breiman, L., Friedman, J., Olshen, R.A. and Stone, C.J. (1984) Classification and Regression Trees. Chapman and Hall/CRC.
https://doi.org/10.1201/9781315139470
[28]  Das, B.M. (2017) Race, Ethnicity, and Anthropometry in North-East India. Gauhati University Press.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133