The paper is devoted to the optimization of data structure in classification and clustering problems by mapping the original data onto a set of ordered feature vectors. When ordering, the elements of each feature vector receive new numbers such that their values are arranged in non-decreasing order. For update structure, the main volume of computational operations is performed not on multidimensional quantities describing objects, but on one-dimensional ones, which are the values of objects individual features. Then, instead of a rather complex existing algorithm, the same simplest algorithm is repeatedly used. Transition from original to ordered data leads to a decrease in the entropy of data distribution, which allows us to reveal their properties. It was shown that the classes differ in the functions of feature values for ordered object numbers. The set of these functions displays the information contained in the training sample and allows one to calculate class of any object in the test sample by values of its features using the simplest total probability formula. The paper also discusses the issues of using ordered data matrix to solve problems of partitioning a set into clusters of objects that have common properties.
References
[1]
Luger, G.F. (2016) Artificial Intelligence: Structures and Strategies for Complex Problem Solving. 6th Edition, Addison-Wesley.
[2]
Shats, V. (2022) Properties of the Ordered Feature Values as a Classifier Basis. CyberneticsandPhysics, 11, 25-29. https://doi.org/10.35470/2226-4116-2022-11-1-25-29
[3]
Yao, J.T., Vasilakos, A.V. and Pedrycz, W. (2013) Granular Computing: Perspectives and Challenges. IEEETransactionsonCybernetics, 43, 1977-1989. https://doi.org/10.1109/tsmcc.2012.2236648
[4]
Shats, V.N. (2017) The Classification of Objects Based on a Model of Perception. In: Kryzhanovsky, B., et al., Eds., Advances in Neural Computation, Machine Learning, and Cognitive Research, Studies in Computational Intelligence, Springer International Publishing, 125-131. https://doi.org/10.1007/978-3-319-66604-4_19
[5]
Shats, V.N. (2024) Feature Ordering as a Way to Reduce the Entropy of the Training Sample and the Basis of the Simplest Classification Algorithms. Proceeding 26thInternationalConferenceNeuroinformatics, Moskow, 24-26 October 2024, 164-173.
[6]
David, H.A. and Nagaraja, H.N. (2003) Order Statistics. 3rd Edition, Wiley. https://doi.org/10.1002/0471722162
[7]
Hastie, T., Tibshirani, R. and Friedman, R. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Edition, Springer, 764.
[8]
Asuncion, A. and Newman, D. (2007) UCI Machine Learning Repository. Irvine University of California.
[9]
Shats, V.N. (2023) Principle Splitting of Finite Set in Classification Problem. Proceeding 25thInternationalConferenceNeuroinformatics, Moskow, 23-27 October 2023, 262-270.
[10]
Prigogine, I. and Stengers, I. (1984) Order out of Chaos: Men’s New Dialogue with Nature. Flamingo Edition.