%0 Journal Article
%T Imprecise Imputation as a Tool for Solving Classification Problems with Mean Values of Unobserved Features
%A Lev V. Utkin
%A Yulia A. Zhuk
%J Advances in Artificial Intelligence
%D 2013
%I Hindawi Publishing Corporation
%R 10.1155/2013/176890
%X A method for solving a classification problem when there is only partial information about some features is proposed. This partial information comprises the mean values of features for every class and the bounds of the features. In order to maximally exploit the available information, a set of probability distributions is constructed such that two distributions are selected from the set which define the minimax and minimin strategies. Random values of features are generated in accordance with the selected distributions by using the Monte Carlo technique. As a result, the classification problem is reduced to the standard model which is solved by means of the support vector machine. Numerical examples illustrate the proposed method. 1. Introduction There are several major data mining techniques including classification, clustering, and novelty detection. We consider classification as a data mining technique used to predict an unobserved output value based on an observed input vector . This requires us to estimate a predictor from training data or a set of example pairs of . A special very important problem of the statistical machine learning is the binary classification problem which can be regarded as a task of classifying some objects into two classes (groups) in accordance with their properties or features. In other words, we have to classify each pattern into one of the classes by means of a discriminant function . A common assumption in supervised learning is that training and predicted data are drawn from the same (unknown) probability distribution; that is, training and predicted data come from the same statistical model. As a result, most machine learning algorithms and methods exploit this assumption which, unfortunately, does not often hold in practice. This may lead to a performance deterioration in the induced classifiers [1, 2]. This problem may arise if we have imbalanced data [3] or in case of rare events or observations [4]. The assumption does not hold also in case of partially known or observed features. For instance, it may take place when we know only some mean values of the features but cannot get their actual values during training. One of the approaches to handle the above problem and to cope with the imbalance and possible inconsistencies of training and predicted data is the minimax strategy for which the classification parameters are determined by minimizing the maximum possible risk of misclassification [1, 2]. This is an “extreme” strategy of decision making. As pointed out in [1], the minimax classifiers may be seen as
%U http://www.hindawi.com/journals/aai/2013/176890/