A method for solving a classification problem when there is only partial information about some features is proposed. This partial information comprises the mean values of features for every class and the bounds of the features. In order to maximally exploit the available information, a set of probability distributions is constructed such that two distributions are selected from the set which define the minimax and minimin strategies. Random values of features are generated in accordance with the selected distributions by using the Monte Carlo technique. As a result, the classification problem is reduced to the standard model which is solved by means of the support vector machine. Numerical examples illustrate the proposed method. 1. Introduction There are several major data mining techniques including classification, clustering, and novelty detection. We consider classification as a data mining technique used to predict an unobserved output value based on an observed input vector . This requires us to estimate a predictor from training data or a set of example pairs of . A special very important problem of the statistical machine learning is the binary classification problem which can be regarded as a task of classifying some objects into two classes (groups) in accordance with their properties or features. In other words, we have to classify each pattern into one of the classes by means of a discriminant function . A common assumption in supervised learning is that training and predicted data are drawn from the same (unknown) probability distribution; that is, training and predicted data come from the same statistical model. As a result, most machine learning algorithms and methods exploit this assumption which, unfortunately, does not often hold in practice. This may lead to a performance deterioration in the induced classifiers [1, 2]. This problem may arise if we have imbalanced data [3] or in case of rare events or observations [4]. The assumption does not hold also in case of partially known or observed features. For instance, it may take place when we know only some mean values of the features but cannot get their actual values during training. One of the approaches to handle the above problem and to cope with the imbalance and possible inconsistencies of training and predicted data is the minimax strategy for which the classification parameters are determined by minimizing the maximum possible risk of misclassification [1, 2]. This is an “extreme” strategy of decision making. As pointed out in [1], the minimax classifiers may be seen as
References
[1]
R. Alaiz-Rodríguez, A. Guerrero-Curieses, and J. Cid-Sueiro, “Minimax regret classifier for imprecise class distributions,” Journal of Machine Learning Research, vol. 8, pp. 103–130, 2007.
[2]
R. Alaiz-Rodríguez, A. Guerrero-Curieses, and J. Cid-Sueiro, “Improving classification under changes in class and within-class distributions,” in Systems: Computational and Ambient Intelligence, J. Cabestany, F. Sandoval, A. Prieto, and J. Corchado, Eds., vol. 5517 of Lecture Notes in Computer Science, pp. 122––130, Springer, Berlin, Germany, 2009.
[3]
S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, “Handling imbalanced datasets: a review,” GESTS International Transactions on Computer Science and Engineering, vol. 30, no. 1, p. 25–36, 2006.
[4]
G. M. Weiss, “Mining with rarity: a unifying framework,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 7––19, 2004.
[5]
B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, Cambridge, Mass, USA, 2002.
[6]
D. B. Rubin, “Multiple Imputation after 18+ Years,” Journal of the American Statistical Association, vol. 91, no. 434, pp. 473–489, 1996.
[7]
M. Saar-Tsechansky and F. Provost, “Handling missing values when applying classification models,” Journal of Machine Learning Research, vol. 8, pp. 1625–1657, 2007.
[8]
G. E. A. P. A. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised learning,” Applied Artificial Intelligence, vol. 17, no. 5-6, pp. 519–533, 2003.
[9]
A. Farhangfar, L. Kurgan, and J. Dy, “Impact of imputation of missing values on classification error for discrete data,” Pattern Recognition, vol. 41, no. 12, pp. 3692–3705, 2008.
[10]
S. Garcia and F. Herrera, “An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons,” Journal of Machine Learning Research, vol. 9, pp. 2677–2694, 2008.
[11]
J. Grzymala-Busse and M. Hu, “A comparison of several approaches to missing attribute values in data mining,” in Rough Sets and Current Trends in Computing, pp. 378––385, Springer, Berlin, Germany, 2001.
[12]
J. Luengo, S. Garcia, and F. Herrera, “On the choice of the best imputation methods for missing values considering three groups of classification methods,” Knowledge and Information Systems, vol. 32, no. 1, p. 77–108, 2012.
[13]
J. Ning and P. E. Cheng, “A comparison study of nonparametric imputation methods,” Statistics and Computing, vol. 22, no. 1, pp. 273–285, 2012.
[14]
S. Destercke, D. Dubois, and E. Chojnacki, “Unifying practical uncertainty representations. II: clouds,” International Journal of Approximate Reasoning, vol. 49, no. 3, pp. 664–677, 2008.
[15]
S. Ferson, V. Kreinovich, L. Ginzburg, D. S. Myers, and K. Sentz, “Constructing probability boxes and Dempster-Shafer structures,” Tech. Rep. SAND2002-4015, Sandia National Laboratories, January 2003.
[16]
C. P. Robert, The Bayesian Choice, Springer, New York, NY, USA, 1994.
[17]
L.V. Utkin, “Regression analysis using the imprecise Bayesian normal model,” International Journal of Data Analysis Techniques and Strategies, vol. 2, no. 4, pp. 356–372, 2010.
[18]
L. V. Utkin and F. P. A. Coolen, “On reliability growth models using Kolmogorov-Smirnov bounds,” International Journal of Performability Engineering, vol. 7, no. 1, pp. 5–19, 2011.
[19]
L.V. Utkin and Y. A. Zhuk, “A machine learning algorithm for classification under extremely scarce information,” International Journal of Data Analysis Techniques and Strategies, vol. 4, no. 2, pp. 115––133, 2012.
[20]
J. O. Berger and G. Salinetti, “Approximations of Bayes decision problems: the epigraphical approach,” Annals of Operations Research, vol. 56, no. 1, pp. 1–13, 1995.
[21]
J. Shao, “Monte Carlo approximations in Bayesian decision theory,” Journal of the American Statistical Association, vol. 84, no. 407, pp. 727––732, 1989.
[22]
A. Farhangfar, L. Kurgan, and J. Dy, “Impact of imputation of missing values on classification error for discrete data,” Pattern Recognition, vol. 41, no. 12, pp. 3692–3705, 2008.
[23]
D. Williams, X. Liao, Y. Xue, L. Carin, and B. Krishnapuram, “On classification with incomplete data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 427–436, 2007.
[24]
R. Esposito and L. Saitta, “Monte Carlo theory as an explanation of bagging and boosting,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI '03), pp. 499––504, 2003.
[25]
P. Sollich, “Bayesian methods for support vector machines: evidence and predictive class probabilities,” Machine Learning, vol. 46, no. 1–3, pp. 21–52, 2002.
[26]
J. E. Hurtado, “An examination of methods for approximating implicit limit state functions from the viewpoint of statistical learning theory,” Structural Safety, vol. 26, no. 3, pp. 271–293, 2004.
[27]
J. E. Hurtado and D. A. Alvarez, “Classification approach for reliability analysis with stochastic finite-element modeling,” Journal of Structural Engineering, vol. 129, no. 8, pp. 1141–1149, 2003.
[28]
A. Frank and A. Asuncion, UCI Machine Learning Repository, 2010.
[29]
V. Vapnik, Statistical Learning Theory, Wiley, New York, NY, USA, 1998.
[30]
P. Walley, “Measures of uncertainty in expert systems,” Artificial Intelligence, vol. 83, no. 1, pp. 1––58, 1996.
[31]
V. P. Kuznetsov, Interval Statistical Models. Radio and Communication, Moscow, Russia, 1991, in Russian.
[32]
P. Walley, Statistical Reasoning with Imprecise Probabilities, Chapman and Hall, London, UK, 1991.
[33]
S. Ferson, L. Ginzburg, and R. Akcakaya, “Whereof one cannot speak: when input distributions are unknown,” Applied Biomathematics Report, 2001, http://www.ramas.com/whereof.pdf.
[34]
A. N. Tikhonov and V. Y. Arsenin, Solution of Ill-Posed Problems, W.H. Winston, Washington, DC, USA, 1977.
[35]
T. Evgeniou, T. Poggio, M. Pontil, and A. Verri, “Regularization and statistical learning theory for data analysis,” Computational Statistics and Data Analysis, vol. 38, no. 4, pp. 421–432, 2002.
[36]
R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2005.
[37]
C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm/.