We revisit a comparison of two discriminant analysis procedures, namely the linear combination classifier of Chung and Han (2000) and the maximum likelihood estimation substitution classifier for the problem of classifying unlabeled multivariate normal observations with equal covariance matrices into one of two classes. Both classes have matching block monotone missing training data. Here, we demonstrate that for intra-class covariance structures with at least small correlation among the variables with missing data and the variables without block missing data, the maximum likelihood estimation substitution classifier outperforms the Chung and Han (2000) classifier regardless of the percent of missing observations. Specifically, we examine the differences in the estimated expected error rates for these classifiers using a Monte Carlo simulation, and we compare the two classifiers using two real data sets with monotone missing data via parametric bootstrap simulations. Our results contradict the conclusions of Chung and Han (2000) that their linear combination classifier is superior to the MLE classifier for block monotone missing multivariate normal data.
References
[1]
Chung, H.-C. and Han, C.-P. (2000) Discriminant Analysis When a Block of Observations Is Missing. Annals of the Institute of Statistical Mathematics, 52, 544-556.
[2]
Bohannon, T.R. and Smith, W.B. (1975) Classification Based on Incomplete Data Records. ASA Proceeding of Social Statistics Section, 67, 214-218.
[3]
Jackson, E.C. (1968) Missing Values in Linear Multiple Discriminant Analysis. Biometrics, 24, 835-844. http://dx.doi.org/10.2307/2528874
[4]
Chang, L.S. and Dunn, O.J. (1972) The Treatment of Missing Values in Discriminant Analysis—1. The Sampling Experiment. Journal of the American Statistical Association, 67, 473-477.
[5]
Chang, L.S., Gilman, A. and Dunn, O.J. (1976) Alternative Approaches to Missing Values in Discriminant Analysis. Journal of the American Statistical Association, 71, 842-844.http://dx.doi.org/10.1080/01621459.1976.10480956
[6]
Titterington, D.M. and Jian, J.-M. (1983) Recursive Estimation Procedures for Missing-Data Problems. Biometrika Trust, 70, 613-624.http://dx.doi.org/10.1093/biomet/70.3.613
[7]
Hocking, R.R. and Smith, W.B. (2000) Estimation of Parameters in the Multivariate Normal Distribution with Missing Observations. Journal of the American Statistical Association, No. 63, 159-173.
[8]
Anderson, T.W. and Olkin, I. (1985) Maximum-Likelihood Estimation of the Parameters of a Multivariate Normal Distribution. Linear Algebra and Its Applications, 70, 147-171. http://dx.doi.org/10.1016/0024-3795(85)90049-7
[9]
Fisher, R.A. (1936) The Use of Multiple Measurements in Taxonomic Problems. Annals Eugenics, 7, 179-188. http://dx.doi.org/10.1111/j.1469-1809.1936.tb02137.x