|
Statistics 2015
An information criterion for model selection with missing data via complete-data divergenceAbstract: We derive an information criterion for selecting a parametric model of complete-data distribution when only incomplete or partially observed data is available. Compared with AIC, the new criterion has an additional penalty term for missing data expressed by the Fisher information matrices of complete data and incomplete data. We prove that the new criterion is an asymptotically unbiased estimator of the complete-data divergence, namely, the expected Kullback-Leibler divergence between the true distribution and the estimated distribution for complete data, whereas AIC is that for the incomplete data. Information criteria PDIO (Shimodaira 1994) and AICcd (Cavanaugh and Shumway 1998) have been previously proposed for estimating the complete-data divergence, and these two criteria have the same penalty term. The additional penalty term of the new criterion for missing data turns out to be only the half of what is claimed in PDIO and AICcd. We observe in a simulation study that the new criterion is unbiased while the other two criteria are biased. Before starting the argument of model selection, we review the geometrical view of alternating minimizations of the EM algorithm, which plays an important role for the derivation of the new criterion.
|