All Title Author
Keywords Abstract

First Use of Multiple Imputation with the National Tuberculosis Surveillance System

DOI: 10.1155/2013/875234

Full-Text   Cite this paper   Add to My Lib


Aims. The purpose of this study was to compare methods for handling missing data in analysis of the National Tuberculosis Surveillance System of the Centers for Disease Control and Prevention. Because of the high rate of missing human immunodeficiency virus (HIV) infection status in this dataset, we used multiple imputation methods to minimize the bias that may result from less sophisticated methods. Methods. We compared analysis based on multiple imputation methods with analysis based on deleting subjects with missing covariate data from regression analysis (case exclusion), and determined whether the use of increasing numbers of imputed datasets would lead to changes in the estimated association between isoniazid resistance and death. Results. Following multiple imputation, the odds ratio for initial isoniazid resistance and death was 2.07 (95% CI 1.30, 3.29); with case exclusion, this odds ratio decreased to 1.53 (95% CI 0.83, 2.83). The use of more than 5 imputed datasets did not substantively change the results. Conclusions. Our experience with the National Tuberculosis Surveillance System dataset supports the use of multiple imputation methods in epidemiologic analysis, but also demonstrates that close attention should be paid to the potential impact of missing covariates at each step of the analysis. 1. Background Missing data is a common problem in epidemiologic research. Analytic techniques used in multivariable analysis, such as regression models, rely on methods that exclude cases with missing covariate data from analysis. This missing data approach has important limitations. First, case exclusion will always lead to loss of statistical power. Second, case exclusion will introduce bias into the analysis if excluded subjects differ from included subjects in ways that are relevant for the parameter of interest [1]. The potential for bias using case exclusion depends on the mechanism for missingness. For missing-at-random (MAR) data, the missingness of a particular observation depends only on observed covariates, and for missing-not-at-random (MNAR) data, missingness may depend on both observed and unobserved covariates. For either MAR or MNAR data, case exclusion will introduce bias, as subjects excluded from analysis will differ from subjects included in analysis according to either the measured or unmeasured covariates. In contrast, when data is missing-completely-at-random (MCAR), missingness can be considered a random deletion of observations without respect to measured or unmeasured covariates, and case exclusion does not lead to the


[1]  J. W. Graham, “Missing data analysis: making it work in the real world,” Annual Review of Psychology, vol. 60, pp. 549–576, 2009.
[2]  A. R. T. Donders, G. J. M. G. van der Heijden, T. Stijnen, and K. G. M. Moons, “Review: a gentle introduction to imputation of missing values,” Journal of Clinical Epidemiology, vol. 59, no. 10, pp. 1087–1091, 2006.
[3]  D. B. Rubin, Multiple Imputation For Nonresponse in Surveys, John Wiley & Sons, New York, NY, USA, 1987.
[4]  J. L. Schafer, Analysis of Incomplete Multivariate Data, Chapman & Hall, New York, NY, USA, 1997.
[5]  J. L. Schafer, “Multiple imputation: a primer,” Statistical Methods in Medical Research, vol. 8, no. 1, pp. 3–15, 1999.
[6]  C. Vinnard, C. A. Winston, E. P. Wileyto, R. R. MacGregor, and G. P. Bisson, “Isoniazid resistance and death in patients with tuberculous meningitis: retrospective cohort study,” British Medical Journal, vol. 341, no. 7773, p. 596, 2010.
[7]  “Trends in tuberculosis—United States, 2008,” Morbidity and Mortality Weekly Report (MMWR), vol. 58, pp. 249–253, 2009.
[8]  S. van Buuren, “Multiple imputation of discrete and continuous data by fully conditional specification,” Statistical Methods in Medical Research, vol. 16, no. 3, pp. 219–242, 2007.
[9]  K. J. Lee and J. B. Carlin, “Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation,” American Journal of Epidemiology, vol. 171, no. 5, pp. 624–632, 2010.
[10]  P. Royston, “Multiple imputation of missing values: further update of ice, with an emphasis on categorical variables,” The Stata Journal, vol. 9, no. 3, pp. 466–477, 2009.
[11]  K. G. Moons, R. A. Donders, T. Stijnen, and F. E. Harrell Jr., “Using the outcome for imputation of missing predictor values was preferred,” Journal of Clinical Epidemiology, vol. 59, pp. 1092–1101, 2006.
[12]  P. Royston, J. B. Carlin, and I. R. White, “Multiple imputation of missing values: new features for mim,” The Stata Journal, vol. 9, no. 2, pp. 252–264, 2009.
[13]  M. A. Klebanoff and S. R. Cole, “Use of multiple imputation in the epidemiologic literature,” American Journal of Epidemiology, vol. 168, no. 4, pp. 355–357, 2008.
[14]  F. E. Harrell Jr., Regression Modeling Strategies: With Applications To Linear Models, Logistic Regression, and Survival Analysis, Springer, New York, NY, USA, 2001.


comments powered by Disqus