Abstract
The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in the cases studied. The present paper reviews and compares the approaches. JM and FCS were applied to pubertal development data of 3801 Dutch girls that had missing data on menarche (two categories), breast development (five categories) and pubic hair development (six stages). Imputations for these data were created under two models: a multivariate normal model with rounding and a conditionally specified discrete model. The JM approach introduced biases in the reference curves, whereas FCS did not. The paper concludes that FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified.
Keywords
Affiliated Institutions
Related Publications
Multiple Imputation of Missing Values: Further Update of Ice, with an Emphasis on Categorical Variables
Multiple imputation of missing data continues to be a topic of considerable interest and importance to applied researchers. In this article, the ice package for multiple imputat...
Sensitivity analysis after multiple imputation under missing at random: a weighting approach
Multiple imputation (MI) is now well established as a flexible, general, method for the analysis of data sets with missing values. Most implementations assume the missing data a...
Inference and missing data
When making sampling distribution inferences about the parameter of the data, θ, it is appropriate to ignore the process that causes missing data if the missing data are 'missin...
Applied Missing Data Analysis
Part 1. An Introduction to Missing Data. 1.1 Introduction. 1.2 Chapter Overview. 1.3 Missing Data Patterns. 1.4 A Conceptual Overview of Missing Data heory. 1.5 A More Formal De...
Multiple imputation of missing blood pressure covariates in survival analysis
This paper studies a non-response problem in survival analysis where the occurrence of missing data in the risk factor is related to mortality. In a study to determine the influ...
Publication Info
- Year
- 2007
- Type
- article
- Volume
- 16
- Issue
- 3
- Pages
- 219-242
- Citations
- 2681
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1177/0962280206074463