Robustness of a multivariate normal approximation for imputation of incomplete binary data

Coen Bernaards; Thomas R. Belin; Joseph L. Schafer

doi:10.1002/sim.2619

Abstract

Abstract Multiple imputation has become easier to perform with the advent of several software packages that provide imputations under a multivariate normal model, but imputation of missing binary data remains an important practical problem. Here, we explore three alternative methods for converting a multivariate normal imputed value into a binary imputed value: (1) simple rounding of the imputed value to the nearer of 0 or 1, (2) a Bernoulli draw based on a ‘coin flip’ where an imputed value between 0 and 1 is treated as the probability of drawing a 1, and (3) an adaptive rounding scheme where the cut‐off value for determining whether to round to 0 or 1 is based on a normal approximation to the binomial distribution, making use of the marginal proportions of 0's and 1's on the variable. We perform simulation studies on a data set of 206 802 respondents to the California Healthy Kids Survey, where the fully observed data on 198 262 individuals defines the population, from which we repeatedly draw samples with missing data, impute, calculate statistics and confidence intervals, and compare bias and coverage against the true values. Frequently, we found satisfactory bias and coverage properties, suggesting that approaches such as these that are based on statistical approximations are preferable in applied research to either avoiding settings where missing data occur or relying on complete‐case analyses. Considering both the occurrence and extent of deficits in coverage, we found that adaptive rounding provided the best performance. Copyright © 2006 John Wiley & Sons, Ltd.

Keywords

RoundingImputation (statistics)Missing dataBinary dataStatisticsMultivariate statisticsMultivariate normal distributionComputer scienceBinary numberBinomial distributionConfidence intervalMathematics

Affiliated Institutions

Related Publications

Multiple imputation of discrete and continuous data by fully conditional specification

Stef van Buuren

The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure...

2007 Statistical Methods in Medical Research 2681 citations

Multiple Imputation of Missing Values

Patrick Royston

Following the seminal publications of Rubin about thirty years ago, statisticians have become increasingly aware of the inadequacy of “complete-case” analysis of datasets with m...

2004 The Stata Journal Promoting communica... 2310 citations

Imputing missing covariate values for the Cox model

Ian R. White , Patrick Royston

Abstract Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have mi...

2009 Statistics in Medicine 905 citations

A multivariate technique for multiply imputing missing values using a sequence of regression models

Trivellore E. Raghunathan , James M. Lepkowski , John Van Hoewyk +1 more

This article describes and evaluates a procedure for imputing missing values for a relatively complex data structure when the data are missing at random. The imputations are obt...

2001 Survey methodology 1994 citations

Applied Missing Data Analysis

Craig K. Enders

Part 1. An Introduction to Missing Data. 1.1 Introduction. 1.2 Chapter Overview. 1.3 Missing Data Patterns. 1.4 A Conceptual Overview of Missing Data heory. 1.5 A More Formal De...

2010 6888 citations

Publication Info

Year: 2006
Type: article
Volume: 26
Issue: 6
Pages: 1368-1382
Citations: 191
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Robustness of a multivariate normal approximation for imputation of incomplete binary data

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

191

OpenAlex

Cite This

APA Style

                            
                                    Coen Bernaards, 
                                
                                    Thomas R. Belin, 
                                
                                    Joseph L. Schafer
                                
                            (2006). 
                            Robustness of a multivariate normal approximation for imputation of incomplete binary data. 
                            Statistics in Medicine
                            , 26
                            (6)
                            , 1368-1382.
                            https://doi.org/10.1002/sim.2619

Identifiers

DOI: 10.1002/sim.2619