The Calculation of Posterior Distributions by Data Augmentation

Martin A. Tanner; Wing Hung Wong

doi:10.1080/01621459.1987.10478458

Abstract

Abstract The idea of data augmentation arises naturally in missing value problems, as exemplified by the standard ways of filling in missing cells in balanced two-way tables. Thus data augmentation refers to a scheme of augmenting the observed data so as to make it more easy to analyze. This device is used to great advantage by the EM algorithm (Dempster, Laird, and Rubin 1977) in solving maximum likelihood problems. In situations when the likelihood cannot be approximated closely by the normal likelihood, maximum likelihood estimates and the associated standard errors cannot be relied upon to make valid inferential statements. From the Bayesian point of view, one must now calculate the posterior distribution of parameters of interest. If data augmentation can be used in the calculation of the maximum likelihood estimate, then in the same cases one ought to be able to use it in the computation of the posterior distribution. It is the purpose of this article to explain how this can be done. The basic idea is quite simple. The observed data y is augmented by the quantity z, which is referred to as the latent data. It is assumed that if y and z are both known, then the problem is straightforward to analyze, that is, the augmented data posterior p(θ | y, z) can be calculated. But the posterior density that we want is p(θ | y), which may be difficult to calculate directly. If, however, one can generate multiple values of z from the predictive distribution p(z | y) (i.e., multiple imputations of z), thenp(θ | y) can be approximately obtained as the average ofp(θ | y, z) over the imputed z's. However, p(z | y) depends, in turn, onp(θ | y). Hence if p(θ | y) was known, it could be used to calculate p(z | y). This mutual dependency between p(θ | y) and p(z | y) leads to an iterative algorithm to calculate p(θ | y). Analytically, this algorithm is essentially the method of successive substitution for solving an operator fixed point equation. We exploit this fact to prove convergence under mild regularity conditions. Typically, to implement the algorithm, one must be able to sample from two distributions, namely p(θ | y, z) andp(z | θ, y). In many cases, it is straightforward to sample from either distribution. In general, though, either sampling can be difficult, just as either the E or the M step can be difficult to implement in the EM algorithm. For p(θ | y, z) arising from parametric submodels of the multinomial, we develop a primitive but generally applicable way to approximately sample θ. The idea is first to sample from the posterior distribution of the cell probabilities and then to project to the parametric surface that is specified by the submodel, giving more weight to those observations lying closer to the surface. This procedure should cover many of the common models for categorical data. There are several examples given in this article. First, the algorithm is introduced and motivated in the context of a genetic linkage example. Second, we apply this algorithm to an example of inference from incomplete data regarding the correlation coefficient of the bivariate normal distribution. It is seen that the algorithm recovers the bimodal nature of the posterior distribution. Finally, the algorithm is used in the analysis of the traditional latent-class model as applied to data from the General Social Survey.

Keywords

Posterior probabilityBayesian probabilityMathematicsSimple (philosophy)Posterior predictive distributionApproximate Bayesian computationLikelihood functionMissing dataAlgorithmStatisticsPoint (geometry)Point estimationMaximum likelihoodApplied mathematicsComputer scienceBayesian inferenceBayesian linear regressionInferenceArtificial intelligence

Affiliated Institutions

Related Publications

Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models)

Robert E. Kass , Duane Steffey

Abstract We consider two-stage models of the kind used in parametric empirical Bayes (PEB) methodology, calling them conditionally independent hierarchical models. We suppose th...

1989 Journal of the American Statistical A... 449 citations

Inference and missing data

Donald B. Rubin

When making sampling distribution inferences about the parameter of the data, θ, it is appropriate to ignore the process that causes missing data if the missing data are 'missin...

1976 Biometrika 9337 citations

Estimation of Finite Mixture Distributions Through Bayesian Sampling

Jean Diebolt , Christian P. Robert

SUMMARY A formal Bayesian analysis of a mixture model usually leads to intractable calculations, since the posterior distribution takes into account all the partitions of the sa...

1994 Journal of the Royal Statistical Soci... 904 citations

Applied Missing Data Analysis

Craig K. Enders

Part 1. An Introduction to Missing Data. 1.1 Introduction. 1.2 Chapter Overview. 1.3 Missing Data Patterns. 1.4 A Conceptual Overview of Missing Data heory. 1.5 A More Formal De...

2010 6888 citations

Estimation and Hypothesis Testing in Finite Mixture Models

Murray Aitkin , Donald B. Rubin

SUMMARY Finite mixture models are a useful class of models for application to data. When sample sizes are not large and the number of underlying densities is in question, likeli...

1985 Journal of the Royal Statistical Soci... 302 citations

Publication Info

Year: 1987
Type: article
Volume: 82
Issue: 398
Pages: 528-540
Citations: 3732
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The Calculation of Posterior Distributions by Data Augmentation

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

3732

OpenAlex

Cite This

APA Style

                            
                                    Martin A. Tanner, 
                                
                                    Wing Hung Wong
                                
                            (1987). 
                            The Calculation of Posterior Distributions by Data Augmentation. 
                            Journal of the American Statistical Association
                            , 82
                            (398)
                            , 528-540.
                            https://doi.org/10.1080/01621459.1987.10478458

Identifiers

DOI: 10.1080/01621459.1987.10478458