Abstract
Abstract This article is concerned with the selection of subsets of predictor variables in a linear regression model for the prediction of a dependent variable. It is based on a Bayesian approach, intended to be as objective as possible. A probability distribution is first assigned to the dependent variable through the specification of a family of prior distributions for the unknown parameters in the regression model. The method is not fully Bayesian, however, because the ultimate choice of prior distribution from this family is affected by the data. It is assumed that the predictors represent distinct observables; the corresponding regression coefficients are assigned independent prior distributions. For each regression coefficient subject to deletion from the model, the prior distribution is a mixture of a point mass at 0 and a diffuse uniform distribution elsewhere, that is, a "spike and slab" distribution. The random error component is assigned a normal distribution with mean 0 and standard deviation σ, where ln(σ) has a locally uniform noninformative prior distribution. The appropriate posterior probabilities are derived for each submodel. If the regression coefficients have identical priors, the posterior distribution depends only on the data and the parameter γ, which is the height of the spike divided by the height of the slab for the common prior distribution. This parameter is not assigned a probability distribution; instead, it is considered a parameter that indexes the members of a class of Bayesian methods. Graphical methods are proposed as informal guides for choosing γ, assessing the complexity of the response function and the strength of the individual predictor variables, and assessing the degree of uncertainty about the best submodel. The following plots against γ are suggested: (a) posterior probability that a particular regression coefficient is 0; (b) posterior expected number of terms in the model; (c) posterior entropy of the submodel distribution; (d) posterior predictive error; and (e) posterior probability of goodness of fit. Plots (d) and (e) are suggested as ways to choose y. The predictive error is determined using a Bayesian cross-validation approach that generates a predictive density for each observation, given all of the data except that observation, that is, a type of "leave one out" approach. The goodness-of-fit measure is the sum of the posterior probabilities of all submodels that pass a standard F test for goodness of fit relative to the full model, at a specified level of significance. The dependence of the results on the scaling of the variables is discussed, and some ways to choose the scaling constants are suggested. Examples based on a large data set arising from an energy-conservation study are given to demonstrate the application of the methods.
Keywords
Affiliated Institutions
Related Publications
Choosing a multivariate model: Noncentrality and goodness of fit.
It is suggested that Akaike's information criterion cannot be used for model selection in real applications and that there are problems attending the definition of parsimonious ...
Goodness-of-Fit Testing for Latent Class Models
Latent class models with sparse contingency tables can present problems for model comparison and selection, because under these conditions the distributions of goodness-of-fit i...
Practical Bayesian Density Estimation Using Mixtures of Normals
Abstract Mixtures of normals provide a flexible model for estimating densities in a Bayesian framework. There are some difficulties with this model, however. First, standard ref...
CODA: convergence diagnosis and output analysis for MCMC
[1st paragraph] At first sight, Bayesian inference with Markov Chain Monte Carlo (MCMC) appears to be straightforward. The user defines a full probability model, perhaps using o...
Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters
If there are many independent, identically distributed\nobservations governed by a smooth, finite-dimensional statistical model, the\nBayes estimate and the maximum likelihood e...
Publication Info
- Year
- 1988
- Type
- article
- Volume
- 83
- Issue
- 404
- Pages
- 1023-1032
- Citations
- 1367
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/01621459.1988.10478694