Bayesian Variable Selection in Linear Regression

Abstract

Abstract This article is concerned with the selection of subsets of predictor variables in a linear regression model for the prediction of a dependent variable. It is based on a Bayesian approach, intended to be as objective as possible. A probability distribution is first assigned to the dependent variable through the specification of a family of prior distributions for the unknown parameters in the regression model. The method is not fully Bayesian, however, because the ultimate choice of prior distribution from this family is affected by the data. It is assumed that the predictors represent distinct observables; the corresponding regression coefficients are assigned independent prior distributions. For each regression coefficient subject to deletion from the model, the prior distribution is a mixture of a point mass at 0 and a diffuse uniform distribution elsewhere, that is, a "spike and slab" distribution. The random error component is assigned a normal distribution with mean 0 and standard deviation σ, where ln(σ) has a locally uniform noninformative prior distribution. The appropriate posterior probabilities are derived for each submodel. If the regression coefficients have identical priors, the posterior distribution depends only on the data and the parameter γ, which is the height of the spike divided by the height of the slab for the common prior distribution. This parameter is not assigned a probability distribution; instead, it is considered a parameter that indexes the members of a class of Bayesian methods. Graphical methods are proposed as informal guides for choosing γ, assessing the complexity of the response function and the strength of the individual predictor variables, and assessing the degree of uncertainty about the best submodel. The following plots against γ are suggested: (a) posterior probability that a particular regression coefficient is 0; (b) posterior expected number of terms in the model; (c) posterior entropy of the submodel distribution; (d) posterior predictive error; and (e) posterior probability of goodness of fit. Plots (d) and (e) are suggested as ways to choose y. The predictive error is determined using a Bayesian cross-validation approach that generates a predictive density for each observation, given all of the data except that observation, that is, a type of "leave one out" approach. The goodness-of-fit measure is the sum of the posterior probabilities of all submodels that pass a standard F test for goodness of fit relative to the full model, at a specified level of significance. The dependence of the results on the scaling of the variables is discussed, and some ways to choose the scaling constants are suggested. Examples based on a large data set arising from an energy-conservation study are given to demonstrate the application of the methods.

Keywords

Bayesian linear regressionLinear regressionBayesian multivariate linear regressionBayesian probabilityStatisticsFeature selectionProper linear modelLinear modelMathematicsEconometricsComputer scienceBayesian inferenceArtificial intelligence

Affiliated Institutions

Oak Ridge National Laboratory US

Related Publications

Choosing a multivariate model: Noncentrality and goodness of fit.

Roderick P. McDonald , Herbert W. Marsh

It is suggested that Akaike's information criterion cannot be used for model selection in real applications and that there are problems attending the definition of parsimonious ...

1990 Psychological Bulletin 1377 citations

Goodness-of-Fit Testing for Latent Class Models

Linda M. Collins , Penny L. Fidler , Stuart E. Wugalter +1 more

Latent class models with sparse contingency tables can present problems for model comparison and selection, because under these conditions the distributions of goodness-of-fit i...

1993 Multivariate Behavioral Research 178 citations

Practical Bayesian Density Estimation Using Mixtures of Normals

Kathryn Roeder , Larry Wasserman

Abstract Mixtures of normals provide a flexible model for estimating densities in a Bayesian framework. There are some difficulties with this model, however. First, standard ref...

1997 Journal of the American Statistical A... 473 citations

CODA: convergence diagnosis and output analysis for MCMC

Martyn Plummer , Nicky Best , Kate Cowles +1 more

[1st paragraph] At first sight, Bayesian inference with Markov Chain Monte Carlo (MCMC) appears to be straightforward. The user defines a full probability model, perhaps using o...

2006 Open Research Online (The Open Univer... 3474 citations

Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters

David A. Freedman

If there are many independent, identically distributed\nobservations governed by a smooth, finite-dimensional statistical model, the\nBayes estimate and the maximum likelihood e...

1999 The Annals of Statistics 142 citations

Publication Info

Year: 1988
Type: article
Volume: 83
Issue: 404
Pages: 1023-1032
Citations: 1367
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Bayesian Variable Selection in Linear Regression

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1367

OpenAlex

Cite This

APA Style

                            
                                    Toby J. Mitchell, 
                                
                                    John J. Beauchamp
                                
                            (1988). 
                            Bayesian Variable Selection in Linear Regression. 
                            Journal of the American Statistical Association
                            , 83
                            (404)
                            , 1023-1032.
                            https://doi.org/10.1080/01621459.1988.10478694

Identifiers

DOI: 10.1080/01621459.1988.10478694