Abstract

Abstract This article is concerned with the selection of subsets of predictor variables in a linear regression model for the prediction of a dependent variable. It is based on a Bayesian approach, intended to be as objective as possible. A probability distribution is first assigned to the dependent variable through the specification of a family of prior distributions for the unknown parameters in the regression model. The method is not fully Bayesian, however, because the ultimate choice of prior distribution from this family is affected by the data. It is assumed that the predictors represent distinct observables; the corresponding regression coefficients are assigned independent prior distributions. For each regression coefficient subject to deletion from the model, the prior distribution is a mixture of a point mass at 0 and a diffuse uniform distribution elsewhere, that is, a "spike and slab" distribution. The random error component is assigned a normal distribution with mean 0 and standard deviation σ, where ln(σ) has a locally uniform noninformative prior distribution. The appropriate posterior probabilities are derived for each submodel. If the regression coefficients have identical priors, the posterior distribution depends only on the data and the parameter γ, which is the height of the spike divided by the height of the slab for the common prior distribution. This parameter is not assigned a probability distribution; instead, it is considered a parameter that indexes the members of a class of Bayesian methods. Graphical methods are proposed as informal guides for choosing γ, assessing the complexity of the response function and the strength of the individual predictor variables, and assessing the degree of uncertainty about the best submodel. The following plots against γ are suggested: (a) posterior probability that a particular regression coefficient is 0; (b) posterior expected number of terms in the model; (c) posterior entropy of the submodel distribution; (d) posterior predictive error; and (e) posterior probability of goodness of fit. Plots (d) and (e) are suggested as ways to choose y. The predictive error is determined using a Bayesian cross-validation approach that generates a predictive density for each observation, given all of the data except that observation, that is, a type of "leave one out" approach. The goodness-of-fit measure is the sum of the posterior probabilities of all submodels that pass a standard F test for goodness of fit relative to the full model, at a specified level of significance. The dependence of the results on the scaling of the variables is discussed, and some ways to choose the scaling constants are suggested. Examples based on a large data set arising from an energy-conservation study are given to demonstrate the application of the methods.

Keywords

Bayesian linear regressionLinear regressionBayesian multivariate linear regressionBayesian probabilityStatisticsFeature selectionProper linear modelLinear modelMathematicsEconometricsComputer scienceBayesian inferenceArtificial intelligence

Affiliated Institutions

Related Publications

Publication Info

Year
1988
Type
article
Volume
83
Issue
404
Pages
1023-1032
Citations
1367
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1367
OpenAlex

Cite This

Toby J. Mitchell, John J. Beauchamp (1988). Bayesian Variable Selection in Linear Regression. Journal of the American Statistical Association , 83 (404) , 1023-1032. https://doi.org/10.1080/01621459.1988.10478694

Identifiers

DOI
10.1080/01621459.1988.10478694