Abstract
The use of automated subset search algorithms is reviewed and issues concerning model selection and selection criteria are discussed. In addition, a Monte Carlo study is reported which presents data regarding the frequency with which authentic and noise variables are selected by automated subset algorithms. In particular, the effects of the correlation between predictor variables, the number of candidate predictor variables, the size of the sample, and the level of significance for entry and deletion of variables were studied for three automated subset algorithms: BACKWARD ELIMINATION, FORWARD SELECTION, and STEPWISE. Results indicated that: (1) the degree of correlation between the predictor variables affected the frequency with which authentic predictor variables found their way into the final model; (2) the number of candidate predictor variables affected the number of noise variables that gained entry to the model; (3) the size of the sample was of little practical importance in determining the number of authentic variables contained in the final model; and (4) the population multiple coefficient of determination could be faithfully estimated by adopting a statistic that is adjusted by the total number of candidate predictor variables rather than the number of variables in the final model.
Keywords
Affiliated Institutions
Related Publications
Latent Class Model Diagnosis
Summary. In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the i...
Interpreting the Likelihood Ratio Statistic in Factor Models When Sample Size is Small
Abstract The use of the likelihood ratio statistic in testing the goodness of fit of the exploratory factor model has no formal justification when, as is often the case in pract...
Subset Selection in Regression
OBJECTIVES Prediction, Explanation, Elimination or What? How Many Variables in the Prediction Formula? Alternatives to Using Subsets 'Black Box' Use of Best-Subsets Techniques L...
A simulation study of the number of events per variable in logistic regression analysis
We performed a Monte Carlo study to evaluate the effect of the number of events per variable (EPV) analyzed in logistic regression analysis. The simulations were based on data f...
How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power
Abstract A common question asked by researchers is, "What sample size do I need for my study?" Over the years, several rules of thumb have been proposed. In reality there is no ...
Publication Info
- Year
- 1992
- Type
- article
- Volume
- 45
- Issue
- 2
- Pages
- 265-282
- Citations
- 760
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1111/j.2044-8317.1992.tb00992.x