Abstract
This paper takes a broad, pragmatic view of statistical inference to include all aspects of model formulation. The estimation of model parameters traditionally assumes that a model has a prespecified known form and takes no account of possible uncertainty regarding the model structure. This implicitly assumes the existence of a 'true' model, which many would regard as a fiction. In practice model uncertainty is a fact of life and likely to be more serious than other sources of uncertainty which have received far more attention from statisticians. This is true whether the model is specified on subject-matter grounds or, as is increasingly the case, when a model is formulated, fitted and checked on the same data set in an iterative, interactive way. Modern computing power allows a large number of models to be considered and data-dependent specification searches have become the norm in many areas of statistics. The term data mining may be used in this context when the analyst goes to great lengths to obtain a good fit. This paper reviews the effects of model uncertainty, such as too narrow prediction intervals, and the non-trivial biases in parameter estimates which can follow data-based modelling. Ways of assessing and overcoming the effects of model uncertainty are discussed, including the use of simulation and resampling methods, a Bayesian model averaging approach and collecting additional data wherever possible. Perhaps the main aim of the paper is to ensure that statisticians are aware of the problems and start addressing the issues even if there is no simple, general theoretical fix.
Keywords
Affiliated Institutions
Related Publications
Goodness-of-fit measures of <i>R</i> <sup>2</sup> for repeated measures mixed effect models
Linear mixed effects model (LMEM) is efficient in modeling repeated measures longitudinal data. However, little research has been done in developing goodness-of-fit measures tha...
Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses
In this paper, we develop a classical approach to model selection. Using the Kullback-Leibler Information Criterion to measure the closeness of a model to the truth, we propose ...
Latent Class Model Diagnosis
Summary. In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the i...
CODA: convergence diagnosis and output analysis for MCMC
[1st paragraph] At first sight, Bayesian inference with Markov Chain Monte Carlo (MCMC) appears to be straightforward. The user defines a full probability model, perhaps using o...
On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics
A new approach to problems of multiple significance testing was presented in Benjamini and Hochberg (1995), which calls for controlling the expected ratio of the number of erron...
Publication Info
- Year
- 1995
- Type
- article
- Volume
- 158
- Issue
- 3
- Pages
- 419-419
- Citations
- 1096
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.2307/2983440