MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS

Abstract

Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross-validation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression.

Keywords

Computer scienceCensoring (clinical trials)Categorical variablePredictive modellingBootstrapping (finance)RegressionProportional hazards modelRegression analysisMultivariable calculusStatisticsData miningMachine learningEconometricsArtificial intelligenceMathematics

Affiliated Institutions

Related Publications

R-Squared Measures for Count Data Regression Models with Applications to Health-Care Utilization

A. Colin Cameron , Frank Windmeijer

For regression models other than the linear model, R-squared type goodness-to-fit summary statistics have been constructed for particular models using a variety of methods. The ...

1996 Journal of Business and Economic Stat... 223 citations

MCMC Methods for Multi-Response Generalized Linear Mixed Models: The<b>MCMCglmm</b><i>R</i>Package

Jarrod D. Hadfield

Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in clo...

2010 Journal of Statistical Software 4603 citations

Structural equation modeling in practice: A review and recommended two-step approach.

James C. Anderson , David W. Gerbing

In this article, we provide guidance for substantive researchers on the use of structural equation modeling in practice for theory testing and development. We present a comprehe...

1988 Psychological Bulletin 38507 citations

Model Uncertainty, Data Mining and Statistical Inference

Chris Chatfield

This paper takes a broad, pragmatic view of statistical inference to include all aspects of model formulation. The estimation of model parameters traditionally assumes that a mo...

1995 Journal of the Royal Statistical Soci... 1096 citations

Analysis of Longitudinal Data

Peter J. Diggle , Patrick J. Heagerty , Kung‐Yee Liang +1 more

1. Introduction 2. Design considerations 3. Exploring longitudinal data 4. General linear models 5. Parametric models for covariance structure 6. Analysis of variance methods 7....

2001 Psychology Press eBooks 6913 citations

Publication Info

Year: 1996
Type: review
Volume: 15
Issue: 4
Pages: 361-387
Citations: 9497
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

9497

OpenAlex

Cite This

APA Style

                            
                                    Frank E. Harrell, 
                                
                                    Kerry L. Lee, 
                                
                                    Daniel B. Mark
                                
                            (1996). 
                            MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS. 
                            Statistics in Medicine
                            , 15
                            (4)
                            , 361-387.
                            https://doi.org/10.1002/(sici)1097-0258(19960229)15:4<361::aid-sim168>3.0.co;2-4

Identifiers

DOI: 10.1002/(sici)1097-0258(19960229)15:4<361::aid-sim168>3.0.co;2-4