A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003

Peter C. Austin

doi:10.1002/sim.3150

Abstract

Abstract Propensity‐score methods are increasingly being used to reduce the impact of treatment‐selection bias in the estimation of treatment effects using observational data. Commonly used propensity‐score methods include covariate adjustment using the propensity score, stratification on the propensity score, and propensity‐score matching. Empirical and theoretical research has demonstrated that matching on the propensity score eliminates a greater proportion of baseline differences between treated and untreated subjects than does stratification on the propensity score. However, the analysis of propensity‐score‐matched samples requires statistical methods appropriate for matched‐pairs data. We critically evaluated 47 articles that were published between 1996 and 2003 in the medical literature and that employed propensity‐score matching. We found that only two of the articles reported the balance of baseline characteristics between treated and untreated subjects in the matched sample and used correct statistical methods to assess the degree of imbalance. Thirteen (28 per cent) of the articles explicitly used statistical methods appropriate for the analysis of matched data when estimating the treatment effect and its statistical significance. Common errors included using the log‐rank test to compare Kaplan–Meier survival curves in the matched sample, using Cox regression, logistic regression, chi‐squared tests, t ‐tests, and Wilcoxon rank sum tests in the matched sample, thereby failing to account for the matched nature of the data. We provide guidelines for the analysis and reporting of studies that employ propensity‐score matching. Copyright © 2007 John Wiley & Sons, Ltd.

Keywords

Propensity score matchingCovariateWilcoxon signed-rank testStatisticsObservational studyMatching (statistics)Selection biasLogistic regressionSample size determinationStatistical significanceMedicineMathematicsMann–Whitney U test

Affiliated Institutions

Related Publications

Influence of disease‐modifying therapy on radiographic outcome in inflammatory polyarthritis at five years: Results from a large observational inception study

Marwan Bukhari , Nicola Wiles , Mark Lunt +4 more

Abstract Objective To determine the effect of early treatment with disease‐modifying antirheumatic drugs (DMARDs) in reducing radiographic progression over a 5‐year period in pa...

2003 Arthritis & Rheumatism 170 citations

Sample size calculations for ordered categorical data

John Whitehead

Abstract Many clinical trials yield data on an ordered categorical scale such as very good, good, moderate, poor . Under the assumption of proportional odds, such data can be an...

1993 Statistics in Medicine 283 citations

Computing Distributions for Exact Logistic Regression

Karim F. Hirji , Cyrus R. Mehta , Nitin R. Patel

Abstract Logistic regression is a commonly used technique for the analysis of retrospective and prospective epidemiological and clinical studies with binary response variables. ...

1987 Journal of the American Statistical A... 292 citations

Why Propensity Scores Should Not Be Used for Matching

Gary King , Richard A. Nielsen

We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus ...

2019 Political Analysis 1505 citations

Biostatistical methods : the assessment of relative risks

John M. Lachin

Biostatistics and Biomedical Science Relative Risk Estimates and Tests for Two Independent Groups Sample Size, Power, and Efficiency Stratified-Adjusted Analysis for Two Indepen...

2000 273 citations

Publication Info

Year: 2007
Type: review
Volume: 27
Issue: 12
Pages: 2037-2049
Citations: 1229
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1229

OpenAlex

Cite This

APA Style

                            
                                    Peter C. Austin
                                
                            (2007). 
                            A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003. 
                            Statistics in Medicine
                            , 27
                            (12)
                            , 2037-2049.
                            https://doi.org/10.1002/sim.3150

Identifiers

DOI: 10.1002/sim.3150