A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality

Peter C. Austin

doi:10.1002/sim.2770

Abstract

Abstract Clinicians and health service researchers are frequently interested in predicting patient‐specific probabilities of adverse events (e.g. death, disease recurrence, post‐operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data‐driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split‐sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times—the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0.851. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI. However, the logistic regression model had performance comparable to that of more flexible, data‐driven models such as GAMs and MARS. Copyright © 2006 John Wiley & Sons, Ltd.

Keywords

Logistic regressionMultivariate adaptive regression splinesStatisticsReceiver operating characteristicMultivariate statisticsRegressionRegression analysisLogistic model treePredictive modellingMathematicsBayesian multivariate linear regressionMedicine

Affiliated Institutions

Related Publications

Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines

Jane Elith , John R. Leathwick

ABSTRACT Current circumstances — that the majority of species distribution records exist as presence‐only data (e.g. from museums and herbaria), and that there is an established...

2007 Diversity and Distributions 337 citations

Nomograms for Risk of Hepatocellular Carcinoma in Patients With Chronic Hepatitis B Virus Infection

Hwai‐I Yang , Morris Sherman , Jun Su +4 more

Purpose Counseling patients with chronic hepatitis B virus (HBV) on their individual risk of liver disease progression is challenging. This study aimed to develop nomograms for ...

2010 Journal of Clinical Oncology 282 citations

Interpretable Classification Models for Recidivism Prediction

Jiaming Zeng , Berk Ustun , Cynthia Rudin

Summary We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent and interpretable to use for d...

2016 Journal of the Royal Statistical Soci... 141 citations

Assessment of Individual Risk of Death Using Self‐Report Data: An Artificial Neural Network Compared with a Frailty Index

Xiaowei Song , Arnold Mitnitski , Chris MacKnight +1 more

Objectives: To evaluate the potential of an artificial neural network (ANN) in predicting survival in elderly Canadians, using self‐report data. Design: Cohort study with up to ...

2004 Journal of the American Geriatrics So... 79 citations

GRAM

Edward Choi , Mohammad Taha Bahadori , Le Song +2 more

Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: <i>Data insufficiency:</i> Often in healthcare pr...

2017 628 citations

Publication Info

Year: 2006
Type: article
Volume: 26
Issue: 15
Pages: 2937-2957
Citations: 155
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

155

OpenAlex

Cite This

APA Style

                            
                                    Peter C. Austin
                                
                            (2006). 
                            A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. 
                            Statistics in Medicine
                            , 26
                            (15)
                            , 2937-2957.
                            https://doi.org/10.1002/sim.2770

Identifiers

DOI: 10.1002/sim.2770