Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models

Daniel W. Apley; Jingyu Zhu

doi:10.1111/rssb.12377

Abstract

Summary In many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A shortcoming of black box supervised learning models (e.g. complex trees, neural networks, boosted trees, random forests, nearest neighbours, local kernel-weighted methods and support vector regression) in this regard is their lack of interpretability or transparency. Partial dependence plots, which are the most popular approach for visualizing the effects of the predictors with black box supervised learning models, can produce erroneous results if the predictors are strongly correlated, because they require extrapolation of the response at predictor values that are far outside the multivariate envelope of the training data. As an alternative to partial dependence plots, we present a new visualization approach that we term accumulated local effects plots, which do not require this unreliable extrapolation with correlated predictors. Moreover, accumulated local effects plots are far less computationally expensive than partial dependence plots. We also provide an R package ALEPlot as supplementary material to implement our proposed method.

Keywords

InterpretabilityRandom forestExtrapolationMachine learningArtificial intelligenceRegressionMultivariate statisticsBlack boxComputer scienceVisualizationSupport vector machineSupervised learningPartial least squares regressionArtificial neural networkMathematicsStatistics

Affiliated Institutions

Northwestern University US

Related Publications

A working guide to boosted regression trees

Jane Elith , John R. Leathwick , Trevor Hastie

1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonline...

2008 Journal of Animal Ecology 6183 citations

Model Agnostic Supervised Local Explanations

Gregory Plumb , Denali Molitor , Ameet Talwalkar

Model interpretability is an increasingly important component of practical machine learning. Some of the most common forms of interpretability systems are example-based, local, ...

2018 arXiv (Cornell University) 113 citations

Interpretable Classification Models for Recidivism Prediction

Jiaming Zeng , Berk Ustun , Cynthia Rudin

Summary We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent and interpretable to use for d...

2016 Journal of the Royal Statistical Soci... 141 citations

Classification and Regression by randomForest

Andy Liaw , Matthew C. Wiener

Recently there has been a lot of interest in “ensemble learning” — methods that generate many classifiers and aggregate their results. Two well-known methods are boosting (see, ...

2007 18390 citations

Binarized Support Vector Machines

Emilio Carrizosa , Belén Martín-Barragán , Dolores Romero Morales

The widely used support vector machine (SVM) method has shown to yield very good results in supervised classification problems. Other methods such as classification trees have b...

2009 INFORMS journal on computing 36 citations

Publication Info

Year: 2020
Type: article
Volume: 82
Issue: 4
Pages: 1059-1086
Citations: 1142
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1142

OpenAlex

Cite This

APA Style

                            
                                    Daniel W. Apley, 
                                
                                    Jingyu Zhu
                                
                            (2020). 
                            Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. 
                            Journal of the Royal Statistical Society Series B (Statistical Methodology)
                            , 82
                            (4)
                            , 1059-1086.
                            https://doi.org/10.1111/rssb.12377

Identifiers

DOI: 10.1111/rssb.12377