Abstract

Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model $f(\mathbf{x})=\mathbf{x}^{T}β$ with a fixed coefficient vector $β$) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.

Keywords

Class (philosophy)Variable (mathematics)Computer scienceArtificial intelligenceMachine learningEconometricsMathematics

Affiliated Institutions

Related Publications

Stochastic Complexity and Modeling

As a modification of the notion of algorithmic complexity, the stochastic complexity of a string of data, relative to a class of probabilistic models, is defined to be the fewes...

1986 The Annals of Statistics 943 citations

Publication Info

Year
2018
Type
preprint
Citations
1147
Access
Closed

External Links

Citation Metrics

1147
OpenAlex

Cite This

Aaron Fisher, Cynthia Rudin, Francesca Dominici (2018). All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.1801.01489

Identifiers

DOI
10.48550/arxiv.1801.01489