Abstract
Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model $f(\mathbf{x})=\mathbf{x}^{T}β$ with a fixed coefficient vector $β$) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.
Keywords
Affiliated Institutions
Related Publications
Which Method Predicts Recidivism Best?: A Comparison of Statistical, Machine Learning and Data Mining Predictive Models
Summary Using criminal population conviction histories of recent offenders, prediction mod els are developed that predict three types of criminal recidivism: general recidivism,...
Variable selection – A review and recommendations for the practicing statistician
Abstract Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk...
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the t...
Unbiased Recursive Partitioning: A Conditional Inference Framework
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been kn...
Stochastic Complexity and Modeling
As a modification of the notion of algorithmic complexity, the stochastic complexity of a string of data, relative to a class of probabilistic models, is defined to be the fewes...
Publication Info
- Year
- 2018
- Type
- preprint
- Citations
- 1147
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.48550/arxiv.1801.01489