Optimized Risk Scores | RDL Research Database

Abstract

Risk scores are simple classification models that let users quickly assess risk by adding, subtracting, and multiplying a few small numbers. Such models are widely used in healthcare and criminal justice, but are often built ad hoc. In this paper, we present a principled approach to learn risk scores that are fully optimized for feature selection, integer coefficients, and operational constraints. We formulate the risk score problem as a mixed integer nonlinear program, and present a new cutting plane algorithm to efficiently recover its optimal solution. Our approach can fit optimized risk scores in a way that scales linearly with the sample size of a dataset, provides a proof of optimality, and obeys complex constraints without parameter tuning. We illustrate these benefits through an extensive set of numerical experiments, and an application where we build a customized risk score for ICU seizure prediction.

Keywords

Integer (computer science)Set (abstract data type)Computer scienceCutting-plane methodSelection (genetic algorithm)Feature (linguistics)Simple (philosophy)Mathematical optimizationArtificial intelligenceInteger programmingAlgorithmMathematics

Affiliated Institutions

Related Publications

Least angle regression

Bradley Efron , Trevor Hastie , Iain M. Johnstone +1 more

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to whi...

2004 The Annals of Statistics 9329 citations

Feature Selection via Mathematical Programming

Patricia Bradley , O. L. Mangasarian , W. Nick Street

The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible is formulated a...

1998 INFORMS journal on computing 209 citations

Feature selection for high-dimensional genomic microarray data

Eric P. Xing , Michael I. Jordan , Richard M. Karp

We report on the successful application of feature selection methods to a classification problem in molecular biology involving only 72 data points in a 7130 dimensional space. ...

2001 628 citations

Covariance selection for nonchordal graphs via chordal embedding

Joachim Dahl , Lieven Vandenberghe , Vwani Roychowdhury

We describe algorithms for maximum likelihood estimation of Gaussian graphical models with conditional independence constraints. This problem is also known as covariance selecti...

2008 Optimization methods & software 117 citations

A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization

Maria Fernanda Caropreso , Stan Matwin , Fabrizio Sebastiani

In this work we investigate the usefulness of n-grams for document indexing in text categorization (TCi We call-gram a set g k of n word stems, and we say that g k occurs in a d...

2001 180 citations

Publication Info

Year: 2017
Type: article
Pages: 1125-1134
Citations: 48
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Optimized Risk Scores

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Berk Ustun, 
                                
                                    Cynthia Rudin
                                
                            (2017). 
                            Optimized Risk Scores. 
                            
                            , 1125-1134.
                            https://doi.org/10.1145/3097983.3098161

Identifiers

DOI: 10.1145/3097983.3098161