Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

2020 arXiv (Cornell University) 4,425 citations

Abstract

Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.

Keywords

MarkednessMeasure (data warehouse)RecallComputer scienceStatisticCorrelationNatural language processingArtificial intelligenceClass (philosophy)Extension (predicate logic)StatisticsMathematicsPsychologyLinguisticsData miningCognitive psychology

Affiliated Institutions

Related Publications

Some Concepts of Dependence

Problems involving dependent pairs of variables $(X, Y)$ have been studied most intensively in the case of bivariate normal distributions and of $2 \\times 2$ tables. This is du...

1966 The Annals of Mathematical Statistics 1501 citations

Testing Statistical Hypotheses

This chapter presents the basic concepts and results of the theory of testing statistical hypotheses. The generalized likelihood ratio tests that are discussed can be applied to...

2021 Wiley series in probability and stati... 5220 citations

Publication Info

Year
2020
Type
preprint
Volume
2
Issue
1
Pages
37-63
Citations
4425
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

4425
OpenAlex

Cite This

David Powers (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv (Cornell University) , 2 (1) , 37-63. https://doi.org/10.48550/arxiv.2010.16061

Identifiers

DOI
10.48550/arxiv.2010.16061