Abstract

Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during the generation of data, the development of algorithms, and the evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS). The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and researches. These ideas include experimental design principles of randomization and local control as well as the principle of stability to gain reproducibility and interpretability of algorithms and data results. We discuss the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors’ collaborative research.

Keywords

Representativeness heuristicInterpretabilityComputer scienceWorkflowArtificial intelligenceMachine learningData scienceScrutinyPopulationStatisticsMathematics

Affiliated Institutions

Related Publications

Comprehensible classification models

The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for consideri...

2014 ACM SIGKDD Explorations Newsletter 548 citations

A theory of the learnable

Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowl...

1984 4226 citations

Publication Info

Year
2018
Type
article
Volume
19
Issue
1
Pages
6-9
Citations
1987
Access
Closed

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

1987
OpenAlex
22
Influential

Cite This

Bin Yu, Karl Kumbier (2018). Artificial intelligence and statistics. Frontiers of Information Technology & Electronic Engineering , 19 (1) , 6-9. https://doi.org/10.1631/fitee.1700813

Identifiers

DOI
10.1631/fitee.1700813
arXiv
1712.03779

Data Quality

Data completeness: 84%