Abstract
Effect size measures are used to quantify treatment effects or associations between variables. Such measures, of which >70 have been described in the literature, include unstandardized and standardized differences in means, risk differences, risk ratios, odds ratios, or correlations. While null hypothesis significance testing is the predominant approach to statistical inference on effect sizes, results of such tests are often misinterpreted, provide no information on the magnitude of the estimate, and tell us nothing about the clinically importance of an effect. Hence, researchers should not merely focus on statistical significance but should also report the observed effect size. However, all samples are to some degree affected by randomness, such that there is a certain uncertainty on how well the observed effect size represents the actual magnitude and direction of the effect in the population. Therefore, point estimates of effect sizes should be accompanied by the entire range of plausible values to quantify this uncertainty. This facilitates assessment of how large or small the observed effect could actually be in the population of interest, and hence how clinically important it could be. This tutorial reviews different effect size measures and describes how confidence intervals can be used to address not only the statistical significance but also the clinical significance of the observed effect or association. Moreover, we discuss what P values actually represent, and how they provide supplemental information about the significant versus nonsignificant dichotomy. This tutorial intentionally focuses on an intuitive explanation of concepts and interpretation of results, rather than on the underlying mathematical theory or concepts.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
Effect size, confidence interval and statistical significance: a practical guide for biologists
Abstract Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly,...
Statistical Problems in the Reporting of Clinical Trials
Reports of clinical trials often contain a wealth of data comparing treatments. This can lead to problems in interpretation, particularly when significance testing is used exten...
Explaining heterogeneity in meta-analysis: a comparison of methods
Exploring the possible reasons for heterogeneity between studies is an important aspect of conducting a meta-analysis. This paper compares a number of methods which can be used ...
Goodness-of-fit measures of <i>R</i> <sup>2</sup> for repeated measures mixed effect models
Linear mixed effects model (LMEM) is efficient in modeling repeated measures longitudinal data. However, little research has been done in developing goodness-of-fit measures tha...
PLS, Small Sample Size, and Statistical Power in MIS Research
There is a pervasive belief in the Management Information Systems (MIS) field that Partial Least Squares (PLS) has special abilities that make it more appropriate than other tec...
Publication Info
- Year
- 2018
- Type
- review
- Volume
- 126
- Issue
- 3
- Pages
- 1068-1072
- Citations
- 184
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1213/ane.0000000000002798
- PMID
- 29337724
- PMCID
- PMC5811238