Abstract

Background and Objectives: Artificial intelligence (AI) has shown promising performance in skin-lesion classification; however, its fairness, external validity, and real-world reliability remain uncertain. This systematic review and meta-analysis evaluated the diagnostic accuracy, equity, and generalizability of AI-based dermatology systems across diverse imaging modalities and clinical settings. Materials and Methods: A comprehensive search of PubMed, Embase, Web of Science, and ClinicalTrials.gov (inception–31 October 2025) identified diagnostic accuracy studies using clinical, dermoscopic, or smartphone images. Eighteen studies (11 melanoma-focused; 7 mixed benign–malignant) met inclusion criteria. Six studies provided complete 2 × 2 contingency data for bivariate Reitsma HSROC modeling, while seven reported AUROC values with extractable variance. Risk of bias was assessed using QUADAS-2, and evidence certainty was graded using GRADE. Results: Across more than 70,000 test images, pooled sensitivity and specificity were 0.91 (95% CI 0.74–0.97) and 0.64 (95% CI 0.47–0.78), respectively, corresponding to an HSROC AUROC of 0.88 (95% CI 0.84–0.92). The AUROC-only meta-analysis yielded a similar pooled AUROC of 0.88 (95% CI 0.87–0.90). Diagnostic performance was highest in specialist settings (AUROC 0.90), followed by community care (0.85) and smartphone environments (0.81). Notably, performance was lower in darker skin tones (Fitzpatrick IV–VI: AUROC 0.82) compared with lighter skin tones (I–III: 0.89), indicating persistent fairness gaps. Conclusions: AI-based dermatology systems achieve high diagnostic accuracy but demonstrate reduced performance in darker skin tones and non-specialist environments. These findings emphasize the need for diverse training datasets, skin-tone–stratified reporting, and rigorous external validation before broad clinical deployment.

Affiliated Institutions

Related Publications

Publication Info

Year
2025
Type
article
Volume
61
Issue
12
Pages
2186-2186
Citations
0
Access
Closed

Citation Metrics

0
OpenAlex
0
Influential
0
CrossRef

Cite This

Jeng‐Wei Tjiu, Chia‐Fang Lu (2025). Equity and Generalizability of Artificial Intelligence for Skin-Lesion Diagnosis Using Clinical, Dermoscopic, and Smartphone Images: A Systematic Review and Meta-Analysis. Medicina , 61 (12) , 2186-2186. https://doi.org/10.3390/medicina61122186

Identifiers

DOI
10.3390/medicina61122186

Data Quality

Data completeness: 77%