The severe underrepresentation of darker skin tones in dermatological training datasets perpetuates critical healthcare disparities in melanoma detection. Dermatological AI tools that are trained on predominantly light-skinned datasets show dramatic performance degradation on darker skin tones with diagnostic accuracy for melanoma that plummets from 92% to 56%. The Pipsqueak dataset, presented in Ruga et al. (2025), highlighted that there exist fewer than 20 diagnostic-quality melanoma images from Fitzpatrick skin types V-VI across public available datasets. The ideal solutions is collecting real data, but this would require years and in the meantime homogeneous algorithms continue clinical deployment today. This paper introduces: the HAM-SyntheticDarker dataset, a synthetic dataset generated through controlled color-luminosity matching, and the HAM-HybridEquity dataset, obtained combining a real and a synthetic dataset, to embrace equity. This study extends the paper in Ruga et al. (2025) and conducts a series of experiments, using the MultiExCam framework (Ruga et al., 2026), to document the benefits and limitations of synthetic diversity and how it can overcome bias and promote fairness. The achieved results highlight that synthetic diversity cannot substitute authentic diversity. The paper also reveals how models trained on darker skin generalize better to lighter skin than converse, revealing directional representation biases and an empirical evidence that synthetic diversity is supplementary rather than substitutive, as it offers modest interim improvements. Therefore, the ethical imperative remains: developing dermatological imaging datasets that represent the full spectrum of human skin diversity.
Reasoning on the gap between synthetic and authentic diversity: The limits of computational solutions to representation bias
Ruga, Tommaso;Zumpano, Ester;
2026-01-01
Abstract
The severe underrepresentation of darker skin tones in dermatological training datasets perpetuates critical healthcare disparities in melanoma detection. Dermatological AI tools that are trained on predominantly light-skinned datasets show dramatic performance degradation on darker skin tones with diagnostic accuracy for melanoma that plummets from 92% to 56%. The Pipsqueak dataset, presented in Ruga et al. (2025), highlighted that there exist fewer than 20 diagnostic-quality melanoma images from Fitzpatrick skin types V-VI across public available datasets. The ideal solutions is collecting real data, but this would require years and in the meantime homogeneous algorithms continue clinical deployment today. This paper introduces: the HAM-SyntheticDarker dataset, a synthetic dataset generated through controlled color-luminosity matching, and the HAM-HybridEquity dataset, obtained combining a real and a synthetic dataset, to embrace equity. This study extends the paper in Ruga et al. (2025) and conducts a series of experiments, using the MultiExCam framework (Ruga et al., 2026), to document the benefits and limitations of synthetic diversity and how it can overcome bias and promote fairness. The achieved results highlight that synthetic diversity cannot substitute authentic diversity. The paper also reveals how models trained on darker skin generalize better to lighter skin than converse, revealing directional representation biases and an empirical evidence that synthetic diversity is supplementary rather than substitutive, as it offers modest interim improvements. Therefore, the ethical imperative remains: developing dermatological imaging datasets that represent the full spectrum of human skin diversity.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


