Advanced artificial intelligence models for skin lesion classification often suffer from performance disparities when applied to images of patients with darker skin tones, largely due to underrepresentation of dark skin tone images in training datasets. In this study, we investigate this issue by evaluating a previously proposed explainable framework, MultiExCAM, trained on the widely used ISIC2018 dataset. We test its performance on Pipsqueak, a previously proposed dataset composed by skin lesion images on dark skin tones. As expected, we observe a significant drop in classification performance when the model is applied to Pipsqueak. To better understand the source of these failures, we employ explainable artificial intelligence techniques to visualize and analyze the model’s decision-making process on both datasets. Our results highlight clear differences in attention patterns and decision rationale, revealing how the lack of dark skin tone representation in the training data leads to poor generalization and biased behavior. This work emphasizes the critical role of explainable analysis in exposing and understanding model bias in clinical applications, and the necessity of inclusive datasets for fair and reliable skin lesion classification.

Underrepresentation of Dark Skin Tone in Skin Lesion Datasets: The Role of the Explainable Techniques in Assessing the Bias

Ruga, Tommaso;Zumpano, Ester;Vocaturo, Eugenio;Caroprese, Luciano
2026-01-01

Abstract

Advanced artificial intelligence models for skin lesion classification often suffer from performance disparities when applied to images of patients with darker skin tones, largely due to underrepresentation of dark skin tone images in training datasets. In this study, we investigate this issue by evaluating a previously proposed explainable framework, MultiExCAM, trained on the widely used ISIC2018 dataset. We test its performance on Pipsqueak, a previously proposed dataset composed by skin lesion images on dark skin tones. As expected, we observe a significant drop in classification performance when the model is applied to Pipsqueak. To better understand the source of these failures, we employ explainable artificial intelligence techniques to visualize and analyze the model’s decision-making process on both datasets. Our results highlight clear differences in attention patterns and decision rationale, revealing how the lack of dark skin tone representation in the training data leads to poor generalization and biased behavior. This work emphasizes the critical role of explainable analysis in exposing and understanding model bias in clinical applications, and the necessity of inclusive datasets for fair and reliable skin lesion classification.
2026
9783032057266
9783032057273
Dataset Bias
Explainable AI
Melanoma Classification
Skin Tone Diversity
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/390278
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact