Improving local interpretable classifier explanations exploiting self-generated semantic features

IRIS

Explaining predictions of classifiers is a fundamental problem in eXplainable Artificial Intelligence (XAI). LIME (for Local Interpretable Model-agnostic Explanations) is a popular XAI technique able to explain any classifier by providing an interpretable model which approximates the black box locally to the instance under consideration. In order to build interpretable local models, LIME requires the user to explicitly define a space of interpretable components, also called artefacts, associated with the input instance. To reconstruct local black-box behaviour, the instance neighbourhood is explored by generating instance neighbours as random subsets of the provided artefacts. In this work, we note that the above-depicted strategy has a limitation given by the fact that the local explanation is limited to be expressed only in terms of object artefacts. To overcome this limitation, in this work we propose S-LIME, a variant of the basic LIME method exploiting unsupervised learning to replace object artefacts with self-generated semantic features in neighbourhood generation. This characteristic enables our approach to sample instance neighbours in a more semantic-driven fashion and greatly reduces the bias associated with explanations. We demonstrate the applicability and effectiveness of our proposal in the text classification domain. We also present a further extension for textual data in which word groups are used to obtain richer explanations. Comparison with the baseline highlights the superior quality of the explanations obtained by adopting our strategy.

Improving local interpretable classifier explanations exploiting self-generated semantic features

Angiulli, Fabrizio;Fassetti, Fabio;Nistico', Simona

2025-01-01

Abstract

Explaining predictions of classifiers is a fundamental problem in eXplainable Artificial Intelligence (XAI). LIME (for Local Interpretable Model-agnostic Explanations) is a popular XAI technique able to explain any classifier by providing an interpretable model which approximates the black box locally to the instance under consideration. In order to build interpretable local models, LIME requires the user to explicitly define a space of interpretable components, also called artefacts, associated with the input instance. To reconstruct local black-box behaviour, the instance neighbourhood is explored by generating instance neighbours as random subsets of the provided artefacts. In this work, we note that the above-depicted strategy has a limitation given by the fact that the local explanation is limited to be expressed only in terms of object artefacts. To overcome this limitation, in this work we propose S-LIME, a variant of the basic LIME method exploiting unsupervised learning to replace object artefacts with self-generated semantic features in neighbourhood generation. This characteristic enables our approach to sample instance neighbours in a more semantic-driven fashion and greatly reduces the bias associated with explanations. We demonstrate the applicability and effectiveness of our proposal in the text classification domain. We also present a further extension for textual data in which word groups are used to obtain richer explanations. Comparison with the baseline highlights the superior quality of the explanations obtained by adopting our strategy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Adversarial autoencoders
Black-box explanations
Explainable machine learning
Local interpretable explanations
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/389419

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

social impact