LLiMe: enhancing text classifier explanations with large language models

IRIS

The widespread diffusion of text black-box classifiers necessitates explainable AI (XAI) techniques for this domain. A seminal XAI technique is Local Interpretable Model-agnostic Explanations (LIME). For text classification, LIME maps an input sentence and its neighbours into a bag of words, using a linear regressor as an interpretable model. However, this strategy has significant limitations. Neighbouring sentences are constructed solely by extracting subsets of the input sentence, which may fail to accurately capture the local decision boundary. Moreover, these subsets are not guaranteed to be representative of the classification classes, potentially leading to unbalanced or misleading interpretability. Additionally, such generated sentences might lack semantic coherence. Furthermore, the resulting explanation is often limited to confirming the relevance of a term or highlighting the impact of its removal, without providing deeper insights. This work tries to overcome these limitations by proposing LLiMean extension of LIME that exploits advances in Large Language Models (LLMs) to perform a classifier-driven generation of the neighbourhood. Our approach allows neighbours to employ a vocabulary larger than that of the input text. A generation procedure is introduced to more effectively capture the local decision boundary by ensuring generated samples span all classes involved in the classification. Additionally, an LLM-driven explanation and a counterfactual generation procedure are presented, returning the most relevant set of editing operations to influence the black-box predictor’s decision. Thus, the approach provides a richer, easier-to-interpret explanation and high-quality counterfactuals compared to standard LIME. Experiments on real datasets witness the technique’s effectiveness in providing suitable, relevant, and interpretable explanations.

LLiMe: enhancing text classifier explanations with large language models

Angiulli, Fabrizio;De Luca, Francesco;Fassetti, Fabio;Nistico', Simona

2025-01-01

Abstract

The widespread diffusion of text black-box classifiers necessitates explainable AI (XAI) techniques for this domain. A seminal XAI technique is Local Interpretable Model-agnostic Explanations (LIME). For text classification, LIME maps an input sentence and its neighbours into a bag of words, using a linear regressor as an interpretable model. However, this strategy has significant limitations. Neighbouring sentences are constructed solely by extracting subsets of the input sentence, which may fail to accurately capture the local decision boundary. Moreover, these subsets are not guaranteed to be representative of the classification classes, potentially leading to unbalanced or misleading interpretability. Additionally, such generated sentences might lack semantic coherence. Furthermore, the resulting explanation is often limited to confirming the relevance of a term or highlighting the impact of its removal, without providing deeper insights. This work tries to overcome these limitations by proposing LLiMean extension of LIME that exploits advances in Large Language Models (LLMs) to perform a classifier-driven generation of the neighbourhood. Our approach allows neighbours to employ a vocabulary larger than that of the input text. A generation procedure is introduced to more effectively capture the local decision boundary by ensuring generated samples span all classes involved in the classification. Additionally, an LLM-driven explanation and a counterfactual generation procedure are presented, returning the most relevant set of editing operations to influence the black-box predictor’s decision. Thus, the approach provides a richer, easier-to-interpret explanation and high-quality counterfactuals compared to standard LIME. Experiments on real datasets witness the technique’s effectiveness in providing suitable, relevant, and interpretable explanations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Black-box explanation
Explainable AI
Large language models
Local interpretable explanation
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/394777

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

0

social impact