Explaining predictions of classifiers is a fundamental problem in eXplainable Artificial Intelligence (XAI). LIME (for Local Interpretable Model-agnostic Explanations) is a popular XAI technique able to explain any classifier by providing an interpretable model which approximates the black box locally to the instance under consideration. In order to build interpretable local models, LIME requires the user to explicitly define a space of interpretable components, also called artefacts, associated with the input instance. To reconstruct local black-box behaviour, the instance neighbourhood is explored by generating instance neighbours as random subsets of the provided artefacts. In this work, we note that the above-depicted strategy has a limitation given by the fact that the local explanation is limited to be expressed only in terms of object artefacts. To overcome this limitation, in this work we propose S-LIME, a variant of the basic LIME method exploiting unsupervised learning to replace object artefacts with self-generated semantic features in neighbourhood generation. This characteristic enables our approach to sample instance neighbours in a more semantic-driven fashion and greatly reduces the bias associated with explanations. We demonstrate the applicability and effectiveness of our proposal in the text classification domain. We also present a further extension for textual data in which word groups are used to obtain richer explanations. Comparison with the baseline highlights the superior quality of the explanations obtained by adopting our strategy.
Improving local interpretable classifier explanations exploiting self-generated semantic features
Angiulli, Fabrizio;Fassetti, Fabio;Nistico', Simona
2025-01-01
Abstract
Explaining predictions of classifiers is a fundamental problem in eXplainable Artificial Intelligence (XAI). LIME (for Local Interpretable Model-agnostic Explanations) is a popular XAI technique able to explain any classifier by providing an interpretable model which approximates the black box locally to the instance under consideration. In order to build interpretable local models, LIME requires the user to explicitly define a space of interpretable components, also called artefacts, associated with the input instance. To reconstruct local black-box behaviour, the instance neighbourhood is explored by generating instance neighbours as random subsets of the provided artefacts. In this work, we note that the above-depicted strategy has a limitation given by the fact that the local explanation is limited to be expressed only in terms of object artefacts. To overcome this limitation, in this work we propose S-LIME, a variant of the basic LIME method exploiting unsupervised learning to replace object artefacts with self-generated semantic features in neighbourhood generation. This characteristic enables our approach to sample instance neighbours in a more semantic-driven fashion and greatly reduces the bias associated with explanations. We demonstrate the applicability and effectiveness of our proposal in the text classification domain. We also present a further extension for textual data in which word groups are used to obtain richer explanations. Comparison with the baseline highlights the superior quality of the explanations obtained by adopting our strategy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


