Explaining predictions of classifiers is a fundamental problem in eXplainable Artificial Intelligence (XAI). LIME (for Local Interpretable Model-agnostic Explanations) is a recently proposed XAI technique able to explain any classifier by providing an interpretable model which approximates the black-box locally to the instance under consideration. In order to build interpretable local models, LIME requires the user to explicitly define a space of interpretable components, also called artefacts, associated with the input instance. To reconstruct local black-box behaviour, the instance neighbourhood is explored by generating instance neighbours as random subsets of the provided artefacts. In this work we note that the above depicted strategy has two main flaws: first, it requires user intervention in the definition of the interpretable space and, second, the local explanation is limited to be expressed in terms the user-provided artefacts. To overcome these two limitations, in this work we propose S-LIME, a variant of the basic LIME method exploiting unsupervised learning to replace user-provided interpretable components with self-generated semantic features. This characteristics enables our approach to sample instance neighbours in a more semantic-driven fashion and to greatly reduce the bias associated with explanations. We demonstrate the applicability and effectiveness of our proposal in the text classification domain. Comparison with the baseline highlights superior quality of the explanations provided adopting our strategy.
Local Interpretable Classifier Explanations with Self-generated Semantic Features
Angiulli F.
;Fassetti F.;Nistico' S.
2021-01-01
Abstract
Explaining predictions of classifiers is a fundamental problem in eXplainable Artificial Intelligence (XAI). LIME (for Local Interpretable Model-agnostic Explanations) is a recently proposed XAI technique able to explain any classifier by providing an interpretable model which approximates the black-box locally to the instance under consideration. In order to build interpretable local models, LIME requires the user to explicitly define a space of interpretable components, also called artefacts, associated with the input instance. To reconstruct local black-box behaviour, the instance neighbourhood is explored by generating instance neighbours as random subsets of the provided artefacts. In this work we note that the above depicted strategy has two main flaws: first, it requires user intervention in the definition of the interpretable space and, second, the local explanation is limited to be expressed in terms the user-provided artefacts. To overcome these two limitations, in this work we propose S-LIME, a variant of the basic LIME method exploiting unsupervised learning to replace user-provided interpretable components with self-generated semantic features. This characteristics enables our approach to sample instance neighbours in a more semantic-driven fashion and to greatly reduce the bias associated with explanations. We demonstrate the applicability and effectiveness of our proposal in the text classification domain. Comparison with the baseline highlights superior quality of the explanations provided adopting our strategy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.