Increasing attention has been paid to the detection and analysis of "deviant" instances of a business process that are connected with some kind of "hidden" undesired behavior (e.g. frauds and faults). In particular, several recent works faced the problem of inducing a binary classification model (here named deviance detection model) that can discriminate between deviant traces and normal ones, based on a set of historical log traces (labeled as either deviant or normal). Current solutions rely on applying standard classifier-induction methods to a feature-based representation of the given traces, where the features include sequence-based patterns extracted from the corresponding sequences of activities. However, there is no consensus on which kinds of patterns are the most suitable for such a task. On the other hand, mixing multiple pattern families together may produce a heterogenous, redundant and sparse representation of the traces that likely leads to poor deviance detection models. In this paper, we propose an ensemble-learning method for solving this problem, where multiple base classifiers are trained on different feature-based views of the log (each obtained by mapping the traces onto a distinguished collection of patterns). A stacking procedure is used to combine the discovered base models into an overall probabilistic model that associates any new trace with an estimate of the probability that it reflects a deviant process instance. This helps the analyst prioritize the inspection of the cases that are more likely to be deviant. The method also takes advantage of all nonstructural data available in the log, and employs a resampling mechanism to deal with the rarity of deviances in the training log. It has been conceived as the core of a comprehensive framework for detecting and analyzing business process deviances. The framework supports the analyst to investigate suspect deviances, and provides some feedback to the learning method for improving the accuracy of the discovered deviance detection models. Tests on several real-life datasets proved the validity of the approach, as concerns its capability to discover an accurate deviance detection model, and to effectively exploit new (originally unlabeled) traces via active learning and self-training mechanisms.

A Robust and Versatile Multi-View Learning Framework for the Detection of Deviant Business Process Instances

Cuzzocrea Alfredo;
2016

Abstract

Increasing attention has been paid to the detection and analysis of "deviant" instances of a business process that are connected with some kind of "hidden" undesired behavior (e.g. frauds and faults). In particular, several recent works faced the problem of inducing a binary classification model (here named deviance detection model) that can discriminate between deviant traces and normal ones, based on a set of historical log traces (labeled as either deviant or normal). Current solutions rely on applying standard classifier-induction methods to a feature-based representation of the given traces, where the features include sequence-based patterns extracted from the corresponding sequences of activities. However, there is no consensus on which kinds of patterns are the most suitable for such a task. On the other hand, mixing multiple pattern families together may produce a heterogenous, redundant and sparse representation of the traces that likely leads to poor deviance detection models. In this paper, we propose an ensemble-learning method for solving this problem, where multiple base classifiers are trained on different feature-based views of the log (each obtained by mapping the traces onto a distinguished collection of patterns). A stacking procedure is used to combine the discovered base models into an overall probabilistic model that associates any new trace with an estimate of the probability that it reflects a deviant process instance. This helps the analyst prioritize the inspection of the cases that are more likely to be deviant. The method also takes advantage of all nonstructural data available in the log, and employs a resampling mechanism to deal with the rarity of deviances in the training log. It has been conceived as the core of a comprehensive framework for detecting and analyzing business process deviances. The framework supports the analyst to investigate suspect deviances, and provides some feedback to the learning method for improving the accuracy of the discovered deviance detection models. Tests on several real-life datasets proved the validity of the approach, as concerns its capability to discover an accurate deviance detection model, and to effectively exploit new (originally unlabeled) traces via active learning and self-training mechanisms.
Business process intelligence
classification
deviation detection
Information Systems
Computer Science Applications1707 Computer Vision and Pattern Recognition
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.11770/312679
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 4
social impact