The increasing availability of large process log repositories calls for efficient solutions for their analysis. In this regard, a novel specialized compression technique for process logs is proposed, that builds a synopsis supporting a fast estimation of aggregate queries, which are of crucial importance in exploratory and high-level analysis tasks. The synopsis is constructed by progressively merging the original log-Tuples, which represent single activity executions within the process instances, into aggregate tuples, summarizing sets of activity executions. The compression strategy is guided by a heuristic aiming at limiting the loss of information caused by summarization, while guaranteeing that no information is lost on the set of activities performed within the process instances and on the order among their executions. The selection conditions in an aggregate query are specified in terms of a graph pattern, that allows precedence relationships over activity executions to be expressed, along with conditions on their starting times, durations, and executors. The efficacy of the compression technique, in terms of capability of reducing the size of the log and of accuracy of the estimates retrieved from the synopsis, has been experimentally validated.

A compression-based framework for the efficient analysis of business process logs

Fazzinga B;FLESCA, Sergio;FURFARO, Filippo;
2015-01-01

Abstract

The increasing availability of large process log repositories calls for efficient solutions for their analysis. In this regard, a novel specialized compression technique for process logs is proposed, that builds a synopsis supporting a fast estimation of aggregate queries, which are of crucial importance in exploratory and high-level analysis tasks. The synopsis is constructed by progressively merging the original log-Tuples, which represent single activity executions within the process instances, into aggregate tuples, summarizing sets of activity executions. The compression strategy is guided by a heuristic aiming at limiting the loss of information caused by summarization, while guaranteeing that no information is lost on the set of activities performed within the process instances and on the order among their executions. The selection conditions in an aggregate query are specified in terms of a graph pattern, that allows precedence relationships over activity executions to be expressed, along with conditions on their starting times, durations, and executors. The efficacy of the compression technique, in terms of capability of reducing the size of the log and of accuracy of the estimates retrieved from the synopsis, has been experimentally validated.
2015
978-145033709-0
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/180065
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact