Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.

BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams

Cuzzocrea A.;
2019-01-01

Abstract

Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.
2019
Batch-free
Data streams
Sequential pattern mining
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/312331
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 14
social impact