Huge amounts of data are being generated by IoT devices, and are termed as ‘Big Data’. Big Data needs to be reliably stored and analyzed. Capturing provenance of such data provides a mechanism to explain the result of data analyt-ics, and provides greater trustworthiness to the insights gathered from data analyt-ics. Capturing the provenance of the data stored in NoSQL databases can help to understand how the data reached its current state. A holistic explanation of the re-sults of data analytics can be achieved through the combination of provenance in-formation of the data with results of analytics. This chapter explores the challenges of automatic provenance capture at the middleware level in three different contexts – in an analytics framework like MapReduce, NoSQL data stores analyzed using the MapReduce framework and in NoSQL stores with SQL front ends. The chapter also portrays how the provenance captured in the MapReduce framework is useful for improving the future executions of job re-runs and anomaly detection, apart from its use in debugging.

Automatic Big Data Provenance Capture at Middleware Level in Advanced Big Data Frameworks

CUZZOCREA, Alfredo Massimiliano
2017-01-01

Abstract

Huge amounts of data are being generated by IoT devices, and are termed as ‘Big Data’. Big Data needs to be reliably stored and analyzed. Capturing provenance of such data provides a mechanism to explain the result of data analyt-ics, and provides greater trustworthiness to the insights gathered from data analyt-ics. Capturing the provenance of the data stored in NoSQL databases can help to understand how the data reached its current state. A holistic explanation of the re-sults of data analytics can be achieved through the combination of provenance in-formation of the data with results of analytics. This chapter explores the challenges of automatic provenance capture at the middleware level in three different contexts – in an analytics framework like MapReduce, NoSQL data stores analyzed using the MapReduce framework and in NoSQL stores with SQL front ends. The chapter also portrays how the provenance captured in the MapReduce framework is useful for improving the future executions of job re-runs and anomaly detection, apart from its use in debugging.
2017
978-3-319-70102-8
Provenance
MapReduce
NoSQL Data Stores
Why Provenance
How Provenance: Foreign Data Wrapper
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/312510
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 1
social impact