Evaluating aggregate range queries by accessing a compressed representation of the data is a widely adopted solution to the problem of efficiently retrieving aggregate information from large amounts of data. Although several summarization techniques have been proposed which are effective in reducing the amount of time needed for computing aggregates, querying summary data often results in dramatically inaccurate estimates, due to the difficulty of limiting the loss of information resulting from data compression. Thus, a crucial issue regarding the definition of summarization techniques is to retain a reasonable degree of approximation in reconstructing query answers. Following the idea that an effective ad-hoc solution to this problem can be found in specific application domains, in this paper we restrict our attention to the case of two-dimensional data, which is relevant for a number of applications. Our proposal is a summarization technique where blocks of data resulting from a quad-tree based partition of the two-dimensional domain are summarized into aggregate values and possibly associated with indices, i.e., compact structures providing an approximate description the original data inside them. Several experimental results are presented showing that our technique results in data synopses providing query estimates having error rates lower than other techniques tailored at data with a generic dimensionality, such as wavelets and various types of multi-dimensional histogram.

A quad-tree based multiresolution approach for two-dimensional summary data

FURFARO, Filippo;SACCA', Domenico
2011-01-01

Abstract

Evaluating aggregate range queries by accessing a compressed representation of the data is a widely adopted solution to the problem of efficiently retrieving aggregate information from large amounts of data. Although several summarization techniques have been proposed which are effective in reducing the amount of time needed for computing aggregates, querying summary data often results in dramatically inaccurate estimates, due to the difficulty of limiting the loss of information resulting from data compression. Thus, a crucial issue regarding the definition of summarization techniques is to retain a reasonable degree of approximation in reconstructing query answers. Following the idea that an effective ad-hoc solution to this problem can be found in specific application domains, in this paper we restrict our attention to the case of two-dimensional data, which is relevant for a number of applications. Our proposal is a summarization technique where blocks of data resulting from a quad-tree based partition of the two-dimensional domain are summarized into aggregate values and possibly associated with indices, i.e., compact structures providing an approximate description the original data inside them. Several experimental results are presented showing that our technique results in data synopses providing query estimates having error rates lower than other techniques tailored at data with a generic dimensionality, such as wavelets and various types of multi-dimensional histogram.
2011
database; multidimensional data; datacube compression
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/149521
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 8
social impact