Hierarchical binary partitions of multi-dimensional data are investigated as a basis for the construction of effective histograms. Specifically, the impact of adopting lossless compression techniques for representing the histogram on both the accuracy and the efficiency of query answering is investigated. Compression is obtained by exploiting the hierarchical partition scheme underlying the histogram, and then introducing further restrictions on the partitioning which enable a more compact representation of bucket boundaries. Basically, these restrictions consist of constraining the splits of the partition to be laid onto regular grids defined on the buckets. Several heuristics guiding the histogram construction are also proposed, and a thorough experimental analysis comparing the accuracy of histograms resulting from combining different heuristics with different representation models (both the new compression-based and the traditional ones) is provided. The best accuracy turns out from combining our grid-constrained partitioning scheme with one of the new heuristics. Histograms resulting from this combination are compared with state-of-the-art summarization techniques, showing that the proposed approach yields lower error rates and is much less sensitive to dimensionality, and that adopting our compression scheme results in improving the efficiency of query estimation.

Compressed hierarchical binary histograms for summarizing multi-dimensional data

FURFARO, Filippo;SACCA', Domenico;
2008-01-01

Abstract

Hierarchical binary partitions of multi-dimensional data are investigated as a basis for the construction of effective histograms. Specifically, the impact of adopting lossless compression techniques for representing the histogram on both the accuracy and the efficiency of query answering is investigated. Compression is obtained by exploiting the hierarchical partition scheme underlying the histogram, and then introducing further restrictions on the partitioning which enable a more compact representation of bucket boundaries. Basically, these restrictions consist of constraining the splits of the partition to be laid onto regular grids defined on the buckets. Several heuristics guiding the histogram construction are also proposed, and a thorough experimental analysis comparing the accuracy of histograms resulting from combining different heuristics with different representation models (both the new compression-based and the traditional ones) is provided. The best accuracy turns out from combining our grid-constrained partitioning scheme with one of the new heuristics. Histograms resulting from this combination are compared with state-of-the-art summarization techniques, showing that the proposed approach yields lower error rates and is much less sensitive to dimensionality, and that adopting our compression scheme results in improving the efficiency of query estimation.
2008
Data reduction; Multi-dimensional data; Query processing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/128470
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 11
social impact