Designing a Parallel Relational Data Warehouse (PRDW) consists of a set of tasks: (i) choosing the hardware architecture; (ii) fragmenting the data warehouse schema; (iii) allocating the generated fragments; (iv) replicating fragments in order to ensure high performance; (v) defining the strategies for load balancing and query processing. The major drawback of this life-cycle is the fact that it does not consider the inter-dependency among sub-problems related to the design of PRDW, and it makes use of heterogeneous metrics to evaluate the “quality” of the final design. In previous research efforts, we introduced an analytical cost model for parallel OLAP query processing in cluster environments. In a second experience, we have taken into account the inter-dependency existing between fragmentation and allocation. In this paper, we propose a novel methodology, called F&A&R, which further extends previous results, and defines an approach where the main PRDW design phases (i.e., fragmentation, allocation, and replication) are performed simultaneously, in a global fashion. In particular, our approach determines whether the fragmentation pattern currently generated is relevant to the allocation process or not. An original method of supporting data replication, based on fuzzy k-means clustering, is also proposed and successfully integrated within the whole design framework. Finally, we experimentally assessed the performance of F&A&R against a well-known data warehouse benchmark, with very promising results.
A Global Paradigm for Designing Parallel Relational Data Warehouses in Distributed Environments
Cuzzocrea Alfredo
2014-01-01
Abstract
Designing a Parallel Relational Data Warehouse (PRDW) consists of a set of tasks: (i) choosing the hardware architecture; (ii) fragmenting the data warehouse schema; (iii) allocating the generated fragments; (iv) replicating fragments in order to ensure high performance; (v) defining the strategies for load balancing and query processing. The major drawback of this life-cycle is the fact that it does not consider the inter-dependency among sub-problems related to the design of PRDW, and it makes use of heterogeneous metrics to evaluate the “quality” of the final design. In previous research efforts, we introduced an analytical cost model for parallel OLAP query processing in cluster environments. In a second experience, we have taken into account the inter-dependency existing between fragmentation and allocation. In this paper, we propose a novel methodology, called F&A&R, which further extends previous results, and defines an approach where the main PRDW design phases (i.e., fragmentation, allocation, and replication) are performed simultaneously, in a global fashion. In particular, our approach determines whether the fragmentation pattern currently generated is relevant to the allocation process or not. An original method of supporting data replication, based on fuzzy k-means clustering, is also proposed and successfully integrated within the whole design framework. Finally, we experimentally assessed the performance of F&A&R against a well-known data warehouse benchmark, with very promising results.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.