This paper presents an efficient strategy for the matrix assembly procedure in a Galerkin implementation of local maximum-entropy (LME) meshfree schemes, using graphic processor units (GPUs) as massive parallel accelerators. LME basis functions show excellent performance in the simulation of vibrational and acoustic problems, described by the Helmholtz equation. However, even considering a locally truncated support, their evaluation requires a significantly higher number of neighbors, as compared to finite elements and other meshfree methods, which poses several challenges towards a computationally efficient allocation and filling of the required sparse matrices structures. The proposed algorithm relies on a clustering strategy, and it is structured to exploit the massive parallelism of GPU architectures. Numerical examples demonstrate that this strategy enables a substantial performance boost, deriving from a synergic effect of the relatively higher computational throughput and typically larger memory bandwidth of GPUs, as compared to conventional CPUs. For the more demanding stage, we report speedups up to 1035X when using a Titan X GPU hosted in a dedicated workstation, and a more modest yet substantial acceleration up to 91X when using a mobile workstation, finally opening up to the possibility of handling industrially relevant applications not only on dedicated high-performance computing infrastructures but also on commodity hardware.
GPU accelerated initialization of local maximum-entropy meshfree methods for vibrational and acoustic problems
Cosco, F.
;Mundo, D.
2020-01-01
Abstract
This paper presents an efficient strategy for the matrix assembly procedure in a Galerkin implementation of local maximum-entropy (LME) meshfree schemes, using graphic processor units (GPUs) as massive parallel accelerators. LME basis functions show excellent performance in the simulation of vibrational and acoustic problems, described by the Helmholtz equation. However, even considering a locally truncated support, their evaluation requires a significantly higher number of neighbors, as compared to finite elements and other meshfree methods, which poses several challenges towards a computationally efficient allocation and filling of the required sparse matrices structures. The proposed algorithm relies on a clustering strategy, and it is structured to exploit the massive parallelism of GPU architectures. Numerical examples demonstrate that this strategy enables a substantial performance boost, deriving from a synergic effect of the relatively higher computational throughput and typically larger memory bandwidth of GPUs, as compared to conventional CPUs. For the more demanding stage, we report speedups up to 1035X when using a Titan X GPU hosted in a dedicated workstation, and a more modest yet substantial acceleration up to 91X when using a mobile workstation, finally opening up to the possibility of handling industrially relevant applications not only on dedicated high-performance computing infrastructures but also on commodity hardware.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.