This paper presents an efficient strategy for the matrix assembly procedure in a Galerkin implementation of local maximum-entropy (LME) meshfree schemes, using graphic processor units (GPUs) as massive parallel accelerators. LME basis functions show excellent performance in the simulation of vibrational and acoustic problems, described by the Helmholtz equation. However, even considering a locally truncated support, their evaluation requires a significantly higher number of neighbors, as compared to finite elements and other meshfree methods, which poses several challenges towards a computationally efficient allocation and filling of the required sparse matrices structures. The proposed algorithm relies on a clustering strategy, and it is structured to exploit the massive parallelism of GPU architectures. Numerical examples demonstrate that this strategy enables a substantial performance boost, deriving from a synergic effect of the relatively higher computational throughput and typically larger memory bandwidth of GPUs, as compared to conventional CPUs. For the more demanding stage, we report speedups up to 1035X when using a Titan X GPU hosted in a dedicated workstation, and a more modest yet substantial acceleration up to 91X when using a mobile workstation, finally opening up to the possibility of handling industrially relevant applications not only on dedicated high-performance computing infrastructures but also on commodity hardware.

GPU accelerated initialization of local maximum-entropy meshfree methods for vibrational and acoustic problems

Cosco, F.
;
Mundo, D.
2020-01-01

Abstract

This paper presents an efficient strategy for the matrix assembly procedure in a Galerkin implementation of local maximum-entropy (LME) meshfree schemes, using graphic processor units (GPUs) as massive parallel accelerators. LME basis functions show excellent performance in the simulation of vibrational and acoustic problems, described by the Helmholtz equation. However, even considering a locally truncated support, their evaluation requires a significantly higher number of neighbors, as compared to finite elements and other meshfree methods, which poses several challenges towards a computationally efficient allocation and filling of the required sparse matrices structures. The proposed algorithm relies on a clustering strategy, and it is structured to exploit the massive parallelism of GPU architectures. Numerical examples demonstrate that this strategy enables a substantial performance boost, deriving from a synergic effect of the relatively higher computational throughput and typically larger memory bandwidth of GPUs, as compared to conventional CPUs. For the more demanding stage, we report speedups up to 1035X when using a Titan X GPU hosted in a dedicated workstation, and a more modest yet substantial acceleration up to 91X when using a mobile workstation, finally opening up to the possibility of handling industrially relevant applications not only on dedicated high-performance computing infrastructures but also on commodity hardware.
2020
Vibrational and acoustic analysisMaximum-entropyMeshfreeMatrix assemblyGPU accelerationCUDA
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/304561
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 4
social impact