We extend the panorama of performance analyses of CUDA, OpenCL and SYCL for the execution of Cellular Automata. To this end, we apply the SciddicaT landslide model to a real event by considering two complex topographic surfaces of different granularity, thus resulting in two simulations of different computing loads. For each technology, we developed a global memory and two tiled implementations of SciddicaT by adopting the Nvidia nvcc compiler for CUDA, the Nvidia implementation of the OpenCL standard and the CUDA back-end of the Intel DPC++ compiler for SYCL. The experiments, performed on three Nvidia accelerators, point out from good to optimal performances of SYCL compared to CUDA according to the newer device’s architecture. The carried-out Roofline analysis evidences high cache effects, pointing out greater advantages of tiled implementations for older architectures.
A Performance Analysis of Leading Many-Core Technologies for Cellular Automata Execution
De Rango A.
;D'Ambrosio D.
;Senatore A.;Mendicino G.;
2024-01-01
Abstract
We extend the panorama of performance analyses of CUDA, OpenCL and SYCL for the execution of Cellular Automata. To this end, we apply the SciddicaT landslide model to a real event by considering two complex topographic surfaces of different granularity, thus resulting in two simulations of different computing loads. For each technology, we developed a global memory and two tiled implementations of SciddicaT by adopting the Nvidia nvcc compiler for CUDA, the Nvidia implementation of the OpenCL standard and the CUDA back-end of the Intel DPC++ compiler for SYCL. The experiments, performed on three Nvidia accelerators, point out from good to optimal performances of SYCL compared to CUDA according to the newer device’s architecture. The carried-out Roofline analysis evidences high cache effects, pointing out greater advantages of tiled implementations for older architectures.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.