Performance Analysis and Optimization of the CUDA Implementation of the Three-Dimensional Subsurface XCA-Flow Cellular Automaton

IRIS

We present the results of a performance assessment and optimisation work regarding the CUDA implementation of the three-dimensional XCA-Flow subsurface Extended Cellular Automata model. To this end, we have considered a ten days long simulation already considered in previous works, characterized by a constant infiltration rate and a heterogeneous hydraulic conductivity field, as the benchmark. We ran the experiments on the Nvidia V100 high-performance many-core device. We have analysed essential aspects of the XCA-Flow model by updating its kernels. We applied classical tiling/shared memory techniques to the stencil-based and reduction kernels in the first step. Results suggested applying a thorough analysis of the model. Both theoretical and experimental assessments have driven this analysis, which has pointed out the need to increase the achieved warp occupancy to speed up the computation. The resulting general redesign of the application allowed for a 20.3% mean performance gain (over the CUDA block configurations considered). We also performed two Roofline analyses to characterise the kernels of the original and improved implementations in terms of arithmetic intensity and performance. Besides the improved performance, we have obtained meaningful insights about the CUDA implementation of the XCA-Flow model that could, in principle, allow for further optimisations.

Performance Analysis and Optimization of the CUDA Implementation of the Three-Dimensional Subsurface XCA-Flow Cellular Automaton

De Rango A.;Furnari L.;Senatore A.;Mendicino G.;Giordano A.;MacRi D.;Utrera G.;D'Ambrosio D.

2023-01-01

Abstract

We present the results of a performance assessment and optimisation work regarding the CUDA implementation of the three-dimensional XCA-Flow subsurface Extended Cellular Automata model. To this end, we have considered a ten days long simulation already considered in previous works, characterized by a constant infiltration rate and a heterogeneous hydraulic conductivity field, as the benchmark. We ran the experiments on the Nvidia V100 high-performance many-core device. We have analysed essential aspects of the XCA-Flow model by updating its kernels. We applied classical tiling/shared memory techniques to the stencil-based and reduction kernels in the first step. Results suggested applying a thorough analysis of the model. Both theoretical and experimental assessments have driven this analysis, which has pointed out the need to increase the achieved warp occupancy to speed up the computation. The resulting general redesign of the application allowed for a 20.3% mean performance gain (over the CUDA block configurations considered). We also performed two Roofline analyses to characterise the kernels of the original and improved implementations in terms of arithmetic intensity and performance. Besides the improved performance, we have obtained meaningful insights about the CUDA implementation of the XCA-Flow model that could, in principle, allow for further optimisations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice ISBN
	
				979-8-3503-3763-1
			
	Parole chiave
	
				CUDA Performance Assessment
3D Structured Grid
Stencil
XCA-Flow Subsurface Model
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/355597

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

social impact