Run-time adaptive hardware accelerator for convolutional neural networks

IRIS

State-of-the-art Convolutional Neural Networks are characterized by heterogeneous convolutional layers to proper balance accuracy and computational complexity. Run-time adaptive convolution architectures able to process feature maps with kernels of various sizes and strides are highly desirable to achieve a favorable speed/power dissipation balance. This paper presents the design of an adaptive architecture able to manage efficiently convolutional layers with different running parameters. In order to guarantee high resources utilization for all the supported kernel sizes and strides, in contrast with existing competitors, the proposed design combines non-uniform basic blocks differently customized from each other. As a further nice characteristic, the hardware architecture here presented efficiently manages both odd and even kernel sizes, useful in models also requiring transposed convolutional layers. When accommodated within a Xilinx XC7Z045 FPGA SoC device, the proposed engine reaches a peak throughput of 217.2 GOPS and dissipates about 2.75 W at the 150 MHz clock frequency.

Run-time adaptive hardware accelerator for convolutional neural networks

Sestito C.;Spagnolo F.;Corsonello P.;Perri S.

2021-01-01

Abstract

State-of-the-art Convolutional Neural Networks are characterized by heterogeneous convolutional layers to proper balance accuracy and computational complexity. Run-time adaptive convolution architectures able to process feature maps with kernels of various sizes and strides are highly desirable to achieve a favorable speed/power dissipation balance. This paper presents the design of an adaptive architecture able to manage efficiently convolutional layers with different running parameters. In order to guarantee high resources utilization for all the supported kernel sizes and strides, in contrast with existing competitors, the proposed design combines non-uniform basic blocks differently customized from each other. As a further nice characteristic, the hardware architecture here presented efficiently manages both odd and even kernel sizes, useful in models also requiring transposed convolutional layers. When accommodated within a Xilinx XC7Z045 FPGA SoC device, the proposed engine reaches a peak throughput of 217.2 GOPS and dissipates about 2.75 W at the 150 MHz clock frequency.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Parole chiave
	
				Convolutional neural networks (CNN)
Field programmable gate array (FPGA)
Heterogeneous embedded systems
Reconfigurable hardware architecture
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/328595

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

ND

social impact