Runtime Reconfigurable Hardware Accelerator for Energy-Efficient Transposed Convolutions

IRIS

Transposed convolution is a crucial operation in several computer vision applications, including emerging Convolutional Neural Networks for super-resolution, generative adversarial and segmentation tasks. Such algorithms deal with high computational loads and memory requirements, which hinder their implementation in real-time and power-constrained embedded systems. In addition, they may adopt different kernel sizes along the network, thus making the design of flexible yet efficient hardware architectures highly desirable. This paper presents a reconfigurable accelerator able to runtime adapt its computational capabilities to perform transposed convolution with different kernel sizes. When accommodated within the Xilinx XC7Z020 and XC7K410T chips, the proposed design dissipates less than 95 mW at 125MHz and 179 mW at 250MHz, exhibiting a throughput of 1.95 and 3.9 Giga output per second, respectively. Both the implementations overcome state-of-the-art counterparts, achieving an energy efficiency up to 4.4 times higher. When used to accelerate the Fast Super Resolution Convolutional Neural Networks, the novel reconfigurable architecture achieves an energy efficiency at least 23% better than the competitors.

Runtime Reconfigurable Hardware Accelerator for Energy-Efficient Transposed Convolutions

Marrazzo E.;Spagnolo F.;Perri S.

2022-01-01

Abstract

Transposed convolution is a crucial operation in several computer vision applications, including emerging Convolutional Neural Networks for super-resolution, generative adversarial and segmentation tasks. Such algorithms deal with high computational loads and memory requirements, which hinder their implementation in real-time and power-constrained embedded systems. In addition, they may adopt different kernel sizes along the network, thus making the design of flexible yet efficient hardware architectures highly desirable. This paper presents a reconfigurable accelerator able to runtime adapt its computational capabilities to perform transposed convolution with different kernel sizes. When accommodated within the Xilinx XC7Z020 and XC7K410T chips, the proposed design dissipates less than 95 mW at 125MHz and 179 mW at 250MHz, exhibiting a throughput of 1.95 and 3.9 Giga output per second, respectively. Both the implementations overcome state-of-the-art counterparts, achieving an energy efficiency up to 4.4 times higher. When used to accelerate the Fast Super Resolution Convolutional Neural Networks, the novel reconfigurable architecture achieves an energy efficiency at least 23% better than the competitors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Parole chiave
	
				Convolutional Neural Networks
energy-efficient designs
FPGA
runtime reconfiguration
transposed convolution
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/370838

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

2

social impact