Transposed convolution is a crucial operation in several computer vision applications, including emerging Convolutional Neural Networks for super-resolution, generative adversarial and segmentation tasks. Such algorithms deal with high computational loads and memory requirements, which hinder their implementation in real-time and power-constrained embedded systems. In addition, they may adopt different kernel sizes along the network, thus making the design of flexible yet efficient hardware architectures highly desirable. This paper presents a reconfigurable accelerator able to runtime adapt its computational capabilities to perform transposed convolution with different kernel sizes. When accommodated within the Xilinx XC7Z020 and XC7K410T chips, the proposed design dissipates less than 95 mW at 125MHz and 179 mW at 250MHz, exhibiting a throughput of 1.95 and 3.9 Giga output per second, respectively. Both the implementations overcome state-of-the-art counterparts, achieving an energy efficiency up to 4.4 times higher. When used to accelerate the Fast Super Resolution Convolutional Neural Networks, the novel reconfigurable architecture achieves an energy efficiency at least 23% better than the competitors.

Runtime Reconfigurable Hardware Accelerator for Energy-Efficient Transposed Convolutions

Spagnolo F.;Perri S.
2022-01-01

Abstract

Transposed convolution is a crucial operation in several computer vision applications, including emerging Convolutional Neural Networks for super-resolution, generative adversarial and segmentation tasks. Such algorithms deal with high computational loads and memory requirements, which hinder their implementation in real-time and power-constrained embedded systems. In addition, they may adopt different kernel sizes along the network, thus making the design of flexible yet efficient hardware architectures highly desirable. This paper presents a reconfigurable accelerator able to runtime adapt its computational capabilities to perform transposed convolution with different kernel sizes. When accommodated within the Xilinx XC7Z020 and XC7K410T chips, the proposed design dissipates less than 95 mW at 125MHz and 179 mW at 250MHz, exhibiting a throughput of 1.95 and 3.9 Giga output per second, respectively. Both the implementations overcome state-of-the-art counterparts, achieving an energy efficiency up to 4.4 times higher. When used to accelerate the Fast Super Resolution Convolutional Neural Networks, the novel reconfigurable architecture achieves an energy efficiency at least 23% better than the competitors.
2022
Convolutional Neural Networks
energy-efficient designs
FPGA
runtime reconfiguration
transposed convolution
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/370838
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact