State-of-the-art Convolutional Neural Networks are characterized by heterogeneous convolutional layers to proper balance accuracy and computational complexity. Run-time adaptive convolution architectures able to process feature maps with kernels of various sizes and strides are highly desirable to achieve a favorable speed/power dissipation balance. This paper presents the design of an adaptive architecture able to manage efficiently convolutional layers with different running parameters. In order to guarantee high resources utilization for all the supported kernel sizes and strides, in contrast with existing competitors, the proposed design combines non-uniform basic blocks differently customized from each other. As a further nice characteristic, the hardware architecture here presented efficiently manages both odd and even kernel sizes, useful in models also requiring transposed convolutional layers. When accommodated within a Xilinx XC7Z045 FPGA SoC device, the proposed engine reaches a peak throughput of 217.2 GOPS and dissipates about 2.75 W at the 150 MHz clock frequency.

Run-time adaptive hardware accelerator for convolutional neural networks

Sestito C.;Spagnolo F.;Corsonello P.;Perri S.
2021-01-01

Abstract

State-of-the-art Convolutional Neural Networks are characterized by heterogeneous convolutional layers to proper balance accuracy and computational complexity. Run-time adaptive convolution architectures able to process feature maps with kernels of various sizes and strides are highly desirable to achieve a favorable speed/power dissipation balance. This paper presents the design of an adaptive architecture able to manage efficiently convolutional layers with different running parameters. In order to guarantee high resources utilization for all the supported kernel sizes and strides, in contrast with existing competitors, the proposed design combines non-uniform basic blocks differently customized from each other. As a further nice characteristic, the hardware architecture here presented efficiently manages both odd and even kernel sizes, useful in models also requiring transposed convolutional layers. When accommodated within a Xilinx XC7Z045 FPGA SoC device, the proposed engine reaches a peak throughput of 217.2 GOPS and dissipates about 2.75 W at the 150 MHz clock frequency.
2021
Convolutional neural networks (CNN)
Field programmable gate array (FPGA)
Heterogeneous embedded systems
Reconfigurable hardware architecture
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/328595
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact