Deep learning is rapidly becoming a strong boost to the already pervasive field of computer vision. State-of-the-art Convolutional Neural Networks reach accuracies comparable to human senses. However, the high computational load and low energy efficiency make their implementation on modern embedded systems hard. In this paper, several strategies for designing fast convolutional engines suitable to hardware accelerate Convolutional Neural Networks are evaluated. When implemented within a complete embedded system based on a Zynq Ultrascale+ SoC device, two of the proposed architectures achieve a peak performance of 131.6 GMAC/s at 234MHz running frequency, by occupying at most ∼13% of the DSP slices available on chip. All the proposed engines overcome state-of-the-art competitors, exhibiting a performance/DSP utilization ratio up to 29.6 times higher.
Designing Fast Convolutional Engines for Deep Learning Applications
Spagnolo, Fanny;Perri, Stefania;Frustaci, Fabio;Corsonello, Pasquale
2018-01-01
Abstract
Deep learning is rapidly becoming a strong boost to the already pervasive field of computer vision. State-of-the-art Convolutional Neural Networks reach accuracies comparable to human senses. However, the high computational load and low energy efficiency make their implementation on modern embedded systems hard. In this paper, several strategies for designing fast convolutional engines suitable to hardware accelerate Convolutional Neural Networks are evaluated. When implemented within a complete embedded system based on a Zynq Ultrascale+ SoC device, two of the proposed architectures achieve a peak performance of 131.6 GMAC/s at 234MHz running frequency, by occupying at most ∼13% of the DSP slices available on chip. All the proposed engines overcome state-of-the-art competitors, exhibiting a performance/DSP utilization ratio up to 29.6 times higher.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.