Integral image (IIM) is an intermediate image representation, employed in several computer vision algorithms. Although only simple arithmetic operations are required to compute an IIM, the total number of additions increases quadratically with the input image size. For this reason, the design of hardware architectures able to accelerate the IIM computation receives a great deal of attention. Unfortunately, existing solutions are not appropriate for the integration within high-performance embedded systems, which are currently realized within modern heterogeneous CPU-FPGA System on Chips (SoCs). In this paper, we present a novel hardware architecture for accelerating the IIM computation. The proposed design outperforms existing competitors by parallelizing operations along both rows and columns of the input image. Experiments, conducted on a Zynq-7000 XC7Z020 SoC, demonstrate that the novel accelerator achieves a speed per computation unit up to 124 times higher than prior works, saving more than 70Mbits of on-chip memory resources for 1920×1080 frame resolutions.
Efficient Architecture for Integral Image Computation on Heterogeneous FPGAs
Spagnolo Fanny;Corsonello P.;Perri Stefania
2019-01-01
Abstract
Integral image (IIM) is an intermediate image representation, employed in several computer vision algorithms. Although only simple arithmetic operations are required to compute an IIM, the total number of additions increases quadratically with the input image size. For this reason, the design of hardware architectures able to accelerate the IIM computation receives a great deal of attention. Unfortunately, existing solutions are not appropriate for the integration within high-performance embedded systems, which are currently realized within modern heterogeneous CPU-FPGA System on Chips (SoCs). In this paper, we present a novel hardware architecture for accelerating the IIM computation. The proposed design outperforms existing competitors by parallelizing operations along both rows and columns of the input image. Experiments, conducted on a Zynq-7000 XC7Z020 SoC, demonstrate that the novel accelerator achieves a speed per computation unit up to 124 times higher than prior works, saving more than 70Mbits of on-chip memory resources for 1920×1080 frame resolutions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.