This paper presents a simple but effective strategy to implement signed binary multipliers on Field Programmable Gate Arrays (FPGAs). It is based on the radix-4 Booth's encoding logic but adopts an unconventional sequence of logic operations that allows hardware resources to be more efficiently exploited. The main idea consists in preliminarily generating incorrect partial products, which allows the encoding logic to be simplified, and then correcting them in the subsequent computational steps, without requiring additional logic resources. The adopted approach uses both the look up tables (LUTs) and the fast carry-chains (CCs) available within FPGA devices. When implemented on a Xilinx xc7v585ttfg1157–3 device, a 32 × 32 multiplier designed as proposed here achieves a maximum running frequency of ∼217 MHz consuming only ∼216pJ per operation and using 792 LUTs and 147 CCs. When adopted in the realization of a 3 × 3 Multiply-Accumulate unit, in comparison with the lowest energy consuming competitor, the proposed approach leads to an energy-delay product 5.2 % lower and allows reducing the number of utilized LUTs and carry-chains by 25.2 % and 41.4 %, respectively.

Efficient implementation of signed multipliers on FPGAs

Spagnolo, Fanny;Corsonello, Pasquale;Frustaci, Fabio;Perri, Stefania
2024-01-01

Abstract

This paper presents a simple but effective strategy to implement signed binary multipliers on Field Programmable Gate Arrays (FPGAs). It is based on the radix-4 Booth's encoding logic but adopts an unconventional sequence of logic operations that allows hardware resources to be more efficiently exploited. The main idea consists in preliminarily generating incorrect partial products, which allows the encoding logic to be simplified, and then correcting them in the subsequent computational steps, without requiring additional logic resources. The adopted approach uses both the look up tables (LUTs) and the fast carry-chains (CCs) available within FPGA devices. When implemented on a Xilinx xc7v585ttfg1157–3 device, a 32 × 32 multiplier designed as proposed here achieves a maximum running frequency of ∼217 MHz consuming only ∼216pJ per operation and using 792 LUTs and 147 CCs. When adopted in the realization of a 3 × 3 Multiply-Accumulate unit, in comparison with the lowest energy consuming competitor, the proposed approach leads to an energy-delay product 5.2 % lower and allows reducing the number of utilized LUTs and carry-chains by 25.2 % and 41.4 %, respectively.
2024
Hardware accelerators; Multipliers; FPGAs; Energy reduction; High performance computing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/366179
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact