This paper presents the design of a new dynamic modular addition circuit optimized for the integration into high-speed low-power processors-in-memory (PIMs). The proposed architecture is based on a hybrid ripple-carry/carry-look ahead/carry-bypass approach. In order to reach the required computational speed and the limited power dissipation, the circuit described here is divided into two independent submodules interfaced through dynamic latches. Furthermore, the proposed adder operates in the single instruction multiple data fashion, therefore it is able to manage different operand wordlengths. Our PIM architecture is based on slices containing 16-bit adders. Therefore, the main specification of the design described here is to minimize the effect on speed performance caused by cascading 16-bit blocks. Using a bulk CMOS UMC 0.18-μm 1.8-V process, the optimized version of the 64-bit circuit here proposed, obtained realizing a rippling chain of four 16-bit blocks, shows a power-delay product of only 38.8 pJ*ns and requires less than 4300 transistors.
Efficient addition circuits for modular design of processors-in-memory
CORSONELLO, Pasquale;PERRI, Stefania;
2005-01-01
Abstract
This paper presents the design of a new dynamic modular addition circuit optimized for the integration into high-speed low-power processors-in-memory (PIMs). The proposed architecture is based on a hybrid ripple-carry/carry-look ahead/carry-bypass approach. In order to reach the required computational speed and the limited power dissipation, the circuit described here is divided into two independent submodules interfaced through dynamic latches. Furthermore, the proposed adder operates in the single instruction multiple data fashion, therefore it is able to manage different operand wordlengths. Our PIM architecture is based on slices containing 16-bit adders. Therefore, the main specification of the design described here is to minimize the effect on speed performance caused by cascading 16-bit blocks. Using a bulk CMOS UMC 0.18-μm 1.8-V process, the optimized version of the 64-bit circuit here proposed, obtained realizing a rippling chain of four 16-bit blocks, shows a power-delay product of only 38.8 pJ*ns and requires less than 4300 transistors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.