Application domains, such as machine learning and big data analytics, impose significant computational challenges to contemporary Von Neumann architectures. To address this issue, logic-in-memory (LiM) has been raised as a promising alternative that targets computing within memory arrays, aimed at alleviating the memory wall, optimizing data transfer, and enabling massive parallelism. Spin-transfer torque magnetic tunnel junction (STT-MTJ) based memory is an emerging memory technology that enables efficient processing using memory. This paper proposes AM5, a novel LiM architecture leveraging MRAM NAND crossbar technology to support in-memory arithmetic operations efficiently. The proposed LiM scheme is designed using a commercial 28nm process node and a Verilog-A-based double-barrier MTJ compact model. Evaluation results show that AM5 consumes about 98 fJ/40.7 fJ/DMTJ per evaluation/write cycle (4.2-4.4 ns/1.9 ns). Additionally, the proposed architecture proposes an in-situ error correction mechanism to mitigate variability, yielding reliable arithmetic operations. These findings show better energy (∼ 6× lower, on average) and competitive latency (∼ 1.3× faster, on average) figures of AM5 compared to other LiM designs based on MTJ-based technology. When used as a LiM unit to perform inference in adder attention Vision Transformer networks, AM5 consumes about one-tenth of the energy required by a processor-centric unit.
AM5: Bulk Logic-in-Memory Using MRAM NAND Crossbar
Garzon E.;Lanuzza M.;
In corso di stampa
Abstract
Application domains, such as machine learning and big data analytics, impose significant computational challenges to contemporary Von Neumann architectures. To address this issue, logic-in-memory (LiM) has been raised as a promising alternative that targets computing within memory arrays, aimed at alleviating the memory wall, optimizing data transfer, and enabling massive parallelism. Spin-transfer torque magnetic tunnel junction (STT-MTJ) based memory is an emerging memory technology that enables efficient processing using memory. This paper proposes AM5, a novel LiM architecture leveraging MRAM NAND crossbar technology to support in-memory arithmetic operations efficiently. The proposed LiM scheme is designed using a commercial 28nm process node and a Verilog-A-based double-barrier MTJ compact model. Evaluation results show that AM5 consumes about 98 fJ/40.7 fJ/DMTJ per evaluation/write cycle (4.2-4.4 ns/1.9 ns). Additionally, the proposed architecture proposes an in-situ error correction mechanism to mitigate variability, yielding reliable arithmetic operations. These findings show better energy (∼ 6× lower, on average) and competitive latency (∼ 1.3× faster, on average) figures of AM5 compared to other LiM designs based on MTJ-based technology. When used as a LiM unit to perform inference in adder attention Vision Transformer networks, AM5 consumes about one-tenth of the energy required by a processor-centric unit.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


