Sparse General matrix multiplication (SpGEMM) is a fundamental kernel in many scientific and engineering fields, including Artificial Intelligence (AI). However, its intrinsic computation complexity presents substantial challenges, making efficient hardware implementation particularly difficult. This paper proposes SPARCAM, a novel SpGEMM accelerator, developed and optimized for very energy-efficient AI edge applications. SPARCAM is designed using low-power dense Gain Cell embedded DRAM (GC-eDRAM) technology, a processing near memory paradigm, and a modified outer product matrix multiplication algorithm. Despite its quite limited peak theoretical performance, SPARCAM achieves very high energy efficiency due to its low-power architecture and almost 100% utilization of its computing resources. Designed in a commercial 28 nm FDSOI technology, SPARCAM achieves 13.9× speedup over a high-performance embedded CPU when processing large-scale sparse matrices. When multiplying limited-size sparse matrices, SPARCAM obtains 193× speedup over high-performance GPU. SPARCAM reaches about 4.3 orders-of-magnitude, on average, higher energy benefits, and 1892×, 181×, 2×, and 3471×, higher energy efficiency (over CPU) compared with state-of-the-art SpGEMM accelerators SpArch, OuterSPACE, MatRaptor, and high-performance GPU, respectively.
SPARCAM: Sparse matrix multiplication accelerator using multi-port dynamic CAM
Garzon, Esteban
;Zambrano, Benjamin;Lanuzza, Marco;Teman, Adam;
2026-01-01
Abstract
Sparse General matrix multiplication (SpGEMM) is a fundamental kernel in many scientific and engineering fields, including Artificial Intelligence (AI). However, its intrinsic computation complexity presents substantial challenges, making efficient hardware implementation particularly difficult. This paper proposes SPARCAM, a novel SpGEMM accelerator, developed and optimized for very energy-efficient AI edge applications. SPARCAM is designed using low-power dense Gain Cell embedded DRAM (GC-eDRAM) technology, a processing near memory paradigm, and a modified outer product matrix multiplication algorithm. Despite its quite limited peak theoretical performance, SPARCAM achieves very high energy efficiency due to its low-power architecture and almost 100% utilization of its computing resources. Designed in a commercial 28 nm FDSOI technology, SPARCAM achieves 13.9× speedup over a high-performance embedded CPU when processing large-scale sparse matrices. When multiplying limited-size sparse matrices, SPARCAM obtains 193× speedup over high-performance GPU. SPARCAM reaches about 4.3 orders-of-magnitude, on average, higher energy benefits, and 1892×, 181×, 2×, and 3471×, higher energy efficiency (over CPU) compared with state-of-the-art SpGEMM accelerators SpArch, OuterSPACE, MatRaptor, and high-performance GPU, respectively.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


