Sparse General matrix multiplication (SpGEMM) is a fundamental kernel in many scientific and engineering fields, including Artificial Intelligence (AI). However, its intrinsic computation complexity presents substantial challenges, making efficient hardware implementation particularly difficult. This paper proposes SPARCAM, a novel SpGEMM accelerator, developed and optimized for very energy-efficient AI edge applications. SPARCAM is designed using low-power dense Gain Cell embedded DRAM (GC-eDRAM) technology, a processing near memory paradigm, and a modified outer product matrix multiplication algorithm. Despite its quite limited peak theoretical performance, SPARCAM achieves very high energy efficiency due to its low-power architecture and almost 100% utilization of its computing resources. Designed in a commercial 28 nm FDSOI technology, SPARCAM achieves 13.9× speedup over a high-performance embedded CPU when processing large-scale sparse matrices. When multiplying limited-size sparse matrices, SPARCAM obtains 193× speedup over high-performance GPU. SPARCAM reaches about 4.3 orders-of-magnitude, on average, higher energy benefits, and 1892×, 181×, 2×, and 3471×, higher energy efficiency (over CPU) compared with state-of-the-art SpGEMM accelerators SpArch, OuterSPACE, MatRaptor, and high-performance GPU, respectively.

SPARCAM: Sparse matrix multiplication accelerator using multi-port dynamic CAM

Garzon, Esteban
;
Zambrano, Benjamin;Lanuzza, Marco;Teman, Adam;
2026-01-01

Abstract

Sparse General matrix multiplication (SpGEMM) is a fundamental kernel in many scientific and engineering fields, including Artificial Intelligence (AI). However, its intrinsic computation complexity presents substantial challenges, making efficient hardware implementation particularly difficult. This paper proposes SPARCAM, a novel SpGEMM accelerator, developed and optimized for very energy-efficient AI edge applications. SPARCAM is designed using low-power dense Gain Cell embedded DRAM (GC-eDRAM) technology, a processing near memory paradigm, and a modified outer product matrix multiplication algorithm. Despite its quite limited peak theoretical performance, SPARCAM achieves very high energy efficiency due to its low-power architecture and almost 100% utilization of its computing resources. Designed in a commercial 28 nm FDSOI technology, SPARCAM achieves 13.9× speedup over a high-performance embedded CPU when processing large-scale sparse matrices. When multiplying limited-size sparse matrices, SPARCAM obtains 193× speedup over high-performance GPU. SPARCAM reaches about 4.3 orders-of-magnitude, on average, higher energy benefits, and 1892×, 181×, 2×, and 3471×, higher energy efficiency (over CPU) compared with state-of-the-art SpGEMM accelerators SpArch, OuterSPACE, MatRaptor, and high-performance GPU, respectively.
2026
CAM
Gain cell
GC-eDRAM
Hardware acceleration on edge
Multi-port
Sparse matrices
Sparse matrix multiplication
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/400817
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact