3D-PLANE: A 3D-stacked DRAM-based Programmable SLM Accelerator Capable of Near-Memory and Energy-Efficient Parallel Processing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Small Language Model (SLM) processing is characterized by extensive matrix multiplication workloads, which are memory-bound and data-intensive in nature, leading to a significant amount of data movement overhead and consequent energy inefficiency in traditional processors such as GPUs and TPUs. To minimize these overheads, we propose 3D-PLANE, a novel memory-centric accelerator for SLM processing that incorporates programmable processing logic within the memory to alleviate the data movement overheads. Our proposed architecture is integrated on a 3D-stacked DRAM packaging where each stacked DRAM chip is enhanced with programmable computing logic to facilitate a competitive compute-bandwidth for fast processing of SLMs. The area-efficient programmable processing elements, accompanied by workload-aware dynamic power gating, near-memory computation, and adaptive dataflow scheduling, enable us to minimize energy consumption without compromising performance. We evaluate the effectiveness of 3D-PLANE through hardware simulations and system-level analytical modeling across multiple configurations, using decoder-only variants of SLMs such as Phi-3 Mini and TinyLLaMA.

Original languageEnglish
Title of host publicationGLSVLSI 2025 - Proceedings of the Great Lakes Symposium on VLSI 2025
Pages540-546
Number of pages7
ISBN (Electronic)9798400714962
DOIs
StatePublished - 29 Jun 2025
Event35th Edition of the Great Lakes Symposium on VLSI 2025, GLSVLSI 2025 - New Orleans, United States
Duration: 30 Jun 20252 Jul 2025

Publication series

NameProceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI

Conference

Conference35th Edition of the Great Lakes Symposium on VLSI 2025, GLSVLSI 2025
Country/TerritoryUnited States
CityNew Orleans
Period30/06/252/07/25

Fingerprint

Dive into the research topics of '3D-PLANE: A 3D-stacked DRAM-based Programmable SLM Accelerator Capable of Near-Memory and Energy-Efficient Parallel Processing'. Together they form a unique fingerprint.

Cite this