TY - GEN
T1 - 3D-PLANE
T2 - 35th Edition of the Great Lakes Symposium on VLSI 2025, GLSVLSI 2025
AU - Bavikadi, Sathwika
AU - Sutradhar, Purab Ranjan
AU - Thangellamudi, Jayanth
AU - Manoj Pudukotai Dinakarrao, Sai
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/6/29
Y1 - 2025/6/29
N2 - Small Language Model (SLM) processing is characterized by extensive matrix multiplication workloads, which are memory-bound and data-intensive in nature, leading to a significant amount of data movement overhead and consequent energy inefficiency in traditional processors such as GPUs and TPUs. To minimize these overheads, we propose 3D-PLANE, a novel memory-centric accelerator for SLM processing that incorporates programmable processing logic within the memory to alleviate the data movement overheads. Our proposed architecture is integrated on a 3D-stacked DRAM packaging where each stacked DRAM chip is enhanced with programmable computing logic to facilitate a competitive compute-bandwidth for fast processing of SLMs. The area-efficient programmable processing elements, accompanied by workload-aware dynamic power gating, near-memory computation, and adaptive dataflow scheduling, enable us to minimize energy consumption without compromising performance. We evaluate the effectiveness of 3D-PLANE through hardware simulations and system-level analytical modeling across multiple configurations, using decoder-only variants of SLMs such as Phi-3 Mini and TinyLLaMA.
AB - Small Language Model (SLM) processing is characterized by extensive matrix multiplication workloads, which are memory-bound and data-intensive in nature, leading to a significant amount of data movement overhead and consequent energy inefficiency in traditional processors such as GPUs and TPUs. To minimize these overheads, we propose 3D-PLANE, a novel memory-centric accelerator for SLM processing that incorporates programmable processing logic within the memory to alleviate the data movement overheads. Our proposed architecture is integrated on a 3D-stacked DRAM packaging where each stacked DRAM chip is enhanced with programmable computing logic to facilitate a competitive compute-bandwidth for fast processing of SLMs. The area-efficient programmable processing elements, accompanied by workload-aware dynamic power gating, near-memory computation, and adaptive dataflow scheduling, enable us to minimize energy consumption without compromising performance. We evaluate the effectiveness of 3D-PLANE through hardware simulations and system-level analytical modeling across multiple configurations, using decoder-only variants of SLMs such as Phi-3 Mini and TinyLLaMA.
UR - https://www.scopus.com/pages/publications/105017771840
U2 - 10.1145/3716368.3735285
DO - 10.1145/3716368.3735285
M3 - Conference contribution
AN - SCOPUS:105017771840
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 540
EP - 546
BT - GLSVLSI 2025 - Proceedings of the Great Lakes Symposium on VLSI 2025
Y2 - 30 June 2025 through 2 July 2025
ER -