TY - GEN
T1 - FlutPIM
T2 - 33rd Great Lakes Symposium on VLSI, GLSVLSI 2023
AU - Sutradhar, Purab Ranjan
AU - Bavikadi, Sathwika
AU - Indovina, Mark
AU - Pudukotai Dinakarrao, Sai Manoj
AU - Ganguly, Amlan
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/5
Y1 - 2023/6/5
N2 - Processing-in-Memory (PIM) has shown great potential for a wide range of data-driven applications, especially Deep Learning and AI. However, it is a challenge to facilitate the computational sophistication of a standard processor (i.e. CPU or GPU) within the limited scope of a memory chip without contributing significant circuit overheads. To address the challenge, we propose a programmable LUT-based area-efficient PIM architecture capable of performing various low-precision floating point (FP) computations using a novel LUT-oriented operand-decomposition technique. We incorporate such compact computational units within the memory banks in a large count to achieve impressive parallel processing capabilities, up to 4x higher than state-of-the-art FP-capable PIM. Additionally, we adopt a highly-optimized low-precision FP format that maximizes computational performance at a minimal compromise of computational precision, especially for Deep Learning Applications. The overall result is a 17% higher throughput and an impressive 8-20x higher compute Bandwidth/bank compared to the state-of-the-art of in-memory acceleration.
AB - Processing-in-Memory (PIM) has shown great potential for a wide range of data-driven applications, especially Deep Learning and AI. However, it is a challenge to facilitate the computational sophistication of a standard processor (i.e. CPU or GPU) within the limited scope of a memory chip without contributing significant circuit overheads. To address the challenge, we propose a programmable LUT-based area-efficient PIM architecture capable of performing various low-precision floating point (FP) computations using a novel LUT-oriented operand-decomposition technique. We incorporate such compact computational units within the memory banks in a large count to achieve impressive parallel processing capabilities, up to 4x higher than state-of-the-art FP-capable PIM. Additionally, we adopt a highly-optimized low-precision FP format that maximizes computational performance at a minimal compromise of computational precision, especially for Deep Learning Applications. The overall result is a 17% higher throughput and an impressive 8-20x higher compute Bandwidth/bank compared to the state-of-the-art of in-memory acceleration.
KW - deep learning
KW - dram
KW - floating point
KW - processing in memory
UR - http://www.scopus.com/inward/record.url?scp=85163172368&partnerID=8YFLogxK
U2 - 10.1145/3583781.3590313
DO - 10.1145/3583781.3590313
M3 - Conference contribution
AN - SCOPUS:85163172368
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 207
EP - 211
BT - GLSVLSI 2023 - Proceedings of the Great Lakes Symposium on VLSI 2023
Y2 - 5 June 2023 through 7 June 2023
ER -