Abstract
Processing-in-Memory (PIM) has shown great potential for a wide range of data-driven applications, especially Deep Learning and AI. However, it is a challenge to facilitate the computational sophistication of a standard processor (i.e. CPU or GPU) within the limited scope of a memory chip without contributing significant circuit overheads. To address the challenge, we propose a programmable LUT-based area-efficient PIM architecture capable of performing various low-precision floating point (FP) computations using a novel LUT-oriented operand-decomposition technique. We incorporate such compact computational units within the memory banks in a large count to achieve impressive parallel processing capabilities, up to 4x higher than state-of-the-art FP-capable PIM. Additionally, we adopt a highly-optimized low-precision FP format that maximizes computational performance at a minimal compromise of computational precision, especially for Deep Learning Applications. The overall result is a 17% higher throughput and an impressive 8-20x higher compute Bandwidth/bank compared to the state-of-the-art of in-memory acceleration.
Original language | American English |
---|---|
Title of host publication | GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023 |
DOIs | |
State | Published - 2023 |
Externally published | Yes |
Keywords
- DRAM
- Deep Learning
- floating point
- processing in memory
EGS Disciplines
- Electrical and Computer Engineering