TY - JOUR
T1 - SoC Reconfigurable Architecture for Implementing Software Trained Recurrent Neural Networks on FPGA
AU - Wasef, Michael
AU - Rafla, Nader
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - Recurrent neural networks (RNNs) are used extensively in time series data applications. Modern RNNs consist of three layer types: recurrent, Fully-Connected (FC), and attention. This paper introduces the design, acceleration, implementation, and verification of a complete reconfigurable RNN using a system-on-chip approach on an FPGA. This design is suitable for small-scale projects and Internet of Things (IoT) end devices as the design utilizes a small number of hardware resources compared to previous configurable architectures. The proposed reconfigurable architecture consists of three layers. The first layer is a Python software layer that contains a function serving as the architecture's user interface. The output of the python function is three binary files containing the RNN architecture description and trained parameters. The embedded software layer implemented on an on-chip ARM microcontroller is the second layer of that architecture. This layer reads the first layer output files and configures the hardware layer with the required configuration and parameters to execute each layer in the RNN. The hardware layer consists of two Intellectual Properties (IPs) with different configurations. The Recurrent Layer Hardware IP implements the recurrent layer using either Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) as basic building blocks, while the ATTENTION/FC IP implements the attention layer and the FC layer. The proposed design allows the implementation of a recurrent layer on an FPGA with variable input and a hidden vector length of up to 100 elements for each vector. It also supports implementing an attention layer with up to 64 input vectors and a maximum vector length of 100 items. The FC layers can be configured to support a maximum value of 256 for the input vector length and the number of neurons in each layer. The hardware design of the recurrent layer achieves a maximum performance of 1.958 and 2.479 GOPS for the GRU and LSTM models, respectively. The maximum performance of the attention and FC layers is 2.641 GOPS and 634.3 MOPS, respectively. The hardware design works at a maximum frequency of 100 MHz.
AB - Recurrent neural networks (RNNs) are used extensively in time series data applications. Modern RNNs consist of three layer types: recurrent, Fully-Connected (FC), and attention. This paper introduces the design, acceleration, implementation, and verification of a complete reconfigurable RNN using a system-on-chip approach on an FPGA. This design is suitable for small-scale projects and Internet of Things (IoT) end devices as the design utilizes a small number of hardware resources compared to previous configurable architectures. The proposed reconfigurable architecture consists of three layers. The first layer is a Python software layer that contains a function serving as the architecture's user interface. The output of the python function is three binary files containing the RNN architecture description and trained parameters. The embedded software layer implemented on an on-chip ARM microcontroller is the second layer of that architecture. This layer reads the first layer output files and configures the hardware layer with the required configuration and parameters to execute each layer in the RNN. The hardware layer consists of two Intellectual Properties (IPs) with different configurations. The Recurrent Layer Hardware IP implements the recurrent layer using either Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) as basic building blocks, while the ATTENTION/FC IP implements the attention layer and the FC layer. The proposed design allows the implementation of a recurrent layer on an FPGA with variable input and a hidden vector length of up to 100 elements for each vector. It also supports implementing an attention layer with up to 64 input vectors and a maximum vector length of 100 items. The FC layers can be configured to support a maximum value of 256 for the input vector length and the number of neurons in each layer. The hardware design of the recurrent layer achieves a maximum performance of 1.958 and 2.479 GOPS for the GRU and LSTM models, respectively. The maximum performance of the attention and FC layers is 2.641 GOPS and 634.3 MOPS, respectively. The hardware design works at a maximum frequency of 100 MHz.
KW - attention layer
KW - FPGA
KW - GRU
KW - LSTM
KW - RNN
KW - system-on-chip
UR - http://www.scopus.com/inward/record.url?scp=85153359096&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2023.3262479
DO - 10.1109/TCSI.2023.3262479
M3 - Article
AN - SCOPUS:85153359096
SN - 1549-8328
VL - 70
SP - 2497
EP - 2510
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 6
ER -