TY - JOUR
T1 - A fully pipelined FPGA accelerator for scale invariant feature transform keypoint descriptor matching
AU - Daoud, Luka
AU - Latif, Muhammad Kamran
AU - Jacinto, H. S.
AU - Rafla, Nader
N1 - Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2020/2
Y1 - 2020/2
N2 - The scale invariant feature transform (SIFT) algorithm is considered a classical feature extraction algorithm within the field of computer vision. SIFT keypoint descriptor matching is a computationally intensive process due to the amount of data consumed. In this work, we designed a novel fully pipelined hardware accelerator architecture for SIFT keypoint descriptor matching. The accelerator core was implemented and tested on a field programmable gate array (FPGA). The proposed hardware architecture is able to properly handle the memory bandwidth necessary for a fully-pipelined implementation and hits the roofline performance model, achieving the potential maximum throughput. The fully pipelined matching architecture was designed based on the consine angle distance method. Our architecture was optimized for 16-bit fixed-point operations and implemented on hardware using a Xilinx Zynq-based FPGA development board. Our proposed architecture shows a noticeable reduction of area resources compared with its counterparts in literature, while maintaining high throughput by alleviating memory bandwidth restrictions. The results show a reduction in consumed device resources of up to 91% in LUTs and 79% of BRAMs. Our hardware implementation is 15.7 × faster than the comparable software approach.
AB - The scale invariant feature transform (SIFT) algorithm is considered a classical feature extraction algorithm within the field of computer vision. SIFT keypoint descriptor matching is a computationally intensive process due to the amount of data consumed. In this work, we designed a novel fully pipelined hardware accelerator architecture for SIFT keypoint descriptor matching. The accelerator core was implemented and tested on a field programmable gate array (FPGA). The proposed hardware architecture is able to properly handle the memory bandwidth necessary for a fully-pipelined implementation and hits the roofline performance model, achieving the potential maximum throughput. The fully pipelined matching architecture was designed based on the consine angle distance method. Our architecture was optimized for 16-bit fixed-point operations and implemented on hardware using a Xilinx Zynq-based FPGA development board. Our proposed architecture shows a noticeable reduction of area resources compared with its counterparts in literature, while maintaining high throughput by alleviating memory bandwidth restrictions. The results show a reduction in consumed device resources of up to 91% in LUTs and 79% of BRAMs. Our hardware implementation is 15.7 × faster than the comparable software approach.
KW - Acceleration
KW - FPGA
KW - HLS
KW - High level synthesis
KW - Matching algorithm
KW - Pipeline
KW - SIFT
KW - Scale invariant feature transform
UR - https://www.scopus.com/pages/publications/85074680626
U2 - 10.1016/j.micpro.2019.102919
DO - 10.1016/j.micpro.2019.102919
M3 - Article
AN - SCOPUS:85074680626
SN - 0141-9331
VL - 72
JO - Microprocessors and Microsystems
JF - Microprocessors and Microsystems
M1 - 102919
ER -