An SoC-based CNN Accelerator for Face Recognition using HWCK data scheduling
Abstract:
In response to the promotion of the smart city and
smart home, people pay more and more attention to the quality of life, wishing the technology
could change our life. In recent years, with the use of GPU and big data, deep learning has brought
revolutionary progress to various fields, especially in the area of computer vision. But the GPU accelerating
is limited to power and cost, which makes its product extremely expansive. In this work, we will implement an
unsupervised face recognition system with the deep learning technique, accelerating the algorithm with the combination
of FPGA and ARM, to realize an access control system. We proposed a design and implementation of a separable convolution
accelerator based on HWCK data scheduling. It can be used to accelerate the deep separable convolution model by the design
of deepwise convolution, pointwise convolution, and the batch normalization. The while hardware design architecture can be
implemented on the Xilinx ZCU106 development board. The result shows that can achieve 222 FPS and 60.8GOPS by running FaceNet.
The energy consumption on the Xilinx ZCU106 board is 8.82W, it has 6.89GOPS/s/W performance. Additionally, our design can retain
the 94% accuracy on the VGGFACE2 dataset, 99.2% on the LFW dataset.
Flowchart of separable network accelerator control module:
Hardware Architecture:
Implementation Results:
Made
by ³\®ÊÞ³