Gate count

An SoC-based CNN Accelerator for Face Recognition using HWCK data scheduling

Abstract:

In response to the promotion of the smart city and smart home, people pay more and more attention to the quality of life, wishing the technology could change our life. In recent years, with the use of GPU and big data, deep learning has brought revolutionary progress to various fields, especially in the area of computer vision. But the GPU accelerating is limited to power and cost, which makes its product extremely expansive. In this work, we will implement an unsupervised face recognition system with the deep learning technique, accelerating the algorithm with the combination of FPGA and ARM, to realize an access control system. We proposed a design and implementation of a separable convolution accelerator based on HWCK data scheduling. It can be used to accelerate the deep separable convolution model by the design of deepwise convolution, pointwise convolution, and the batch normalization. The while hardware design architecture can be implemented on the Xilinx ZCU106 development board. The result shows that can achieve 222 FPS and 60.8GOPS by running FaceNet. The energy consumption on the Xilinx ZCU106 board is 8.82W, it has 6.89GOPS/s/W performance. Additionally, our design can retain the 94% accuracy on the VGGFACE2 dataset, 99.2% on the LFW dataset.

Flowchart of separable network accelerator control module:

Hardware Architecture:

Implementation Results:

Made by 許晉瑋