Gate count

基於FPGA的可重構神經網路加速器

應用於Tiny YOLO-V3

Abstract:

In recent year, the development of deep learning has been grown. According to the equipment updated and advanced, neural network could be handle has more and more things. Even, neural network has been through our lives. From the unlocking function of cellphones to the smart customer service and chat robot, these have replaced the traditional algorithmic method. But neural network has huge amount of parameter and calculations, and its need to execute on GPU or an embedded development board with CUDA acceleration. Therefore, the method and hardware architecture that can accelerate the neural network has become main issue.

This paper proposed a reconfigurable hardware architecture which can be calculated for different input image size, different kernel size and different stride size. This architecture includes depthwise convolution, pointwise convolution, convolution, batch-normalization, activation function and max-pooling to accelerate neural network. The proposed system uses SoC design to communicate through the AXI bus protocol between Programming Logic and Processing System. The PS part deal with the data transfer and data sorting and the PL part handle all the calculations. Reconfigurable hardware architecture can select corresponding situation for calculation according to the instruction, and other neural network can use this architecture for acceleration if the function of neural network is supported by this architecture. Because the on-chip memory is less, to reduce the number of data transmission and communication, we choose to directly complete zero-padding in the hardware part. So, the on-chip memory can store more input data. Finally, we implement Tiny YOLO-V3 on the Xilinx ZCU104 development board. The input data transfer to FPGA by CMOS camera. Then the result of object detection which include input image, bounding box, classification and probability will show on the monitor through the HDMI. Our design can achieve 25.6 GOPs and the power consumption only 4.959W and the performance is 5.16 GOPs/W on ZCU104.

Hardware Architecture:

Demo:

Made by 童迺婕