°ò©óFPGAªº¥i«ºc¯«¸gºô¸ô¥[³t¾¹
À³¥Î©óTiny YOLO-V3
Abstract:
In recent year, the development of deep learning has been grown.
According to the equipment updated and advanced, neural network could be handle has more and more things. Even,
neural network has been through our lives. From the unlocking function of cellphones to the smart customer service
and chat robot, these have replaced the traditional algorithmic method. But neural network has huge amount of parameter
and calculations, and its need to execute on GPU or an embedded development board with CUDA acceleration. Therefore, the
method and hardware architecture that can accelerate the neural network has become main issue.
This paper proposed a reconfigurable hardware architecture which can be calculated for different input image size,
different kernel size and different stride size. This architecture includes depthwise convolution, pointwise convolution,
convolution, batch-normalization, activation function and max-pooling to accelerate neural network. The proposed system uses
SoC design to communicate through the AXI bus protocol between Programming Logic and Processing System. The PS part deal with
the data transfer and data sorting and the PL part handle all the calculations. Reconfigurable hardware architecture can select
corresponding situation for calculation according to the instruction, and other neural network can use this architecture for
acceleration if the function of neural network is supported by this architecture. Because the on-chip memory is less, to reduce
the number of data transmission and communication, we choose to directly complete zero-padding in the hardware part. So, the on-chip
memory can store more input data. Finally, we implement Tiny YOLO-V3 on the Xilinx ZCU104 development board. The input data transfer to
FPGA by CMOS camera. Then the result of object detection which include input image, bounding box, classification and probability will show
on the monitor through the HDMI. Our design can achieve 25.6 GOPs and the power consumption only 4.959W and the performance is 5.16 GOPs/W on ZCU104.
Hardware Architecture:
Demo:
Made
by µ£°iÔÐ