¥H²`«×¯«¸gºô¸ô¹ê²{¤â¶Õ¿ëÃѤΨäµwÅé¬[ºc³]p
Abstract:
The research in deep learning has become extensively deep recently,
such as image pre-processing, image segmentation, object recognition, semantic analysis, etc. Deep learning has
gradually replaced the traditional algorithm. The traditional hand gesture recognition algorithm needs to depend
on the depth information to recognize hand gesture correctly in complex backgrounds and yet the recognition rate
is not good. The depth information needs to be obtained using a depth camera or a dual CMOS camera, which is not
convenient for the common user due to its high price. Therefore, this paper proposes a method using deep neural network
for hand gesture recognition and an implementation of its hardware architecture design. It only needs a single CMOS camera
which can recognize hand gestures in complex background. The research can be divided into two parts; one is the design of
neural network model and second is the implementation of the hardware architecture. In the neural network design part, a
depthwise separable convolutional is used to establish a neural network model and the model can be divided into segmentation
and classification. By training the segmentation model as the attention model, the recognition rate of the classification model
is improved. In the inference stage, hand gesture recognition can be performed by only using the classification model and a part
of segmentation model, which avoids the use of the whole model to reduce the amount of weights and calculation.
In the hardware implementation part, this work designs depthwise convolution, pointwise convolution, batch
normalization and max-pooling to accelerate the depthwise separable convolution. Design uses the on-chip memory to
temporarily store the feature data and then transfers the data to the off-chip memory through DMA to reduce the on-chip
memory usage. 16 bits fixed-point data is used for weights and feature data so that the memory size of weights and feature data
can be reduced along with different calculations. This work also implements a ping-pong memory to maximize on-chip memory usage to
reduce the access time to off-chip memory. The whole system is implemented on the Xilinx ZCU106 development board. The image is sent as
input to the FPGA by the CMOS camera. After the gesture is recognized, the original image and the recognition result are outputted through
the HDMI and displayed on the monitor. The implemented system can achieve the frame rate of 52.6 FPS and 65.6 GOPS.
Hardware Architecture:
Made
by ¦ó¤¸ºÕ