Gate count

Multitask Learning on 3D Hand Pose Estimation

with Continuous Joints Heatmap

Abstract:

In recent years, deep learning algorithms have been accelerated with GPUs or other volume acceleration hardware, and deep neural networks have gained significant improvements in various tasks. From basic image pre-processing, image cutting techniques, face recognition, voice recognition, etc., they are gradually replacing the traditional algorithms, which shows that the rise of neural networks has led to various reforms in artificial intelligence.

In the field of 3D hand pose estimation, traditional algorithms require sensors tied to the body or random forest algorithms to predict joints, but the drawback is that additional equipment is required or the accuracy of random forest is not sufficient.

We propose a multi-task learning approach based on 2D/3D HeatMap as input to train a single-level 3D hand skeleton prediction network, which only requires one backbone network to output 2D/3D HeatMap simultaneously. We believe that there is a continuous relationship between the same finger, so we modify it to predict 5 nodes in one HeatMap (i.e., the same finger is predicted in the same HeatMap), and use it as a feature to predict the 3D HeatMap of left and right hand separately, and take the maximum value of (x, y, z) coordinates of the target from the 3D HeatMap. Since large hand datasets are mostly collected in the laboratory, we also propose a hand-segmentation technique to improve the basic encoding and decoding architecture to segment out the hands of the dataset and combine them with various landscape photographs to train a more robust network without restricting to the context of the dataset.

Network Architecture:

Made by 黃世安