Multitask Learning on 3D Hand Pose Estimation
with Continuous Joints Heatmap
Abstract:
In recent years, deep learning algorithms have been accelerated with GPUs or
other volume acceleration hardware, and deep neural networks have gained significant
improvements in various tasks. From basic image pre-processing, image cutting techniques,
face recognition, voice recognition, etc., they are gradually replacing the traditional algorithms,
which shows that the rise of neural networks has led to various reforms in artificial intelligence.
In the field of 3D hand pose estimation, traditional algorithms require sensors tied
to the body or random forest algorithms to predict joints, but the drawback is that additional
equipment is required or the accuracy of random forest is not sufficient.
We propose a multi-task learning approach based on 2D/3D HeatMap as input to train a
single-level 3D hand skeleton prediction network, which only requires one backbone network
to output 2D/3D HeatMap simultaneously. We believe that there is a continuous relationship
between the same finger, so we modify it to predict 5 nodes in one HeatMap (i.e., the same finger
is predicted in the same HeatMap), and use it as a feature to predict the 3D HeatMap of left and right
hand separately, and take the maximum value of (x, y, z) coordinates of the target from the 3D HeatMap.
Since large hand datasets are mostly collected in the laboratory, we also propose a hand-segmentation technique
to improve the basic encoding and decoding architecture to segment out the hands of the dataset and combine them
with various landscape photographs to train a more robust network without restricting to the context of the dataset.
Network Architecture:
Made
by ¶À¥@¦w