Hanbin Dai - Academia.edu (original) (raw)
Papers by Hanbin Dai
2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, 2019
Conventional pose distillation method utilizes the teacher’s output and the label to co-supervise... more Conventional pose distillation method utilizes the teacher’s output and the label to co-supervise the student model. The architecture of the student model is fixed in the training, making it impossible to obtain a compact and powerful model. In this paper, we introduce the theory of evolution to the pose distillation and propose the self-evolutionary pose distillation (SEPD) method which not only improves the performance of the student model but also reduces the size of the student model. Specifically, the SEPD considers the original model as the teacher model and obtains the student model by using the model reduction strategy to shrink the teacher model. The student model is supervised by the teacher’s output and the label jointly. After the optimization, the student is treated as the teacher model and the student model is obtained by the model reduction strategy. The student model is optimized again. The compact and strong model is obtained by repeating the procedure above. Experiments on the challenging benchmark validate the effectiveness of our SEPD method.
Cornell University - arXiv, Mar 13, 2020
In this paper, we focus on the coordinate representation in human pose estimation. While being th... more In this paper, we focus on the coordinate representation in human pose estimation. While being the standard choice, heatmap based representation has not been systematically investigated. We found that the process of coordinate decoding (i.e. transforming the predicted heatmaps to the coordinates) is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before. In light of the discovered importance, we further probe the design limitations of the standard coordinate decoding method and propose a principled distribution-aware decoding method. Meanwhile, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating accurate heatmap distributions for unbiased model training. Taking them together, we formulate a novel Distribution-Aware coordinate Representation for Keypoint (DARK) method. Serving as a model-agnostic plug-in, DARK significantly improves the performance of a variety of state-of-the-art human pose estimation models. Extensive experiments show that DARK yields the best results on COCO keypoint detection challenge, validating the usefulness and effectiveness of our novel coordinate representation idea. The project page containing more details is at https://ilovepose. github.io/coco/
ACM Transactions on Multimedia Computing, Communications, and Applications, 2022
The performance of human pose estimation depends on the spatial accuracy of keypoint localization... more The performance of human pose estimation depends on the spatial accuracy of keypoint localization. Most existing methods pursue the spatial accuracy through learning the high-resolution (HR) representation from input images. By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation. In this article, we propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. Whereas the LR design largely shrinks the model complexity, how to effectively train the network with respect to the spatial accuracy is a concomitant challenge. We study the training behavior of FasterPose and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence and promoting the accuracy. The RCE loss generalizes the ordinary cross-entropy loss from the binary sup...
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
While being the de facto standard coordinate representation for human pose estimation, heatmap ha... more While being the de facto standard coordinate representation for human pose estimation, heatmap has not been investigated in-depth. This work fills this gap. For the first time, we find that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for the performance. We further probe the design limitations of the standard coordinate decoding method, and propose a more principled distributionaware decoding method. Also, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating unbiased/accurate heatmaps. Taking the two together, we formulate a novel Distribution-Aware coordinate Representation of Keypoints (DARK) method. Serving as a model-agnostic plug-in, DARK brings about significant performance boost to existing human pose estimation models. Extensive experiments show that DARK yields the best results on two common benchmarks, MPII and COCO. Besides, DARK achieves the 2 nd place entry in the ICCV 2019 COCO Keypoints Challenge. The code is available online [36].
2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, 2019
Conventional pose distillation method utilizes the teacher’s output and the label to co-supervise... more Conventional pose distillation method utilizes the teacher’s output and the label to co-supervise the student model. The architecture of the student model is fixed in the training, making it impossible to obtain a compact and powerful model. In this paper, we introduce the theory of evolution to the pose distillation and propose the self-evolutionary pose distillation (SEPD) method which not only improves the performance of the student model but also reduces the size of the student model. Specifically, the SEPD considers the original model as the teacher model and obtains the student model by using the model reduction strategy to shrink the teacher model. The student model is supervised by the teacher’s output and the label jointly. After the optimization, the student is treated as the teacher model and the student model is obtained by the model reduction strategy. The student model is optimized again. The compact and strong model is obtained by repeating the procedure above. Experiments on the challenging benchmark validate the effectiveness of our SEPD method.
Cornell University - arXiv, Mar 13, 2020
In this paper, we focus on the coordinate representation in human pose estimation. While being th... more In this paper, we focus on the coordinate representation in human pose estimation. While being the standard choice, heatmap based representation has not been systematically investigated. We found that the process of coordinate decoding (i.e. transforming the predicted heatmaps to the coordinates) is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before. In light of the discovered importance, we further probe the design limitations of the standard coordinate decoding method and propose a principled distribution-aware decoding method. Meanwhile, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating accurate heatmap distributions for unbiased model training. Taking them together, we formulate a novel Distribution-Aware coordinate Representation for Keypoint (DARK) method. Serving as a model-agnostic plug-in, DARK significantly improves the performance of a variety of state-of-the-art human pose estimation models. Extensive experiments show that DARK yields the best results on COCO keypoint detection challenge, validating the usefulness and effectiveness of our novel coordinate representation idea. The project page containing more details is at https://ilovepose. github.io/coco/
ACM Transactions on Multimedia Computing, Communications, and Applications, 2022
The performance of human pose estimation depends on the spatial accuracy of keypoint localization... more The performance of human pose estimation depends on the spatial accuracy of keypoint localization. Most existing methods pursue the spatial accuracy through learning the high-resolution (HR) representation from input images. By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation. In this article, we propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. Whereas the LR design largely shrinks the model complexity, how to effectively train the network with respect to the spatial accuracy is a concomitant challenge. We study the training behavior of FasterPose and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence and promoting the accuracy. The RCE loss generalizes the ordinary cross-entropy loss from the binary sup...
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
While being the de facto standard coordinate representation for human pose estimation, heatmap ha... more While being the de facto standard coordinate representation for human pose estimation, heatmap has not been investigated in-depth. This work fills this gap. For the first time, we find that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for the performance. We further probe the design limitations of the standard coordinate decoding method, and propose a more principled distributionaware decoding method. Also, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating unbiased/accurate heatmaps. Taking the two together, we formulate a novel Distribution-Aware coordinate Representation of Keypoints (DARK) method. Serving as a model-agnostic plug-in, DARK brings about significant performance boost to existing human pose estimation models. Extensive experiments show that DARK yields the best results on two common benchmarks, MPII and COCO. Besides, DARK achieves the 2 nd place entry in the ICCV 2019 COCO Keypoints Challenge. The code is available online [36].