Deep High-Resolution Representation Learning for Human Pose Estimation

Overview

Deep High-Resolution Representation Learning for Human Pose Estimation (accepted to CVPR2019)

News

Introduction

This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

Illustrating the architecture of the proposed HRNet

Main Results

Results on MPII val

Arch Head Shoulder Elbow Wrist Hip Knee Ankle Mean [email protected]
pose_resnet_50 96.4 95.3 89.0 83.2 88.4 84.0 79.6 88.5 34.0
pose_resnet_101 96.9 95.9 89.5 84.4 88.4 84.5 80.7 89.1 34.0
pose_resnet_152 97.0 95.9 90.0 85.0 89.2 85.3 81.3 89.6 35.0
pose_hrnet_w32 97.1 95.9 90.3 86.4 89.1 87.1 83.3 90.3 37.7

Note:

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch Input size #Params GFLOPs AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_resnet_50 256x192 34.0M 8.9 0.704 0.886 0.783 0.671 0.772 0.763 0.929 0.834 0.721 0.824
pose_resnet_50 384x288 34.0M 20.0 0.722 0.893 0.789 0.681 0.797 0.776 0.932 0.838 0.728 0.846
pose_resnet_101 256x192 53.0M 12.4 0.714 0.893 0.793 0.681 0.781 0.771 0.934 0.840 0.730 0.832
pose_resnet_101 384x288 53.0M 27.9 0.736 0.896 0.803 0.699 0.811 0.791 0.936 0.851 0.745 0.858
pose_resnet_152 256x192 68.6M 15.7 0.720 0.893 0.798 0.687 0.789 0.778 0.934 0.846 0.736 0.839
pose_resnet_152 384x288 68.6M 35.3 0.743 0.896 0.811 0.705 0.816 0.797 0.937 0.858 0.751 0.863
pose_hrnet_w32 256x192 28.5M 7.1 0.744 0.905 0.819 0.708 0.810 0.798 0.942 0.865 0.757 0.858
pose_hrnet_w32 384x288 28.5M 16.0 0.758 0.906 0.825 0.720 0.827 0.809 0.943 0.869 0.767 0.871
pose_hrnet_w48 256x192 63.6M 14.6 0.751 0.906 0.822 0.715 0.818 0.804 0.943 0.867 0.762 0.864
pose_hrnet_w48 384x288 63.6M 32.9 0.763 0.908 0.829 0.723 0.834 0.812 0.942 0.871 0.767 0.876

Note:

Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset

Arch Input size #Params GFLOPs AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_resnet_152 384x288 68.6M 35.3 0.737 0.919 0.828 0.713 0.800 0.790 0.952 0.856 0.748 0.849
pose_hrnet_w48 384x288 63.6M 32.9 0.755 0.925 0.833 0.719 0.815 0.805 0.957 0.874 0.763 0.863
pose_hrnet_w48* 384x288 63.6M 32.9 0.770 0.927 0.845 0.734 0.831 0.820 0.960 0.886 0.778 0.877

Note:

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

Installation

  1. Install pytorch >= v1.0.0 following official instruction. Note that if you use pytorch's version < v1.0.0, you should following the instruction at https://github.com/Microsoft/human-pose-estimation.pytorch to disable cudnn's implementations of BatchNorm layer. We encourage you to use higher pytorch's version(>=v1.0.0)

  2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  5. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 setup.py install --user
    

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  6. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    ├── data
    ├── experiments
    ├── lib
    ├── log
    ├── models
    ├── output
    ├── tools 
    ├── README.md
    └── requirements.txt
    
  7. Download pretrained models from our model zoo(GoogleDrive or OneDrive)

    ${POSE_ROOT}
     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- hrnet_w32-36af842e.pth
             |   |-- hrnet_w48-8ef0771d.pth
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet101-5d3b4d8f.pth
             |   `-- resnet152-b121ed2d.pth
             |-- pose_coco
             |   |-- pose_hrnet_w32_256x192.pth
             |   |-- pose_hrnet_w32_384x288.pth
             |   |-- pose_hrnet_w48_256x192.pth
             |   |-- pose_hrnet_w48_384x288.pth
             |   |-- pose_resnet_101_256x192.pth
             |   |-- pose_resnet_101_384x288.pth
             |   |-- pose_resnet_152_256x192.pth
             |   |-- pose_resnet_152_384x288.pth
             |   |-- pose_resnet_50_256x192.pth
             |   `-- pose_resnet_50_384x288.pth
             `-- pose_mpii
                 |-- pose_hrnet_w32_256x256.pth
                 |-- pose_hrnet_w48_256x256.pth
                 |-- pose_resnet_101_256x256.pth
                 |-- pose_resnet_152_256x256.pth
                 `-- pose_resnet_50_256x256.pth
    
    

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 and test-dev2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        |   |-- COCO_test-dev2017_detections_AP_H_609_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

Training and Testing

Testing on MPII dataset using model zoo's models(GoogleDrive or OneDrive)

python tools/test.py \
    --cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pytorch/pose_mpii/pose_hrnet_w32_256x256.pth

Training on MPII dataset

python tools/train.py \
    --cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml

Testing on COCO val2017 dataset using model zoo's models(GoogleDrive or OneDrive)

python tools/test.py \
    --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pytorch/pose_coco/pose_hrnet_w32_256x192.pth \
    TEST.USE_GT_BBOX False

Training on COCO train2017 dataset

python tools/train.py \
    --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \

Other applications

Many other dense prediction tasks, such as segmentation, face alignment and object detection, etc. have been benefited by HRNet. More information can be found at Deep High-Resolution Representation Learning.

Citation

If you use our code or models in your research, please cite with:

@inproceedings{sun2019deep,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={CVPR},
  year={2019}
}

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}
Owner
HRNet
Code for pose estimation is available at https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
HRNet
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
AIR^2 for Interaction Prediction

This is the repository for AIR^2 for Interaction Prediction. Explanation of the solution: Video: link License AIR is released under the Apache 2.0 lic

21 Sep 27, 2022
Official code repository for the EMNLP 2021 paper

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization PyTorch code for the EMNLP 2021 paper "Integrating Visuospatia

Adyasha Maharana 23 Dec 19, 2022
An example of time series augmentation methods with Keras

Time Series Augmentation This is a collection of time series data augmentation methods and an example use using Keras. News 2020/04/16: Repository Cre

九州大学 ヒューマンインタフェース研究室 229 Jan 02, 2023
The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

Yufei Wang 56 Dec 28, 2022
Microscopy Image Cytometry Toolkit

Cytokit Cytokit is a collection of tools for quantifying and analyzing properties of individual cells in large fluorescent microscopy datasets with a

Hammer Lab 106 Jan 06, 2023
RGB-stacking 🛑 🟩 🔷 for robotic manipulation

RGB-stacking 🛑 🟩 🔷 for robotic manipulation BLOG | PAPER | VIDEO Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes, Alex X. Lee*,

DeepMind 95 Dec 23, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 03, 2022
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Debiasing Item-to-Item Recommendations With Small Annotated Datasets This is the code for our RecSys '20 paper. Other materials can be found here: Ful

Microsoft 34 Aug 10, 2022
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Website | ICCV paper | arXiv | Twitter This repository contains the official i

Ajay Jain 73 Dec 27, 2022
[ICLR'21] Counterfactual Generative Networks

This repository contains the code for the ICLR 2021 paper "Counterfactual Generative Networks" by Axel Sauer and Andreas Geiger. If you want to take the CGN for a spin and generate counterfactual ima

88 Jan 02, 2023
MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

Burak Bagatarhan 12 Mar 29, 2022
StyleGAN2-ada for practice

This version of the newest PyTorch-based StyleGAN2-ada is intended mostly for fellow artists, who rarely look at scientific metrics, but rather need a working creative tool. Tested on Python 3.7 + Py

vadim epstein 170 Nov 16, 2022
High accurate tool for automatic faces detection with landmarks

faces_detanator High accurate tool for automatic faces detection with landmarks. The library is based on public detectors with high accuracy (TinaFace

Ihar 7 May 10, 2022
A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

LPM_Python A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching. The code is established ac

AoxiangFan 11 Nov 07, 2022
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

CvT2DistilGPT2 Improving Chest X-Ray Report Generation by Leveraging Warm-Starting This repository houses the implementation of CvT2DistilGPT2 from [1

The Australian e-Health Research Centre 21 Dec 28, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

Voxel-based Network for Shape Completion by Leveraging Edge Generation This is the PyTorch implementation for the paper "Voxel-based Network for Shape

10 Dec 04, 2022
Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT CheXbert is an accurate, automated dee

Stanford Machine Learning Group 51 Dec 08, 2022
AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

AI Virtual Calculator: This is a simple virtual calculator that works with gestures using OpenCV. We will use our hand in the air to click on the calc

Md. Rakibul Islam 1 Jan 13, 2022