Simple Baselines for Human Pose Estimation and Tracking

Overview

Simple Baselines for Human Pose Estimation and Tracking

News

Introduction

This is an official pytorch implementation of Simple Baselines for Human Pose Estimation and Tracking. This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best single model achieves 74.3 of mAP. You can reproduce our results using this repo. All models are provided for research purpose.

Main Results

Results on MPII val

Arch Head Shoulder Elbow Wrist Hip Knee Ankle Mean [email protected]
256x256_pose_resnet_50_d256d256d256 96.351 95.329 88.989 83.176 88.420 83.960 79.594 88.532 33.911
384x384_pose_resnet_50_d256d256d256 96.658 95.754 89.790 84.614 88.523 84.666 79.287 89.066 38.046
256x256_pose_resnet_101_d256d256d256 96.862 95.873 89.518 84.376 88.437 84.486 80.703 89.131 34.020
384x384_pose_resnet_101_d256d256d256 96.965 95.907 90.268 85.780 89.597 85.935 82.098 90.003 38.860
256x256_pose_resnet_152_d256d256d256 97.033 95.941 90.046 84.976 89.164 85.311 81.271 89.620 35.025
384x384_pose_resnet_152_d256d256d256 96.794 95.618 90.080 86.225 89.700 86.862 82.853 90.200 39.433

Note:

  • Flip test is used.

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_d256d256d256 0.704 0.886 0.783 0.671 0.772 0.763 0.929 0.834 0.721 0.824
384x288_pose_resnet_50_d256d256d256 0.722 0.893 0.789 0.681 0.797 0.776 0.932 0.838 0.728 0.846
256x192_pose_resnet_101_d256d256d256 0.714 0.893 0.793 0.681 0.781 0.771 0.934 0.840 0.730 0.832
384x288_pose_resnet_101_d256d256d256 0.736 0.896 0.803 0.699 0.811 0.791 0.936 0.851 0.745 0.858
256x192_pose_resnet_152_d256d256d256 0.720 0.893 0.798 0.687 0.789 0.778 0.934 0.846 0.736 0.839
384x288_pose_resnet_152_d256d256d256 0.743 0.896 0.811 0.705 0.816 0.797 0.937 0.858 0.751 0.863

Results on Caffe-style ResNet

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_caffe_d256d256d256 0.704 0.914 0.782 0.677 0.744 0.735 0.921 0.805 0.704 0.783
256x192_pose_resnet_101_caffe_d256d256d256 0.720 0.915 0.803 0.693 0.764 0.753 0.928 0.821 0.720 0.802
256x192_pose_resnet_152_caffe_d256d256d256 0.728 0.925 0.804 0.702 0.766 0.760 0.931 0.828 0.729 0.806

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.
  • Difference between PyTorch-style and Caffe-style ResNet is the position of stride=2 convolution

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

Installation

  1. Install pytorch >= v0.4.0 following official instruction.

  2. Disable cudnn for batch_norm:

    # PYTORCH=/path/to/pytorch
    # for pytorch v0.4.0
    sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    # for pytorch v0.4.1
    sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    

    Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.

  3. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  4. Install dependencies:

    pip install -r requirements.txt
    
  5. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  6. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 setup.py install --user
    

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  7. Download pytorch imagenet pretrained models from pytorch model zoo and caffe-style pretrained models from GoogleDrive.

  8. Download mpii and coco pretrained models from OneDrive or GoogleDrive. Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:

    ${POSE_ROOT}
     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet50-caffe.pth.tar
             |   |-- resnet101-5d3b4d8f.pth
             |   |-- resnet101-caffe.pth.tar
             |   |-- resnet152-b121ed2d.pth
             |   `-- resnet152-caffe.pth.tar
             |-- pose_coco
             |   |-- pose_resnet_101_256x192.pth.tar
             |   |-- pose_resnet_101_384x288.pth.tar
             |   |-- pose_resnet_152_256x192.pth.tar
             |   |-- pose_resnet_152_384x288.pth.tar
             |   |-- pose_resnet_50_256x192.pth.tar
             |   `-- pose_resnet_50_384x288.pth.tar
             `-- pose_mpii
                 |-- pose_resnet_101_256x256.pth.tar
                 |-- pose_resnet_101_384x384.pth.tar
                 |-- pose_resnet_152_256x256.pth.tar
                 |-- pose_resnet_152_384x384.pth.tar
                 |-- pose_resnet_50_256x256.pth.tar
                 `-- pose_resnet_50_384x384.pth.tar
    
    
  9. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    ├── data
    ├── experiments
    ├── lib
    ├── log
    ├── models
    ├── output
    ├── pose_estimation
    ├── README.md
    └── requirements.txt
    

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

Valid on MPII using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar

Training on MPII

python pose_estimation/train.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml

Valid on COCO val2017 using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar

Training on COCO train2017

python pose_estimation/train.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

Other Implementations

Citation

If you use our code or models in your research, please cite with:

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Ro

Meta Research 1.2k Jan 02, 2023
SPTAG: A library for fast approximate nearest neighbor search

SPTAG: A library for fast approximate nearest neighbor search SPTAG SPTAG (Space Partition Tree And Graph) is a library for large scale vector approxi

Microsoft 4.3k Jan 01, 2023
Creating a custom CNN hypertunned architeture for the Fashion MNIST dataset with Python, Keras and Tensorflow.

custom-cnn-fashion-mnist Creating a custom CNN hypertunned architeture for the Fashion MNIST dataset with Python, Keras and Tensorflow. The following

Danielle Almeida 1 Mar 05, 2022
Robot Servers and Server Manager software for robo-gym

robo-gym-server-modules Robot Servers and Server Manager software for robo-gym. For info on how to use this package please visit the robo-gym website

JR ROBOTICS 4 Aug 16, 2021
TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Training CIFAR-10 with TensorFlow2(TF2) TensorFlow2 Classification Model Zoo. I'm playing with TensorFlow2 on the CIFAR-10 dataset. Architectures LeNe

Chia-Hung Yuan 16 Sep 27, 2022
The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store dev

George Rocha 0 Feb 03, 2022
Unofficial implementation of Google "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization" in PyTorch

CutPaste CutPaste: image from paper Unofficial implementation of Google's "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization"

Lilit Yolyan 59 Nov 27, 2022
Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Realtime Unsupervised Depth Estimation from an Image This is the caffe implementation of our paper "Unsupervised CNN for single view depth estimation:

Ravi Garg 227 Nov 28, 2022
[ICCV2021] Learning to Track Objects from Unlabeled Videos

Unsupervised Single Object Tracking (USOT) 🌿 Learning to Track Objects from Unlabeled Videos Jilai Zheng, Chao Ma, Houwen Peng and Xiaokang Yang 2021

53 Dec 28, 2022
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
NeRF visualization library under construction

NeRF visualization library using PlenOctrees, under construction pip install nerfvis Docs will be at: https://nerfvis.readthedocs.org import nerfvis s

Alex Yu 196 Jan 04, 2023
This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Prompt-Based Multi-Modal Image Segmentation This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation". The sys

Timo Lüddecke 305 Dec 30, 2022
Computational Pathology Toolbox developed by TIA Centre, University of Warwick.

TIA Toolbox Computational Pathology Toolbox developed at the TIA Centre Getting Started All Users This package is for those interested in digital path

Tissue Image Analytics (TIA) Centre 156 Jan 08, 2023
A simple root calculater for python

Root A simple root calculater Usage/Examples python3 root.py 9 3 4 # Order: number - grid - number of decimals # Output: 2.08

Reza Hosseinzadeh 5 Feb 10, 2022
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ra

EnliteAI GmbH 222 Dec 24, 2022
PG2Net: Personalized and Group PreferenceGuided Network for Next Place Prediction

PG2Net PG2Net:Personalized and Group Preference Guided Network for Next Place Prediction Datasets Experiment results on two Foursquare check-in datase

Urban Mobility 5 Dec 20, 2022
A python-image-classification web application project, written in Python and served through the Flask Microframework

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and

Gerald Maduabuchi 19 Dec 12, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023
Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Chex Chex is a library of utilities for helping to write reliable JAX code. This includes utils to help: Instrument your code (e.g. assertions) Debug

DeepMind 506 Jan 08, 2023
Look Who’s Talking: Active Speaker Detection in the Wild

Look Who's Talking: Active Speaker Detection in the Wild Dependencies pip install -r requirements.txt In addition to the Python dependencies, ffmpeg

Clova AI Research 60 Dec 08, 2022