[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Last update: Dec 15, 2022

Related tags

Overview

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Getting Started

Our codes are implemented and tested with python 3.6 and pytorch 1.5.

Install Pytorch following the official guide on Pytorch website.

And install the requirements using virtualenv or conda:

pip install -r requirements.txt

Data Preparation

Refer to data.md for instructions.

Training

Stage 1 training

Generally, you can use the distributed launch script of pytorch to start training.

For example, for a training on 2 nodes, 4 gpus each (2x4=8 gpus total): On node 0, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

On node 1, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=1 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

Otherwise, if you are using task scheduling system such as Slurm to submit your training tasks, you can refer to this script to start your training:

# training on 2 nodes, 4 gpus each (2x4=8 gpus total)
sh scripts/run.sh 2 4 configs/config_stage1.yaml

The checkpoint of training will be saved in [results/] by default. You are free to modify it in the config file.

Stage 2 training

Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.

# On Node 0.
python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage2.yaml --pretrained <PATH_TO_CHECKPOINT_FILE>

Similar on node 1.

Evaluation

To evaluate model on 3dpw test set:

python eval.py --cfg <PATH_TO_EXPERIMENT>/config.yaml --checkpoint <PATH_TO_EXPERIMENT>/model_best.pth.tar --eval_set 3dpw

Evaluation metric is Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) in mm.

Models	PA-MPJPE ↓	MPJPE ↓	PVE ↓	ACCEL ↓
HMR (w/o 3DPW)	81.3	130.0	-	37.4
SPIN (w/o 3DPW)	59.2	96.9	116.4	29.8
MEVA (w/ 3DPW)	54.7	86.9	-	11.6
VIBE (w/o 3DPW)	56.5	93.5	113.4	27.1
VIBE (w/ 3DPW)	51.9	82.9	99.1	23.4
ours (w/o 3DPW)	50.7	88.8	104.5	18.0
ours (w/ 3DPW)	45.7	79.1	92.6	17.6

Citation

@inproceedings{wan2021,
  title={Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation},
  author={Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Related tags

Overview

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Getting Started

Data Preparation

Training

Stage 1 training

Stage 2 training

Evaluation

Citation

Owner

ZiNiU WaN

SemiNAS: Semi-Supervised Neural Architecture Search

This repository is for the preprint "A generative nonparametric Bayesian model for whole genomes"

The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

An unofficial styleguide and best practices summary for PyTorch

以孤立语假设和宽度优先搜索为基础，构建了一种多通道堆叠注意力Transformer结构的斗地主ai

An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

Image-to-image regression with uncertainty quantification in PyTorch

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Implementation of "Fast and Flexible Temporal Point Processes with Triangular Maps" (Oral @ NeurIPS 2020)

TensorRT examples (Jetson, Python/C++)(object detection)

NeurIPS 2021, self-supervised 6D pose on category level

maximal update parametrization (µP)

Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment"

Repo for the Video Person Clustering dataset, and code for the associated paper

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"