The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

Overview

SpeechDrivesTemplates

The official repo for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

[arxiv / video]

Our paper and this repo focus on upper-body pose generation from audio. To synthesize images from poses, please refer to this Pose2Img repo.

  • Code
  • Model
  • Data preparation

Package Hierarchy

|-- config
|     |-- default.py
|     |-- voice2pose_s2g_speech2gesture.yaml        # baseline: speech2gesture
|     |-- voice2pose_sdt_vae_speech2gesture.yaml    # ours (VAE)
|     |-- pose2pose_speech2gesture.yaml             # gesture reconstruction  
|     `-- voice2pose_sdt_bp_speech2gesture.yaml     # ours (Backprop)
|
|-- core
|     |-- datasets
|     |-- netowrks
|     |-- pipelines
|     \-- utils
|
|-- dataset
|     \-- speech2gesture  # create a soft link here
|
|-- output
|     \-- <date-config-tag>  # A directory for each experiment
|
`-- main.py

Setup the Dataset

Datasets shuold be placed in the dataset directory. Just create a soft link like this:

ln -s <path-to-SPEECH2GESTURE-dataset> ./dataset/speech2gesture

For your own dataset, you need to implement a subclass of torch.utils.data.Dataset in core/datasets/custom_dataset.py.

Train

Train a Model from Scratch

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag DEV \
    SYS.NUM_WORKERS 32
  • --tag set the name of the experiment which wil be displayed in the outputfile.
  • You can overwrite the any parameters defined in voice2pose_default.py by simply adding it at the end of the command. The example above set SYS.NUM_WORKERS to 32 temporarily.

Resume Training from an Interrupted Experiment

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --resume_from <checkpoint-to-continue-from>
  • This command will load the state_dict from the checkpoint for both the model and the optimizer, and write results to the original directory that the checkpoint lies in.

Training from a pretrained model

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --pretrain_from <checkpoint-to-continue-from> \
    --tag DEV
  • This command will only load the state_dict for the model, and write results to a new base directory.

Test

To test the model, run this command:

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag DEV \
    --test-only \
    --checkpoint <path-to-checkpoint>

Demo

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag <DEV> \
    --demo_input <audio.wav> \
    --checkpoint <path-to-checkpoint> \
    DATASET.SPEAKER oliver \
    SYS.VIDEO_FORMAT "['mp4']"

Important Details

Dataset caching

We turn on dataset caching (DATASET.CACHING) by default to speed up training.

If you encounter errors in the dataloader like RuntimeError: received 0 items of ancdata, please increase ulimit by running the command ulimit -n 262144. (refer to this issue)

DataParallel and DistributedDataParallel

We use single GPU (warpped by DataParallel) by default since it is fast enough with dataset caching. For multi-GPU training, we recommand using DistributedDataParallel (DDP) because it provide SyncBN across GPU cards. To enable DDP, set SYS.DISTRIBUTED to True and set SYS.WORLD_SIZE according to the number of GPUs.

When using DDP, assure that the batch_size can be divided exactly by SYS.WORLD_SIZE.

Misc

  • To run any module other than the main files in the root directory, for example the core\datasets\speech2gesture.py file, you should run python -m core.datasets.speech2gesture rather than python core\datasets\speech2gesture.py. This is an interesting problem of Python's relative importing which deserves in-depth thinking.
  • We save a checkpoint and conduct validation after each epoch. You can change the interval in the config file.
  • We generate and save 2 videos in each epoch when training. During validation, we sample 8 videos for each epoch. These videos are saved in tensorborad (without sound) and mp4 (with sound). You can change the SYS.VIDEO_FORMAT parameter to select one or two of them.
  • We usually sett NUM_WORKERS to 32 for best performance. If you encounter any error about memory, try lower NUM_WORKERS.
@inproceedings{qian2021speech,
  title={Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates},
  author={Qian, Shenhan and Tu, Zhi and Zhi, YiHao and Liu, Wen and Gao, Shenghua},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}
Owner
Qian Shenhan
Qian Shenhan
Text Detection from images using OpenCV

EAST Detector for Text Detection OpenCV’s EAST(Efficient and Accurate Scene Text Detection ) text detector is a deep learning model, based on a novel

Abhishek Singh 88 Oct 20, 2022
Msos searcher - A half-hearted attempt at finding a magic square of squares

MSOS searcher A half-hearted attempt at finding (or rather searching) a MSOS (Magic Square of Squares) in the spirit of the Parker Square. Running I r

Niels Mündler 1 Jan 02, 2022
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Qing 1.2k Jan 04, 2023
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
A curated list of awesome synthetic data for text location and recognition

awesome-SynthText A curated list of awesome synthetic data for text location and recognition and OCR datasets. Text location SynthText SynthText_Chine

Tianzhong 283 Jan 05, 2023
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 04, 2023
kaldi-asr/kaldi is the official location of the Kaldi project.

Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux

Kaldi 12.3k Jan 05, 2023
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
Solution for Problem 1 by team codesquad for AIDL 2020. Uses ML Kit for OCR and OpenCV for image processing

CodeSquad PS1 Solution for Problem Statement 1 for AIDL 2020 conducted by @unifynd technologies. Problem Given images of bills/invoices, the task was

Burhanuddin Udaipurwala 111 Nov 27, 2022
Controlling the computer volume with your hands // OpenCV

HandsControll-AI Controlling the computer volume with your hands // OpenCV Step 1 git clone https://github.com/Hayk-21/HandsControll-AI.git pip instal

Hayk 1 Nov 04, 2021
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022
Ddddocr - 通用验证码识别OCR pypi版

带带弟弟OCR通用验证码识别SDK免费开源版 今天ddddocr又更新啦! 当前版本为1.3.1 想必很多做验证码的新手,一定头疼碰到点选类型的图像,做样本费时

Sml2h3 4.4k Dec 31, 2022
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
📷 Face Recognition using Haar-Cascade Classifier, OpenCV, and Python

Face-Recognition-System Face Recognition using Haar-Cascade Classifier, OpenCV and Python. This project is based on face detection and face recognitio

1 Jan 10, 2022
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
Primary QPDF source code and documentation

QPDF QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption,

QPDF 2.2k Jan 04, 2023
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

Yuliang Liu 600 Dec 18, 2022
[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

ADAPET This repository contains the official code for the paper: "Improving and Simplifying Pattern Exploiting Training". The model improves and simpl

Rakesh R Menon 138 Dec 26, 2022
fishington.io bot with OpenCV and NumPy

fishington.io-bot fishington.io bot with using OpenCV and NumPy bot can continue to fishing fully automatically how to use Open cmd in fishington.io-b

Bahadır Araz 77 Jan 02, 2023