This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Last update: Dec 19, 2022

Overview

VAC_CSLR

This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper]

Prerequisites

This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode]，for beam search decode.
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics.
[Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.

Data Preparation

Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phienix2014
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python data_preprocess.py --process-image --multiprocessing
```

Inference

We provide the pretrained models for inference, you can download them from:

Backbone	WER on Dev	WER on Test	Pretrained model
ResNet18	21.2%	22.3%	[Baidu] (passwd: qi83) [Dropbox]

To evaluate the pretrained model, run the command below：
python main.py --load-weights resnet18_slr_pretrained.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:

python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS

Feature Extraction

We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:

python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features

To Do List

Pure python implemented evaluation tools.
WAR and WER calculation scripts.

Citation

If you find this repo useful in your research works, please consider citing:

@InProceedings{Min_2021_ICCV,
    author    = {Min, Yuecong and Hao, Aiming and Chai, Xiujuan and Chen, Xilin},
    title     = {Visual Alignment Constraint for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11542-11551}
}

Relevant paper

Self-Mutual Distillation Learning for Continuous Sign Language Recognition[paper]

@InProceedings{Hao_2021_ICCV,
    author    = {Hao, Aiming and Min, Yuecong and Chen, Xilin},
    title     = {Self-Mutual Distillation Learning for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11303-11312}
}

Acknowledge

We appreciate the help from Runpeng Cui, Hao Zhou@Rhythmblue and Xinzhe Han@GeraldHan :)

This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Related tags

Overview

VAC_CSLR

Prerequisites

Data Preparation

Inference

Training

Feature Extraction

To Do List

Citation

Relevant paper

Acknowledge

Owner

Yuecong Min

This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control

NAACL2021 - COIL Contextualized Lexical Retriever

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

This code implements constituency parse tree aggregation

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Code for "Diversity can be Transferred: Output Diversification for White- and Black-box Attacks"

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

This repository gives an example on how to preprocess the data of the HECKTOR challenge

The fastai deep learning library

Scrutinizing XAI with linear ground-truth data

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

Cockpit is a visual and statistical debugger specifically designed for deep learning.