Yet another video caption

Last update: May 26, 2022

Related tags

Deep Learning yet-another-video-caption

Overview

yet-another-video-caption

数据集配置

准备数据集

将原始数据集重新组织成统一的格式后，放置于 ./dataset 中。

数据集的组织格式为：

./dataset
    train/
        video/
            *.avi
        ...
        info.json
    test/
        video/ 
            *.avi
        ...

自动配置

通常你只需要使用数据集的一个子集，此时请考虑运行自动抽取脚本 makedata.py。

所有数据位于 ./data 中。

所有视频（包括 train/val/test）位于 ./data/video 中。

所有视频信息（包括 train/val/test）输入到 ./data/input.json。

程序会在 ./data 中产生一些中间信息，请勿修改。

依赖

pip install tqdm pillow pretrainedmodels nltk

此外，请确保已当前环境下已经正确配置 CUDA 运行库，CUDNN，Pytorch(GPU)，ffmpeg，JDK

食用步骤

确保数据集已正确配置
确保依赖已经正确安装
抽取数据，将你希望使用的 train/val/test 划分参数输入 makedata.py 中，然后执行该脚本
依次执行（请自行修改 batch_size 和 saved_model 参数！）

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152
python prepro_vocab.py
python train.py --epochs 3001 --batch_size 1 --checkpoint_path data/save --feats_dir data/feats/resnet152 --model S2VTAttModel --with_c3d 0 --dim_vid 2048
python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_10.pth --batch_size 1

速度测试

以下结果测试于单张 2080Ti

预处理（ResNet152 特征提取）：共 40min

训练速度（batch_size=32）：6.20 it/s

Todo

大小写问题

References

https://github.com/xiadingZ/video-caption.pytorch

Yet another video caption

Related tags

Overview

yet-another-video-caption

数据集配置

准备数据集

自动配置

依赖

食用步骤

速度测试

Todo

References

Owner

Fan Zhimin

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

LoL Runes Recommender With Python

Auto grind btdb2 exp for tower

Exploring Simple Siamese Representation Learning

CLIP (Contrastive Language–Image Pre-training) for Italian

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation

Jupyter notebooks showing best practices for using cx_Oracle, the Python DB API for Oracle Database

PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

Kaggle Ultrasound Nerve Segmentation competition [Keras]

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

LSTM model trained on a small dataset of 3000 names written in PyTorch

This repository is dedicated to developing and maintaining code for experiments with wide neural networks.

A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

Hardware accelerated, batchable and differentiable optimizers in JAX.