Automatic Video Captioning Evaluation Metric --- EMScore

Last update: Nov 28, 2022

Related tags

Deep Learning emscore

Overview

Automatic Video Captioning Evaluation Metric --- EMScore

Overview

For an illustration, EMScore can be computed as:

Installation

modify the encode_text() function in CLIP/clip/model.py as follows:

def encode_text(self, text, local=False):
    x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]

    x = x + self.positional_embedding.type(self.dtype)
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.transformer(x)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.ln_final(x).type(self.dtype)

    if local:
        x = x @ self.text_projection
    else:
        # x.shape = [batch_size, n_ctx, transformer.width]
        # take features from the eot embedding (eot_token is the highest number in each sequence)
        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
  
    return x

Push your modified CLIP to your GitHub.

Install

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/$Yours_GitHub_name/CLIP

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU.

Usage:

A general demo

python demo.py

VATEX-EVAL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1jAfZZKEgkMEYFF2x1mhYo39nH-TNeGm6?usp=sharing

run code

python VATEX-EVAL-demo.py --storage_path $storage_path --use_n_refs 1 --use_feat_cache --use_idf

ActivityNet-FOIL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1oY9EJiEi_db_1GH-R33JDqfE8txffKR3?usp=sharing

run code

python ActivityNet-FOIL_demo.py --storage_path $storage_path --use_references --use_idf

Others

if you want extract embeddings by yourself:

python extract_video_embeddings.py --videos_path $your_video_path  --save_path $your_storage_path --backbone 'ViT-B/32'

Automatic Video Captioning Evaluation Metric --- EMScore

Related tags

Overview

Overview

Installation

Usage:

A general demo

VATEX-EVAL

ActivityNet-FOIL

Others

Owner

Yaya Shi

Facestar dataset. High quality audio-visual recordings of human conversational speech.

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.

NVIDIA container runtime

HW3 ― GAN, ACGAN and UDA

Fashion Entity Classification

Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates

DISTIL: Deep dIverSified inTeractIve Learning.

To build a regression model to predict the concrete compressive strength based on the different features in the training data.

General Vision Benchmark, a project from OpenGVLab

Simple tools for logging and visualizing, loading and training

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Video Frame Interpolation without Temporal Priors (a general method for blurry video interpolation)

Solution to the Weather4cast 2021 challenge

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Super Resolution for images using deep learning.

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

DiffStride: Learning strides in convolutional neural networks