A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

Overview

MUGEN Dataset

Project Page | Paper

Setup

conda create --name MUGEN python=3.6
conda activate MUGEN
pip install --ignore-installed https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.12.0-cp36-cp36m-linux_x86_64.whl 
module load cuda/9.0
module load cudnn/v7.4-cuda.10.0
git clone coinrun_MUGEN
cd coinrun_MUGEN
pip install -r requirements.txt
conda install -c conda-forge mpi4py
pip install -e .

Training Agents

Basic training commands:

python -m coinrun.train_agent --run-id myrun --save-interval 1

After each parameter update, this will save a copy of the agent to ./saved_models/. Results are logged to /tmp/tensorflow by default.

Run parallel training using MPI:

mpiexec -np 8 python -m coinrun.train_agent --run-id myrun

Train an agent on a fixed set of N levels. With N = 0, the training set is unbounded.

python -m coinrun.train_agent --run-id myrun --num-levels N

Continue training an agent from a checkpoint:

python -m coinrun.train_agent --run-id newrun --restore-id myrun

View training options:

python -m coinrun.train_agent --help

Example commands for MUGEN agents:

Base model

python -m coinrun.train_agent --run-id name_your_agent \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 80 \
                --bump-head-penalty 0.25 -kill-monster-reward 10.0

Add squat penalty to reduce excessive squating

python -m coinrun.train_agent --run-id gamev2_fine_tune_m4_squat_penalty \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 811 \
                --bump-head-penalty 0.1 --kill-monster-reward 5.0 --squat-penalty 0.1 \
                --restore-id gamev2_fine_tune_m4_0

Larger model

python -m coinrun.train_agent --run-id gamev2_largearch_bump_head_penalty_0.05_0 \
                --architecture impalalarge --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 51 \
                --bump-head-penalty 0.05 -kill-monster-reward 10.0

Add reward for dying

python -m coinrun.train_agent --run-id gamev2_fine_tune_squat_penalty_die_reward_3.0 \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 857 \
                --bump-head-penalty 0.1 --kill-monster-reward 5.0 --squat-penalty 0.1 \
                --restore-id gamev2_fine_tune_m4_squat_penalty --die-penalty -3.0

Add jump penalty

python -m coinrun.train_agent --run-id gamev2_fine_tune_m4_jump_penalty \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 811 \
                --bump-head-penalty 0.1 --kill-monster-reward 10.0 --jump-penalty 0.1 \
                --restore-id gamev2_fine_tune_m4_0

Data Collection

Collect video data with trained agent. The following command will create a folder {save_dir}/{model_name}_seed_{seed}, which contains the audio semantic maps to reconstruct game audio, as well as the csv containing all game metadata. We use the csv for reconstructing video data in the next step.

python -m coinrun.collect_data --collect_data --paint-vel-info 1 \
                --set-seed 406 --restore-id gamev2_fine_tune_squat_penalty_timeout_300 \
                --save-dir  \
                --level-timeout 600 --num-levels-to-collect 2000

The next step is to create 3.2 second videos with audio by running the script gen_videos.sh. This script first parses the csv metadata of agent gameplay into a json format. Then, we sample 3 second clips, render to RGB, generate audio, and save .mp4s. Note that we apply some sampling logic in gen_videos.py to only generate videos for levels of sufficient length and with interesting game events. You can adjust the sampling logic to your liking here.

There are three outputs from this script:

  1. ./json_metadata - where full level jsons are saved for longer video rendering
  2. ./video_metadata - where 3.2 second video jsons are saved
  3. ./videos - where 3.2s .mp4 videos with audio are saved. We use these videos for human annotation.
bash gen_videos.sh  

For example:

bash gen_videos.sh video_data model_gamev2_fine_tune_squat_penalty_timeout_300_seed_406

License Info

The majority of MUGEN is licensed under CC-BY-NC, however portions of the project are available under separate license terms: CoinRun, VideoGPT, VideoCLIP, and S3D are licensed under the MIT license; Tokenizer is licensed under the Apache 2.0 Pycocoevalcap is licensed under the BSD license; VGGSound is licensed under the CC-BY-4.0 license.

Owner
MUGEN
MUGEN
Meta Self-learning for Multi-Source Domain Adaptation: A Benchmark

Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark Project | Arxiv | YouTube | | Abstract In recent years, deep learning-based methods

CVSM Group - email: <a href=[email protected]"> 188 Dec 12, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Vide

Jonas Wu 232 Dec 29, 2022
Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION This is the official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSU

电线杆 14 Dec 15, 2022
BLEURT is a metric for Natural Language Generation based on transfer learning.

BLEURT: a Transfer Learning-Based Metric for Natural Language Generation BLEURT is an evaluation metric for Natural Language Generation. It takes a pa

Google Research 492 Jan 05, 2023
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021
Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22) Ok-Topk is a scheme for distributed training with sparse gradients

Shigang Li 9 Oct 29, 2022
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

PCAN for Multiple Object Tracking and Segmentation This is the offical implementation of paper PCAN for MOTS. We also present a trailer that consists

ETH VIS Group 328 Dec 29, 2022
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Liu Songtao 1.4k Dec 21, 2022
All the code and files related to the MI-Lab of UE19CS305 course in sem 5

Machine-Intelligence-Lab-CS305 The compilation of all the code an drelated files from MI-Lab UE19CS305 (of batch 2019-2023) offered by PES University

Arvind Krishna 3 Nov 10, 2022
Weakly Supervised End-to-End Learning (NeurIPS 2021)

WeaSEL: Weakly Supervised End-to-end Learning This is a PyTorch-Lightning-based framework, based on our End-to-End Weak Supervision paper (NeurIPS 202

Auton Lab, Carnegie Mellon University 131 Jan 06, 2023
FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cr

Scaleout 75 Nov 09, 2022
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 09, 2023
💡 Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

Gustavo Rosa 57 Nov 17, 2022
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

Microsoft 28 Oct 02, 2022
Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! 👋 This is code for "Active Testing: Sample-Efficient Model Evaluation". P

Jannik Kossen 19 Oct 30, 2022
Programming with Neural Surrogates of Programs

Programming with Neural Surrogates of Programs

0 Dec 12, 2021
scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

scAR scAR (single cell Ambient Remover) is a package for denoising multiple single cell omics data. It can be used for multiple tasks, such as, sgRNA

19 Nov 28, 2022
Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Visual Transformer for Facial Emotion Recognition (FER) This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recog

Mario Sessa 8 Dec 12, 2022
Fashion Entity Classification

Fashion-Entity-Classification - Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grays

ADITYA SHAH 1 Jan 04, 2022