[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Last update: Jan 05, 2023

Overview

TransFuser

This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our code or paper useful, please cite

@inproceedings{Prakash2021CVPR,
  author = {Prakash, Aditya and Chitta, Kashyap and Geiger, Andreas},
  title = {Multi-Modal Fusion Transformer for End-to-End Autonomous Driving},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
}

Setup

Install anaconda

wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
bash Anaconda3-2020.11-Linux-x86_64.sh
source ~/.profile

Clone the repo and build the environment

git clone https://github.com/autonomousvision/transfuser
cd transfuser
conda create -n transfuser python=3.7
pip3 install -r requirements.txt
conda activate transfuser

Download and setup CARLA 0.9.10.1

chmod +x setup_carla.sh
./setup_carla.sh

Data Generation

The training data is generated using leaderboard/team_code/auto_pilot.py in 8 CARLA towns and 14 weather conditions. The routes and scenarios files to be used for data generation are provided at leaderboard/data.

Running CARLA Server

With Display

./CarlaUE4.sh -world-port=<port> -opengl

Without Display

Without Docker:

SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=<gpu_id> ./CarlaUE4.sh -world-port=<port> -opengl

With Docker:

Instructions for setting up docker are available here. Pull the docker image of CARLA 0.9.10.1 docker pull carlasim/carla:0.9.10.1.

Docker 18:

docker run -it --rm -p 2000-2002:2000-2002 --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=<gpu_id> carlasim/carla:0.9.10.1 ./CarlaUE4.sh -world-port=2000 -opengl

Docker 19:

docker run -it --rm --net=host --gpus '"device=<gpu_id>"' carlasim/carla:0.9.10.1 ./CarlaUE4.sh -world-port=2000 -opengl

If the docker container doesn't start properly then add another environment variable -e SDL_AUDIODRIVER=dsp.

Run the Autopilot

Once the CARLA server is running, rollout the autopilot to start data generation.

./leaderboard/scripts/run_evaluation.sh

The expert agent used for data generation is defined in leaderboard/team_code/auto_pilot.py. Different variables which need to be set are specified in leaderboard/scripts/run_evaluation.sh. The expert agent is based on the autopilot from this codebase.

Routes and Scenarios

Each route is defined by a sequence of waypoints (and optionally a weather condition) that the agent needs to follow. Each scenario is defined by a trigger transform (location and orientation) and other actors present in that scenario (optional). The leaderboard repository provides a set of routes and scenarios files. To generate additional routes, spin up a CARLA server and follow the procedure below.

Generating routes with intersections

The position of traffic lights is used to localize intersections and (start_wp, end_wp) pairs are sampled in a grid centered at these points.

python3 tools/generate_intersection_routes.py --save_file <path_of_generated_routes_file> --town <town_to_be_used>

Sampling individual junctions from a route

Each route in the provided routes file is interpolated into a dense sequence of waypoints and individual junctions are sampled from these based on change in navigational commands.

python3 tools/sample_junctions.py --routes_file <xml_file_containing_routes> --save_file <path_of_generated_file>

Generating Scenarios

Additional scenarios are densely sampled in a grid centered at the locations from the reference scenarios file. More scenario files can be found here.

python3 tools/generate_scenarios.py --scenarios_file <scenarios_file_to_be_used_as_reference> --save_file <path_of_generated_json_file> --towns <town_to_be_used>

Training

The training code and pretrained models are provided below.

mkdir model_ckpt
wget https://s3.eu-central-1.amazonaws.com/avg-projects/transfuser/models.zip -P model_ckpt
unzip model_ckpt/models.zip -d model_ckpt/
rm model_ckpt/models.zip

Evaluation

Spin up a CARLA server (described above) and run the required agent. The adequate routes and scenarios files are provided in leaderboard/data and the required variables need to be set in leaderboard/scripts/run_evaluation.sh.

CUDA_VISIBLE_DEVICES=<gpu_id> ./leaderboard/scripts/run_evaluation.sh

Acknowledgements

This implementation is based on codebase from several repositories.

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Related tags

Overview

TransFuser

Setup

Data Generation

Running CARLA Server

With Display

Without Display

Run the Autopilot

Routes and Scenarios

Generating routes with intersections

Sampling individual junctions from a route

Generating Scenarios

Training

Evaluation

Acknowledgements

Owner

Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.

This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling

Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

An end-to-end PyTorch framework for image and video classification

Job Assignment System by Real-time Emotion Detection

Repository for training material for the 2022 SDSC HPC/CI User Training Course

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

A collection of random and hastily hacked together scripts for investigating EU-DCC

CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization

Piotr - IoT firmware emulation instrumentation for training and research

Bayesian Optimization using GPflow

Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Python Library for Signal/Image Data Analysis with Transport Methods

OntoProtein: Protein Pretraining With Ontology Embedding

PoseCamera is python based SDK for human pose estimation through RGB webcam.

In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP