SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Related tags

Deep LearningSAT
Overview

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

by Zhengyuan Yang, Songyang Zhang, Liwei Wang, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2021, Oral

Introduction

We propose 2D Semantics Assisted Training (SAT) that assists 3D visual grounding with 2D semantics. SAT helps 3D tasks with 2D semantics in training but does not require 2D inputs during inference. For more details, please refer to our paper.

Citation

@inproceedings{yang2021sat,
  title={SAT: 2D Semantics Assisted Training for 3D Visual Grounding},
  author={Yang, Zhengyuan and Zhang, Songyang and Wang, Liwei and Luo, Jiebo},
  booktitle={ICCV},
  year={2021}
}

Prerequisites

  • Python 3.6.9 (e.g., conda create -n sat_env python=3.6.9 cudatoolkit=10.0)
  • Pytorch 1.2.0 (e.g., conda install pytorch==1.2.0 cudatoolkit=10.0 -c pytorch)
  • Install other common packages (numpy, pytorch_transformers, etc.)
  • Please refer to setup.py (From ReferIt3D).

Installation

  • Clone the repository

    git clone https://github.com/zyang-ur/SAT.git
    cd SAT
    pip install -e .
    
  • To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.

    cd external_tools/pointnet2
    python setup.py install
    

Data

ScanNet

First you should download the train/val scans of ScanNet if you do not have them locally. Please refer to the instructions from referit3d for more details. The output is the scanfile keep_all_points_00_view_with_global_scan_alignment.pkl/keep_all_points_with_global_scan_alignment.pkl.

Ref3D Linguistic Data

You can dowload the Nr3D and Sr3D/Sr3D+ from Referit3D, and send the file path to referit3D-file.

SAT Processed 2D Features

You can download the processed 2D object image features from here. The cached feature should be placed under the referit3d/data folder, or match the cache path in the dataloader. The feature extraction code will be cleaned and released in the future. Meanwhile, feel free to contact me if you need that before the official release.

Training

Please reference the following example command on Nr3D. Feel free to change the parameters. Please reference arguments for valid options.

cd referit3d/scripts
scanfile=keep_all_points_00_view_with_global_scan_alignment.pkl ## keep_all_points_with_global_scan_alignment if include Sr3D
python train_referit3d.py --patience 100 --max-train-epochs 100 --init-lr 1e-4 --batch-size 16 --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile_csv --log-dir log/$exp_id --n-workers 2 --gpu 0 --unit-sphere-norm True --feat2d clsvecROI --context_2d unaligned --mmt_mask train2d --warmup

Evaluation

Please find the pretrained models here (clsvecROI on Nr3D). A known issue here.

cd referit3d/scripts
python train_referit3d.py --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile --log-dir log/$exp_id --n-workers 2 --gpu $gpu --unit-sphere-norm True --feat2d clsvecROI --mode evaluate --pretrain-path $pretrain_path/best_model.pth

Credits

The project is built based on the following repository:

Part of the code or models are from ScanRef, MMF, TAP, and ViLBERT.

Owner
Zhengyuan Yang
Zhengyuan Yang
Keras Implementation of Neural Style Transfer from the paper "A Neural Algorithm of Artistic Style"

Neural Style Transfer & Neural Doodles Implementation of Neural Style Transfer from the paper A Neural Algorithm of Artistic Style in Keras 2.0+ INetw

Somshubra Majumdar 2.2k Dec 31, 2022
Emblaze - Interactive Embedding Comparison

Emblaze - Interactive Embedding Comparison Emblaze is a Jupyter notebook widget for visually comparing embeddings using animated scatter plots. It bun

CMU Data Interaction Group 77 Nov 24, 2022
Code for "Learning to Regrasp by Learning to Place"

Learning2Regrasp Learning to Regrasp by Learning to Place, CoRL 2021. Introduction We propose a point-cloud-based system for robots to predict a seque

Shuo Cheng (成硕) 18 Aug 27, 2022
Project page for End-to-end Recovery of Human Shape and Pose

End-to-end Recovery of Human Shape and Pose Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik CVPR 2018 Project Page Requirements Pyt

1.4k Dec 29, 2022
ReGAN: Sequence GAN using RE[INFORCE|LAX|BAR] based PG estimators

Sequence Generation with GANs trained by Gradient Estimation Requirements: PyTorch v0.3 Python 3.6 CUDA 9.1 (For GPU) Origin The idea is from paper Se

40 Nov 03, 2022
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
Consistency Regularization for Adversarial Robustness

Consistency Regularization for Adversarial Robustness Official PyTorch implementation of Consistency Regularization for Adversarial Robustness by Jiho

40 Dec 17, 2022
Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

by Matyáš Boháček and Marek Hrúz, University of West Bohemia Should you have any questions or inquiries, feel free to contact us here. Repository acco

Matyáš Boháček 30 Dec 30, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

356 Dec 23, 2022
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The SpeechBrain Toolkit SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch. The goal is to create a single, flexible, and us

SpeechBrain 5.1k Jan 02, 2023
TextureGAN in Pytorch

TextureGAN This code is our PyTorch implementation of TextureGAN [Project] [Arxiv] TextureGAN is a generative adversarial network conditioned on sketc

Patsorn 147 Dec 14, 2022
The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

Bingoren 49 Dec 01, 2022
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

Katsuya Hyodo 6 May 15, 2022
Hyperbolic Hierarchical Clustering.

Hyperbolic Hierarchical Clustering (HypHC) This code is the official PyTorch implementation of the NeurIPS 2020 paper: From Trees to Continuous Embedd

HazyResearch 154 Dec 15, 2022
StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

Creating easier access to the Moroccan stock market data What is StocksMA ? StocksMA is a package to facilitate access to financial and economic data

Salah Eddine LABIAD 28 Jan 04, 2023
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

Blakey Wu 40 Dec 14, 2022
Leaf: Multiple-Choice Question Generation

Leaf: Multiple-Choice Question Generation Easy to use and understand multiple-choice question generation algorithm using T5 Transformers. The applicat

Kristiyan Vachev 62 Dec 20, 2022
网络协议2天集训

网络协议2天集训 抓包工具安装 Wireshark wireshark下载地址 Tcpdump CentOS yum install tcpdump -y Ubuntu apt-get install tcpdump -y k8s抓包测试环境 查看虚拟网卡veth pair 查看

120 Dec 12, 2022