Localizing-Visual-Sounds-the-Hard-Way

Code and Dataset for "Localizing Visual Sounds the Hard Way".

The repo contains code and our pre-trained model.

Environment

Python 3.6.8
Pytorch 1.3.0

Flickr-SoundNet

We provide the pretrained model here.

To test the model, testing data and ground truth should be downloaded from learning to localize sound source.

Then run

python test.py --data_path "path to downloaded data with structure below/" --summaries_dir "path to pretrained models" --gt_path "path to ground truth" --testset "flickr"

VGG-Sound Source

We provide the pretrained model here.

To test the model, run

python test.py --data_path "path to downloaded data with structure below/" --summaries_dir "path to pretrained models" --testset "vggss"

(Note, some gt bounding boxes are updated recently, all results on VGG-SS cause a 2~3% difference on IoU.)

Both test data should be placed in the following structure.

data path
│
└───frames
│   │   image001.jpg
│   │   image002.jpg
│   │
└───audio
    │   audio011.wav
    │   audio012.wav

Citation

@InProceedings{Chen21,
              title        = "Localizing Visual Sounds the Hard Way",
              author       = "Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman",
              booktitle    = "CVPR",
              year         = "2021"}

Localizing Visual Sounds the Hard Way

Related tags

Overview

Localizing-Visual-Sounds-the-Hard-Way

Environment

Flickr-SoundNet

VGG-Sound Source

Citation

Owner

Honglie Chen

Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion

For medical image segmentation

This is a classifier which basically predicts whether there is a gun law in a state or not, depending on various things like murder rates etc.

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

This repo contains code to reproduce all experiments in Equivariant Neural Rendering

This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Fake videos detection by tracing the source using video hashing retrieval.

Metric learning algorithms in Python

Array Camera Ptychography

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

中文语音识别系列，读者可以借助它快速训练属于自己的中文语音识别模型，或直接使用预训练模型测试效果。

Unsupervised Feature Ranking via Attribute Networks.

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

some classic model used to segment the medical images like CT、X-ray and so on

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.