Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Overview

Wav2CLIP

🚧 WIP 🚧

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP 📄 🔗

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation, and show that Wav2CLIP can outperform several publicly available pre-trained audio representation algorithms. Wav2CLIP projects audio into a shared embedding space with images and text, which enables multimodal applications such as zero-shot classification, and cross-modal retrieval. Furthermore, Wav2CLIP needs just ~10% of the data to achieve competitive performance on downstream tasks compared with fully supervised models, and is more efficient to pre-train than competing methods as it does not require learning a visual model in concert with an auditory model. Finally, we demonstrate image generation from Wav2CLIP as qualitative assessment of the shared embedding space. Our code and model weights are open sourced and made available for further applications.

Installation

pip install wav2clip

Usage

Clip-Level Embeddings

import wav2clip

model = wav2clip.get_model()
embeddings = wav2clip.embed_audio(audio, model)

Frame-Level Embeddings

import wav2clip

model = wav2clip.get_model(frame_length=16000, hop_length=16000)
embeddings = wav2clip.embed_audio(audio, model)
Comments
  • request of projection layer weight

    request of projection layer weight

    Hi @hohsiangwu , Thanks for great work! Request pre-trained weights of image_transform (MLP layer) for audio-image-language joint embedding space.

    Currently, only audio encoders seem to exist in the get_model function. Is there any big problem if I use CLIP embedding (text or image) without projection layer?

    opened by SeungHeonDoh 2
  • Initial checkin for accessing pre-trained model via pip install

    Initial checkin for accessing pre-trained model via pip install

    I am considering using the release feature of GitHub to host model weights, once the url is added to MODEL_WEIGHTS_URL, and the repository is made public, we should be able to model = torch.hub.load('descriptinc/lyrebird-wav2clip', 'wav2clip', pretrained=True)

    opened by hohsiangwu 1
  • Adding VQGAN-CLIP with modification to generate audio

    Adding VQGAN-CLIP with modification to generate audio

    • Adding a working snapshot of original generate.py from https://github.com/nerdyrodent/VQGAN-CLIP/
    • Modify to add audio related params and functions
    • Add scripts to generate image and video with options for conditioning and interpolation
    opened by hohsiangwu 0
  • Supervised scenario no transform

    Supervised scenario no transform

    In the supervise scenario in the __init__.py the transform flag is not set to True, so the model doesn't contain the MLP layer after training. I'm wondering how you train the MLP layer when using as pretrained.

    opened by alirezadir 0
  • Integrated into VQGAN+CLIP 3D Zooming notebook

    Integrated into VQGAN+CLIP 3D Zooming notebook

    Dear researchers,

    I integrated Wav2CLIP into a VQGAN+CLIP animation notebook.

    It is available on colab here: https://colab.research.google.com/github/pollinations/hive/blob/main/notebooks/2%20Text-To-Video/1%20CLIP-Guided%20VQGAN%203D%20Turbo%20Zoom.ipynb

    I'm part of a team creating an open-source generative art platform called Pollinations.AI. It's also possible to use through our frontend if you are interested. https://pollinations.ai/p/QmT7yt67DF3GF4wd2vyw6bAgN3QZx7Xpnoyx98YWEsEuV7/create

    Here is an example output: https://user-images.githubusercontent.com/5099901/168467451-f633468d-e596-48f5-8c2c-2dc54648ead3.mp4

    opened by voodoohop 0
  • The details concerning loading raw audio files

    The details concerning loading raw audio files

    Hi !

    I haved imported the wave2clip as a package, however when testing, the inputs for the model to extract features are not original audio files. Thus can you provided the details to load the audio files to processed data for the model?

    opened by jinx2018 0
  • torch version

    torch version

    Hi, thanks for sharing the wonderful work! I encountered some issues during pip installing it, so may I ask what is the torch version you used? I cannot find the requirement of this project. Thanks!

    opened by annahung31 0
  • Error when importing after fresh installation on colab

    Error when importing after fresh installation on colab

    What CUDA and Python versions have you tested the pip package in? After installation on a fresh collab I receive the following error:


    OSError Traceback (most recent call last) in () ----> 1 import wav2clip

    7 frames /usr/local/lib/python3.7/dist-packages/wav2clip/init.py in () 2 import torch 3 ----> 4 from .model.encoder import ResNetExtractor 5 6

    /usr/local/lib/python3.7/dist-packages/wav2clip/model/encoder.py in () 4 from torch import nn 5 ----> 6 from .resnet import BasicBlock 7 from .resnet import ResNet 8

    /usr/local/lib/python3.7/dist-packages/wav2clip/model/resnet.py in () 3 import torch.nn as nn 4 import torch.nn.functional as F ----> 5 import torchaudio 6 7

    /usr/local/lib/python3.7/dist-packages/torchaudio/init.py in () ----> 1 from torchaudio import _extension # noqa: F401 2 from torchaudio import ( 3 compliance, 4 datasets, 5 functional,

    /usr/local/lib/python3.7/dist-packages/torchaudio/_extension.py in () 25 26 ---> 27 _init_extension()

    /usr/local/lib/python3.7/dist-packages/torchaudio/_extension.py in _init_extension() 19 # which depends on libtorchaudio and dynamic loader will handle it for us. 20 if path.exists(): ---> 21 torch.ops.load_library(path) 22 torch.classes.load_library(path) 23 # This import is for initializing the methods registered via PyBind11

    /usr/local/lib/python3.7/dist-packages/torch/_ops.py in load_library(self, path) 108 # static (global) initialization code in order to register custom 109 # operators with the JIT. --> 110 ctypes.CDLL(path) 111 self.loaded_libraries.add(path) 112

    /usr/lib/python3.7/ctypes/init.py in init(self, name, mode, handle, use_errno, use_last_error) 362 363 if handle is None: --> 364 self._handle = _dlopen(self._name, mode) 365 else: 366 self._handle = handle

    OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory

    opened by janzuiderveld 0
Releases(v0.1.0-alpha)
Owner
Descript
Descript
Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Intro This repository contains code to generate data and reproduce experiments from our NeurIPS 2019 paper: Boris Knyazev, Graham W. Taylor, Mohamed R

Boris Knyazev 242 Jan 06, 2023
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

Deep High Dynamic Range Imaging Benchmark This repository is the pytorch impleme

Tianhong Dai 5 Nov 16, 2022
9th place solution in "Santa 2020 - The Candy Cane Contest"

Santa 2020 - The Candy Cane Contest My solution in this Kaggle competition "Santa 2020 - The Candy Cane Contest", 9th place. Basic Strategy In this co

toshi_k 22 Nov 26, 2021
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) -- xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
Datasets, tools, and benchmarks for representation learning of code.

The CodeSearchNet challenge has been concluded We would like to thank all participants for their submissions and we hope that this challenge provided

GitHub 1.8k Dec 25, 2022
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

ZhouYanzhao 217 Dec 12, 2022
Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

RE_improved_baseline Code for technical report "An Improved Baseline for Sentence-level Relation Extraction". Requirements torch = 1.8.1 transformers

Wenxuan Zhou 74 Nov 29, 2022
A set of tools to pre-calibrate and calibrate (multi-focus) plenoptic cameras (e.g., a Raytrix R12) based on the libpleno.

COMPOTE: Calibration Of Multi-focus PlenOpTic camEra. COMPOTE is a set of tools to pre-calibrate and calibrate (multifocus) plenoptic cameras (e.g., a

ComSEE - Computers that SEE 4 May 10, 2022
Keyword spotting on Arm Cortex-M Microcontrollers

Keyword spotting for Microcontrollers This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword sp

Arm Software 1k Dec 30, 2022
An image classification app boilerplate to serve your deep learning models asap!

Image 🖼 Classification App Boilerplate Have you been puzzled by tons of videos, blogs and other resources on the internet and don't know where and ho

Smaranjit Ghose 27 Oct 06, 2022
Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021) 99% of the code in this repository originates from this link. ICCV 2021 pap

Jeesoo Kim 10 Feb 01, 2022
A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

Phil Wang 515 Dec 26, 2022
IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

IDRLnet IDRLnet is a machine learning library on top of PyTorch. Use IDRLnet if you need a machine learning library that solves both forward and inver

IDRL 105 Dec 17, 2022
Official repository of the paper "GPR1200: A Benchmark for General-PurposeContent-Based Image Retrieval"

GPR1200 Dataset GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval (ArXiv) Konstantin Schall, Kai Uwe Barthel, Nico Hezel, Klaus J

Visual Computing Group 16 Nov 21, 2022
Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

thenewboston 27 Jul 08, 2022
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Ponder(ing) Transformer Implementation of a Transformer that learns to adapt the number of computational steps it takes depending on the difficulty of

Phil Wang 65 Oct 04, 2022
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.2k Dec 23, 2022
Real-time LIDAR-based Urban Road and Sidewalk detection for Autonomous Vehicles 🚗

urban_road_filter: a real-time LIDAR-based urban road and sidewalk detection algorithm for autonomous vehicles Dependency ROS (tested with Kinetic and

JKK - Vehicle Industry Research Center 180 Dec 12, 2022
The code used for the free [email protected] Webinar series on Reinforcement Learning in Finance

Reinforcement Learning in Finance [email protected] Webinar This repository provides the code f

Yves Hilpisch 62 Dec 22, 2022