Easy to use Audio Tagging in PyTorch

Overview

Audio Classification, Tagging & Sound Event Detection in PyTorch

Progress:

  • Fine-tune on audio classification
  • Fine-tune on audio tagging
  • Fine-tune on sound event detection
  • Add tagging metrics
  • Add Tutorial
  • Add Augmentation Notebook
  • Add more schedulers
  • Add FSDKaggle2019 dataset
  • Add MTT dataset
  • Add DESED

Model Zoo

AudioSet Pretrained Models
Model Task mAP
(%)
Sample Rate
(kHz)
Window Length Num Mels Fmax Weights
CNN14 Tagging 43.1 32 1024 64 14k download
CNN14_16k Tagging 43.8 16 512 64 8k download
CNN14_DecisionLevelMax SED 38.5 32 1024 64 14k download

Note: These models will be used as a pretrained model in the fine-tuning tasks below. Check out audioset-tagging-cnn, if you want to train on AudioSet dataset.

Fine-tuned Classification Models
Model Dataset Accuracy
(%)
Sample Rate
(kHz)
Weights
CNN14 ESC50 (Fold-5) 95.75 32 download
CNN14 FSDKaggle2018 (test) 93.56 32 download
CNN14 SpeechCommandsv1 (val/test) 96.60/96.77 32 download
Fine-tuned Tagging Models
Model Dataset mAP(%) AUC d-prime Sample Rate
(kHz)
Config Weights
CNN14 FSDKaggle2019 - - - 32 - -
Fine-tuned SED Models
Model Dataset F1 Sample Rate
(kHz)
Config Weights
CNN14_DecisionLevelMax DESED - 32 - -

Supported Datasets

Dataset Task Classes Train Val Test Audio Length Audio Spec Size
ESC-50 Classification 50 2,000 5 folds - 5s 44.1kHz, mono 600MB
UrbanSound8k Classification 10 8,732 10 folds - <=4s Vary 5.6GB
FSDKaggle2018 Classification 41 9,473 - 1,600 300ms~30s 44.1kHz, mono 4.6GB
SpeechCommandsv1 Classification 30 51,088 6,798 6,835 <=1s 16kHz, mono 1.4GB
SpeechCommandsv2 Classification 35 84,843 9,981 11,005 <=1s 16kHz, mono 2.3GB
FSDKaggle2019* Tagging 80 4,970+19,815 - 4,481 300ms~30s 44.1kHz, mono 24GB
MTT* Tagging 50 19,000 - - - - 3GB
DESED* SED 10 - - - 10 - -

Notes: * datasets are not available yet. Classification dataset are treated as multi-class/single-label classification and tagging and sed datasets are treated as multi-label classification.

Dataset Structure (click to expand)

Download the dataset and prepare it into the following structure.

datasets
|__ ESC50
    |__ audio

|__ Urbansound8k
    |__ audio

|__ FSDKaggle2018
    |__ audio_train
    |__ audio_test
    |__ FSDKaggle2018.meta
        |__ train_post_competition.csv
        |__ test_post_competition_scoring_clips.csv

|__ SpeechCommandsv1/v2
    |__ bed
    |__ bird
    |__ ...
    |__ testing_list.txt
    |__ validation_list.txt


Augmentations (click to expand)

Currently, the following augmentations are supported. More will be added in the future. You can test the effects of augmentations with this notebook

WaveForm Augmentations:

  • MixUp
  • Background Noise
  • Gaussian Noise
  • Fade In/Out
  • Volume
  • CutMix

Spectrogram Augmentations:

  • Time Masking
  • Frequency Masking
  • Filter Augmentation

Usage

Requirements (click to expand)
  • python >= 3.6
  • pytorch >= 1.8.1
  • torchaudio >= 0.8.1

Other requirements can be installed with pip install -r requirements.txt.


Configuration (click to expand)
  • Create a configuration file in configs. Sample configuration for ESC50 dataset can be found here.
  • Copy the contents of this and then edit the fields you think if it is needed.
  • This configuration file is needed for all of training, evaluation and prediction scripts.

Training (click to expand)

To train with a single GPU:

$ python tools/train.py --cfg configs/CONFIG_FILE_NAME.yaml

To train with multiple gpus, set DDP field in config file to true and run as follows:

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/CONFIG_FILE_NAME.yaml

Evaluation (click to expand)

Make sure to set MODEL_PATH of the configuration file to your trained model directory.

$ python tools/val.py --cfg configs/CONFIG_FILE.yaml

Audio Classification/Tagging Inference
  • Set MODEL_PATH of the configuration file to your model's trained weights.
  • Change the dataset name in DATASET >> NAME as your trained model's dataset.
  • Set the testing audio file path in TEST >> FILE.
  • Run the following command.
$ python tools/infer.py --cfg configs/CONFIG_FILE.yaml

## for example
$ python tools/infer.py --cfg configs/audioset.yaml

You will get an output similar to this:

Class                     Confidence
----------------------  ------------
Speech                     0.897762
Telephone bell ringing     0.752206
Telephone                  0.219329
Inside, small room         0.20761
Music                      0.0770325

Sound Event Detection Inference
  • Set MODEL_PATH of the configuration file to your model's trained weights.
  • Change the dataset name in DATASET >> NAME as your trained model's dataset.
  • Set the testing audio file path in TEST >> FILE.
  • Run the following command.
$ python tools/sed_infer.py --cfg configs/CONFIG_FILE.yaml

## for example
$ python tools/sed_infer.py --cfg configs/audioset_sed.yaml

You will get an output similar to this:

Class                     Start    End
----------------------  -------  -----
Speech                      2.2    7
Telephone bell ringing      0      2.5

The following plot will also be shown, if you set PLOT to true:

sed_result


References (click to expand)

Citations (click to expand)
@misc{kong2020panns,
      title={PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition}, 
      author={Qiuqiang Kong and Yin Cao and Turab Iqbal and Yuxuan Wang and Wenwu Wang and Mark D. Plumbley},
      year={2020},
      eprint={1912.10211},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

@misc{gong2021ast,
      title={AST: Audio Spectrogram Transformer}, 
      author={Yuan Gong and Yu-An Chung and James Glass},
      year={2021},
      eprint={2104.01778},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

@misc{nam2021heavily,
      title={Heavily Augmented Sound Event Detection utilizing Weak Predictions}, 
      author={Hyeonuk Nam and Byeong-Yun Ko and Gyeong-Tae Lee and Seong-Hu Kim and Won-Ho Jung and Sang-Min Choi and Yong-Hwa Park},
      year={2021},
      eprint={2107.03649},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}
You might also like...
TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics. TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.
TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

TorchFlare TorchFlare is a simple, beginner-friendly and an easy-to-use PyTorch Framework train your models without much effort. It provides an almost

A more easy-to-use implementation of KPConv based on PyTorch.

A more easy-to-use implementation of KPConv This repo contains a more easy-to-use implementation of KPConv based on PyTorch. Introduction KPConv is a

Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectrum sensing.

Deep-Learning-based-Spectrum-Sensing Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectru

Fast image augmentation library and easy to use wrapper around other libraries. Documentation:  https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

Fast, flexible and easy to use probabilistic modelling in Python.
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

A fast and easy to use, moddable, Python based Minecraft server!
A fast and easy to use, moddable, Python based Minecraft server!

PyMine PyMine - The fastest, easiest to use, Python-based Minecraft Server! Features Note: This list is not always up to date, and doesn't contain all

Releases(v0.2.0)
  • v0.2.0(Aug 17, 2021)

    This release includes the following:

    • Fine-tuned on ESC50, FSDKaggle2018, SpeechCommandsv1
    • Add waveform augmentations
    • Add spectrogram augmentations
    • Add augmentation testing notebook
    • Add tagging metrics
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Aug 13, 2021)

Owner
sithu3
AI Developer
sithu3
Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

IIC2233 - Programación Avanzada Evaluación Las evaluaciones serán efectuadas por medio de actividades prácticas en clases y tareas. Se calculará la no

IIC2233 @ UC 0 Dec 15, 2022
Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB)

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB) This repository provides evaluation codes of PLNLP for OGB link property prediction t

Zhitao WANG 31 Oct 10, 2022
🙄 Difficult algorithm, Simple code.

🎉TensorFlow2.0-Examples🎉! "Talk is cheap, show me the code." ----- Linus Torvalds Created by YunYang1994 This tutorial was designed for easily divin

1.7k Dec 25, 2022
AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet buil

3.4k Jan 07, 2023
3D-aware GANs based on NeRF (arXiv).

CIPS-3D This repository will contain the code of the paper, CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis.

Peterou 563 Dec 31, 2022
Demonstrates iterative FGSM on Apple's NeuralHash model.

apple-neuralhash-attack Demonstrates iterative FGSM on Apple's NeuralHash model. TL;DR: It is possible to apply noise to CSAM images and make them loo

Lim Swee Kiat 11 Jun 23, 2022
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

Gen Li 91 Dec 23, 2022
The project covers common metrics for super-resolution performance evaluation.

Super-Resolution Performance Evaluation Code The project covers common metrics for super-resolution performance evaluation. Metrics support The script

xmy 10 Aug 03, 2022
This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Yue Yu 58 Dec 21, 2022
Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

auto-self-checker 자동으로 자가진단 해주는 프로그램(python 필요) 중요 이 프로그램이 실행될때에는 절대로 마우스포인터를 움직이거나 키보드를 건드리면 안된다(화면인식, 마우스포인터로 직접 클릭) 사용법 프로그램을 구동할 폴더 내의 cmd창에서 pip

1 Dec 30, 2021
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

LEAR The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction". **The code is in the "master

杨攀 93 Jan 07, 2023
Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

Variational Gibbs inference (VGI) This repository contains the research code for Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs infer

Vaidotas Šimkus 1 Apr 08, 2022
Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
Meta Learning Backpropagation And Improving It (VSML)

Meta Learning Backpropagation And Improving It (VSML) This is research code for the NeurIPS 2021 publication Kirsch & Schmidhuber 2021. Many concepts

Louis Kirsch 22 Dec 21, 2022
Keeper for Ricochet Protocol, implemented with Apache Airflow

Ricochet Keeper This repository contains Apache Airflow DAGs for executing keeper operations for Ricochet Exchange. Usage You will need to run this us

Ricochet Exchange 5 May 24, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed+Megatron trained the world's most powerful language model: MT-530B DeepSpeed is hiring, come join us! DeepSpeed is a deep learning optimizat

Microsoft 8.4k Dec 28, 2022
Neural Scene Flow Fields using pytorch-lightning, with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation o

AI葵 178 Dec 21, 2022
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 697 Dec 27, 2022
A Real-Time-Strategy game for Deep Learning research

Description DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provi

Centre for Artificial Intelligence Research (CAIR) 156 Dec 19, 2022