An evaluation toolkit for voice conversion models.

Last update: Aug 29, 2022

Overview

Voice-conversion-evaluation

An evaluation toolkit for voice conversion models.

Sample test pair

Generate the metadata for evaluating models.
The directory of parsers contains several available corpus parsers.

  python sampler.py [name of source corpus] [path of source dir] [name of target corpus] [path of target dir] -n [number of samples] -nt [number of target utterances] -o [path of output dir]

The pairs of metadata are sorted by src_second for long to short.
The metadata contains:

source_corpus: The name of the source corpus.
source_corpus_speaker_number: The number of speaker in source corpus.
source_random_seed: Random seed used for sampling source utterance.
target_corpus: The name of the target corpus.
target_corpus_speaker_number: The number of speaker in target corpus.
target_random_seed: Random seed used for sampling target utterances.
n_samples: number of samples
n_target_samples: number of target utterances
pairs: List of evaluating pairs
- source_speaker: The name of the source speaker.
- target_speaker: The name of the target speaker.
- src_utt: The relative path of the source utterance, which is relative to the source dir.
- tgt_utts: List of the relative path of target utterances, which is relative to the target dir.
- content: The content of the source utterance.
- src_second: The second of the source utterance.
- converted: The entry does not appear when use sampler, you need to add the relative path for your converted output.

Metrics

The metrics include automatic mean opinion score assessment, character error rate, and speaker verification acceptance rate.

Automatic mean opinion score assessment
- Ensemble several MBNet which is implemented by sky1456723.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/mean_opinion_score
```
Character error rate:
- Use the automatic speech recognition model provided by Hugging Face.
- The word error rate on Librispeech test-other is 3.9.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/character_error_rate
```
Speaker verification acceptance rate:
- You can calculate the threshold by metrics/speaker_verification/equal_error_rate/.
- And some pre-calculated thresholds are in metrics/speaker_verification/equal_error_rate/threshold.yaml.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/speaker_verification -t [target_dir] -th [threshold path]
```

Voice assistant - Voice assistant with python

🌐 Python Voice Assistant 🌵 - User's greeting 🌵 - Writing tasks to todo-list ?

10 Dec 26, 2022

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

5.7k Jan 9, 2023

SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

SimulEval SimulEval is a general evaluation framework for simultaneous translation on text and speech. Requirement python = 3.7.0 Installation git cl

48 Dec 28, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17.3k Dec 29, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17k Feb 11, 2021

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

136 Dec 28, 2022

An evaluation toolkit for voice conversion models.

Related tags

Overview

Voice-conversion-evaluation

Sample test pair

Metrics

You might also like...

Voice assistant - Voice assistant with python

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

PlenOctrees: NeRF-SH Training & Conversion

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

Releases(checkpoints)

checkpoints(May 17, 2021)

Owner

Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

The Simplest DCGAN Implementation

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Datasets, Transforms and Models specific to Computer Vision

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Official page of Patchwork (RA-L'21 w/ IROS'21)

The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

A Simplied Framework of GAN Inversion

A general-purpose programming language, focused on simplicity, safety and stability.

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

Brain tumor detection using Convolution-Neural Network (CNN)

Deep Learning GPU Training System

CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery

Exe-to-xlsm - Simple script to create VBscript of exe and inject to xlsm

Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

For medical image segmentation