African language Speech Recognition - Speech-to-Text

Last update: Jan 05, 2023

Overview

Swahili-Speech-To-Text

Table of Contents

Swahili-Speech-To-Text
- Overview
- Scenario
- Approach
- Project Structure
  - data:
  - models:
  - notebooks:
  - scripts
  - tests:
  - logs:
  - root folder
- Installation guide

Overview

This repository is used for week 4 challenge of 10Academy. The instructions for this project can be found in the challenge document.

Scenario

The World Food Program wants to deploy an intelligent form that collects nutritional information of food bought and sold at markets in two different countries in Africa - Ethiopia and Kenya. The design of this intelligent form requires selected people to install an app on their mobile phone, and whenever they buy food, they use their voice to activate the app to register the list of items they just bought in their own language. The intelligent systems in the app are expected to live to transcribe the speech-to-text and organize the information in an easy-to-process way in a database.

You work for the Tenacious data science consultancy, which is chosen to deliver speech-to-text technology for Swahili. Your responsibility is to build a deep learning model that is capable of transcribing a speech to text. The model you produce should be accurate and is robust against background noise.

Approach

The project is divided and implemented by the following phases

Data pre-processing
Modelling using deep learning
Serving predictions on a web interface
Interpretation & Reporting

Project Structure

The repository has a number of files including python scripts, jupyter notebooks, pdfs and text files. Here is their structure with a brief explanation.

data:

the folder where the dataset csv files are stored

models:

the folder where models' pickle files are stored

notebooks:

EDA.ipynb: a jupyter notebook for exploratory data analysis
Meta-data Generation.ipynb: a jupyter notebook for extracting the metadata from the transription and audio files
Audio preprocessing.ipynb: a jupyter notebook for preprocessing the audio data

scripts

app_logger.py: a python script for logging
file_handler.py: a python script for handling reading and writing of csv, pickle and other files

tests:

the folder containing unit tests for components in the scripts

logs:

the folder containing log files (if it doesn't exist it will be created once logging starts)

root folder

10 Academy Batch 4 - Week 3 Challenge.pdf: the challenge document
requirements.txt: a text file lsiting the projet's dependancies
setup.py: a configuration file for installing the scripts as a package
README.md: Markdown text with a brief explanation of the project and the repository structure.

Installation guide

git clone https://github.com/10-Academy-Batch-4-Week-4/Swahili-Speech-To-Text
cd Swahili-Speech-To-Text
pip install -r requirements.txt

African language Speech Recognition - Speech-to-Text

Related tags

Overview

Swahili-Speech-To-Text

Overview

Scenario

Approach

Project Structure

data:

models:

notebooks:

scripts

tests:

logs:

root folder

Installation guide

Owner

[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

AirLoop: Lifelong Loop Closure Detection

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

PCGNN - Procedural Content Generation with NEAT and Novelty

Extracting knowledge graphs from language models as a diagnostic benchmark of model performance.

Explainability of the Implications of Supervised and Unsupervised Face Image Quality Estimations Through Activation Map Variation Analyses in Face Recognition Models

Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

pytorchのスライス代入操作をonnxに変換する際にScatterNDならないようにするサンプル

Editing a classifier by rewriting its prediction rules

Official PyTorch implementation of the NeurIPS 2021 paper StyleGAN3

Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit

Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

A simple image/video to Desmos graph converter run locally

Generating Videos with Scene Dynamics

Computer vision - fun segmentation experience using classic and deep tools :)

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"