Audio Visual Emotion Recognition using TDA

Overview

Audio Visual Emotion Recognition using TDA

RAVDESS database with two datasets analyzed: Video and Audio dataset:

Audio-Dataset: https://www.kaggle.com/uwrfkaggler/ravdess-emotional-speech-audio

Video-Dataset: https://zenodo.org/record/1188976#.X7yio2hKjIU

The Final Master project PDF document is available here.

Folder Video_Dataset:

Dataset used is available in this url https://zenodo.org/record/1188976#.X7yio2hKjIU The algorithm works in this order:

  1. delaunay_construction.m: The first step of the algorithm in order to build the Delaunay triangulation in every video associated from dataset, remind that we have videos of 24 people and for each person 60 videos associated to 8 emotions. The first step is to defines the pathdata where it is the dataset address, that it is in format csv with the landmark point of the face. The coordinate of point X is is between position 2:297 and Y from 138:416 return the Delaunay_base, the struct that we will use in the code.

  2. complex_filtration.m: After get the delaunay_construction, we apply complex_filtration(Delaunay). The input is the Delaunay triangulation, in this code we built the complexes using the triangulation, taking the edges which form the squares and used them to form the square in every frame. We are working with 9 frames and this function calls the filtration function. Then, this function the return the complex asociated to each video, and the index position where each 3-cell is formed in the complex

2.1. filtrations.m This function obtains 8 border simplicial complexes filtered, from 4 view directions, 2 by each direction.We applied a set of function in order to get the different complex, as you can see the funcion return Complex X in the direction of axis X, Complex X in direction of Y, Complex XY, Complex YX in diagonal direction and the same complex with the order inverted.

2.2. complex_wtsquare.m In this function we are going to split the complexes which form every cell to see the features which born and died in the same square on the complex.

  1. WORKFLOW.m One time that we have the complexes build, we are going to apply the Incremental Algorithm (Persistence_new) used in this thesis, the Incremental algorithm was implemented in C++ using differente topology libraries which offer this language. Then we get the barcode or persistence diagram associated to each filter complex obtained at begining. In this function we apply also the function (per_entropy) to summarise the information from the persistence diagram

Load each complex and its index and apply:

3.1 complex2matrix.py: converts the complex obtained for the ATR model applied in matricial way as we explained on the thesis(page 50).

3.2 Persistence_new: ATR model defined in C++ to calculate the persisten homology and get the barcode or persistence diagrams associated with each filtration of the complex. The psuedo-code of the algorithm you will find on the thesis.

3.3 create_matrix.m: Built the different matrix based on persistence value to classify.

  1. experiment: the first experiment done based on the entropy values of video, but it sets each filtration compex that we get, then for that we worked with vector of eight elements associated to each filtration. Later this matrix is splitted in training and test set in order to use APP Classificator from Matlab and gets the accuracy.

  2. experiment3: Experiment that construct the matrix with the information of each persisten value associate with one filtration of the complex calculated. Later this matrix is splitted in training and test set in order to use APP Classificator from Matlab and gets the accuracy.

  3. feature24_vector.m: experiment done considering a vector of 24 features for each person. in this experiment we dont get good results.

Folder Audio Dataset:

In this url yo can finde the Audio-Dataset used for this implementation, the formal of the files are in .wav: https://www.kaggle.com/uwrfkaggler/ravdess-emotional-speech-audio

Experiment 1

  1. work_flow.py focuses on the first experiment, load data that will be used in the script, and initialize the dataframe to fill.

1.1 test.py using function emotions to get the embedder and duration in seconds of each audio signal. Read the audio and create the time array (timeline), resample the signal, get the optimal parameter delay, apply the emmbedding algorithm

1.2 get_parameters.py function to get the optimal parameter for taken embedding, which contains datDelayInformation for mutual information, false_nearest_neighours for embedding dimension.

1.3 TakensEmbedding: This function returns the Takens embedding of data with a delay into a dimension

1.4 per_entropy.py: Computes the persistence entropy of a set of intervals according to the diagrama obtained.

1.5 get_diagramas.py used to apply Vietoris-Rips filter and get the persisten_entropy values.

  1. machine_learning.py is used to define classification techniques in the set of entropy values. Create training and test splits. Import the KNeighborsClassifier from library. The parameter K is to plot in graph with corresponding error rate for dataset and calculate the mean of error for all the predicted values where K ranges from 1 to 40.

Experment 2

  1. Work_flow2.py: Second experiment, using function emotions_second to obtain the resampled signal, get_diag2 from test.py to calculates the Vietoris-Rips filter.

  2. machine_learning_second: To construct a distance matrix of persistence diagrams (Bottleneck distance). Upload the csv prueba5.csv that contains the label of the emotion associated to each rows of the matrix. Create the fake data matrix: just the indices of the timeseries. Import the KNeighborsClassifier from library. For evaluating the algorithm, confusion matrix, precision, recall and f1 score are the most commonly used. Testing different classifier to see what is the best one. GaussianNB; DecisionTreeClassifier, knn and SVC.

4.1 my_dist: To get the distance bottleneck between diagrams, function that we use to built the matrix of distance, that will be the input of the KNN algorithm.

Classification folder

In this folder, the persistent entropy matrixes and classification experiments using neural networks for video-only and audiovideo datasets are provided.

Owner
Combinatorial Image Analysis research group
Combinatorial Image Analysis research group
Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

SFace Code of TIP2021 Paper 《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet, PyTorch and Jittor versi

Zhong Yaoyao 47 Nov 25, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 01, 2023
Empowering journalists and whistleblowers

Onymochat Empowering journalists and whistleblowers Onymochat is an end-to-end encrypted, decentralized, anonymous chat application. You can also host

Samrat Dutta 19 Sep 02, 2022
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
A library for implementing Decentralized Graph Neural Network algorithms.

decentralized-gnn A package for implementing and simulating decentralized Graph Neural Network algorithms for classification of peer-to-peer nodes. De

Multimedia Knowledge and Social Analytics Lab 5 Nov 07, 2022
Intrusion Detection System using ensemble learning (machine learning)

IDS-ML implementation of an intrusion detection system using ensemble machine learning methods Data set This project is carried out using the UNSW-15

4 Nov 25, 2022
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023
Fully Convolutional DenseNet (A.K.A 100 layer tiramisu) for semantic segmentation of images implemented in TensorFlow.

FC-DenseNet-Tensorflow This is a re-implementation of the 100 layer tiramisu, technically a fully convolutional DenseNet, in TensorFlow (Tiramisu). Th

Hasnain Raza 121 Oct 12, 2022
PyTorch code of paper "LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering"

LiVLR-VideoQA We propose a Lightweight Visual-Linguistic Reasoning framework (LiVLR) for VideoQA. The overview of LiVLR: Evaluation on MSRVTT-QA Datas

JJ Jiang 7 Dec 30, 2022
The 3rd place solution for competition

The 3rd place solution for competition "Lyft Motion Prediction for Autonomous Vehicles" at Kaggle Team behind this solution: Artsiom Sanakoyeu [Homepa

Artsiom 104 Nov 22, 2022
Code for NeurIPS 2020 article "Contrastive learning of global and local features for medical image segmentation with limited annotations"

Contrastive learning of global and local features for medical image segmentation with limited annotations The code is for the article "Contrastive lea

Krishna Chaitanya 152 Dec 22, 2022
Cross-platform-profile-pic-changer - Script to change profile pictures across multiple platforms

cross-platform-profile-pic-changer script to change profile pictures across mult

4 Jan 17, 2022
An open-source project for applying deep learning to medical scenarios

Auto Vaidya An open source solution for creating end-end web app for employing the power of deep learning in various clinical scenarios like implant d

Smaranjit Ghose 18 May 29, 2022
PyTorch implementation of UNet++ (Nested U-Net).

PyTorch implementation of UNet++ (Nested U-Net) This repository contains code for a image segmentation model based on UNet++: A Nested U-Net Architect

4ui_iurz1 642 Jan 04, 2023
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Phillip Lippe 1.1k Jan 07, 2023
EZ graph is an easy to use AI solution that allows you to make and train your neural networks without a single line of code.

EZ-Graph EZ Graph is a GUI that allows users to make and train neural networks without writing a single line of code. Requirements python 3 pandas num

1 Jul 03, 2022
Official implementation of Rethinking Graph Neural Architecture Search from Message-passing (CVPR2021)

Rethinking Graph Neural Architecture Search from Message-passing Intro The GNAS can automatically learn better architecture with the optimal depth of

Shaofei Cai 48 Sep 30, 2022
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

TianxiangMa 46 Dec 26, 2022
Mini Software that give reminder to drink water as per your weight.

Water Notification Desktop Python The Mini Software built in Python (tkinter) that will remind you to drink water on specific time span based on your

Om Jogani 5 Dec 16, 2022