Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Last update: Dec 12, 2022

Overview

Visual Transformer for Facial Emotion Recognition (FER)

This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recognition (FER) task. Project is interally on Python Notebook, hosted on Google Colab with a runtime environment given by NVIDIA P100 setup.

Dataset

Dataset is formed by 8 different classes integrated by 3 different subsets:

FER-2013: It contains approximately 35,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number of images – 600, while other labels have nearly 5,000 samples each.
CK+: The Extended Cohn-Kanade (CK+) dataset contains some images extrapolated from 593 video sequences from a total of 123 different subjects, ranging from 18 to 50 years of age with a variety of genders and heritage. Each video shows a facial shift from the neutral expression to a targeted peak expression, recorded at 30 frames per second (FPS) with a resolution of either 640x490 or 640x480 pixels. Unfortunately, we don't have the entire generated datasets but we stored only 1000 images with high variance from a kaggle repository.
AffectNet: It is a large facial expression dataset with 41.000 images classified in eight categories (neutral, happy, angry, sad, fear, surprise, disgust, contempt) of facial expressions along with the intensity of valence and arousal.

Data loading, integration and analysis are in the first part of the ViT-Emotion-Recognition.ipynb notebook. The result dataset is an integration divided by two subset (train an val folder) with 8 subfolder with the scope of the class label.

Data Management

Given an eterogeneous dataset on a fine-tuned transformer, we had to manage some image features:

Data Scaling: Pre-trained models are transformers with different configurations that train them on ImageNet dataset for the object detection with images on 224x224. We use the same scale and convert input data to this size.
Data Channels: We use RGB channels for each images for the same reason of the previous point.
Data Augmentation: We use brightness, rotation, scaling, translation and zooming augmentation to improve the amount of the samples and balance the dataset classes variation.

Model

Overview of the model: The input image is split into fixed-sized patches; the embedding phase is preceded by a convolutional layer with a kernel 16x16 with a stride of 16x16. The output of the convolution is then used for the embedding phase where the resulting vector is given by the sum of the position embedding and a linear embedding in a projection space of 768 dimensions. The embedded patches are then processed by a set of 11 sequential Transformer Encoders. For the classification task, the final layer is a linear layer with a 8 dimensional output for our eight emotions. The model we rely on is pretrained on ImageNet and finetuned with the datased described above.

Source: https://github.com/google-research/vision_transformer

Authors

Andrea Gurioli (@andreagurioli1995)
Mario Sessa (@kode-git)

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code for the paper accepted to CVPR 2021.

280 Dec 23, 2022

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

12 Nov 9, 2022

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

LMS Attendance Marker Automatic script for lazy people to mark attendance on LMS for Practice School 1. Setup Add your LMS credentials and time slot t

3 Jun 12, 2021

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

1 Feb 7, 2022

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 7, 2023

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 2, 2023

Comments

Pre-processing phase removes some images
After the Data Analysis on the AVFER, data from the splitting phase is different after the pre-processing, we need to check

Check the removing of png can influence the number

Control if there are some changes after the reshaping

Be care about the possible miss-indentation of the os.remove(fl)

I need to run again the data integration and data analysis of the AVFER before test features variation on the pre-processing phase.
bug
opened by kode-git 2

Releases(0.3.12)

0.3.12(May 16, 2022)
Adding presentation and official documentation

Splitting notebook per sections

Adding additional comments to the code

Source code(tar.gz)
Source code(zip)
0.3.11(May 14, 2022)
Adding ViT-B/16/S model on 25 epochs with constant learning rate

Checking on training and validation accuracy/loss parameters according to the training log

Display results on standalone plots

Source code(tar.gz)
Source code(zip)
vfer_small_25.pth(327.37 MB)
vfer_small_25_history_loss.pkl(490 bytes)
vfer_small_25_history_train.pkl(233 bytes)
vfer_small_25_history_val.pkl(233 bytes)
0.3.10(May 13, 2022)
Adding evaluation for ResNet18

Debugging on SAM model evaluation

Improvment Training Plot support curves on N < 5 lines

Model adaptation during loading on evaluation (standalone) with adapting on backbones

Source code(tar.gz)
Source code(zip)
0.3.9(May 12, 2022)
Adding ResNet 18 (11M parameters)

Upload history for loss and accuracy

Upload epoch 20 dump

Upload final model checkpoint

Source code(tar.gz)
Source code(zip)
resnet18_25.pth(42.72 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_25_history_train.pkl(7.05 KB)
resnet18_25_history_val.pkl(7.05 KB)
0.3.8(May 11, 2022)
Adding ViT-B/16/SG

Gradual learning rate every 10 epochs

SGD optimization

Adding loss and accuracy histories

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(490 bytes)
vfer_grad_25_history_train.pkl(233 bytes)
vfer_grad_25_history_val.pkl(233 bytes)
0.3.7(May 11, 2022)
Adding VIT-B/16 model checkpoint using customized learning rate scheduler

Adding SAM to the model as a optimization algorithm to smooth the loss landscape

Adding history for training and validation loss

Adding history for training and validation accuracy

Source code(tar.gz)
Source code(zip)
vfer_sam_25.pth(327.37 MB)
vfer_sam_25_history_loss.pkl(490 bytes)
vfer_sam_25_history_train.pkl(233 bytes)
vfer_sam_25_history_val.pkl(233 bytes)
0.3.6(May 9, 2022)
Configuration of resnet18 with gradual learning rate

Starting learning rate at 0.01

Epochs 50 with plateau at 25

Loading training and validation accuracy histories

Source code(tar.gz)
Source code(zip)
resnet18.pth(44.69 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_history_train.pkl(14.17 KB)
resnet18_history_val.pkl(14.17 KB)
0.3.5(May 9, 2022)
Adding SAM optimization for VIT-B/16

Defining closure for sharpness-aware minimization efficiency

Debugging model loader for the checkpoints recovery

Source code(tar.gz)
Source code(zip)
0.2.5(May 7, 2022)
Upload optimal model on AffectNet

Defines evaluation plots on accuracy and loss values

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(130 bytes)
vfer_grad_25_history_train.pkl(1.48 KB)
vfer_grad_25_history_val.pkl(1.48 KB)
0.2.4(May 6, 2022)
Adding gradual learning rate

Modify dataset with AffectNet in validation and testing set

Adding scheduler for learning rate adjustment

Source code(tar.gz)
Source code(zip)
vfer_grad_50.pth(327.37 MB)
vfer_grad_50_history_train.pkl(2.86 KB)
vfer_grad_50_history_val.pkl(2.86 KB)
0.2.3(Apr 29, 2022)
Extends data analysis for the AffectNet, CK+48 and FER-2013

Creation of AVFER with the following features

Splitting initial dataset in training and testing set with ratio 80/20

Splitting validation and training set with ratio 90/10

Testing and validation set contains only samples from AffectNet (RGB and high quality images)

Drive of AVFER: https://drive.google.com/drive/folders/1-8WG_CNrU3chL_OHpkM8EYx3Bm129cnE?usp=sharing
Source code(tar.gz)
Source code(zip)
0.2.2(Apr 27, 2022)
Adjust train and test splitting

Balancing augmentation over 150.000 samples

Removing augmentation on validation to increment variability

Loading of vfer for 5, 15 and 25 epochs of training on the result dataset

Loading history for training and validation accuracy/loss

Source code(tar.gz)
Source code(zip)
epoch_15_vfer_small_50(327.37 MB)
epoch_15_vfer_small_50.pth(327.37 MB)
epoch_25_vfer_small.pth(327.37 MB)
epoch_25_vfer_small_50(327.37 MB)
epoch_5_vfer_small_50(327.37 MB)
vfer_small_15_on_50_history_loss.pkl(220 bytes)
vfer_small_15_on_50_history_train.pkl(3.00 KB)
vfer_small_15_on_50_history_val.pkl(3.00 KB)
vfer_small_25_on_50_history_loss.pkl(220 bytes)
vfer_small_25_on_50_history_train.pkl(3.00 KB)
vfer_small_25_on_50_history_val.pkl(3.00 KB)
0.2.1(Apr 24, 2022)
Adding integration with partial training during the transformer weights improvements (best-fit)

Updating of the VFER model on 5/50 training epochs with 62% accuracy (state-of-art of AffectNet visual transformer)

Integrating with fluid system for face detection in the cropping phase

Source code(tar.gz)
Source code(zip)
epoch_5_vfer_small_50(327.37 MB)
0.2.0(Apr 22, 2022)
Adjust normalization parameters from [0.48, 0.28] to 0.5

Balancing dataset with not augment element in validation

Resize the training set on double capacity for less epochs on training phase

Adding featuring and inference on video capture tools in OpenCV for models applications

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 18, 2022)
Model dump for batch 50 on 12 epochs for the VFER transformer, accuracy of 69%

Model dump for batch 60 on 24 epochs for the VFER transformer, accuracy of 70%

Model dump for batch 60 on 50 epochs for the VFER transformer, accuracy of 71%

Debugging notebook for the loss evaluation

Adding every section until the evaluation

Integration of the dataset available here

Source code(tar.gz)
Source code(zip)
vfer_base_12.zip(304.26 MB)
vfer_base_24.zip(304.25 MB)
vfer_base_50.zip(608.51 MB)

Owner

Mario Sessa

Computer Scientist for /dev/null. Master Student in Computer Science.

GitHub Repository

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

35 Nov 16, 2022

[ECCV'20] Convolutional Occupancy Networks

622 Dec 30, 2022

This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Influence Selection for Active Learning (ISAL) This project hosts the code for implementing the ISAL algorithm for object detection and image classifi

25 Sep 11, 2022

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

1 Jan 10, 2022

DiSECt: Differentiable Simulator for Robotic Cutting

DiSECt: Differentiable Simulator for Robotic Cutting Website | Paper | Dataset | Video | Blog post DiSECt is a simulator for the cutting of deformable

73 Oct 29, 2022

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN If you use this code for your research, please cite ou

41 Dec 08, 2022

This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

Code Repository for the Paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (To appear in: Proceedings of NeurIPS20

1 Oct 03, 2022

A PyTorch implementation of EfficientDet.

A PyTorch impl of EfficientDet faithful to the original Google impl w/ ported weights

1.4k Jan 07, 2023

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

DivNoising: Diversity Denoising with Fully Convolutional Variational Autoencoders Mangal Prakash1, Alexander Krull1,2, Florian Jug2 1Authors contribut

33 Dec 30, 2022

A Traffic Sign Recognition Project which can help the driver recognise the signs via text as well as audio. Can be used at Night also.

Traffic-Sign-Recognition In this report, we propose a Convolutional Neural Network(CNN) for traffic sign classification that achieves outstanding perf

64 Nov 19, 2022

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds Introduction This is the official PyTorch implementation of o

96 Dec 07, 2022

A curated list of programmatic weak supervision papers and resources

118 Jan 02, 2023

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

OstrichRL This is the repository accompanying the paper OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion. It contain

51 Nov 17, 2022

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows This is the official implementation of the ICCV 2021 Paper "Probabilistic Mono

62 Nov 23, 2022

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

8 Apr 25, 2022

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

face-vid2vid Usage Dataset Preparation cd datasets wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl chmod a+rx youtube-dl python load_

68 Dec 30, 2022

Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

23 Oct 17, 2022

Editing a Conditional Radiance Field

Editing Conditional Radiance Fields Project | Paper | Video | Demo Editing Conditional Radiance Fields Steven Liu, Xiuming Zhang, Zhoutong Zhang, Rich

216 Dec 30, 2022

Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching

Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching This is our attempt of the shared task on Quan

12 Jul 08, 2022

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Related tags

Overview

Visual Transformer for Facial Emotion Recognition (FER)

Dataset

Data Management

Model

Authors

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

A Moonraker plug-in for real-time compensation of frame thermal expansion

Comments

Pre-processing phase removes some images

Releases(0.3.12)

0.3.12(May 16, 2022)

0.3.11(May 14, 2022)

0.3.10(May 13, 2022)

0.3.9(May 12, 2022)

0.3.8(May 11, 2022)

0.3.7(May 11, 2022)

0.3.6(May 9, 2022)

0.3.5(May 9, 2022)

0.2.5(May 7, 2022)

0.2.4(May 6, 2022)

0.2.3(Apr 29, 2022)

0.2.2(Apr 27, 2022)

0.2.1(Apr 24, 2022)

0.2.0(Apr 22, 2022)

0.1.0(Apr 18, 2022)

Owner

Mario Sessa

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

[ECCV'20] Convolutional Occupancy Networks

This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Streamlit App For Product Analysis - Streamlit App For Product Analysis

DiSECt: Differentiable Simulator for Robotic Cutting

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

A PyTorch implementation of EfficientDet.

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

A Traffic Sign Recognition Project which can help the driver recognise the signs via text as well as audio. Can be used at Night also.

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

A curated list of programmatic weak supervision papers and resources

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Editing a Conditional Radiance Field

Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching