FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Last update: Sep 06, 2022

Overview

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.

Authors

Andrew Wang, University of Cambridge, Cambridge, UK Pierre Houdouin, CentraleSupélec, Paris, France

Instllation

pip install -i https://test.pypi.org/simple/ femda

Get started

>>> from sklearn.datasets import load_iris
>>> from femda import FEMDA
>>> X, y = load_iris(return_X_y=True)
>>> clf = FEMDA()
>>> clf.fit(X, y)
FEMDA()
>>> clf.score(X, y)
0.9666666666666667

Using a specific dataset...

>> FEMDA().fit(X_train, y_train).score(X_test, y_test) ...">

>>> import femda.experiments.preprocessing as pre
>>> X_train, y_train, X_test, y_test = pre.statlog(r"root\datasets\\")
>>> FEMDA().fit(X_train, y_train).score(X_test, y_test)
...

Using a sklearn.pipeline.Pipeline...

>>> from sklearn.datasets import load_digits
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.decomposition import PCA
>>> X, y = load_digits(return_X_y=True)
>>> pipe = make_pipeline(PCA(n_components=5), FEMDA()).fit(X, y)
>>> pipe.predict(X)
...

Run all experiments presented in the paper

>>> from femda.experiments import run_experiments()
>>> run_experiments()
...

See for more.

Abstract

Linear and Quadraic Discriminant Analysis are well-known classical methods but suffer heavily from non-Gaussian class distributions and are very non-robust in contaminated datasets. In this paper, we present a new discriminant analysis style classification algorithm that directly models noise and diverse shapes which can deal with a wide range of datasets.

Each data point is modelled by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter, modelling directly very heterogeneous, non-i.i.d datasets. We show that maximum-likelihood parameter estimation and classification are simple and fast under this model.

We highlight the flexibility of the model to a wide range of Elliptically Symmetrical distribution shapes and varying levels of contamination in synthetic datasets. Then, we show that our algorithm outperforms other robust methods on contaminated datasets from Computer Vision and NLP.

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Related tags

Overview

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Authors

Instllation

Get started

Run all experiments presented in the paper

Abstract

Owner

RealTime Emotion Recognizer for Machine Learning Study Jam's demo

Tool for working with Y-chromosome data from YFull and FTDNA

Rocket-recycling with Reinforcement Learning

PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks"

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning

Chinese named entity recognization with BiLSTM using Keras

details on efforts to dump the Watermelon Games Paprium cart

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator

Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

Implementation of Memformer, a Memory-augmented Transformer, in Pytorch

Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling

Latex code for making neural networks diagrams

R3Det based on mmdet 2.19.0

Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]

Complex Answer Generation For Conversational Search Systems.

A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers