Unimodal Face Classification with Multimodal Training

This is a PyTorch implementation of the following paper:

Unimodal Face Classification with Multimodal Training

Wenbin Teng (Boston University), Chongyang Bai (Dartmouth College)

Abstract: We propose a Multimodal Training Unimodal Test (MTUT) framework for robust face classification, which exploits the cross-modality relationship during training and applies it as a complementary of the imperfect single modality input during testing. Technically, during training, the framework (1) builds both intra-modality and cross-modality autoencoders with the aid of facial attributes to learn latent embeddings as multimodal descriptors, (2) proposes a novel multimodal embedding divergence loss to align the heterogeneous features from different modalities, which also adaptively avoids the useless modality (if any) from confusing the model. This way, the learned autoencoders can generate robust embeddings in single-modality face classification on test stage. We evaluate our framework in two face classification datasets and two kinds of testing input: (1) poor-condition image and (2) point cloud or 3D face mesh, when both 2D and 3D modalities are available for training.

The proposed method applies both 2D and 3D encoder to extract the embeddings of each individual modalities. Divergence between both embeddings is minimized adaptively through measuring the classification loss. Based on the type of testing modality, we use certain decoder to reconstruct 2D and 3D inputs from feature embeddings. An overview of the proposed network is shown in the following picture:

Unimodal Face Classification with Multimodal Training

Related tags

Overview

Unimodal Face Classification with Multimodal Training

Owner

Wenbin Teng

Automatic tool focused on deriving metallicities of open clusters

Spatiotemporal resampling methods for mlr3

MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models

Python Blood Vessel Topology Analysis

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

Rendering color and depth images for ShapeNet models.

Official PyTorch implementation of MAAD: A Model and Dataset for Attended Awareness

Official Pytorch implementation for video neural representation (NeRV)

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

This repo is customed for VisDrone.

Norm-based Analysis of Transformer

Python framework for Stochastic Differential Equations modeling

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth, in ICCV 2021 (oral)

This repo contains research materials released by members of the Google Brain team in Tokyo.

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.