Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Paper has been accepted by Neurocomputing 415(2020): 106–113.

Authors: Yuang Liu, Wei Zhang and Jun Wang.

Links: [ pdf ] [ code ]

Requirements

PyTorch >= 1.0.0
Jupyter
visdom

Introduction

Knowledge distillation (KD) is an effective learning paradigm for improving the performance of light-weight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework (AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets (high-level knowledge) and (ii) enabling the intermediate-level hints (intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors.

Citation

@article{LIU2020106,
    title = {Adaptive multi-teacher multi-level knowledge distillation},
    author = {Yuang Liu and Wei Zhang and Jun Wang},
    journal = {Neurocomputing},
    volume = {415},
    pages = {106 -- 113},
    year = {2020},
    issn = {0925 -- 2312},
}

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Related tags

Overview

Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Requirements

Introduction

Citation

Owner

Frank Liu

Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

Curated list of awesome GAN applications and demo

My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

Byzantine-robust decentralized learning via self-centered clipping

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

A framework for attentive explainable deep learning on tabular data

Continual Learning of Electronic Health Records (EHR).

The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

Auditing Black-Box Prediction Models for Data Minimization Compliance

Codes for the ICCV'21 paper "FREE: Feature Refinement for Generalized Zero-Shot Learning"

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

Dense Prediction Transformers

Baseline powergrid model for NY

Implementation of the paper "Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning"

Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

Deep Q-network learning to play flappybird.

Implementation of the pix2pix model on satellite images

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)