Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Last update: Dec 12, 2022

Related tags

Deep Learning AAAI2022-IEEE-for-MMReID

Overview

Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification

We provide the codes for reproducing result of our paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Installation

Basic environments: python3.6, pytorch1.8.0, cuda11.1.
Our codes structure is based on Torchreid. (More details can be found in link: https://github.com/KaiyangZhou/deep-person-reid , you can download the packages according to Torchreid requirements.)

# create environment
cd AAAI2022_IEEE/
conda create --name ieeeReid python=3.6
conda activate ieeeReid

# install dependencies
# make sure `which python` and `which pip` point to the correct path
pip install -r requirements.txt

# install torch and torchvision (select the proper cuda version to suit your machine)
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

# install torchreid (don't need to re-build it if you modify the source code)
python setup.py develop

Get start

You can use the setting in im_r50_softmax_256x128_amsgrad_RGBNT_ieee_part_margin.yaml to get the results of full IEEE.

python ./scripts/mainMultiModal.py --config-file ./configs/im_r50_softmax_256x128_amsgrad_RGBNT_ieee_part_margin.yaml --seed 40

You can run other methods by using following configuration file:

# MLFN
./configs/im_r50_softmax_256x128_amsgrad_RGBNT_mlfn.yaml

# HACNN
./configs/im_r50_softmax_256x128_amsgrad_RGBNT_hacnn.yaml

# OSNet
./configs/im_r50_softmax_256x128_amsgrad_RGBNT_osnet.yaml

# HAMNet
./configs/im_r50_softmax_256x128_amsgrad_RGBNT_hamnet.yaml

# PFNet
./configs/im_r50_softmax_256x128_amsgrad_RGBNT_hamnet.yaml

# full IEEE
./configs/im_r50_softmax_256x128_amsgrad_RGBNT_ieee_part_margin.yaml

Details

The details of our Cross-modal Interacting Module (CIM) and Relation-based Embedding Module (REM) can be found in .\torchreid\models\ieee3modalPart.py. The design of Multi-modal Margin Loss(3M loss) can be found in .\torchreid\losses\multi_modal_margin_loss_new.py.

Ablation study settings.

You can control these two modules and the loss by change the corresponding codes.

Cross-modal Interacting Module (CIM) and Relation-based Embedding Module (REM)

# change the code in .\torchreid\models\ieee3modalPart.py

class IEEE3modalPart(nn.Module):
    def __init__(···
    ):
        modal_number = 3
        fc_dims = [128]
        pooling_dims = 768
        super(IEEE3modalPart, self).__init__()
        self.loss = loss
        self.parts = 6
        
        self.backbone = nn.ModuleList(···
        )
		
		  # using Cross-modal Interacting Module (CIM)
        self.interaction = True
        # using channel attention in CIM
        self.attention = True
        
        # using Relation-based Embedding Module (REM)
        self.using_REM = True
        
        ···

Multi-modal Margin Loss(3M loss)

# change the code in .\configs\your_config_file.yaml

# using Multi-modal Margin Loss(3M loss), you can change the margin by modify the parameter of "ieee_margin".
···
loss:
  name: 'margin'
  softmax:
    label_smooth: True
  ieee_margin: 1
  weight_m: 1.0
  weight_x: 1.0
···

# using only CE loss
···
loss:
  name: 'softmax'
  softmax:
    label_smooth: True
  weight_x: 1.0
···

Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Related tags

Overview

Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification

Installation

Get start

Details

Owner

Implementation of FitVid video prediction model in JAX/Flax.

Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Compare GAN code.

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

A2LP for short, ECCV2020 spotlight, Investigating SSL principles for UDA problems

FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

chainladder - Property and Casualty Loss Reserving in Python

An Open-Source Toolkit for Prompt-Learning.

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Multilingual Image Captioning

Hierarchical Few-Shot Generative Models

A self-supervised 3D representation learning framework named viewpoint bottleneck.

GitHub repository for the ICLR Computational Geometry & Topology Challenge 2021

[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

Learning from Synthetic Shadows for Shadow Detection and Removal [Inoue+, IEEE TCSVT 2020].

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack