PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Last update: Dec 23, 2022

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

PyTorch code for M²HSE. The local-level subenetwork of our M²HSE is built on top of the VSESC.

Xinlei Pei, Zheng Liu, Shaojing Yuan, Shanshan Gao, Huijian Han and Caiming Zhang. "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Introduction

We give a demo code of the Corel 5K dataset, including the details of training process for the global-level subnetwork and the local-level subnetwork.

Requirements

We recommended the following dependencies.

Python 3.6
PyTorch (1.3.1)
NumPy (1.19.2)
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

The raw images and the corrsponding texts can be downloaded from here. Note that we performed data cleaning on this dataset and the specific operations are described in the paper.

Besides, 1) for extracting the fine-grained visual features, the raw images are divided uniformly into 3*3 blocks. 2) we adopt the AlexNet, pre-trained on ImageNet, to extract the CNN features. 3) We upload text data in the ./data/coarse-grained-data/ and ./data/fine-grained-data . Therefore, for data preparation you have the following two options :

Download the above raw data and extract the corresponding features according to the strategy we introduced in the paper.
Contact us for relevant data. (Email: [email protected])

Training models

For training the global-level subnetwork:

Run train_global.py:

python train_global.py 
    --data_path ./data/coarse-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Global/Corel5K 
    --model_name ./checkpoint/M2HSE/Global/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --alpha_1 .8 
    --alpha_2 .8

For training the local-level subnetwork:

Run train_local.py:

python train_local.py 
    --data_path ./data/fine-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Local/Corel5K 
    --model_name ./checkpoint/M2HSE/Local/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --beta_1 .4 
    --beta_2 .4

Reference

Stay tuned. :)

License

Apache License 2.0

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

(AAAI2022) Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

An Intelligent Self-driving Truck System For Highway Transportation

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

Self-Supervised Speech Pre-training and Representation Learning Toolkit.

Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

Instance Semantic Segmentation List

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

A more easy-to-use implementation of KPConv based on PyTorch.

Invert and perturb GAN images for test-time ensembling

Flexible-Modal Face Anti-Spoofing: A Benchmark

La source de mon module 'pyfade' disponible sur Pypi.

Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

(AAAI2022) Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

An Intelligent Self-driving Truck System For Highway Transportation

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

Self-Supervised Speech Pre-training and Representation Learning Toolkit.

Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

Instance Semantic Segmentation List

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

A more easy-to-use implementation of KPConv based on PyTorch.

Invert and perturb GAN images for test-time ensembling

Flexible-Modal Face Anti-Spoofing: A Benchmark

La source de mon module 'pyfade' disponible sur Pypi.

Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)