2021-IEEE TCYB-DCHN

Peng Hu, Xi Peng, Hongyuan Zhu, Jie Lin, Liangli Zhen, Dezhong Peng, Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J]. IEEE Transactions on Cybernetics, vol. 51, no. 10, pp. 4982-4993, Oct. 2021. (PyTorch Code)

Abstract

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.

Framework

Figure 1. Framework of the proposed DCHN method. g is the output of the corresponding view (i.e., image, text, video, etc.). o is the semantic hash code that is computed by the corresponding label y and semantic hashing transformation W. W is computed by the proposed semantic hashing autoencoder module (SHAM). sgn is an elementwise sign function. ℒ_R and ℒ_H are hash reconstruction and semantic hashing functions, respectively. In the training stage, first, W is used to recast the label y as a ground-truth hash code o. Then, the obtained hash code is used to guide view-specific networks with a semantic hashing reconstruction regularizer. Such a learning scheme makes the v view-specific neural networks (one network for each view) can be trained separately since they are decoupled and do not share any trainable parameters. Therefore, our DCHN can be easy to scale to a large number of views. In the inference stage, each trained view-specific network f_k(x^k, Θ_k) is used to compute the hash code of the sample x^k.

Figure 1. Proposed SHAM utilizes the semantic information (e.g., labels or classes) to learn an encoder W and a decoder W^T by mutually converting the semantic and Hamming spaces. SHAM is one key component of our independent hashing paradigm.

Usage

First, to train SHAM wtih 64 bits on MIRFLICKR-25K, just run trainSHAM.py as follows:

python trainSHAM.py --datasets mirflickr25k --output_shape 64 --gama 1 --available_num 100

Then, to train a model for image modality wtih 64 bits on MIRFLICKR-25K, just run main_DCHN.py as follows:

python main_DCHN.py --mode train --epochs 100 --view 0 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 0

For text modality:

python main_DCHN.py --mode train --epochs 100 --view 1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 1

To evaluate the trained models, you could run main_DCHN.py as follows:

python main_DCHN.py --mode eval --view -1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --num_workers 0

Comparison with the State-of-the-Art

Table 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest MAP score is shown in bold.

Method	MIRFLICKR-25K								IAPR TC-12
	Image → Text				Text → Image				Image → Text				Text → Image
	16	32	64	128	16	32	64	128	16	32	64	128	16	32	64	128
Baseline	0.581	0.520	0.553	0.573	0.578	0.544	0.556	0.579	0.329	0.292	0.309	0.298	0.332	0.295	0.311	0.304
SePH [21]	0.729	0.738	0.744	0.750	0.753	0.762	0.764	0.769	0.467	0.476	0.486	0.493	0.463	0.475	0.485	0.492
SePH_lr [12]	0.729	0.746	0.754	0.763	0.760	0.780	0.785	0.793	0.410	0.434	0.448	0.463	0.461	0.495	0.515	0.525
RoPH [34]	0.733	0.744	0.749	0.756	0.757	0.759	0.768	0.771	0.457	0.481	0.493	0.500	0.451	0.478	0.488	0.495
LSRH [22]	0.756	0.780	0.788	0.800	0.772	0.786	0.791	0.802	0.474	0.490	0.512	0.522	0.474	0.492	0.511	0.526
KDLFH [23]	0.734	0.755	0.770	0.771	0.764	0.780	0.794	0.797	0.306	0.314	0.351	0.357	0.307	0.315	0.350	0.356
DLFH [23]	0.721	0.743	0.760	0.767	0.761	0.788	0.805	0.810	0.306	0.314	0.326	0.340	0.305	0.315	0.333	0.353
MTFH [13]	0.581	0.571	0.645	0.543	0.584	0.556	0.633	0.531	0.303	0.303	0.307	0.300	0.303	0.303	0.308	0.302
DJSRH [14]	0.620	0.630	0.645	0.660	0.620	0.626	0.645	0.649	0.368	0.396	0.419	0.439	0.370	0.400	0.423	0.437
DCMH [9]	0.737	0.754	0.763	0.771	0.753	0.760	0.763	0.770	0.423	0.439	0.456	0.463	0.449	0.464	0.476	0.481
SSAH [20]	0.797	0.809	0.810	0.802	0.782	0.797	0.799	0.790	0.501	0.503	0.496	0.479	0.504	0.530	0.554	0.565
DCHN₀	0.806	0.823	0.836	0.842	0.797	0.808	0.823	0.827	0.487	0.492	0.550	0.573	0.481	0.488	0.543	0.567
DCHN₁₀₀	0.813	0.816	0.823	0.840	0.808	0.803	0.814	0.830	0.533	0.558	0.582	0.596	0.527	0.557	0.582	0.595

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest MAP score is shown in bold.

Method	NUS-WIDE								MS-COCO
	Image → Text				Text → Image				Image → Text				Text → Image
	16	32	64	128	16	32	64	128	16	32	64	128	16	32	64	128
Baseline	0.281	0.337	0.263	0.341	0.299	0.339	0.276	0.346	0.362	0.336	0.332	0.373	0.348	0.341	0.347	0.359
SePH [21]	0.644	0.652	0.661	0.664	0.654	0.662	0.670	0.673	0.586	0.598	0.620	0.628	0.587	0.594	0.618	0.625
SePHlr [12]	0.607	0.624	0.644	0.651	0.630	0.649	0.665	0.672	0.527	0.571	0.592	0.600	0.555	0.596	0.618	0.621
RoPH [34]	0.638	0.656	0.662	0.669	0.645	0.665	0.671	0.677	0.592	0.634	0.649	0.657	0.587	0.628	0.643	0.652
LSRH [22]	0.622	0.650	0.659	0.690	0.600	0.662	0.685	0.692	0.580	0.563	0.561	0.567	0.580	0.611	0.615	0.632
KDLFH [23]	0.323	0.367	0.364	0.403	0.325	0.365	0.368	0.408	0.373	0.403	0.451	0.542	0.370	0.400	0.449	0.542
DLFH [23]	0.316	0.367	0.381	0.404	0.319	0.379	0.386	0.415	0.352	0.398	0.455	0.443	0.359	0.393	0.456	0.442
MTFH [13]	0.265	0.473	0.434	0.445	0.243	0.418	0.414	0.485	0.288	0.264	0.311	0.413	0.301	0.284	0.310	0.406
DJSRH [14]	0.433	0.453	0.467	0.442	0.457	0.468	0.468	0.501	0.478	0.520	0.544	0.566	0.462	0.525	0.550	0.567
DCMH [9]	0.569	0.595	0.612	0.621	0.548	0.573	0.585	0.592	0.548	0.575	0.607	0.625	0.568	0.595	0.643	0.664
SSAH [20]	0.636	0.636	0.637	0.510	0.653	0.676	0.683	0.682	0.550	0.577	0.576	0.581	0.552	0.578	0.578	0.669
DCHN₀	0.648	0.660	0.669	0.683	0.662	0.677	0.685	0.697	0.602	0.658	0.682	0.706	0.591	0.652	0.669	0.696
DCHN₁₀₀	0.654	0.671	0.681	0.691	0.668	0.683	0.697	0.707	0.662	0.701	0.703	0.720	0.650	0.689	0.693	0.714

Citation

If you find DCHN useful in your research, please consider citing:

@article{hu2021joint,
  author={Hu, Peng and Peng, Xi and Zhu, Hongyuan and Lin, Jie and Zhen, Liangli and Peng, Dezhong},
  journal={IEEE Transactions on Cybernetics}, 
  title={Joint Versus Independent Multiview Hashing for Cross-View Retrieval}, 
  year={2021},
  volume={51},
  number={10},
  pages={4982-4993},
  doi={10.1109/TCYB.2020.3027614}}
}

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Related tags

Overview

2021-IEEE TCYB-DCHN

Abstract

Framework

Figure 1. Proposed SHAM utilizes the semantic information (e.g., labels or classes) to learn an encoder W and a decoder W^T by mutually converting the semantic and Hamming spaces. SHAM is one key component of our independent hashing paradigm.

Usage

Comparison with the State-of-the-Art

Table 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest MAP score is shown in bold.

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest MAP score is shown in bold.

Citation

Owner

The official codes for the ICCV2021 presentation "Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting"

Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and shape estimation at the university of Lincoln

Prometheus Exporter for data scraped from datenplattform.darmstadt.de

The UI as a mobile display for OP25

Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Implementation of FitVid video prediction model in JAX/Flax.

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

FOSS Digital Asset Distribution Platform built on Frappe.

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

A Simulation Environment to train Robots in Large Realistic Interactive Scenes

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Cereal box identification in store shelves using computer vision and a single train image per model.

This package contains a PyTorch Implementation of IB-GAN of the submitted paper in AAAI 2021

LieTransformer: Equivariant Self-Attention for Lie Groups

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Related tags

Overview

2021-IEEE TCYB-DCHN

Abstract

Framework

Figure 1. Proposed SHAM utilizes the semantic information (e.g., labels or classes) to learn an encoder W and a decoder WT by mutually converting the semantic and Hamming spaces. SHAM is one key component of our independent hashing paradigm.

Usage

Comparison with the State-of-the-Art

Table 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest MAP score is shown in bold.

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest MAP score is shown in bold.

Citation

Owner

The official codes for the ICCV2021 presentation "Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting"

Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and shape estimation at the university of Lincoln

Prometheus Exporter for data scraped from datenplattform.darmstadt.de

The UI as a mobile display for OP25

Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Implementation of FitVid video prediction model in JAX/Flax.

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

FOSS Digital Asset Distribution Platform built on Frappe.

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

A Simulation Environment to train Robots in Large Realistic Interactive Scenes

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Cereal box identification in store shelves using computer vision and a single train image per model.

This package contains a PyTorch Implementation of IB-GAN of the submitted paper in AAAI 2021

LieTransformer: Equivariant Self-Attention for Lie Groups

Figure 1. Proposed SHAM utilizes the semantic information (e.g., labels or classes) to learn an encoder W and a decoder W^T by mutually converting the semantic and Hamming spaces. SHAM is one key component of our independent hashing paradigm.