WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Last update: Dec 17, 2022

Overview

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/2110.02623.pdf

Project is built on top of the [CVSE] (https://github.com/BruceW91/CVSE) in PyTorch. However, it is easy to adapt to different Image-Text Matching models (SCAN, VSRN, SGRAF). Regarding the proposed metric code and evaluation, please visit: https://github.com/furkanbiten/ncs_metric.

Introduction

The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at https://github.com/furkanbiten/ncs_metric and model https://github.com/andrespmd/semantic_adaptive_margin

Install Environment

Git clone the project.

Create Conda environment:

$ conda env create -f env.yml

Activate the environment:

$ conda activate pytorch12

Download Metric Data

Please download the following compressed file from:

Uncompress the downloaded file under the main project folder. The uncompressed folder name should be "cider".

WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Related tags

Overview

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Introduction

Install Environment

Download Metric Data

Owner

Andres

A selectional auto-encoder approach for document image binarization

color detection using python

Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

Captcha Recognition

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Ddddocr - 通用验证码识别OCR pypi版

Motion Detection Squid Game with OpenCV Python

This repo contains several opencv projects done while learning opencv in python.

A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

Machine Leaning applied to denoise images to improve OCR Accuracy

Links to awesome OCR projects

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Document blur detection based on Laplacian operator and text detection.

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

make a better chinese character recognition OCR than tesseract

Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Deep LearningImage Captcha 2

Python-based tools for document analysis and OCR