The SVO-Probes Dataset for Verb Understanding

This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object understanding in image--language models. This benchmark provides two positive and negative images for a given sentence. The negative image differs from the positive one with respect to either subject, verb, or object. Given a sentence, we test if a model can correctly classify both positive and negative images.

For a detailed description of our benchmark, please see the paper Probing Image–Language Transformers for Verb Understanding. Please cite this paper if you use the SVO-Probes benchmark in your work.

Files

svo_probes.csv: our raw data. Each row in the dataset consists of two <sentence,positive-image> and <sentence,negative-image> pairs. Each image is identified by a url and a unique id: pos_image_id (pos_url) or neg_image_id (neg_url) to mark the positive and negative images, respectively. Each image is also associated with subject-verb-object triplets (pos_triplet or neg_triplet) that can be seen in the image. The subj_neg, verb_neg, obj_neg columns specify the type of the negative: for example, subj_neg is True if the negative example is a subject negative.
image_urls.txt: a list of image urls used in our benchmark.
A Colab to analyze pre-trained models on SVO-Probes.

Disclaimer

This is not an official Google product. The SVO-Probes benchmark is created solely for research purposes and is not intended to be used in products. The images in our benchmark are retrieved from the Google Image Search; we expect our images to reflect distributional properties and biases similar to those returned by the Google Image Search API. Furthermore, our dataset is designed to have a similar vocabulary to the Conceptual Captions dataset so we expect our <Subject, Verb, Object> triplets to reflect biases in the Conceptual Captions.

License

The data is made available under the terms of the Creative Commons Attribution 4.0 International Public License (CC BY 4.0). You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode")

If you have concerns or comments about the benchmark, please contact [email protected] and [email protected].

The SVO-Probes Dataset for Verb Understanding

Related tags

Overview

The SVO-Probes Dataset for Verb Understanding

Files

Disclaimer

License

Owner

DeepMind

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

Unsupervised text tokenizer for Neural Network-based text generation.

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

Chinese Pre-Trained Language Models (CPM-LM) Version-I

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

The SVO-Probes Dataset for Verb Understanding

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Common Voice Dataset explorer

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

FastFormers - highly efficient transformer models for NLU

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

Higher quality textures for the Metal Gear Solid series.

A PyTorch-based model pruning toolkit for pre-trained language models

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

PG-19 Language Modelling Benchmark