COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Last update: Jul 31, 2022

Related tags

Overview

COPA-SSE

Repository for COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning.

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset, a variant of the Choice of Plausible Alternatives (COPA) benchmark. The explanations are formatted as a set of triple-like common sense statements with ConceptNet relations but freely written concepts.

Data format

dev-explained.jsonl and test-explained.jsonl each contain Balanced COPA samples with added explanations in .jsonl format. The question ids match the original questions of the development and test set, respectively.

Each entry contains:

the original question (matching format and ids)
human-explanations: a list of explanations each containing:
- expl-id: the explanation id
- text: the explanation in plain text (full sentences)
- worker-id: anonymized worker id (the author of the explanation)
- worker-avg: the average score the author got for their explanations
- all-ratings: all collected ratings for the explanation
- filtered-ratings: ratings excluding those that failed the control
- triples: the triple-form explanation (a list of ConceptNet-like triples)

Example entry:

id: 1, 
asks-for: cause, 
most-plausible-alternative: 1,
p: "My body cast a shadow over the grass.", 
a1: "The sun was rising.", 
a2: "The grass was cut.", 
human-explanations: [
    {expl-id: f4d9b407-681b-4340-9be1-ac044f1c2230, 
     text: "Sunrise causes casted shadows.", 
     worker-id: 3a71407b-9431-49f9-b3ca-1641f7c05f3b, 
     worker-avg: 3.5832864694635025, 
     all-ratings: [1, 3, 3, 4, 3], 
     filtered-ratings: [3, 3, 4, 3], 
     filtered-avg-rating: 3.25, 
     triples: [["sunrise", "Causes", "casted shadows"]]
     }, ...]

Aggregated versions

graphs.pkl contains aggregated versions of the triples for each question in a dictionary format with COPA question ids as the key.

Each entry contains a list of edges, each being a tuple of (u, v, {'rel': relation, 'weight': weight}). Similar nodes were connected or merged with relatedto, depending on the cosine similarity between their SentenceTransformer embeddings. The weight is the average score of the explanation the edge originated from (summed if multiple), or 1.0 if the edge was automatically generated.

Note: not all graphs are (weakly) connected.

Example entry:

1: [('sunrise', 'casted_shadows', {'rel': 'causes', 'weight': 3.25}),
  ('sunrise', 'sun', {'rel': 'relatedto', 'weight': 1.0}),
  ('casted_shadows', 'the_shadow', {'rel': 'relatedto', 'weight': 1.0}),
  ('sun_rising', 'bringing_light', {'rel': 'hasproperty', 'weight': 4.25}),
  ('sun_rising', 'a_sun_raising', {'rel': 'relatedto', 'weight': 1.0}),
 ...
]

Citation

Thank you for your interest in our dataset! If you use it in your research, please cite:

@misc{brassard2022copasse,
    title={COPA-SSE: Semi-structured Explanations for Commonsense Reasoning},
    author={Ana Brassard and Benjamin Heinzerling and Pride Kavumba and Kentaro Inui},
    year={2022},
    eprint={2201.06777},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Related tags

Overview

COPA-SSE

Data format

Example entry:

Aggregated versions

Example entry:

Citation

Owner

Ana Brassard

IA for recognising Traffic Signs using Keras [Tensorflow]

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Motion planning environment for Sampling-based Planners

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

A NSFW content filter.

PyTorch version implementation of DORN

Pytorch Lightning Distributed Accelerators using Ray

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Pytorch implementation for the Temporal and Object Quantification Networks (TOQ-Nets).

PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

SOLOv2 on onnx & tensorRT

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Repository to run object detection on a model trained on an autonomous driving dataset.

General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

A Unified Framework and Analysis for Structured Knowledge Grounding

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.