The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Last update: Dec 01, 2022

Related tags

Deep Learning interscript

Overview

Interscript

The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts.

Dataset

data.json contains the data in an easy to read JSON format. data.jsonl contains the data in a JSONL format. The file contains 8466 samples, one sample per line. Every sample is a JSON object with the following fields:

 {
        "input_script": "push chair in -> pull chair in; pull chair in -> push chair against wall; push chair against wall -> straighten chair legs; straighten chair legs -> Push all chairs in; line up the chairs -> push chair in",
        "input_feedback": "One would not pull chair in if they had initially pushed it in.",
        "output_script": "push chair against wall -> straighten chair legs;straighten chair legs -> Push all chairs in;line up the chairs -> push chair in;push chair in -> push chair against wall",
        "metadata": {
            "id": "301KG0KX9BKTC0HB7Z9SV1Y5HAFH2Y.2_implicit.gp",
            "goal": "push all chairs in",
            "is_distractor": false,
            "feedback_type": "implicit.gp",
            "edit": "Remove node 'pull chair in'",
            "input_script_formatted": [
                "1. line up the chairs",
                "2. push chair in",
                "3. pull chair in",
                "4. push chair against wall",
                "5. straighten chair legs",
                "6. Push all chairs in"
            ],
            "output_script_formatted": [
                "1. line up the chairs",
                "2. push chair in",
                "3. push chair against wall",
                "4. straighten chair legs",
                "5. Push all chairs in"
            ]
        }
    }

The description of the fields is as follows:

input_script: Model generated script $y_{bad}$.
input_feedback: User feedback on the input script $f$.
output_script: Fixed output script $y_{good}$.

Metadata contains additional information about the sample. Some important fields are:

id: Unique identifier of the sample.
goal: Goal of the script.
is_distractor: Whether the feedback is a distractor (please see Section 4 for more details).
feedback_type: Type of feedback (please see Section 4 "Annotation" for more details).
edit: The input_feedback presented as an edit operation on the input script, that is, the edit operation that transforms the input script into the output script.
input_script_formatted: The input script presented as a list of sentences.
output_script_formatted: The output script presented as a list of sentences.

Data collection process

We use Amazon Mechanical Turk to collect feedback on erroneous scripts from users.
An overview of the process is captured in the following figure:

Amazon Mechanical Turk Template

turk_template.html contains the template for Amazon Mechanical Turk HITs.

The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Related tags

Overview

Interscript

Dataset

Data collection process

Amazon Mechanical Turk Template

Owner

AI2

This repository includes different versions of the prescribed-time controller as Simulink blocks and MATLAB script codes for engineering applications.

Action Segmentation Evaluation

In this project, two programs can help you take full agvantage of time on the model training with a remote server

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Language-Agnostic Website Embedding and Classification

Source code for our paper "Do Not Trust Prediction Scores for Membership Inference Attacks"

VLGrammar: Grounded Grammar Induction of Vision and Language

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Kaggle Feedback Prize - Evaluating Student Writing 15th solution

PolyTrack: Tracking with Bounding Polygons

Welcome to The Eigensolver Quantum School, a quantum computing crash course designed by students for students.

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

Heat transfer problemas solved using python

Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Using pytorch to implement unet network for liver image segmentation.

scikit-learn: machine learning in Python

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

pip install python-office

Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".