Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

Overview

SPLASH: Semantic Parsing with Language Assistance from Humans

SPLASH is dataset for the task of semantic parse correction with natural language feedback in the context of text-to-SQL parsing.

Example

The task, dataset along with baseline results are presented in
Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback.
Ahmed Elgohary, Saghar Hosseini and Ahmed Hassan Awadallah.
ACL 2020.

Release

The train.json, dev.json and test.json contain the training, development and testing examples of SPLASH. In addition to that, we also release the 179 examples that are based on the EditSQL parser (Please, see section 6.3 in the paper for more details). The EditSQL examples are in editsql.json. SPLASH is distributed under the CC BY-SA 4.0 license.

Format

Each example contains the following fields:

db_id: Name of Spider database.

question: Question (Utterance) as provided in Spider.

predicted_parse: The predicted SQL parse by the relevant model.

predicted_parse_with_values: The predicted SQL with the values (annonomized in predicted_parse) inferred by a rule-based post-processor. Note that we still use Spider's evaluation measure which ignores the values, but inferring values for the predicted parse is essential for generating meaningful explanations.

predicted_parse_explanation: The generated natural language explanation of the predicted SQL.

feedback: Collected natural language feedback.

gold_parse: The gold parse of the given question as provided in Spider.

beam: The top 20 predictions with corresponding scores produced by Seq2Struct beam search.

Please, refer to the paper for more details.

Example

    {
        "db_id": "csu_1", 
        "question": "Which university is in Los Angeles county and opened after 1950?", 
        "predicted_parse": "SELECT T1.Campus FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.County = value AND T1.Year > value AND T2.Year > value", 
        "predicted_parse_with_values": "SELECT T1.Campus FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.County = \"Los Angeles\" AND T1.Year > 1950 AND T2.Year > 2002",
        "predicted_parse_explanation": [
            "Step 1: For each row in Campuses table, find the corresponding rows in faculty     
            table", 
            "Step 2: find Campuses's Campus of the results of step 1 whose County equals Los 
             Angeles and Campuses's Year greater than 1950 and faculty's Year greater than 2002"
        ],
        "feedback": "In step 2 Remove faculty 's year greater than 2002\".", 
        "gold_parse": "SELECT campus FROM campuses WHERE county  =  \"Los Angeles\" AND YEAR  >  
        1950", 
        "beam": [
            [
                "SELECT T1.Campus FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.County = value AND T2.Year > value AND T2.Year > value", 
                -1.5820374488830566
            ], 
            [
                "SELECT T1.County FROM Campuses AS T1 JOIN faculty AS T2 ON T1.Id = T2.Campus WHERE T1.Campus = value AND T2.Year > value AND T2.Year > value", 
                -2.0078020095825195
            ], 
            ..
  }          

Please, contact Ahmed Elgohary < [email protected] > for any questions/feedback.

Citation

@inproceedings{Elgohary20Speak,
Title = {Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback},
Author = {Ahmed Elgohary and Saghar Hosseini and Ahmed Hassan Awadallah},
Year = {2020},
Booktitle = {Association for Computational Linguistics},
}
Owner
Microsoft Research - Language and Information Technologies (MSR LIT)
Microsoft Research - Language and Information Technologies (MSR LIT)
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

Blakey Wu 40 Dec 14, 2022
Computational Pathology Toolbox developed by TIA Centre, University of Warwick.

TIA Toolbox Computational Pathology Toolbox developed at the TIA Centre Getting Started All Users This package is for those interested in digital path

Tissue Image Analytics (TIA) Centre 156 Jan 08, 2023
Public repository created to store my custom-made tools for Just Dance (UbiArt Engine)

Woody's Just Dance Tools Public repository created to store my custom-made tools for Just Dance (UbiArt Engine) Development and updates Almost all of

Wodson de Andrade 8 Dec 24, 2022
Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Intel® Neural Compressor targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep l

Intel Corporation 846 Jan 04, 2023
N-Person-Check-Checker-Splitter - A calculator app use to divide checks

N-Person-Check-Checker-Splitter This is my from-scratch programmed calculator ap

2 Feb 15, 2022
SegNet including indices pooling for Semantic Segmentation with tensorflow and keras

SegNet SegNet is a model of semantic segmentation based on Fully Comvolutional Network. This repository contains the implementation of learning and te

Yuta Kamikawa 172 Dec 23, 2022
How to Predict Stock Prices Easily Demo

How-to-Predict-Stock-Prices-Easily-Demo How to Predict Stock Prices Easily - Intro to Deep Learning #7 by Siraj Raval on Youtube ##Overview This is th

Siraj Raval 752 Nov 16, 2022
The devkit of the nuScenes dataset.

nuScenes devkit Welcome to the devkit of the nuScenes and nuImages datasets. Overview Changelog Devkit setup nuImages nuImages setup Getting started w

Motional 1.6k Jan 05, 2023
Pretraining Representations For Data-Efficient Reinforcement Learning

Pretraining Representations For Data-Efficient Reinforcement Learning Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Ch

Mila 40 Dec 11, 2022
This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans

This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans. TABS relies on a Res-Unet backbone, with a Vision

6 Nov 07, 2022
Data-driven reduced order modeling for nonlinear dynamical systems

SSMLearn Data-driven Reduced Order Models for Nonlinear Dynamical Systems This package perform data-driven identification of reduced order model based

Haller Group, Nonlinear Dynamics 27 Dec 13, 2022
网络协议2天集训

网络协议2天集训 抓包工具安装 Wireshark wireshark下载地址 Tcpdump CentOS yum install tcpdump -y Ubuntu apt-get install tcpdump -y k8s抓包测试环境 查看虚拟网卡veth pair 查看

120 Dec 12, 2022
Code for Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing(ICCV21)

NeuralGIF Code for Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing(ICCV21) We present Neural Generalized Implicit F

Garvita Tiwari 104 Nov 18, 2022
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

DiffWave DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via itera

LMNT 498 Jan 03, 2023
Deep Learning with PyTorch made easy 🚀 !

Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c

381 Dec 22, 2022
This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

Lottery Simulator 2022 for Web Launch Application Developed by John Seong in Ontario. This game was designed to encourage young people not to gamble o

John Seong 2 Sep 02, 2022
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
Libraries, tools and tasks created and used at DeepMind Robotics.

dm_robotics: Libraries, tools, and tasks created and used for Robotics research at DeepMind. Package overview Package Summary Transformations Rigid bo

DeepMind 273 Jan 06, 2023
PRIME: A Few Primitives Can Boost Robustness to Common Corruptions

PRIME: A Few Primitives Can Boost Robustness to Common Corruptions This is the official repository of PRIME, the data agumentation method introduced i

Apostolos Modas 34 Oct 30, 2022