Codes for coreference-aware machine reading comprehension

Last update: Sep 29, 2022

Related tags

Overview

Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022.

Dataset

There are three folders for our three models mentioned in the paper: Coref_additive_spacy for Coref_additive_attention, Coref_dgl_spacy for GNN and Coref_multiplication_spacy for Coref_multiplication_attention, and each contains the train data set and the dev data set under the quoref folder.

each sample contains

context: the paragraph text
context_id: the unique identifier of the context
qas: a group of questions
question: question text
id: the unique identifier of the question
answers: a group of the answers to one question
text: answer text
answer_start: the start_position of one answer

Models

If you want to use our trained model, please download it from Google drive

Training

python run_quoref.py --train_file "quoref/train.json" --predict_file "quoref/dev.json" --model_type "roberta_multi" --model_name_or_path "roberta-large" --output_dir "out" --do_train --do_eval --eval_all_checkpoints --learning_rate 1e-5 --num_train_epochs 6 --overwrite_output_dir --per_gpu_train_batch_size 4 --save_steps 6000 --coref_weight 0.4

Kindly Hint

There is an open issue regarding the compatibility between NeuralCoref and spaCy 3.0. If you intend to use the latest spaCy models, please watch the issue.

Cite

If you extend or use this work, please cite the paper where it was introduced:

@article{Huang2021TracingOC,
  title={Tracing Origins: Coref-aware Machine Reading Comprehension},
  author={Baorong Huang and Zhuosheng Zhang and Hai Zhao},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.07961}
}

Codes for coreference-aware machine reading comprehension

Related tags

Overview

Dataset

Models

Training

Kindly Hint

Cite

Owner

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Codes for processing meeting summarization datasets AMI and ICSI.

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

MMDA - multimodal document analysis

Input english text, then translate it between languages n times using the Deep Translator Python Library.

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Knowledge Oriented Programming Language

Codes to pre-train Japanese T5 models

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

Open source code for AlphaFold.

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Malware-Related Sentence Classification

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

Codes for coreference-aware machine reading comprehension

nlpcommon is a python Open Source Toolkit for text classification.