System Combination for Grammatical Error Correction Based on Integer Programming

Last update: Mar 29, 2022

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.
Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.
Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

Reference

Prerequisites

Example

License

Owner

NUS NLP Group

Face Recognition Attendance Project

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

A pytorch implementation of Reading Wikipedia to Answer Open-Domain Questions.

A framework for attentive explainable deep learning on tabular data

Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss （ATVGnet）

Active learning for Mask R-CNN in Detectron2

Object detection on multiple datasets with an automatically learned unified label space.

PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT

Gesture recognition on Event Data

Deep Learning Package based on TensorFlow

🧑‍🔬 verify your TEAL program by experiment and observation

Generative Modelling of BRDF Textures from Flash Images [SIGGRAPH Asia, 2021]

Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"

Stochastic Extragradient: General Analysis and Improved Rates

Code for "Retrieving Black-box Optimal Images from External Databases" (WSDM 2022)

The code written during my Bachelor Thesis "Classification of Human Whole-Body Motion using Hidden Markov Models".

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Deep learning model, heat map, data prepo

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks