QA-GNN: Question Answering using Language Models and Knowledge Graphs

Last update: Jan 04, 2023

Overview

QA-GNN: Question Answering using Language Models and Knowledge Graphs

This repo provides the source code & data of our paper: QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering (NAACL 2021).

@InProceedings{yasunaga2021qagnn,
  author =  {Michihiro Yasunaga and Hongyu Ren and Antoine Bosselut and Percy Liang and Jure Leskovec},
  title =   {QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering},
  year =    {2021},  
  booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},  
}

Webpage: https://snap.stanford.edu/qagnn

Usage

0. Dependencies

Python == 3.7
PyTorch == 1.4.0
transformers == 2.0.0
torch-geometric ==1.6.0

Run the following commands to create a conda environment (assuming CUDA10.1):

conda create -n qagnn python=3.7
source activate qagnn
pip install numpy==1.18.3 tqdm
pip install torch==1.4.0 torchvision==0.5.0
pip install transformers==2.0.0 nltk spacy==2.1.6
python -m spacy download en

#for torch-geometric
pip install torch-scatter==2.0.4 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-cluster==1.5.4 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-sparse==0.6.1 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-spline-conv==1.2.0 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-geometric==1.6.0 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html

1. Download Data

Download all the raw data -- ConceptNet, CommonsenseQA, OpenBookQA -- by

./download_raw_data.sh

You can preprocess the raw data by running

python preprocess.py -p <num_processes>

The script will:

Setup ConceptNet (e.g., extract English relations from ConceptNet, merge the original 42 relation types into 17 types)
Convert the QA datasets into .jsonl files (e.g., stored in data/csqa/statement/)
Identify all mentioned concepts in the questions and answers
Extract subgraphs for each q-a pair

TL;DR. The preprocessing may take long; for your convenience, you can download all the processed data by

./download_preprocessed_data.sh

The resulting file structure will look like:

.
├── README.md
└── data/
    ├── cpnet/                 (prerocessed ConceptNet)
    └── csqa/
        ├── train_rand_split.jsonl
        ├── dev_rand_split.jsonl
        ├── test_rand_split_no_answers.jsonl
        ├── statement/             (converted statements)
        ├── grounded/              (grounded entities)
        ├── graphs/                (extracted subgraphs)
        ├── ...

2. Training

For CommonsenseQA, run

./run_qagnn__csqa.sh

For OpenBookQA, run

./run_qagnn__obqa.sh

As configured in these scripts, the model needs two types of input files

--{train,dev,test}_statements: preprocessed question statements in jsonl format. This is mainly loaded by load_input_tensors function in utils/data_utils.py.
--{train,dev,test}_adj: information of the KG subgraph extracted for each question. This is mainly loaded by load_sparse_adj_data_with_contextnode function in utils/data_utils.py.

Use Your Own Dataset

Convert your dataset to {train,dev,test}.statement.jsonl in .jsonl format (see data/csqa/statement/train.statement.jsonl)
Create a directory in data/{yourdataset}/ to store the .jsonl files
Modify preprocess.py and perform subgraph extraction for your data
Modify utils/parser_utils.py to support your own dataset

Acknowledgment

This repo is built upon the following work:

Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering. Yanlin Feng*, Xinyue Chen*, Bill Yuchen Lin, Peifeng Wang, Jun Yan and Xiang Ren. EMNLP 2020.
https://github.com/INK-USC/MHGRN

Many thanks to the authors and developers!

QA-GNN: Question Answering using Language Models and Knowledge Graphs

Related tags

Overview

QA-GNN: Question Answering using Language Models and Knowledge Graphs

Usage

0. Dependencies

1. Download Data

2. Training

Use Your Own Dataset

Acknowledgment

Owner

Michihiro Yasunaga

PyTorch Lightning implementation of Automatic Speech Recognition

A Simulation Environment to train Robots in Large Realistic Interactive Scenes

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Code for IntraQ, PyTorch implementation of our paper under review

[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

NLP made easy

This is an open source library implementing hyperbox-based machine learning algorithms

Deep Learning GPU Training System

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

This is the official pytorch implementation of the BoxEL for the description logic EL++

Fast and robust clustering of point clouds generated with a Velodyne sensor.

Project for tracking occupancy in Tel-Aviv parking lots.

git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"