The official github repository for Towards Continual Knowledge Learning of Language Models

Overview

Towards Continual Knowledge Learning of Language Models

This is the official github repository for Towards Continual Knowledge Learning of Language Models.

In order to reproduce our results, take the following steps:

1. Create conda environment and install requirements

conda create -n ckl python=3.8 && conda activate ckl
pip install -r requirements.txt

Also, make sure to install the correct version of pytorch corresponding to the CUDA version and environment: Refer to https://pytorch.org/

#For CUDA 10.x
pip3 install torch torchvision torchaudio
#For CUDA 11.x
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

2. Download the data used for the experiments.

To download only the CKL benchmark dataset:

python download_ckl_data.py

To download ALL of the data used for the experiments (required to reproduce results):

python download_all_data.py

To download the (continually pretrained) model checkpoints of the main experiment (required to reproduce results):

python download_model_checkpoints.py

For the other experimental settings such as multiple CKL phases, GPT-2, we do not separately provide the continually pretrained model checkpoints.

3. Reproducing Experimental Results

We provide all the configs in order to reproduce the zero-shot results of our paper. We only provide the model checkpoints for the main experimental setting (full_setting) which can be downloaded with the command above.

configs
├── full_setting
│   ├── evaluation
│   |   ├── invariantLAMA
│   |   |   ├── t5_baseline.json
│   |   |   ├── t5_kadapters.json
│   |   |   ├── ...
│   |   ├── newLAMA
│   |   ├── newLAMA_easy
│   |   ├── updatedLAMA
│   ├── training
│   |   ├── t5_baseline.json
│   |   ├── t5_kadapters.json
│   |   ├── ...
├── GPT2
│   ├── ...
├── kilt
│   ├── ...
├── small_setting
│   ├── ...
├── split
│   ├── ...                    

Components in each configurations file

  • input_length (int) : the input sequence length
  • output_length (int) : the output sequence length
  • num_train_epochs (int) : number of training epochs
  • output_dir (string) : the directory to save the model checkpoints
  • dataset (string) : the dataset to perform zero-shot evaluation or continual pretraining
  • dataset_version (string) : the version of the dataset ['full', 'small', 'debug']
  • train_batch_size (int) : batch size used for training
  • learning rate (float) : learning rate used for training
  • model (string) : model name in huggingface models (https://huggingface.co/models)
  • method (string) : method being used ['baseline', 'kadapter', 'lora', 'mixreview', 'modular_small', 'recadam']
  • freeze_level (int) : how much of the model to freeze during traininig (0 for none, 1 for freezing only encoder, 2 for freezing all of the parameters)
  • gradient_accumulation_steps (int) : gradient accumulation used to match the global training batch of each method
  • ngpu (int) : number of gpus used for the run
  • num_workers (int) : number of workers for the Dataloader
  • resume_from_checkpoint (string) : null by default. directory to model checkpoint if resuming from checkpoint
  • accelerator (string) : 'ddp' by default. the pytorch lightning accelerator to be used.
  • use_deepspeed (bool) : false by default. Currently not extensively tested.
  • CUDA_VISIBLE_DEVICES (string) : gpu devices that are made available for this run (e.g. "0,1,2,3", "0")
  • wandb_log (bool) : whether to log experiment through wandb
  • wandb_project (string) : project name of wandb
  • wandb_run_name (string) : the name of this training run
  • mode (string) : 'pretrain' for all configs
  • use_lr_scheduling (bool) : true if using learning rate scheduling
  • check_validation (bool) : true for evaluation (no training)
  • checkpoint_path (string) : path to the model checkpoint that is used for evaluation
  • output_log (string) : directory to log evaluation results to
  • split_num (int) : default is 1. more than 1 if there are multile CKL phases
  • split (int) : which CKL phase it is

This is an example of getting the invariantLAMA zero-shot evaluation of continually pretrained t5_kadapters

python run.py --config configs/full_setting/evaluation/invariantLAMA/t5_kadapters.json

This is an example of performing continual pretraining on CC-RecentNews (main experiment) with t5_kadapters

python run.py --config configs/full_setting/training/t5_kadapters.json

Reference

@article{jang2021towards,
  title={Towards Continual Knowledge Learning of Language Models},
  author={Jang, Joel and Ye, Seonghyeon and Yang, Sohee and Shin, Joongbo and Han, Janghoon and Kim, Gyeonghun and Choi, Stanley Jungkyu and Seo, Minjoon},
  journal={arXiv preprint arXiv:2110.03215},
  year={2021}
}
Owner
Joel Jang | 장요엘
Aspiring NLP researcher and a MS student at the Graduate School of AI, KAIST advised by Minjoon Seo
Joel Jang | 장요엘
Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

Symbolic Learning to Optimize This is the official implementation for ICLR-2022 paper "Symbolic Learning to Optimize: Towards Interpretability and Sca

VITA 8 Dec 19, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 03, 2023
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

marge This repository releases the code for Generating Query Focused Summaries from Query-Free Resources. Please cite the following paper [bib] if you

Yumo Xu 28 Nov 10, 2022
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
Repo for the ACMMM20 submission: "Personalized breath based biometric authentication with wearable multimodality".

personalized-breath Repo for the ACMMM20 submission: "Personalized breath based biometric authentication with wearable multimodality". Guideline To ex

Manh-Ha Bui 2 Nov 15, 2021
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

Music Demixing Challenge - xumx-sliCQ This repository is the GitHub mirror of my working submission repository for the AICrowd ISMIR 2021 Music Demixi

4 Aug 25, 2021
Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

ConvNeXt Tensorflow This is unofficial tensorflow keras implementation of ConvNe

29 Oct 06, 2022
An efficient framework for reinforcement learning.

rl: An efficient framework for reinforcement learning Requirements Introduction PPO Test Requirements name version Python =3.7 numpy =1.19 torch =1

16 Nov 30, 2022
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF

Google 625 Dec 30, 2022
Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

SSRL-for-image-classification Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

Feng 2 Nov 19, 2021
Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides Project | This repo is the officia

CVSM Group - email: <a href=[email protected]"> 33 Dec 28, 2022
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

WebDataset WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and us

1.1k Jan 08, 2023
How to Leverage Multimodal EHR Data for Better Medical Predictions?

How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat

13 Dec 13, 2022
we propose EfficientDerain for high-efficiency single-image deraining

EfficientDerain we propose EfficientDerain for high-efficiency single-image deraining Requirements python 3.6 pytorch 1.6.0 opencv-python 4.4.0.44 sci

Qing Guo 126 Dec 07, 2022
A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation

##A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation. #USAGE To run the trained classifier on some images: python w

Alex Seewald 13 Nov 17, 2022
Neural implicit reconstruction experiments for the Vector Neuron paper

Neural Implicit Reconstruction with Vector Neurons This repository contains code for the neural implicit reconstruction experiments in the paper Vecto

Congyue Deng 35 Jan 02, 2023
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
The Most Efficient Temporal Difference Learning Framework for 2048

moporgic/TDL2048+ TDL2048+ is a highly optimized temporal difference (TD) learning framework for 2048. Features Many common methods related to 2048 ar

Hung Guei 5 Nov 23, 2022