The official github repository for Towards Continual Knowledge Learning of Language Models

Last update: Jan 07, 2023

Overview

Towards Continual Knowledge Learning of Language Models

This is the official github repository for Towards Continual Knowledge Learning of Language Models.

In order to reproduce our results, take the following steps:

1. Create conda environment and install requirements

conda create -n ckl python=3.8 && conda activate ckl
pip install -r requirements.txt

Also, make sure to install the correct version of pytorch corresponding to the CUDA version and environment: Refer to https://pytorch.org/

#For CUDA 10.x
pip3 install torch torchvision torchaudio
#For CUDA 11.x
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

2. Download the data used for the experiments.

To download only the CKL benchmark dataset:

python download_ckl_data.py

To download ALL of the data used for the experiments (required to reproduce results):

python download_all_data.py

To download the (continually pretrained) model checkpoints of the main experiment (required to reproduce results):

python download_model_checkpoints.py

For the other experimental settings such as multiple CKL phases, GPT-2, we do not separately provide the continually pretrained model checkpoints.

3. Reproducing Experimental Results

We provide all the configs in order to reproduce the zero-shot results of our paper. We only provide the model checkpoints for the main experimental setting (full_setting) which can be downloaded with the command above.

configs
├── full_setting
│   ├── evaluation
│   |   ├── invariantLAMA
│   |   |   ├── t5_baseline.json
│   |   |   ├── t5_kadapters.json
│   |   |   ├── ...
│   |   ├── newLAMA
│   |   ├── newLAMA_easy
│   |   ├── updatedLAMA
│   ├── training
│   |   ├── t5_baseline.json
│   |   ├── t5_kadapters.json
│   |   ├── ...
├── GPT2
│   ├── ...
├── kilt
│   ├── ...
├── small_setting
│   ├── ...
├── split
│   ├── ...

Components in each configurations file

input_length (int) : the input sequence length
output_length (int) : the output sequence length
num_train_epochs (int) : number of training epochs
output_dir (string) : the directory to save the model checkpoints
dataset (string) : the dataset to perform zero-shot evaluation or continual pretraining
dataset_version (string) : the version of the dataset ['full', 'small', 'debug']
train_batch_size (int) : batch size used for training
learning rate (float) : learning rate used for training
model (string) : model name in huggingface models (https://huggingface.co/models)
method (string) : method being used ['baseline', 'kadapter', 'lora', 'mixreview', 'modular_small', 'recadam']
freeze_level (int) : how much of the model to freeze during traininig (0 for none, 1 for freezing only encoder, 2 for freezing all of the parameters)
gradient_accumulation_steps (int) : gradient accumulation used to match the global training batch of each method
ngpu (int) : number of gpus used for the run
num_workers (int) : number of workers for the Dataloader
resume_from_checkpoint (string) : null by default. directory to model checkpoint if resuming from checkpoint
accelerator (string) : 'ddp' by default. the pytorch lightning accelerator to be used.
use_deepspeed (bool) : false by default. Currently not extensively tested.
CUDA_VISIBLE_DEVICES (string) : gpu devices that are made available for this run (e.g. "0,1,2,3", "0")
wandb_log (bool) : whether to log experiment through wandb
wandb_project (string) : project name of wandb
wandb_run_name (string) : the name of this training run
mode (string) : 'pretrain' for all configs
use_lr_scheduling (bool) : true if using learning rate scheduling
check_validation (bool) : true for evaluation (no training)
checkpoint_path (string) : path to the model checkpoint that is used for evaluation
output_log (string) : directory to log evaluation results to
split_num (int) : default is 1. more than 1 if there are multile CKL phases
split (int) : which CKL phase it is

This is an example of getting the invariantLAMA zero-shot evaluation of continually pretrained t5_kadapters

python run.py --config configs/full_setting/evaluation/invariantLAMA/t5_kadapters.json

This is an example of performing continual pretraining on CC-RecentNews (main experiment) with t5_kadapters

python run.py --config configs/full_setting/training/t5_kadapters.json

Reference

@article{jang2021towards,
  title={Towards Continual Knowledge Learning of Language Models},
  author={Jang, Joel and Ye, Seonghyeon and Yang, Sohee and Shin, Joongbo and Han, Janghoon and Kim, Gyeonghun and Choi, Stanley Jungkyu and Seo, Minjoon},
  journal={arXiv preprint arXiv:2110.03215},
  year={2021}
}

The official github repository for Towards Continual Knowledge Learning of Language Models

Related tags

Overview

Towards Continual Knowledge Learning of Language Models

1. Create conda environment and install requirements

2. Download the data used for the experiments.

3. Reproducing Experimental Results

Components in each configurations file

Reference

Owner

Joel Jang | 장요엘

Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

Wind Speed Prediction using LSTMs in PyTorch

Repo for the ACMMM20 submission: "Personalized breath based biometric authentication with wearable multimodality".

XViT - Space-time Mixing Attention for Video Transformer

working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

An efficient framework for reinforcement learning.

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

How to Leverage Multimodal EHR Data for Better Medical Predictions?

we propose EfficientDerain for high-efficiency single-image deraining

A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation

Neural implicit reconstruction experiments for the Vector Neuron paper

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

The Most Efficient Temporal Difference Learning Framework for 2048