Big Bird: Transformers for Longer Sequences

Overview

Big Bird: Transformers for Longer Sequences

Not an official Google product.

What is BigBird?

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization.

More details and comparisons can be found in our presentation.

Citation

If you find this useful, please cite our NeurIPS 2020 paper:

@article{zaheer2020bigbird,
  title={Big bird: Transformers for longer sequences},
  author={Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

Code

The most important directory is core. There are three main files in core.

  • attention.py: Contains BigBird linear attention mechanism
  • encoder.py: Contains the main long sequence encoder stack
  • modeling.py: Contains packaged BERT and seq2seq transformer models with BigBird attention

Colab/IPython Notebook

A quick fine-tuning demonstration for text classification is provided in imdb.ipynb

Create GCP Instance

Please create a project first and create an instance in a zone which has quota as follows

gcloud compute instances create \
  bigbird \
  --zone=europe-west4-a \
  --machine-type=n1-standard-16 \
  --boot-disk-size=50GB \
  --image-project=ml-images \
  --image-family=tf-2-3-1 \
  --maintenance-policy TERMINATE \
  --restart-on-failure \
  --scopes=cloud-platform

gcloud compute tpus create \
  bigbird \
  --zone=europe-west4-a \
  --accelerator-type=v3-32 \
  --version=2.3.1

gcloud compute ssh --zone "europe-west4-a" "bigbird"

For illustration we used instance name bigbird and zone europe-west4-a, but feel free to change them. More details about creating Google Cloud TPU can be found in online documentations.

Instalation and checkpoints

git clone https://github.com/google-research/bigbird.git
cd bigbird
pip3 install -e .

You can find pretrained and fine-tuned checkpoints in our Google Cloud Storage Bucket.

Optionally, you can download them using gsutil as

mkdir -p bigbird/ckpt
gsutil cp -r gs://bigbird-transformer/ bigbird/ckpt/

The storage bucket contains:

  • pretrained BERT model for base(bigbr_base) and large (bigbr_large) size. It correspond to BERT/RoBERTa-like encoder only models. Following original BERT and RoBERTa implementation they are transformers with post-normalization, i.e. layer norm is happening after the attention layer. However, following Rothe et al, we can use them partially in encoder-decoder fashion by coupling the encoder and decoder parameters, as illustrated in bigbird/summarization/roberta_base.sh launch script.
  • pretrained Pegasus Encoder-Decoder Transformer in large size(bigbp_large). Again following original implementation of Pegasus, they are transformers with pre-normalization. They have full set of separate encoder-decoder weights. Also for long document summarization datasets, we have converted Pegasus checkpoints (model.ckpt-0) for each dataset and also provided fine-tuned checkpoints (model.ckpt-300000) which works on longer documents.
  • fine-tuned tf.SavedModel for long document summarization which can be directly be used for prediction and evaluation as illustrated in the colab nootebook.

Running Classification

For quickly starting with BigBird, one can start by running the classification experiment code in classifier directory. To run the code simply execute

export GCP_PROJECT_NAME=bigbird-project  # Replace by your project name
export GCP_EXP_BUCKET=gs://bigbird-transformer-training/  # Replace
sh -x bigbird/classifier/base_size.sh

Using BigBird Encoder instead BERT/RoBERTa

To directly use the encoder instead of say BERT model, we can use the following code.

from bigbird.core import modeling

bigb_encoder = modeling.BertModel(...)

It can easily replace BERT's encoder.

Alternatively, one can also try playing with layers of BigBird encoder

from bigbird.core import encoder

only_layers = encoder.EncoderStack(...)

Understanding Flags & Config

All the flags and config are explained in core/flags.py. Here we explain some of the important config paramaters.

attention_type is used to select the type of attention we would use. Setting it to block_sparse runs the BigBird attention module.

flags.DEFINE_enum(
    "attention_type", "block_sparse",
    ["original_full", "simulated_sparse", "block_sparse"],
    "Selecting attention implementation. "
    "'original_full': full attention from original bert. "
    "'simulated_sparse': simulated sparse attention. "
    "'block_sparse': blocked implementation of sparse attention.")

block_size is used to define the size of blocks, whereas num_rand_blocks is used to set the number of random blocks. The code currently uses window size of 3 blocks and 2 global blocks. The current code only supports static tensors.

Important points to note:

  • Hidden dimension should be divisible by the number of heads.
  • Currently the code only handles tensors of static shape as it is primarily designed for TPUs which only works with statically shaped tensors.
  • For sequene length less than 1024, using original_full is advised as there is no benefit in using sparse BigBird attention.
NVDA, the free and open source Screen Reader for Microsoft Windows

NVDA NVDA (NonVisual Desktop Access) is a free, open source screen reader for Microsoft Windows. It is developed by NV Access in collaboration with a

NV Access 1.6k Jan 07, 2023
I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others

1 Jan 13, 2022
Tutorial to pretrain & fine-tune a ๐Ÿค— Flax T5 model on a TPUv3-8 with GCP

Pretrain and Fine-tune a T5 model with Flax on GCP This tutorial details how pretrain and fine-tune a FlaxT5 model from HuggingFace using a TPU VM ava

Gabriele Sarti 41 Nov 18, 2022
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022
ACL'22: Structured Pruning Learns Compact and Accurate Models

โ˜• CoFiPruning: Structured Pruning Learns Compact and Accurate Models This repository contains the code and pruned models for our ACL'22 paper Structur

Princeton Natural Language Processing 130 Jan 04, 2023
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

IPL-data-analysis This project consists of data analysis and data visualization of all IPL seasons from 2008 to 2019 and answering the most asked ques

Sivateja A T 2 Feb 08, 2022
A python script that will use hydra to get user and password to login to ssh, ftp, and telnet

Hydra-Auto-Hack A python script that will use hydra to get user and password to login to ssh, ftp, and telnet Project Description This python script w

2 Jan 16, 2022
This repo contains simple to use, pretrained/training-less models for speaker diarization.

PyDiar This repo contains simple to use, pretrained/training-less models for speaker diarization. Supported Models Binary Key Speaker Modeling Based o

12 Jan 20, 2022
txtai: Build AI-powered semantic search applications in Go

txtai: Build AI-powered semantic search applications in Go txtai executes machine-learning workflows to transform data and build AI-powered semantic s

NeuML 49 Dec 06, 2022
A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Delta Reading Comprehension Dataset ๅฐ้”้–ฑ่ฎ€็†่งฃ่ณ‡ๆ–™้›† Delta Reading Comprehension Dataset (DRCD) ๅฑฌๆ–ผ้€š็”จ้ ˜ๅŸŸ็น้ซ”ไธญๆ–‡ๆฉŸๅ™จ้–ฑ่ฎ€็†่งฃ่ณ‡ๆ–™้›†ใ€‚ ๆœฌ่ณ‡ๆ–™้›†ๆœŸๆœ›ๆˆ็‚บ้ฉ็”จๆ–ผ้ท็งปๅญธ็ฟ’ไน‹ๆจ™ๆบ–ไธญๆ–‡้–ฑ่ฎ€็†่งฃ่ณ‡ๆ–™้›†ใ€‚ ๆœฌ่ณ‡ๆ–™้›†ๅพž2,108็ฏ‡

272 Dec 15, 2022
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ๐ŸŽ‰ ๐ŸŽ‰ ๐ŸŽ‰ We released the 2.0.0 version with TF2 Support. ๐ŸŽ‰ ๐ŸŽ‰ ๐ŸŽ‰ If you

Eliyar Eziz 2.3k Dec 29, 2022
[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Learning Signal-Agnostic Manifolds of Neural Fields This is the uncleaned code for the paper Learning Signal-Agnostic Manifolds of Neural Fields. The

60 Dec 12, 2022
A Streamlit web app that generates Rick and Morty stories using GPT2.

Rick and Morty Story Generator This project uses a pre-trained GPT2 model, which was fine-tuned on Rick and Morty transcripts, to generate new stories

โ‚ธornike 33 Oct 13, 2022
Text Classification in Turkish Texts with Bert

You can watch the details of the project on my youtube channel Project Interface Project Second Interface Goal= Correctly guessing the classification

42 Dec 31, 2022
An Explainable Leaderboard for NLP

ExplainaBoard: An Explainable Leaderboard for NLP Introduction | Website | Download | Backend | Paper | Video | Bib Introduction ExplainaBoard is an i

NeuLab 319 Dec 20, 2022
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart 247 Jan 05, 2023