BERT Attention Analysis

Overview

BERT Attention Analysis

This repository contains code for What Does BERT Look At? An Analysis of BERT's Attention. It includes code for getting attention maps from BERT and writing them to disk, analyzing BERT's attention in general (sections 3 and 6 of the paper), and comparing its attention to dependency syntax (sections 4.2 and 5). We will add the code for the coreference resolution analysis (section 4.3 of the paper) soon!

Requirements

For extracting attention maps from text:

Additional requirements for the attention analysis:

Attention Analysis

Syntax_Analysis.ipynb and General_Analysis.ipynb contain code for analyzing BERT's attention, including reproducing the figures and tables in the paper.

You can download the data needed to run the notebooks (including BERT attention maps on Wikipedia and the Penn Treebank) from here. However, note that the Penn Treebank annotations are not freely available, so the Penn Treebank data only includes dummy labels. If you want to run the analysis on your own data, you can use the scripts described below to extract BERT attention maps.

Extracting BERT Attention Maps

We provide a script for running BERT over text and writing the resulting attention maps to disk. The input data should be a JSON file containing a list of dicts, each one corresponding to a single example to be passed in to BERT. Each dict must contain exactly one of the following fields:

  • "text": A string.
  • "words": A list of strings. Needed if you want word-level rather than token-level attention.
  • "tokens": A list of strings corresponding to BERT wordpiece tokenization.

If the present field is "tokens," the script expects [CLS]/[SEP] tokens to be already added; otherwise it adds these tokens to the beginning/end of the text automatically. Note that if an example is longer than max_sequence_length tokens after BERT wordpiece tokenization, attention maps will not be extracted for it. Attention extraction adds two additional fields to each dict:

  • "attns": A numpy array of size [num_layers, heads_per_layer, sequence_length, sequence_length] containing attention weights.
  • "tokens": If "tokens" was not already provided for the example, the BERT-wordpiece-tokenized text (list of strings).

Other fields already in the feature dicts will be preserved. For example if each dict has a tags key containing POS tags, they will stay in the data after attention extraction so they can be used when analyzing the data.

Attention extraction is run with

python extract_attention.py --preprocessed_data_file 
   
     --bert_dir 
    

    
   

The following optional arguments can also be added:

  • --max_sequence_length: Maximum input sequence length after tokenization (default is 128).
  • --batch_size: Batch size when running BERT over examples (default is 16).
  • --debug: Use a tiny BERT model for fast debugging.
  • --cased: Do not lowercase the input text.
  • --word_level: Compute word-level instead of token-level attention (see Section 4.1 of the paper).

The feature dicts with added attention maps (numpy arrays with shape [n_layers, n_heads_per_layer, n_tokens, n_tokens]) are written to _attn.pkl

Pre-processing Scripts

We include two pre-processing scripts for going from a raw data file to JSON that can be supplied to attention_extractor.py.

preprocess_unlabeled.py does BERT-pre-training-style preprocessing for unlabeled text (i.e, taking two consecutive text spans, truncating them so they are at most max_sequence_length tokens, and adding [CLS]/[SEP] tokens). Each line of the input data file should be one sentence. Documents should be separated by empty lines. Example usage:

python preprocess_unlabeled.py --data-file $ATTN_DATA_DIR/unlabeled.txt --bert-dir $ATTN_DATA_DIR/uncased_L-12_H-768_A-12

will create the file $ATTN_DATA_DIR/unlabeled.json containing pre-processed data. After pre-processing, you can run extract_attention.py to get attention maps, e.g.,

python extract_attention.py --preprocessed-data-file $ATTN_DATA_DIR/unlabeled.json --bert-dir $ATTN_DATA_DIR/uncased_L-12_H-768_A-12

preprocess_depparse.py pre-processes dependency parsing data. Dependency parsing data should consist of two files train.txt and dev.txt under a common directory. Each line in the files should contain a word followed by a space followed by - (e.g., 0-root). Examples should be separated by empty lines. Example usage:

python preprocess_depparse.py --data-dir $ATTN_DATA_DIR/depparse

After pre-processing, you can run extract_attention.py to get attention maps, e.g.,

python extract_attention.py --preprocessed-data-file $ATTN_DATA_DIR/depparse/dev.json --bert-dir $ATTN_DATA_DIR/uncased_L-12_H-768_A-12 --word_level

Computing Distances Between Attention Heads

head_distances.py computes the average Jenson-Shannon divergence between the attention weights of all pairs of attention heads and writes the results to disk as a numpy array of shape [n_heads, n_heads]. These distances can be used to cluster BERT's attention heads (see Section 6 and Figure 6 of the paper; code for doing this clustering is in General_Analysis.ipynb). Example usage (requires that attention maps have already been extracted):

python head_distances.py --attn-data-file $ATTN_DATA_DIR/unlabeled_attn.pkl --outfile $ATTN_DATA_DIR/head_distances.pkl

Citation

If you find the code or data helpful, please cite the original paper:

@inproceedings{clark2019what,
  title = {What Does BERT Look At? An Analysis of BERT's Attention},
  author = {Kevin Clark and Urvashi Khandelwal and Omer Levy and Christopher D. Manning},
  booktitle = {[email protected]},
  year = {2019}
}

Contact

Kevin Clark (@clarkkev).

Owner
Kevin Clark
Kevin Clark
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
Fast, DB Backed pretrained word embeddings for natural language processing.

Embeddings Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning. Instead of lo

Victor Zhong 212 Nov 21, 2022
Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

79 Nov 29, 2022
DeepPavlov Tutorials

DeepPavlov tutorials DeepPavlov: Sentence Classification with Word Embeddings DeepPavlov: Transfer Learning with BERT. Classification, Tagging, QA, Ze

Neural Networks and Deep Learning lab, MIPT 28 Sep 13, 2022
Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

Andrea Cavallo 3 Jun 22, 2022
Translates basic English sentences into the Huna language (hoo-NAH)

huna-translator The Huna Language Translates basic English sentences into the Huna language (hoo-NAH). The Huna constructed language was developed in

Miles Smith 0 Jan 20, 2022
Curso práctico: NLP de cero a cien 🤗

Curso Práctico: NLP de cero a cien Comprende todos los conceptos y arquitecturas clave del estado del arte del NLP y aplícalos a casos prácticos utili

Somos NLP 147 Jan 06, 2023
p-tuning for few-shot NLU task

p-tuning_NLU Overview 这个小项目是受乐于分享的苏剑林大佬这篇p-tuning 文章启发,也实现了个使用P-tuning进行NLU分类的任务, 思路是一样的,prompt实现方式有不同,这里是将[unused*]的embeddings参数抽取出用于初始化prompt_embed后

3 Dec 29, 2022
GSoC'2021 | TensorFlow implementation of Wav2Vec2

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Vasudev Gupta 73 Nov 28, 2022
This is the source code of RPG (Reward-Randomized Policy Gradient)

RPG (Reward-Randomized Policy Gradient) Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (

40 Nov 25, 2022
Script and models for clustering LAION-400m CLIP embeddings.

clustering-laion400m Script and models for clustering LAION-400m CLIP embeddings. Models were fit on the first million or so image embeddings. A subje

Peter Baylies 22 Oct 04, 2022
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)

DataCLUE 以数据为中心的AI测评(DataCLUE) DataCLUE: A Chinese Data-centric Language Evaluation Benchmark 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE)的背景 任务描述 任务描述 实验结果

CLUE benchmark 135 Dec 22, 2022
Translation for Trilium Notes. Trilium Notes 中文版.

Trilium Translation 中文说明 This repo provides a translation for the awesome Trilium Notes. Currently, I have translated Trilium Notes into Chinese. Test

743 Jan 08, 2023
Dope Wars game engine on StarkNet L2 roll-up

RYO Dope Wars game engine on StarkNet L2 roll-up. What TI-83 drug wars built as smart contract system. Background mechanism design notion here. Initia

104 Dec 04, 2022
Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

Florian Quirin 3 Jun 20, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

HUAWEI Noah's Ark Lab 295 Jan 07, 2023
Utilize Korean BERT model in sentence-transformers library

ko-sentence-transformers 이 프로젝트는 KoBERT 모델을 sentence-transformers 에서 보다 쉽게 사용하기 위해 만들어졌습니다. Ko-Sentence-BERT-SKTBERT 프로젝트에서는 KoBERT 모델을 sentence-trans

Junghyun 40 Dec 20, 2022
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 01, 2023