The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

Overview

R2D2

This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling". The current repo is refactored from the original version used in the paper. If meet any issue, please feel free to feedback.

Data

Train

Multi-GPUs

For training from scratch in a single machine with multiple GPUs, please follow scripts below:

CORPUS_PATH=
OUTPUT_PATH=
NODE_NUM=

python -m torch.distributed.launch \
    --nproc_per_node $NODE_NUM R2D2_trainer.py --batch_size 16 \
    --min_len 2 \
    --max_batch_len 512 \
    --max_line -1 \
    --corpus_path $CORPUS_PATH \
    --vocab_path data/en_bert/bert-base-uncased-vocab.txt \
    --config_path data/en_bert/config.json \
    --epoch 60 \
    --output_dir $OUTPUT_PATH \
    --window_size 4 \
    --input_type txt

Single-GPU

CORPUS_PATH=
OUTPUT_PATH=

python trainer.R2D2_trainer \
    --batch_size 16 \
    --min_len 2 \
    --max_batch_len 512 \
    --max_line -1 \
    --corpus_path $CORPUS_PATH \
    --vocab_path data/en_bert/bert-base-uncased-vocab.txt \
    --config_path data/en_bert/config.json \
    --epoch 10 \
    --output_dir $OUTPUT_PATH \
    --input_type txt

Evaluation

Evaluating the bidirectional language model task.

CORPUS_PATH=path to training corpus
VOCAB_DIR=directory of vocab.txt
MODEL_PATH=path to model.bin
CONFIG_PATH=path to config.json

python lm_eval_buckets.py \
    --model_name R2D2 \
    --dataset test \
    --config_path CONFIG_PATH \
    --model_path MODEL_PATH \
    --vocab_dir VOCAB_DIR \
    --corpus_path CORPUS_PATH

For evaluating F1 score on constituency trees, please refer to https://github.com/harvardnlp/compound-pcfg/blob/master/compare_trees.py

Evaluating compatibility with dependency trees: Download WSJ dataset and convert to dependency trees by Stanford CoreNLP(https://stanfordnlp.github.io/CoreNLP/). As WSJ is not a free dataset, it's not included in our project. Please refer to the files in data/predict_trees for detail format of tree induced.

python eval_tree.py \
    --pred_tree_path path_to_tree_induced \
    --ground_truth_path path_to_dependency_trees
    --vocab_dir VOCAB_DIR

On-going work

  1. Re-implement whole model to increase GPU utility ratio.
  2. Pre-train on large corpus

Contact

[email protected] and [email protected]

You might also like...
Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Learning Associative Inference Using Fast Weight Memory This repository contains the offical code for the paper Learning Associative Inference Using F

Official PyTorch code for CVPR 2020 paper
Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

Official Code for ICML 2021 paper
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

This is the official code of our paper
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Official code for paper
Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demysitifing Local Vision Transformer, arxiv This is the official PyTorch implementation of our paper. We simply replace local self attention by (dyna

Comments
  • question about perplexity measures with R2D2 original model

    question about perplexity measures with R2D2 original model

    I have a few minor questions about the R2D2 PPPL measurements and their implementation.

    Q1: In the paper, it says PPPL is defined as, exp(-(1/N) sum(L(S)))

    This makes sense. But in the evaluation code here,

                    log_p_sums, b_c, pppl = self.predictor(ids, self.bucket_size, self.get_bucket_id)
                    PPPL += (pppl - PPPL) / counter
                    print(PPPL, file=f_out)
    

    We are outputting PPPL without taking the exponential. I assume the numbers in the paper are actually 2^{PPPL} here right? assuming we are using 2 as the base. I simply load a random BERT model, PPPL outputted here is around 10.4, 2^{10.4} ~= 1351, which is about right.

    Q2: For pretraining the BERT model baseline, are you guys loading the same dataset as in the link below? or loading some default huggingface dataset? https://github.com/alipay/StructuredLM_RTDT/tree/r2d2/data/en_wiki

    Sorry to throw random questions at you, but this would be very helpful for me to build something on top of this.

    Thanks.

    opened by frankaging 4
  • an potential issue found for the nn.MultiheadAttention module setup

    an potential issue found for the nn.MultiheadAttention module setup

    Hi Authors!

    Thanks for sharing this repo, I enjoyed when reading your paper, and I am working on a related project. As I am going through the code, I found one potential issue with the current setup. I will (1) explain the issue, and (2) provide a simple test case that I ran on my end. Please help with verifying.

    Issue:

    • nn.MultiheadAttention module inside the BinaryEncoder module is set with batch_first=True, however it seems like we are passing in Q, K, V matrics without the first dimension being the batch dimension.

    Code Analysis: In r2d2.py, it is calling the encoder here, as the following

            tasks_embedding = self.embedding(task_ids)  # (?, 2, dim)
            input_embedding = torch.cat([tasks_embedding, tensor_batch], dim=1)  # (?, 4, dim)
            outputs = self.tree_encoder(input_embedding.transpose(0, 1)).transpose(0, 1)  # (? * batch_size, 4, dim)
    

    We can see that input_embedding is definitely with the first dimension being the batch_size as it concat with the embeddings from the nn.embedding module. Before we call self.tree_encoder, .transpose(0, 1) makes the the second dimension of the input being the batch_size instead. Specifically, the first dimension, in this case, is always 4.

    Testing Done: I simply add some logs inside TreeEncoderLayer as,

        def forward(self, src, src_mask=None, pos_ids=None):
            """
            :param src: concatenation of task embeddings and representation for left and right.
                        src shape: (task_embeddings + left + right, batch_size, dim)
            :param src_mask:
            :param pos_ids:
            :return:
            """
            if len(pos_ids.shape) == 1:
                sz = src.shape[0]  # sz: batch_size
                pos_ids = pos_ids.unsqueeze(0).expand(sz, -1)  # (3, batch_size)
            position_embedding = self.position_embedding(pos_ids)
            print("pre: ", src.shape)
            print("pos_emb: ", position_embedding.shape)
            output = self.self_attn(src + position_embedding, src + position_embedding, src, attn_mask=src_mask)
            src2 = output[0]
            attn_weights = output[1]
            print("attn_w: ", attn_weights.shape)
            src = src + self.dropout1(src2)
            src = self.norm1(src)
            src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))
            src = src + self.dropout2(src2)
            src = self.norm2(src)
            print("post: ", src.shape)
            return src
    

    And this is what I get,

    pre:  torch.Size([4, 8, 768])
    pos_emb:  torch.Size([4, 8, 768])
    attn_w:  torch.Size([4, 8, 8])
    post:  torch.Size([4, 8, 768])
    

    Summary: It seems like for r2d2.py, the self-attention is not on those 4 tokens (2 special prefix + left and right children embedding), but it is on the full collection of candidates with their children.

    I saw this issue is not presented in r2d2_cuda.py as,

                outputs = self.tree_encoder(
                    input_embedding)  # (? * batch_size, 4, dim)
    

    This is great. I have not checked the rest of the code for r2d2_cuda.py though. With this, I am wondering are the numbers from either of your papers need to be updated with this potential issue? Either way, I am not blocked by this potential issue, and I was inspired quite a lot by your codebase. Thanks!

    opened by frankaging 3
  • 关于backbone的疑问。

    关于backbone的疑问。

    作者你好,非常感谢你的贡献,我觉得你的工作很有意义,感觉是一个新方向。 有2个疑问需要请教一下:

    1. encoder 使用 transformer,基于注意力的模型,其能力很大部门来源于能通过注意力机制编码出上下文中有用的信息,但这里每次输入只有 [SUM], [CLS], [token1], [token2] 共4个,上下文短,个人感觉 transformer 可能不是最合适的,有试过其它编码器吗?比如gru,或者textCNN?
    2. 有办法并行编码吗?虽然 transformer 的时间复杂度高,但是GPU并行编码很好解决了训练时间长的问题。从论文的E图看 CKY 树编码,一个 token 要分别编码几次,这样会不会导致训练时间实际更长?如,3层 R2D2 比 12 层 transformer 训练数据时间更长? 谢谢作者。
    opened by wulaoshi 1
Releases(fast-R2D2)
Owner
Alipay
Ant Group Open Source
Alipay
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 07, 2022
PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

ÚFAL 40 Jan 04, 2023
Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV File YOLOv3 weight can be downloaded

Ngoc Quyen Ngo 2 Mar 27, 2022
Ultra-lightweight human body posture key point CNN model. ModelSize:2.3MB HUAWEI P40 NCNN benchmark: 6ms/img,

Ultralight-SimplePose Support NCNN mobile terminal deployment Based on MXNET(=1.5.1) GLUON(=0.7.0) framework Top-down strategy: The input image is t

223 Dec 27, 2022
Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

CottonWeeds Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems. requirements pytorch torchsumma

Dong Chen 8 Jun 07, 2022
U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Dennis Bappert 104 Nov 25, 2022
The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier')

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography

James 135 Dec 23, 2022
Pytorch0.4.1 codes for InsightFace

InsightFace_Pytorch Pytorch0.4.1 codes for InsightFace 1. Intro This repo is a reimplementation of Arcface(paper), or Insightface(github) For models,

1.5k Jan 01, 2023
Face Recognition & AI Based Smart Attendance Monitoring System.

In today’s generation, authentication is one of the biggest problems in our society. So, one of the most known techniques used for authentication is h

Sagar Saha 1 Jan 14, 2022
[ArXiv 2021] One-Shot Generative Domain Adaptation

GenDA - One-Shot Generative Domain Adaptation One-Shot Generative Domain Adaptation Ceyuan Yang*, Yujun Shen*, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Z

GenForce: May Generative Force Be with You 46 Dec 19, 2022
Improving adversarial robustness by a coupling rejection strategy

Adversarial Training with Rectified Rejection The code for the paper Adversarial Training with Rectified Rejection. Environment settings and libraries

Tianyu Pang 29 Jan 06, 2023
Official Repository of NeurIPS2021 paper: PTR

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning Figure 1. Dataset Overview. Introduction A critical aspect of human vis

Yining Hong 32 Jun 02, 2022
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Rbc-traps-generative-design - The generative design for single red clood cell hydrodynamic traps using GEFEST framework

Natural Systems Simulation Lab 4 Jun 16, 2022
Predictive AI layer for existing databases.

MindsDB is an open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning

MindsDB Inc 12.2k Jan 03, 2023
Compact Bidirectional Transformer for Image Captioning

Compact Bidirectional Transformer for Image Captioning Requirements Python 3.8 Pytorch 1.6 lmdb h5py tensorboardX Prepare Data Please use git clone --

YE Zhou 19 Dec 12, 2022
A tool to prepare websites grabbed with wget for local viewing.

makelocal A tool to prepare websites grabbed with wget for local viewing. exapmples After fetching xkcd.com with: wget -r -no-remove-listing -r -N --p

5 Apr 23, 2022
Research code of ICCV 2021 paper "Mesh Graphormer"

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 08, 2023
U-Time: A Fully Convolutional Network for Time Series Segmentation

U-Time & U-Sleep Official implementation of The U-Time [1] model for general-purpose time-series segmentation. The U-Sleep [2] model for resilient hig

Mathias Perslev 176 Dec 19, 2022