The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

Overview

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

This repository provides the implementation details for the ACL 2021 main conference paper:

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data. [paper]

1. Data Preparation

In this work, we carried out persona-based dialogue generation experiments under a persona-dense scenario (English PersonaChat) and a persona-sparse scenario (Chinese PersonalDialog), with the assistance of a series of auxiliary inference datasets. Here we summarize the key information of these datasets and provide the links to download these datasets if they are directly accessible.

2. How to Run

The setup.sh script contains the necessary dependencies to run this project. Simply run ./setup.sh would install these dependencies. Here we take the English PersonaChat dataset as an example to illustrate how to run the dialogue generation experiments. Generally, there are three steps, i.e., tokenization, training and inference:

  • Preprocessing

     python preprocess.py --dataset_type convai2 \
     --trainset ./data/ConvAI2/train_self_original_no_cands.txt \
     --testset ./data/ConvAI2/valid_self_original_no_cands.txt \
     --nliset ./data/ConvAI2/ \
     --encoder_model_name_or_path ./pretrained_models/bert/bert-base-uncased/ \
     --max_source_length 64 \
     --max_target_length 32
    

    We have provided some data examples (dozens of lines) at the ./data directory to show the data format. preprocess.py reads different datasets and tokenizes the raw data into a series of vocab IDs to facilitate model training. The --dataset_type could be either convai2 (for English PersonaChat) or ecdt2019 (for Chinese PersonalDialog). Finally, the tokenized data will be saved as a series of JSON files.

  • Model Training

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --do_train \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder2_model ./pretrained_models/bert/bert-base-uncased/ \
     --save_model_path checkpoints/ConvAI2/bertoverbert --dataset_type convai2 \
     --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --learning_rate 7e-6 \
     --batch_size 32
    

    Here we initialize encoder and both decoders from the same downloaded BERT checkpoint. And more parameter settings could be found at bertoverbert.py.

  • Evaluations

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --dataset_type convai2 \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/  \
     --do_evaluation --do_predict \
     --eval_epoch 7
    

    Empirically, in the PersonaChat experiment with default hyperparameter settings, the best-performing checkpoint should be found between epoch 5 and epoch 9. If the training procedure goes fine, there should be some results like:

     Perplexity on test set is 21.037 and 7.813.
    

    where 21.037 is the ppl from the first decoder and 7.813 is the final ppl from the second decoder. And the generated results is redirected to test_result.tsv, here is a generated example from the above checkpoint:

     persona:i'm terrified of scorpions. i am employed by the us postal service. i've a german shepherd named barnaby. my father drove a car for nascar.
     query:sorry to hear that. my dad is an army soldier.
     gold:i thank him for his service.
     response_from_d1:that's cool. i'm a train driver.
     response_from_d2:that's cool. i'm a bit of a canadian who works for america.  
    

    where d1 and d2 are the two BERT decoders, respectively.

  • Computing Infrastructure:

    • The released codes were tested on NVIDIA Tesla V100 32G and NVIDIA PCIe A100 40G GPUs. Notice that with a batch_size=32, the BoB model will need at least 20Gb GPU resources for training.

MISC

  • Build upon 🤗 Transformers.

  • Bibtex:

      @inproceedings{song-etal-2021-bob,
          title = "BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data",
          author = "Haoyu Song, Yan Wang, Kaiyan Zhang, Wei-Nan Zhang, Ting Liu",
          booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL-2021)",
          month = "Aug",
          year = "2021",
          address = "Online",
          publisher = "Association for Computational Linguistics",
      }
      
  • Email: [email protected].

中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。

MASR中文语音识别(pytorch版) 开箱即用 自行训练 使用与训练分离(增量训练) 识别率高 说明:因为每个人电脑机器不同,而且有些安装包安装起来比较麻烦,强烈建议直接用我编译好的docker环境跑 目前docker基础环境为ubuntu-cuda10.1-cudnn7-pytorch1.6.

发送小信号 180 Dec 17, 2022
Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators. We provide

WeiYang 798 Jan 01, 2023
An open source library for face detection in images. The face detection speed can reach 1000FPS.

libfacedetection This is an open source library for CNN-based face detection in images. The CNN model has been converted to static variables in C sour

Shiqi Yu 11.4k Dec 27, 2022
Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

Fast-Partial-Ranking-MNL This repo provides a PyTorch implementation for the CopulaGNN models as described in the following paper: Fast Learning of MN

Xingjian Zhang 3 Aug 19, 2022
DGN pymarl - Implementation of DGN on Pymarl, which could be trained by VDN or QMIX

This is the implementation of DGN on Pymarl, which could be trained by VDN or QM

4 Nov 23, 2022
Official repository for Natural Image Matting via Guided Contextual Attention

GCA-Matting: Natural Image Matting via Guided Contextual Attention The source codes and models of Natural Image Matting via Guided Contextual Attentio

Li Yaoyi 349 Dec 26, 2022
Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Weakly Supervised Text-to-SQL Parsing through Question Decomposition The official repository for the paper "Weakly Supervised Text-to-SQL Parsing thro

14 Dec 19, 2022
Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

Label-Efficient Semantic Segmentation with Diffusion Models Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion

Yandex Research 355 Jan 06, 2023
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

108 Dec 01, 2022
[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

CPT: Efficient Deep Neural Network Training via Cyclic Precision Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin Accep

26 Oct 25, 2022
An end-to-end project on customer segmentation

End-to-end Customer Segmentation Project Note: This project is in progress. Tools Used in This Project Prefect: Orchestrate workflows hydra: Manage co

Ocelot Consulting 8 Oct 06, 2022
A flexible framework of neural networks for deep learning

Chainer: A deep learning framework Website | Docs | Install Guide | Tutorials (ja) | Examples (Official, External) | Concepts | ChainerX Forum (en, ja

Chainer 5.8k Jan 06, 2023
Semantic graph parser based on Categorial grammars

Lambekseq "Everyone who failed Greek or Latin hates it." This package is for proving theorems in Categorial grammars (CG) and constructing semantic gr

10 Aug 19, 2022
计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

PyTorch实现多种计算机视觉中网络设计中用到的Attention机制,还收集了一些即插即用模块。由于能力有限精力有限,可能很多模块并没有包括进来,有任何的建议或者改进,可以提交issue或者进行PR。

PJDong 599 Dec 23, 2022
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 74 Dec 27, 2022
Multi-Joint dynamics with Contact. A general purpose physics simulator.

MuJoCo Physics MuJoCo stands for Multi-Joint dynamics with Contact. It is a general purpose physics engine that aims to facilitate research and develo

DeepMind 5.2k Jan 02, 2023
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation This repository contains the source code of our paper, ESPNet (acc

Sachin Mehta 515 Dec 13, 2022
PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images

wrist-d PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images note: Paper: Under Review at MPDI Diagnostics Submission Date: Novemb

Fatih UYSAL 5 Oct 12, 2022
Spearmint Bayesian optimization codebase

Spearmint Spearmint is a software package to perform Bayesian optimization. The Software is designed to automatically run experiments (thus the code n

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 1.5k Dec 29, 2022
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dea

MIC-DKFZ 1.2k Jan 04, 2023