Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

Last update: Dec 06, 2022

Overview

DHAP

Source code of SIGIR2021 Long Paper:

One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles .

Preinstallation

First, install the python packages in your Python3 environment:

  git clone https://github.com/zhengyima/DHAP.git DHAP
  cd DHAP
  pip install -r requirements.txt

Then, you should download the pre-trained word embeddings to initialize the model training. We provide two word embeddings in the Google Drive:

sgns.weibo.bigram-char, folloing Li et al., Chinese word embeddings pre-trained on Weibo. Google Drive
Fasttext embeddings, English word embedding pre-trained on Reddit set. Google Drive

You can pre-train your own embeddings(with the same format, i.e., the standard txt format), and use it in the model.

After downloading, you should put the embedding file to the path EMB_FILE.

Data

You should provide the dialogue history of users for training the model. For convenience, we provide a very small subset of PChatbot in the data/ as the demo data. In the direcotry, each user's dialogue history is saved in one text file. Each line in the file should contain post text, user id of post, post timestamp, response text, user id of response, response timestamp, _, _ , with tab as the seperator.

You can refer to seq2seq/dataset/perdialogDatasets.py for more details about the data processing.

If you are interested in the dataset PChatbot, please go to its official repository for more details.

Model Training

We provide a shell script scripts/train_chat.sh to start model pre-training. You should modify the DATA_DIR and EMB_FILE to your own paths. Then, you can start training by the following command:

bash scripts/train_chat.sh

The hyper-parameters are defined and set in the configParser.py.

After training, the trained checkpoints are saved in outputs. The inferenced result is saved in RESULT_FILE, which you define in bash scripts/train_chat.sh

Evaluating

For calculating varities of evaluation metrics(e.g. BLEU, P-Cover...), we provide a shell script scripts/eval.sh. You should modify the EMB_FILE to your own path, then evaluate the results by the following command:

bash scripts/eval.sh

Citations

If our code helps you, please cite our work by:

@inproceedings{DBLP:conf/sigir/madousigir21,
     author = {Zhengyi Ma and Zhicheng Dou and Yutao Zhu Hanxun Zhong and Ji-Rong Wen}, 
     title = {One Chatbot Per Person: Creating Personalized Chatbots based onImplicit User Profiles}, 
     booktitle = {Proceedings of the {SIGIR} 2021}, 
     publisher = {{ACM}}, 
     year = {2021}, 
     url = {https://doi.org/10.1145/3404835.3462828}, 
     doi = {10.1145/3404835.3462828}}

Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

Related tags

Overview

DHAP

Preinstallation

Data

Model Training

Evaluating

Citations

Links

Owner

ZYMa

A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization components are included and optional.

A set of tests for evaluating large-scale algorithms for Wasserstein-2 transport maps computation.

This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.

Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation"

Bridging Vision and Language Model

PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Symbolic Music Generation with Diffusion Models

Generating Digital Painting Lighting Effects via RGB-space Geometry (SIGGRAPH2020/TOG2020)

PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

Automatic caption evaluation metric based on typicality analysis.

CVPRW 2021: How to calibrate your event camera

The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Bravia core script for python

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

Official implementation of Deep Convolutional Dictionary Learning for Image Denoising.