The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
TwitterBot-ImageCollector - Twitter bot that collects images from likes saves the image

TwitterBot-ImageCollector Bot de Twitter que recolecta imagenes a partir de los

Gx3 Studios 4 Jun 01, 2022
pokemon-colorscripts compatible for mac

Pokemon colorscripts some scripts to print out images of pokemons to terminal. Inspired by DT's colorscripts compilation Description Prints out colore

43 Jan 06, 2023
My Advent of Code solutions. I also upload videos of my solves: https://www.youtube.com/channel/UCuWLIm0l4sDpEe28t41WITA

My solutions to adventofcode.com puzzles. I post videos of me solving the puzzles in real-time at https://www.youtube.com/channel/UCuWLIm0l4sDpEe28t41

195 Jan 04, 2023
DeleteAllBot - Telegram bot to delete all messages in a group

Delete All Bot A star ⭐ from you means a lot to me ! Telegram bot to delete all

Stark Bots 15 Dec 26, 2022
A simple bot that looks for names and cpfs in the vaccination list made available by the government Fortaleza - CE

A simple bot that looks for names and cpfs in the vaccination list made available by the government Fortaleza - CE

Breno Aquino 1 Dec 21, 2021
An open-source, multipurpose, configurable discord bot that does it all

Spacebot is an open source discord bot that is designed to be fun, easy to use, and replace every other discord bot out there!! Feel free to add a star ⭐ to the repository to promote the project!

Dhravya Shah 41 Dec 10, 2022
Paid Udemy Courses with Coupons

Freedemy Paid Udemy Courses with Coupons Steps to run pip3 install -r requirements.txt python3 free-courses.py Then you can click the Enroll Link and

GOKUL A.P 23 Dec 14, 2022
📅 Calendar file generator for triathlonlive.tv upcoming events

Triathlon Live Calendar Calendar file generator for triathlonlive.tv upcoming events. Install Requires Python 3.9.4 and Poetry. $ poetry install Runni

Eduardo Cuducos 4 Sep 02, 2022
Clash of Clans developer unofficial api Wrapper to generate ip based token

Clash of Clans developer unofficial api Wrapper to generate ip based token

Aryan Vikash 6 Apr 01, 2022
An Open-Source Discord bot created to provide basic functionality which should be in every discord guild. We use this same bot with additional configurations for our guilds.

A Discord bot completely written to be taken from the source and built according to your own custom needs. This bot supports some core features and is

Tesseract Coding 14 Jan 11, 2022
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

ClearML 3.9k Jan 01, 2023
Enumerate Microsoft 365 Groups in a tenant with their metadata

Enumerate Microsoft 365 Groups in a tenant with their metadata Description The all_groups.py script allows to enumerate all Microsoft 365 Groups in a

Clément Notin 46 Dec 26, 2022
Repositório para a Live Coding do dia 22/12/2021 sobre AWS Step Functions

DIO Live Step Functions - 22/12/2021 Serviços AWS utilizados AWS Step Functions AWS Lambda Amazon S3 Amazon Rekognition Amazon DynamoDB Amazon Cloudwa

Cassiano Ricardo de Oliveira Peres 5 Mar 01, 2022
LOL-banner - A discord bot that bans anybody playing league of legends

LOL-banner A discord bot that bans anybody playing league of legends This bot ha

bsd_witch 46 Dec 17, 2022
Tomli is a Python library for parsing TOML. Tomli is fully compatible with TOML v1.0.0.

Tomli A lil' TOML parser Table of Contents generated with mdformat-toc Intro Installation Usage Parse a TOML string Parse a TOML file Handle invalid T

Taneli Hukkinen 313 Dec 26, 2022
CRUD database for python discord bot developers that stores data on discord text channels

Discord Database A CRUD (Create Read Update Delete) database for python Discord bot developers. All data is stored in key-value pairs directly on disc

Ankush Singh 7 Oct 22, 2022
this is a telegram torrent bot

owner of this repo :- AYUSH contact me :- AYUSH Slam Mirror Bot This is a telegram bot writen in python for mirroring files on the internet to our bel

AYUSH 237 Dec 16, 2021
Send alert to telegram use telegram cli

Run standalone: Rename conf.yml.example to conf.yml Change block cli(Add your api_id and hash) Install requirements.txt Run python AlertManagerTG.py I

Eugene Arkharov 1 Nov 12, 2021
Trellox Tool is written in Python3 and designed to pull and list Trello boards.

TrelloX Trellox Tool is written in Python3 and designed to list and pull Trello boards. It can be used by penetration testers/bug bounty hunters to de

Ali Fathi Ali Sawehli 1 Dec 05, 2021
Github Workflows üzerinde Çalışan A101 Aktüel Telegam Bot

A101AktuelRobot Github Workflows üzerinde Çalışan A101 Aktüel Telegam Bot @A101AktuelRobot 💸 Bağış Yap ☕️ Kahve Ismarla 🌐 Telif Hakkı ve Lisans Copy

Ömer Faruk Sancak 10 Nov 02, 2022