Bert Axioms

This is the repository with the code for the Paper Diagnosing BERT with Retrieval Heuristics

Required Data

In order to run this code, you first need to download the dataset from the TREC 2019 Deep Learning Track Guidelines. The path for these should be specified in the config file

You also need a working installation of the Indri Toolkit for indexing and retrieval.

Parameters

There are a number of hyperparemeter that need to be set (like indri path, number of candidates to be retrieved, random seed etc). These can be set on a config YAML file at scripts/config-defaults.yaml. The parameters are handled by wandb, but can easily be addapted for any YAML reader (take a look at PyYAML.)

Observations

Note that, for LNC2, we use an external C++ code for dealing with Indri. This is so we can add the duplicated documents to the index without comprimissing scores. This code should be compiled with Indri's Makefile.app. This should be as easy as edditing Makefile.app from Indri and running make -f Makefile.app. (Check https://lemur.sourceforge.io/indri/ for more details).

The removal process of documents from the indri index does not guarantee that the index statistics will change immediately. This can cause slight differences than the more "correct" way to re-create the index from scratch for every duplicated document.

Expected Results

The results from this repository may not directly replicate the ones that appear on the paper. This is due to a few performance improvements made after the paper submission. These, however, do not change the final scores and conclusions. Mostly, you may see a increase on alpha-nDCG for all methods, and a increase on QL performance accross the board.

	`nDCG_cut`	`TFCI`	`TFCII`	`MTDC`	`LNC1`	`LNC2`	`TP`	`STMC1`	`STMC2`	`STMC3`
QL	0.3633	0.9936	0.7008	0.8759	0.5021	1.000	0.3852	0.4855	0.7047	0.7011
DistilBERT	0.4537	0.6109	0.3945	0.5130	0.5006	0.0003	0.4105	0.5040	0.5120	0.5099

Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Related tags

Overview

Bert Axioms

Required Data

Parameters

Observations

Expected Results

Owner

Arthur Câmara

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

This repo contains implementation of different architectures for emotion recognition in conversations.

Cmsc11 arcade - Final Project for CMSC11

Generates all variables from your .tf files into a variables.tf file.

Learning to Self-Train for Semi-Supervised Few-Shot

FairyTailor: Multimodal Generative Framework for Storytelling

disentanglement_lib is an open-source library for research on learning disentangled representations.

Pneumonia Detection using machine learning - with PyTorch

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

D-NeRF: Neural Radiance Fields for Dynamic Scenes

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

A project that uses optical flow and machine learning to detect aimhacking in video clips.

MutualGuide is a compact object detector specially designed for embedded devices

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Invert and perturb GAN images for test-time ensembling