Establishing Strong Baselines for TripClick Health Retrieval; ECIR 2022

Overview

TripClick Baselines with Improved Training Data

Welcome 🙌 to the hub-repo of our paper:

Establishing Strong Baselines for TripClick Health Retrieval Sebastian Hofstätter, Sophia Althammer, Mete Sertkan and Allan Hanbury

https://arxiv.org/abs/2201.00365

tl;dr We create strong re-ranking and dense retrieval baselines (BERTCAT, BERTDOT, ColBERT, and TK) for TripClick (health ad-hoc retrieval). We improve the – originally too noisy – training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking and retrieval setting on TripClick, which were not achieved with the original baselines. We publish the improved training files for everyone to use.

If you have any questions, suggestions, or want to collaborate please don't hesitate to get in contact with us via Twitter or mail to [email protected]

Please cite our work as:

@misc{hofstaetter2022tripclick,
      title={Establishing Strong Baselines for TripClick Health Retrieval}, 
      author={Sebastian Hofst{\"a}tter and Sophia Althammer and Mete Sertkan and Allan Hanbury},
      year={2022},
      eprint={2201.00365},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

Training Files

We publish the improved training files without the text content instead using the ids from TripClick (with permission from the TripClick owners); for the text content please get the full TripClick dataset from the TripClick Github page.

Our training files have the format query_id pos_passage_id neg_passage_id (with tab separation) and are available as a HuggingFace dataset: https://huggingface.co/datasets/sebastian-hofstaetter/tripclick-training

Source Code

The full source-code for our paper is here, as part of our matchmaker library: https://github.com/sebastian-hofstaetter/matchmaker

We provide getting started guides for training re-ranking and retrieval models, as well as a range of evaluation setups.

Pre-Trained Models

Unfortunately, the license of TripClick does not allow us to publish the trained models.

TripClick Baselines Results

For more information and commentary on the results, please see our ECIR paper.

BM25 Top200 Re-Ranking

Model BERT Instance HEAD TORSO TAIL
nDCG MRR nDCG MRR nDCG MRR
Original Baselines
BM25 -- .140 .276 .206 .283 .267 .258
ConvKNRM -- .198 .420 .243 .347 .271 .265
TK -- .208 .434 .272 .381 .295 .280
Our Improved Baselines
TK -- .232 .472 .300 .390 .345 .319
ColBERT SciBERT .270 .556 .326 .426 .374 .347
PubMedBERT-Abstract .278 .557 .340 .431 .387 .361
BERT_CAT DistilBERT .272 .556 .333 .427 .381 .355
BERT-Base .287 .579 .349 .453 .396 .366
SciBERT .294 .595 .360 .459 .408 .377
PubMedBERT-Full .298 .582 .365 .462 .412 .381
PubMedBERT-Abstract .296 .587 .359 .456 .409 .380
Ensemble (Last 3 BERT_CAT) .303 .601 .370 .472 .420 .392

Dense Retrieval Results

Model BERT Instance Head(DCTR)
[email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Original Baselines
BM25 -- 31% .140 .276 .499 .621 .834
Our Improved Baselines
BERT_DOT DistilBERT 39% .236 .512 .550 .648 .813
SciBERT 41% .243 .530 .562 .640 .793
PubMedBERT 40% .235 .509 .582 .673 .828
Owner
Sebastian Hofstätter
PhD student; working on machine learning and information retrieval
Sebastian Hofstätter
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

3 Dec 05, 2022
McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

DiffuseAnimals: Reaction-Diffusion Models for the Generation of Biological Patterns Introduction Reaction-diffusion equations can be utilized in order

Austin Szuminsky 2 Mar 07, 2022
Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Self-Tuning for Data-Efficient Deep Learning This repository contains the implementation code for paper: Self-Tuning for Data-Efficient Deep Learning

THUML @ Tsinghua University 101 Dec 11, 2022
A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf 2021). Abstract In this work we propose Pathfind

Benedek Rozemberczki 49 Dec 01, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding 📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

55 Dec 21, 2022
An intelligent, flexible grammar of machine learning.

An english representation of machine learning. Modify what you want, let us handle the rest. Overview Nylon is a python library that lets you customiz

Palash Shah 79 Dec 02, 2022
A pytorch implementation of Reading Wikipedia to Answer Open-Domain Questions.

DrQA A pytorch implementation of the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions (DrQA). Reading comprehension is a task to produ

Runqi Yang 394 Nov 08, 2022
Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis

Introduction This is an implementation of our paper Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis.

24 Dec 06, 2022
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
Implements VQGAN+CLIP for image and video generation, and style transfers, based on text and image prompts. Emphasis on ease-of-use, documentation, and smooth video creation.

VQGAN-CLIP-GENERATOR Overview This is a package (with available notebook) for running VQGAN+CLIP locally, with a focus on ease of use, good documentat

Ryan Hamilton 98 Dec 30, 2022
Jupyter notebooks for the code samples of the book "Deep Learning with Python"

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

François Chollet 16.2k Dec 30, 2022
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
We utilize deep reinforcement learning to obtain favorable trajectories for visual-inertial system calibration.

Unified Data Collection for Visual-Inertial Calibration via Deep Reinforcement Learning Update: The lastest code will be updated in this branch. Pleas

ETHZ ASL 27 Dec 29, 2022
Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Explainability Requires Interactivity This repository contains the code to train all custom models used in the paper Explainability Requires Interacti

Digital Health & Machine Learning 5 Apr 07, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Rishikesh S 15 Aug 20, 2022
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
A unified framework to jointly model images, text, and human attention traces.

connect-caption-and-trace This repository contains the reference code for our paper Connecting What to Say With Where to Look by Modeling Human Attent

Meta Research 73 Oct 24, 2022
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Jan 01, 2023
[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples This repository is the official implementation of "Tow

Sungyoon Lee 4 Jul 12, 2022
Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model Official repository for the ICCV 2021 paper: UltraPose: Syn

MomoAILab 92 Dec 21, 2022