System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Overview

Validating Simulations of User Query Variants

This repository contains the scripts of the experiments and evaluations, simulated queries, as well as the figures of:

Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. In Proceedings of the 44th European Conference on IR Research, ECIR 2022.

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.

Directory overview

Directory Description
config/ Contains configuration files for the query simulations, experiments, and evaluations.
data/ Contains (intermediate) output data of the simulations and experiments as well as the figures of the paper.
eval/ Contains scripts of the experiments and evaluations.
sim/ Contains scripts of the query simulations.

Setup

  1. Install Anserini and index Core17 (The New York Times Annotated Corpus) according to the regression guide:
anserini/target/appassembler/bin/IndexCollection \
    -collection NewYorkTimesCollection \
    -input /path/to/core17/ \
    -index anserini/indexes/lucene-index.core17 \
    -generator DefaultLuceneDocumentGenerator \
    -threads 4 \
    -storePositions \
    -storeDocvectors \
    -storeRaw \
    -storeContents \
    > anserini/logs/log.core17 &
  1. Install the required Python packages:
pip install -r requirements.txt

Query simulation

In order to prepare the language models and simulate the queries, the scripts have to executed in the order shown in the following table. All of the outputs can be found in the data/ directory. For the sake of better code readability the names of the query reformulation strategies have been mapped: S1S1; S2S2; S2'S3; S3S4; S3'S5; S4S6; S4'S7; S4''S8. The names of the scripts and output files comply with this name mapping.

Script Description Output files
sim/make_background.py Make the background language model form all index terms of Core17. The background model is required for Controlled Query Generation (CQG) by Jordan et al. data/lm/background.csv
sim/make_cqg.py Make the CQG language models with different parameters of lambda from 0.0 to 1.0. data/lm/cqg.json
sim/simulate_queries_s12345.py Simulate TTS and KIS queries with strategies S1 to S3' data/queries/s12345.csv
sim/simulate_queries_s678.py Simulate TTS and KIS queries with strategies S4 to S4'' data/queries/s678.csv

Experimental evaluation and results

In order to reproduce the experiments of the study, the scripts have to executed in the order shown in the following table.

Script Description Output files Reproduction of ...
eval/arp.py, eval/arp_first.py, eval/arp_max.py Retrieval performance: Evaluate the Average Retrieval Performance (ARP). data/experimental_results/arp.csv, data/experimental_results/arp_first.csv, data/experimental_results/arp_max.csv Tab. A.1
eval/rmse_s12345.py, eval/rmse_s678.py Retrieval performance: Evaluate the Root-Mean-Square-Error (RMSE). data/experimental_results/rmse_map.csv, data/experimental_results/rmse_ndcg.csv, data/experimental_results/rmse_p1000.csv, data/experimental_results/rmse_uqv_vs_s12345_kis_ndcg.csv, data/experimental_results/rmse_uqv_vs_s12345_tts_ndcg.csv, data/figures/rmse_map.pdf, data/figures/rmse_ndcg.pdf, data/figures/rmse_p1000.pdf, data/figures/rmse_uqv_vs_s12345_kis_ndcg.pdf, data/figures/rmse_uqv_vs_s12345_tts_ndcg.pdf Fig. A.1, Fig. 1
eval/t-test.py Retrieval performance: Evaluate the p-values of paired t-tests. data/experimental_results/ttest.csv, data/figures/ttest.pdf Fig. A.2
eval/system_orderings.py Shared task utility: Evaluate Kendall's tau between relative system orderings. data/experimental_results/system_orderings.csv, data/figures/system_orderings.pdf Fig. 2 (left)
eval/sdcg.py Effort and effect: Evaluate the Session Discounted Cumulative Gain (sDCG). data/experimental_results/sdcg_3queries.csv, data/experimental_results/sdcg_5queries.csv, data/experimental_results/sdcg_10queries.csv, data/figures/sdcg_3queries.pdf, data/figures/sdcg_5queries.pdf, data/figures/sdcg_10queries.pdf Fig. 3 (top)
eval/economic.py Effort and effect: Evaluate tradeoffs between number of queries and browsing depth by isoquants. data/experimental_results/economic0.3.csv, data/experimental_results/economic0.4.csv, data/experimental_results/economic0.5.csv, data/figures/economic0.3.pdf, data/figures/economic0.4.pdf, data/figures/economic0.5.pdf Fig. 3 (bottom)
eval/jaccard_similarity.py Query term similarity: Evaluate query term similarities. data/experimental_results/jacc.csv, data/figures/jacc.pdf Fig. 2 (right)
Owner
IR Group at Technische Hochschule Köln
IR Group at Technische Hochschule Köln
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

xshen 41 Dec 10, 2022
Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation

Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation Introduction 📋 Official implementation of Explainable Robust Learnin

JeongEun Park 6 Apr 19, 2022
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dea

MIC-DKFZ 1.2k Jan 04, 2023
[WWW 2022] Zero-Shot Stance Detection via Contrastive Learning

PT-HCL for Zero-Shot Stance Detection The code of this repository is constantly being updated... Please look forward to it! Introduction This reposito

Akuchi 12 Dec 21, 2022
Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

Second-order Neural ODE Optimizer (NeurIPS 2021 Spotlight) [arXiv] ✔️ faster convergence in wall-clock time | ✔️ O(1) memory cost | ✔️ better test-tim

Guan-Horng Liu 39 Oct 22, 2022
Square Root Bundle Adjustment for Large-Scale Reconstruction

RootBA: Square Root Bundle Adjustment Project Page | Paper | Poster | Video | Code Table of Contents Citation Dependencies Installing dependencies on

Nikolaus Demmel 205 Dec 20, 2022
BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins

BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins Deep learning has brought most profound contributio

Narinder Singh Punn 12 Dec 04, 2022
Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Training Reproduce of PSPNet. (Updated 2021/04/09. Authors of PSPNet have provided a Pytorch implementation for PSPNet and their new work with support

Li Xuhong 126 Jul 13, 2022
A Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images.

Lobe This is a Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images. This component lets you easily use an exported m

Kendell R 4 Feb 28, 2022
FAVD: Featherweight Assisted Vulnerability Discovery

FAVD: Featherweight Assisted Vulnerability Discovery This repository contains the replication package for the paper "Featherweight Assisted Vulnerabil

secureIT 4 Sep 16, 2022
Training Very Deep Neural Networks Without Skip-Connections

DiracNets v2 update (January 2018): The code was updated for DiracNets-v2 in which we removed NCReLU by adding per-channel a and b multipliers without

Sergey Zagoruyko 585 Oct 12, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data By Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, W

Taihong Xiao 141 Apr 16, 2021
Seq2seq - Sequence to Sequence Learning with Keras

Seq2seq Sequence to Sequence Learning with Keras Hi! You have just found Seq2Seq. Seq2Seq is a sequence to sequence learning add-on for the python dee

Fariz Rahman 3.1k Dec 18, 2022
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

Karbo 45 Dec 21, 2022
null

DeformingThings4D dataset Video | Paper DeformingThings4D is an synthetic dataset containing 1,972 animation sequences spanning 31 categories of human

208 Jan 03, 2023
Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 08, 2022
Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

RefinerViT This repo is the official implementation of "Refiner: Refining Self-attention for Vision Transformers". The repo is build on top of timm an

101 Dec 29, 2022
Bootstrapped Representation Learning on Graphs

Bootstrapped Representation Learning on Graphs This is the PyTorch implementation of BGRL Bootstrapped Representation Learning on Graphs The main scri

NerDS Lab :: Neural Data Science Lab 55 Jan 07, 2023
PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features Overview This repository is the Pytorch implementation of PRIN/SPRIN: On Extracting P

Yang You 17 Mar 02, 2022