End-To-End Crowdsourcing

Overview

End-To-End Crowdsourcing

Comparison of traditional crowdsourcing approaches to a state-of-the-art end-to-end crowdsourcing approach LTNet on sentiment analysis. LTNet is adapted from "Facial Expression Recognition with Inconsistently Annotated Datasets" to text data. It encompasses a simple attention based neural network and utilizes confusion matrices as a noise reduction technique. For comparison, the traditional ground truth estimators "Fast-Dawid-Skene" and "MACE" are applied.

This codebase was used in both "End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment Analysis" and "Deep End-to-End Learning for Noisy Annotations and Crowdsourcing in Natural Language Processing".

Training

This is an example training procedure for the TripAdvisor dataset. The dataset and solver objects are initialized before a standard LTNet model is trained for 300 epochs.

import torch
import pytz
import datetime

from datasets.tripadvisor import TripAdvisorDataset
from solver import Solver
from utils import *

# gpu
DEVICE = torch.device('cuda')

# cpu
# DEVICE = torch.device('cpu')

label_dim = 2
annotator_dim = 2
loss = 'nll'
one_dataset_one_annotator = False
dataset = TripAdvisorDataset(device=DEVICE, one_dataset_one_annotator=one_dataset_one_annotator)

lr = 1e-5
batch_size = 64
current_time = datetime.datetime.now(pytz.timezone('Europe/Berlin')).strftime("%Y%m%d-%H%M%S")
hyperparams = {'batch': batch_size, 'lr': lr}
writer = get_writer(path=f'../logs/test',
                    current_time=current_time, params=hyperparams)

solver = Solver(dataset, lr, batch_size, 
                writer=writer,
                device=DEVICE,
                label_dim=label_dim,
                annotator_dim=annotator_dim)

model, f1 = solver.fit(epochs=300, return_f1=True,
                       deep_randomization=True)

These initialization and training steps of a network are abstracted away into src/training. Scripts with many more details on training procedures and different configurations can be found in src/scripts. All are best loaded into an ipython terminal with the %load command.

Databases

How to use them from outside the src folder?

It makes us able to refer to the classes properly.

import sys
sys.path.append("src/")

Pass the root folders of the embeddings and the data.

from datasets.emotion import EmotionDataset

dataset = EmotionDataset(
        text_processor='word2vec', 
        text_processor_filters=['lowercase', 'stopwordsfilter'],
        embedding_path='data/embeddings/word2vec/glove.6B.50d.txt',
        data_path='data/'
        )

Datasets are available at "TripAdvisor", "Emotion" and "Organic".

TripAdvisor Dataset

code

from datasets.tripadvisor import TripAdvisorDataset

dataset = TripAdvisorDataset(text_processor='word2vec', text_processor_filters=['lowercase', 'stopwordsfilter'])

print(f'Dataset is in {dataset.mode} mode')
print(f'Train-Validation split is {dataset.train_val_split}')
print(f'1st train datapoint: {dataset[0]}')

output

Dataset is in train mode
Train-Validation split is 0.8
1st train datapoint: {'label': 0, 'annotator':'f', 'rating': 4, 'text': 'I realise ...', 'embedding': array}

Emotion Dataset

Every headline has been annotated on each emotion. One can select one emotion as the label by the set_emotion method.

code

from datasets.emotion import EmotionDataset

dataset = TripAdvisorDataset(text_processor='word2vec', text_processor_filters=['lowercase', 'stopwordsfilter'])

print(f'Dataset is in {dataset.mode} mode')
print(f'Train-Validation split is {dataset.train_val_split}')
dataset.set_emotion('anger')
print(f'1st train datapoint: {dataset[0]}') # select anger_label as label
dataset.set_emotion('disgust')
print(f'1st train datapoint: {dataset[0]}') # select disgust_label as label

output

Dataset is in train mode
Train-Validation split is 0.8
1st train datapoint: {'label': 0, 'annotator':'xxx1', 'anger_response':0, 'anger_label':0, 'anger_gold'=1, 'disgust_response':0 ... 'text': 'I realise ...', ... 'embedding': array}
1st train datapoint: {'label': 1, 'annotator':'xxx1', 'anger_response':0, 'anger_label':0, 'anger_gold'=1, 'disgust_response':0 ... 'text': 'I realise ...', ... 'embedding': array}
Owner
Andreas Koch
Robotics Graduate @ TU Munich
Andreas Koch
Very large and sparse networks appear often in the wild and present unique algorithmic opportunities and challenges for the practitioner

Sparse network learning with snlpy Very large and sparse networks appear often in the wild and present unique algorithmic opportunities and challenges

Andrew Stolman 1 Apr 30, 2021
A little Python application to auto tag your photos with the power of machine learning.

Tag Machine A little Python application to auto tag your photos with the power of machine learning. Report a bug or request a feature Table of Content

Florian Torres 14 Dec 21, 2022
Cleaned test data list of DukeMTMC-reID, ICCV2021

Cleaned DukeMTMC-reID Cleaned data list of DukeMTMC-reID released with our paper accepted by ICCV 2021: Learning Instance-level Spatial-Temporal Patte

14 Feb 19, 2022
Attentive Implicit Representation Networks (AIR-Nets)

Attentive Implicit Representation Networks (AIR-Nets) Preprint | Supplementary | Accepted at the International Conference on 3D Vision (3DV) teaser.mo

29 Dec 07, 2022
Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

Shiqi Yang 53 Dec 25, 2022
ConformalLayers: A non-linear sequential neural network with associative layers

ConformalLayers: A non-linear sequential neural network with associative layers ConformalLayers is a conformal embedding of sequential layers of Convo

Prograf-UFF 5 Sep 28, 2022
A PyTorch implementation: "LASAFT-Net-v2: Listen, Attend and Separate by Attentively aggregating Frequency Transformation"

LASAFT-Net-v2 Listen, Attend and Separate by Attentively aggregating Frequency Transformation Woosung Choi, Yeong-Seok Jeong, Jinsung Kim, Jaehwa Chun

Woosung Choi 29 Jun 04, 2022
Wordle-solver - Wordle answer generation program in python

🟨 Wordle Solver 🟩 Wordle answer generation program in python ✔️ Requirements U

Dahyun Kang 4 May 28, 2022
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
Unimodal Face Classification with Multimodal Training

Unimodal Face Classification with Multimodal Training This is a PyTorch implementation of the following paper: Unimodal Face Classification with Multi

Wenbin Teng 3 Jul 06, 2022
An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches

Transformer-in-Transformer An Implementation of the Transformer in Transformer paper by Han et al. for image classification, attention inside local pa

Rishit Dagli 40 Jul 25, 2022
This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."

Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers This repository contains code to run experiments in the paper "Signal Stre

0 Jan 19, 2022
[CVPR 2021] NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning

NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning Project Page | Paper | Supplemental material #1 | Supplement

KAIST VCLAB 49 Nov 24, 2022
PyTorch implementation of DUL (Data Uncertainty Learning in Face Recognition, CVPR2020)

PyTorch implementation of DUL (Data Uncertainty Learning in Face Recognition, CVPR2020)

Mouxiao Huang 20 Nov 15, 2022
Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 87 Jan 03, 2023
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

170 Jan 03, 2023
MonoScene: Monocular 3D Semantic Scene Completion

MonoScene: Monocular 3D Semantic Scene Completion MonoScene: Monocular 3D Semantic Scene Completion] [arXiv + supp] | [Project page] Anh-Quan Cao, Rao

298 Jan 08, 2023
🛠 All-in-one web-based IDE specialized for machine learning and data science.

All-in-one web-based development environment for machine learning Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issu

Machine Learning Tooling 2.9k Jan 09, 2023
ChebLieNet, a spectral graph neural network turned equivariant by Riemannian geometry on Lie groups.

ChebLieNet: Invariant spectral graph NNs turned equivariant by Riemannian geometry on Lie groups Hugo Aguettaz, Erik J. Bekkers, Michaël Defferrard We

haguettaz 12 Dec 10, 2022
OpenLT: An open-source project for long-tail classification

OpenLT: An open-source project for long-tail classification Supported Methods for Long-tailed Recognition: Cross-Entropy Loss Focal Loss (ICCV'17) Cla

Ming Li 37 Sep 15, 2022