Data and code accompanying the paper Politics and Virality in the Time of Twitter

Last update: Jul 02, 2022

Overview

Politics and Virality in the Time of Twitter

Data and code accompanying the paper Politics and Virality in the Time of Twitter.

In specific:

the code used for the training of our models (./code/finetune_models.py and ./code/finetune_multi_cv.py)
a Jupyter Notebook containing the major parts of our analysis (./code/analysis.ipynb)
the model that was selected and used for the sentiment analysis.
the manually annotated data used for training are shared (./data/annotation/).
the ids of tweets that were used in our analyis and control experiments (./data/main/ & ./data/control)
names, parties and handles of the MPs that were tracked (./data/mps_list.csv).

Annotated Data (./data/annotation/)

One folder for each language (English, Spanish, Greek).
In each directory there are three files:
1. *_900.csv contains the 900 tweets that annotators labelled individually (300 tweets each annotator).
2. *_tiebreak_100.csv contains the initial 100 tweets all annotators labelled. 'annotator_3' indicates the annotator that was used as a tiebreaker.
3. *_combined.csv contains all tweets labelled for the language.

Model

While we plan to upload all the models trained for our experiments to huggingface.co, currently only the main model used in our analysis can be currently be find at: https://drive.google.com/file/d/1_Ngmh-uHGWEbKHFpKmQ1DhVf6LtDTglx/view?usp=sharing

The model, 'xlm-roberta-sentiment-multilingual', is based on the implementation of 'cardiffnlp/twitter-xlm-roberta-base-sentiment' while being further finetuned on the annotated dataset.

Example usage

from transformers import AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained('./xlm-roberta-sentiment-multilingual/')
sentiment_analysis_task = pipeline("sentiment-analysis", model=model, tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment")

sentiment_analysis_task('Today is a good day')
Out: [{'label': 'Positive', 'score': 0.978614866733551}]

Reference paper

For more details, please check the reference paper. If you use the data contained in this repository for your research, please cite the paper using the following bib entry:

@inproceedings{antypas2022politics,
  title={{Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom}},
  author={Antypas, Dimosthenis and Preece, Alun and Camacho-Collados, Jose},
  booktitle={arXiv preprint arXiv:2202.00396},
  year={2022}
}

Data and code accompanying the paper Politics and Virality in the Time of Twitter

Related tags

Overview

Politics and Virality in the Time of Twitter

Annotated Data (./data/annotation/)

Model

Example usage

Reference paper

Owner

Cardiff NLP

Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Port of dplyr and other related R packages in python, using pipda.

Working Time Statistics of working hours and working conditions by industry and company

small package with utility functions for analyzing (fly) calcium imaging data

Import, connect and transform data into Excel

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

Analyzing Covid-19 Outbreaks in Ontario

Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

BIGDATA SIMULATION ONE PIECE WORLD CENSUS

INFO-H515 - Big Data Scalable Analytics

Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

A set of functions and analysis classes for solvation structure analysis

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

Display the behaviour of a realtime program with a scope or logic analyser.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

MIR Cheatsheet - Survival Guidebook for MIR Researchers in the Lab