Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Overview

MTM

This is the official repository of the paper:

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Qiang Sheng, Juan Cao, Xueyao Zhang, Xirong Li, and Lei Zhong.

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

PDF / Poster / Code / Chinese Dataset / Chinese Blog 1 / Chinese Blog 2

Datasets

There are two experimental datasets, including the Twitter Dataset, and the firstly proposed Weibo Dataset. Note that you can download the Weibo Dataset only after an "Application to Use the Chinese Dataset for Detecting Previously Fact-Checked Claim" has been submitted.

Code

Key Requirements

python==3.6.10
torch==1.6.0
torchvision==0.7.0
transformers==3.2.0

Usage for Weibo Dataset

After you download the dataset (the way to access is described here), move the FN_11934_filtered.json and DN_27505_filtered.json into the path MTM/dataset/Weibo/raw:

mkdir MTM/dataset/Weibo/raw
mv FN_11934_filtered.json MTM/dataset/Weibo/raw
mv DN_27505_filtered.json MTM/dataset/Weibo/raw

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_weibo.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_weibo.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Weibo --pretrained_model bert-base-chinese
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Weibo --pretrained_model bert-base-chinese --save './ckpts/Weibo' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.01 \
    --fp16 True
    

    then you can get ckpts/Weibo/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Weibo --pretrained_model bert-base-chinese \
    --rouge_bert_model_file './ckpts/Weibo/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Weibo
    

    and you can get data/Weibo/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_weibo.sh:

    python kmeans_clustering.py --dataset Weibo --pretrained_model bert-base-chinese --clustering_data_file 'data/Weibo/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Weibo/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Weibo.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Weibo

You can refer to the run_weibo.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Weibo' \
--dataset 'Weibo' --pretrained_model 'bert-base-chinese' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Weibo/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Weibo/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Weibo/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Weibo/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 3 \
--lr 5e-6 --epochs 10 --batch_size 32 \

then the results and ranking reports will be saved in ckpts/Weibo.

Usage for Twitter Dataset

The description of the dataset can be seen at here.

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_twitter.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_twitter.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Twitter --pretrained_model bert-base-uncased
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Twitter --pretrained_model bert-base-uncased --save './ckpts/Twitter' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.05 \
    --fp16 True
    

    then you can get ckpts/Twitter/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Twitter --pretrained_model bert-base-uncased \
    --rouge_bert_model_file './ckpts/Twitter/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Twitter
    

    and you can get data/Twitter/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_twitter.sh:

    python kmeans_clustering.py --dataset Twitter --pretrained_model bert-base-uncased --clustering_data_file 'data/Twitter/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Twitter/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Twitter.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Twitter

You can refer to the run_twitter.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Twitter' \
--dataset 'Twitter' --pretrained_model 'bert-base-uncased' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Twitter/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Twitter/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Twitter/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Twitter/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 5 \
--lr 1e-4 --epochs 10 --batch_size 16 \

then the results and ranking reports will be saved in ckpts/Twitter.

Citation

@inproceedings{MTM,
  author    = {Qiang Sheng and
               Juan Cao and
               Xueyao Zhang and
               Xirong Li and
               Lei Zhong},
  title     = {Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting
               Previously Fact-Checked Claims},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational
               Linguistics and the 11th International Joint Conference on Natural
               Language Processing, {ACL/IJCNLP} 2021},
  pages     = {5468--5481},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
  url       = {https://doi.org/10.18653/v1/2021.acl-long.425},
  doi       = {10.18653/v1/2021.acl-long.425},
}
Owner
ICTMCG
Multimedia Computing Group, Institute of Computing Technology, Chinese Academy of Sciences. Our official account on WeChat: ICTMCG.
ICTMCG
U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

Xuebin Qin 6.5k Jan 09, 2023
InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Men

Hong Wang 4 Dec 27, 2022
Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

DMT: Dynamic Mutual Training for Semi-Supervised Learning This repository contains the code for our paper DMT: Dynamic Mutual Training for Semi-Superv

Zhengyang Feng 120 Dec 30, 2022
Stochastic Normalizing Flows

Stochastic Normalizing Flows We introduce stochasticity in Boltzmann-generating flows. Normalizing flows are exact-probability generative models that

AI4Science group, FU Berlin (Frank Noé and co-workers) 50 Dec 16, 2022
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

Microsoft 983 Dec 23, 2022
Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

Meta Research 621 Dec 31, 2022
Permute Me Softly: Learning Soft Permutations for Graph Representations

Permute Me Softly: Learning Soft Permutations for Graph Representations

Giannis Nikolentzos 7 Jul 10, 2022
Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

PIXEL-LEVEL CRACK DETECTION FROM IMAGES OF LEVEE SYSTEMS : A COMPARATIVE STUDY G

Manisha Panta 2 Jul 23, 2022
🥈78th place in Riiid Solution🥈

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

ds wook 14 Apr 26, 2022
Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

IMDB Success Predictor Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine

Gautam Diwan 1 Jan 18, 2022
Keras implementation of AdaBound

AdaBound for Keras Keras port of AdaBound Optimizer for PyTorch, from the paper Adaptive Gradient Methods with Dynamic Bound of Learning Rate. Usage A

Somshubra Majumdar 132 Sep 23, 2022
A lossless neural compression framework built on top of JAX.

Kompressor Branch CI Coverage main (active) main development A neural compression framework built on top of JAX. Install setup.py assumes a compatible

Rosalind Franklin Institute 2 Mar 14, 2022
Neural network pruning for finding a sparse computational model for controlling a biological motor task.

MothPruning Scientific Overview Originally inspired by biological nervous systems, deep neural networks (DNNs) are powerful computational tools for mo

Olivia Thomas 0 Dec 14, 2022
This is the latest version of the PULP SDK

PULP-SDK This is the latest version of the PULP SDK, which is under active development. The previous (now legacy) version, which is no longer supporte

78 Dec 07, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
Code for NeurIPS 2021 paper "Curriculum Offline Imitation Learning"

README The code is based on the ILswiss. To run the code, use python run_experiment.py --nosrun -e your YAML file -g gpu id Generally, run_experim

ApexRL 12 Mar 19, 2022
Contextual Attention Network: Transformer Meets U-Net

Contextual Attention Network: Transformer Meets U-Net Contexual attention network for medical image segmentation with state of the art results on skin

Reza Azad 67 Nov 28, 2022
MTA:SA Server Configer.

MTAConfiger MTA:SA Server Configer. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Learning CS

3 Jun 07, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022