Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Last update: Dec 16, 2022

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

ACL2021 Findings

Usage

0. Prepare environment

Requirements:

python==3.6
tensorflow-gpu==1.13.1
scipy==1.5.2
scikit-learn==0.23.2

1. Prepare data

Download preprocessed datasets from Google Drive and extract files to the path ./data.

2. Run the model

python main.py --data_dir ./data/{dataset} --output_dir ./output

3. Evaluation

topic coherence: coherence score.

topic diversity:

python utils/TU.py --data_path {path of topic word file}

Citation

If you are interested in our work, please cite as

@inproceedings{wu2021discovering,
    title = "Discovering Topics in Long-tailed Corpora with Causal Intervention",
    author = "Wu, Xiaobao  and
    Li, Chunping  and
    Miao, Yishu",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.15",
    doi = "10.18653/v1/2021.findings-acl.15",
    pages = "175--185",
}

Other related works

EMNLP2020 Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

NLPCC2020 Learning Multilingual Topics with Neural Variational Inference

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Related tags

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

Usage

0. Prepare environment

1. Prepare data

2. Run the model

3. Evaluation

Citation

Other related works

Owner

Xiaobao Wu

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

【原神】自动演奏风物之诗琴的程序

Residual2Vec: Debiasing graph embedding using random graphs

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

The Sudachi synonym dictionary in Solar format.

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

Blender addon - Scrub timeline from viewport with a shortcut

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

Reformer, the efficient Transformer, in Pytorch

FactSumm: Factual Consistency Scorer for Abstractive Summarization

Practical Machine Learning with Python

Shared, streaming Python dict

Toward Model Interpretability in Medical NLP

Sequence-to-Sequence learning using PyTorch

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Telegram AI chat bot written in Python using Pyrogram

Repository for Project Insight: NLP as a Service

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities