How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Last update: Jun 05, 2022

Related tags

Overview

This repo contains codes for the following paper:

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.
Aditya Shah, Chandresh Kumar Maurya, In Proceedings of the 18th International Conference on Natural Language Processing - (ACL 2021).

The presentation slides are available here

Requirements

Python 3.6 or higher
Pytorch >= 1.3.0
Pytorch_transformers (also known as transformers)
Pandas, Numpy, Pickle
Fasttext

Download the fasttext embed file:

The fasttext embedding file can be obtained here

Dataset

We release the benchmark sarcasm dataset for Hinglish language to facilitate further research on code-mix NLP.

We create a dataset using TweetScraper built on top of scrapy to extract code-mix hindi-english tweets. We pass search tags like #sarcasm, #humor, #bollywood, #cricket, etc., combined with most commonly used code-mix Hindi words as query. All the tweets with hashtags like #sarcasm, #sarcastic, #irony, #humor etc. are treated as positive. Non sarcastic tweets are extracted using general hashtags like #politics, #food, #movie, etc. The balanced dataset comprises of 166K tweets.

Finally, we preprocess and clean the data by removing urls, hashtags, mentions, and punctuation in the data. The respective files can be found here as train.csv, val.csv, and test.csv

Arguments:

--epochs:  number of total epochs to run, default=10

--batch-size: train batchsize, default=2

--lr: learning rate for the model, default=5.16e-05

--hidden_size_lstm: hidden size of lstm, default=1024

--hidden_size_linear: hidden size of linear layer, default=128

--seq_len: sequence lenght of input text, default=56

--clip: gradient clipping, default=0.218

--dropout: dropout value, default=0.198

--num_layers: number of lstm layers, default=1

--lstm_bidirectional: bidirectional lstm, default=False

--fasttext_embed_file: path to fasttext embedding file, default='new_hing_emb'

--train_dir: path to train file, default='train.csv'

--valid_dir: path to validation file, default='valid.csv'

--test_dir: path to test file, default='test.csv'

--checkpoint_dir: path to the saved, default='selfnet.pt'

--test: testing the model, default=False

Train

python main.py

Test

python main.py --test True

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Related tags

Overview

Requirements

Download the fasttext embed file:

Dataset

Arguments:

Train

Test

Owner

The first public PyTorch implementation of Attentive Recurrent Comparators

Rotation Robust Descriptors

Restricted Boltzmann Machines in Python.

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

Deep Networks with Recurrent Layer Aggregation

Social Fabric: Tubelet Compositions for Video Relation Detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Official Repository for the paper "Improving Baselines in the Wild".

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

This code is a near-infrared spectrum modeling method based on PCA and pls

Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

LSTMs (Long Short Term Memory) RNN for prediction of price trends

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

Audio Visual Emotion Recognition using TDA

Python package for visualizing the loss landscape of parameterized quantum algorithms.