PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

Last update: Dec 13, 2022

Overview

hierarchical-multi-label-text-classification-pytorch

Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach

This repository is a PyTorch implementation made with reference to this research project.

The main objective of the project is to solve the hierarchical multi-label text classification (HMTC) problem. Different from the multi-label text classification, HMTC assigns each instance (object) into multiple categories and these categories are stored in a hierarchy structure, is a fundamental but challenging task of numerous applications.

Introduction

Many real-world applications organize data in a hierarchical structure, where classes are specialized into subclasses or grouped into superclasses. For example, an electronic document (e.g. web-pages, digital libraries, patents and e-mails) is associated with multiple categories and all these categories are stored hierarchically in a tree or Direct Acyclic Graph (DAG).

It provides an elegant way to show the characteristics of data and a multi-dimensional perspective to tackle the classification problem via hierarchy structure.

The Figure shows an example of predefined labels in hierarchical multi-label classification of documents in patent texts.

Documents are shown as colored rectangles, labels as rounded rectangles.
Circles in the rounded rectangles indicate that the corresponding document has been assigned the label.
Arrows indicate a hierarchical structure between labels.

Data

See data format in data folder which including the data sample files.

Text Segment

You can use jieba package if you are going to deal with the Chinese text data.

Data Format

This repository can be used in other datasets (text classification) in two ways:

Modify your datasets into the same format of the sample.
Modify the data preprocess code in data_helpers.py, data_loader.py.

Anyway, it should depend on what your data and task are.

Pre-trained Word Vectors

~~You can pre-training your word vectors(based on your corpus) in many ways:~~

~~Use gensim package to pre-train data.~~
~~Use glove tools to pre-train data.~~
~~Even can use a fasttext network to pre-train data.~~
This implementation used an embedding layer, but the original paper uses word2vec.

Network Structure

Built with

Python 3.8
Pytorch
Numpy
Sklearn

PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

Related tags

Overview

hierarchical-multi-label-text-classification-pytorch

Introduction

Data

Text Segment

Data Format

Pre-trained Word Vectors

Network Structure

Built with

Owner

Mingu Kang

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

Heart Arrhythmia Classification

Official PyTorch implementation of RIO

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Exploring the Dual-task Correlation for Pose Guided Person Image Generation

ParmeSan: Sanitizer-guided Greybox Fuzzing

Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.

Safe Policy Optimization with Local Features

Ganilla - Official Pytorch implementation of GANILLA

tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

Automatic Video Captioning Evaluation Metric --- EMScore

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

PyTorch implementation of ENet

This is the code used in the paper "Entity Embeddings of Categorical Variables".

Brain Tumor Detection with Tensorflow Neural Networks.

Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Code for the upcoming CVPR 2021 paper