This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

Last update: Nov 28, 2022

Related tags

Deep Learning FlatTN

Overview

FlatTN

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer" published on ICASSP 2022.

Requirement

Python: 3.7.3
PyTorch: 1.2.0
FastNLP: 0.5.0
Numpy: 1.16.4
fitlog

For more about FastNLP, please visit here. For Fitlog, please refer to this.

Dataset download

We release a large-scale Chinese Text Normalization (TN) Dataset in corporatioin with Databaker (Beijing) Technology Co., Ltd.

To download the dataset, please visit https://www.data-baker.com/en/#/data/index/TNtts.

(For Chinese version of the download page, please visit https://www.data-baker.com/data/index/TNtts.)

Data preprocessing

The raw dataset in jsonl format are saved at: dataset/processed/CN_TN_epoch-01-28645_2.jsonl

We preprocessed the data into the BMES format, and divided the data into train 、dev 、test by 8:1:1.

dataset/processed/shuffled_BMES
                      ├── train.char.bmes
                      ├── dev.char.bmes
                      └── test.char.bmes

An example of the processed data in BMES format is as follows:

2 B-DIGIT
0 M-DIGIT
1 M-DIGIT
5 E-DIGIT
年 S-SELF
， S-PUNC
只 S-SELF
剩 S-SELF
3 B-CARDINAL
9 E-CARDINAL
天 S-SELF
。 S-PUNC

You can re-run our code to preprocess and divide the raw dataset again:

cd dataset/processed
python preprocess.py

You can also used the following code to get statistics of all NSW categories of the data:

cd dataset/processed
python stat.py

Training

Our code are in version V1, run training code

cd V1
python flat_main.py --dataset databaker

Our proposed rule base are saved in a python file: V1/add_rule.py

Acknowledgement

Our code is based on Flat-Lattice-Transformer (FLAT) from LeeSureman.

For more information about FLAT, please refer to LeeSureman/Flat-Lattice-Transformer.

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

Related tags

Overview

FlatTN

Requirement

Dataset download

Data preprocessing

Training

Acknowledgement

Owner

THUHCSI

Code for the Convolutional Vision Transformer (ConViT)

GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

Binary Stochastic Neurons in PyTorch

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

Efficiently Disentangle Causal Representations

Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Code for paper [ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot] (ICCV 2021, oral))

ADOP: Approximate Differentiable One-Pixel Point Rendering

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

An efficient framework for reinforcement learning.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

(ICCV 2021) ProHMR - Probabilistic Modeling for Human Mesh Recovery

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

Astrostatistics class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)