Chinese NER with albert/electra or other bert descendable model (keras)

Last update: Nov 20, 2022

Related tags

Overview

Chinese NLP (albert/electra with Keras)

Named Entity Recognization

Project Structure

./
├── NER
│   ├── __init__.py
│   ├── log                                     训练nohup日志
│   │   ├── albert.out
│   │   ├── albert_crf.out
│   │   ├── electra.out
│   │   ├── electra_crf.out
│   │   ├── electra_regulization.out
│   │   └── electra_tiny.out
│   └── train.py
├── README.md
├── albert_base_google_zh                       albert_base权重
│   ├── albert_config.json
│   ├── albert_model.ckpt.data-00000-of-00001
│   ├── albert_model.ckpt.index
│   ├── checkpoint
│   └── vocab.txt
├── albert_tiny_google_zh                       albert_tiny权重
│   ├── albert_config.json
│   ├── albert_model.ckpt.data-00000-of-00001
│   ├── albert_model.ckpt.index
│   ├── checkpoint
│   └── vocab.txt
├── chinese_electra_small_ex_L-24_H-256_A-4     electra_small权重
│   ├── electra_small_ex.data-00000-of-00001
│   ├── electra_small_ex.index
│   ├── electra_small_ex.meta
│   ├── small_ex_discriminator_config.json
│   ├── small_ex_generator_config.json
│   └── vocab.txt
├── data                                        数据集
│   ├── pulmonary.test
│   ├── pulmonary.train
│   └── sict_train.txt
├── electra_180g_base                           electra_base权重
│   ├── base_discriminator_config.json
│   ├── base_generator_config.json
│   ├── electra_180g_base.ckpt.data-00000-of-00001
│   ├── electra_180g_base.ckpt.index
│   ├── electra_180g_base.ckpt.meta
│   └── vocab.txt
├── environment.yaml                            conda环境配置文件
├── main.py
├── path.py                                     所有路径
├── requirements.txt
├── utils                                       bert4keras包（也可pip下）
│   ├── __init__.py
│   ├── backend.py
│   ├── layers.py
│   ├── models.py
│   ├── optimizers.py
│   ├── snippets.py
│   └── tokenizers.py
└── weights                                     权重文件
    ├── pulmonary_albert_ner.h5
    ├── pulmonary_electra_ner.h5
    └── pulmonary_electra_tiny_ner_crf.h5

9 directories, 48 files

Dataset

三甲医院肺结节数据集，20000+字，BIO格式，形如：

中	B-ORG
共	I-ORG
中	I-ORG
央	I-ORG
致	O
中	B-ORG
国	I-ORG
致	I-ORG
公	I-ORG
党	I-ORG
十	I-ORG
一	I-ORG
大	I-ORG
的	O
贺	O
词	O

ATTENTION: 在处理自己数据集的时候需要注意：

字与标签之间用空格（"\ "）隔开
其中句子与句子之间使用空行隔开

Steps

替换数据集
修改NER/train.py中的maxlen（超过截断，少于填充，最好设置训练集、测试集中最长句子作为MAX_SEQ_LEN）
下载权重，放到项目中
修改path.py中的地址
根据需要修改NER/train.py模型结构
训练前debug看下train_generator数据
训练

Model

albert

electra

Train

运行NER/train.py

Evaluate

train时给出的F1即为实体级别的F1

albert最佳F1

Epoch 61/300
13/13 [==============================] - 16s 1s/step - loss: 0.1343 - sparse_accuracy: 0.9713
test:  f1: 0.82428, precision: 0.81775, recall: 0.83092

electra

Epoch 29/300
13/13 [==============================] - 16s 1s/step - loss: 0.3487 - sparse_accuracy: 0.9146
test:  f1: 0.83189, precision: 0.81579, recall: 0.84863

Chinese NER with albert/electra or other bert descendable model (keras)

Related tags

Overview

Chinese NLP (albert/electra with Keras)

Named Entity Recognization

Project Structure

Dataset

Steps

Model

Train

Evaluate

Owner

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Score-Based Point Cloud Denoising (ICCV'21)

CoNLL-English NER Task (NER in English)

Tools to download and cleanup Common Crawl data

code for modular summarization work published in ACL2021 by Krishna et al

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

End-to-End Speech Processing Toolkit

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

Huggingface Transformers + Adapters = ❤️

Contact Extraction with Question Answering.

Checking spelling of form elements

Input english text, then translate it between languages n times using the Deep Translator Python Library.

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/