NLP: SLU tagging

Last update: Jan 14, 2022

Related tags

Text Data & NLP slu-homework

Overview

创建环境

conda create -n slu python=3.6
source activate slu
pip install torch==1.7.1

运行

训练：在根目录下运行

python scripts/slu_baseline.py

测试：在根目录下运行（将会读取test_unlabelled.json并在data目录下生成test.json）环境与原始相同

python scripts/slu_evaluate.py

代码说明

utils/args.py:定义了所有涉及到的可选参数，如需改动某一参数可以在运行的时候将命令修改成
```
  python scripts/slu_baseline.py --
      
      

      
     
```
其中，为要修改的参数名，为修改后的值
utils/initialization.py:初始化系统设置，包括设置随机种子和显卡/CPU
utils/vocab.py:构建编码输入输出的词表
utils/word2vec.py:读取词向量
utils/example.py:读取数据
utils/batch.py:将数据以批为单位转化为输入
model/slu_baseline_tagging.py:baseline模型
scripts/slu_baseline.py:主程序脚本

有关预训练语言模型

本次代码中没有加入有关预训练语言模型的代码，如需使用预训练语言模型我们推荐使用下面几个预训练模型，若使用预训练语言模型，不要使用large级别的模型

Bert: https://huggingface.co/bert-base-chinese
Bert-WWM: https://huggingface.co/hfl/chinese-bert-wwm-ext
Roberta-WWM: https://huggingface.co/hfl/chinese-roberta-wwm-ext
MacBert: https://huggingface.co/hfl/chinese-macbert-base

推荐使用的工具库

transformers
- 使用预训练语言模型的工具库: https://huggingface.co/
nltk
- 强力的NLP工具库: https://www.nltk.org/
stanza
- 强力的NLP工具库: https://stanfordnlp.github.io/stanza/
jieba
- 中文分词工具: https://github.com/fxsjy/jieba

Owner

北海若

Undergraduate, at SJTU & MSRA.

北海若

GitHub Repository

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

142 Jan 06, 2023

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

1.1k Jan 07, 2023

It analyze the sentiment of the user, whether it is postive or negative.

Sentiment-Analyzer-Tool It analyze the sentiment of the user, whether it is postive or negative. It uses streamlit library for creating this sentiment

18 Dec 17, 2022

✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

✨A Python framework to explore, label, and monitor data for NLP projects

1.5k Jan 02, 2023

Use the power of GPT3 to execute any function inside your programs just by giving some doctests

gptrun Don't feel like coding today? Use the power of GPT3 to execute any function inside your programs just by giving some doctests. How is this diff

11 Nov 11, 2022

Text classification on IMDB dataset using Keras and Bi-LSTM network

Text classification on IMDB dataset using Keras and Bi-LSTM Text classification on IMDB dataset using Keras and Bi-LSTM network. Usage python3 main.py

2 Sep 27, 2022

DLO8012: Natural Language Processing & CSL804: Computational Lab - II

NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II DLO8012: NLP & CSL804: CL-II [SEMESTER VIII] Syllabus NLP - Reference Books THE WALL MEGA SATISH

7 Apr 28, 2022

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention April 6, 2021 We extended segment-means to compute landmarks without requiri

322 Jan 01, 2023

Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

1.9k Jan 06, 2023

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

32 Oct 14, 2022

Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Ελληνικά νέα (Python script) / Greek News Feed (Python script) Ελληνικά English Το 2017 είχα υλοποιήσει ένα Python script για να εμφανίζει τα τωρινά ν

1 Jun 14, 2022

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

567 Jan 07, 2023

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Flask React Project This is the backend for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

17 Sep 29, 2021

Just a Basic like Language for Zeno INC

zeno-basic-language Just a Basic like Language for Zeno INC This is written in 100% python. this is basic language like language. so its not for big p

1 Dec 18, 2021

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

17 Oct 18, 2022

Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

578 Dec 13, 2022

Deep Learning for Natural Language Processing - Lectures 2021

This repository contains slides for the course "20-00-0947: Deep Learning for Natural Language Processing" (Technical University of Darmstadt, Summer term 2021).

0 Feb 21, 2022

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Training COMET using seq2seq setting Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET. The codes are modified from run_summarizati

9 Dec 17, 2022

Easy-to-use CPM for Chinese text generation

CPM 项目描述 CPM（Chinese Pretrained Models）模型是北京智源人工智能研究院和清华大学发布的中文大规模预训练模型。官方发布了三种规模的模型，参数量分别为109M、334M、2.6B，用户需申请与通过审核，方可下载。由于原项目需要考虑大模型的训练和使用，需要安装较为复杂

382 Jan 07, 2023

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Emily's Symbol Dictionary Design This dictionary was created with the following goals in mind: Have a consistent method to type (pretty much) every sy

68 Jan 07, 2023