Constituency Tree Labeling Tool

The purpose of this package is to solve the constituency tree labeling problem.

Look from the dataset labeled by NLTK,it is a bit counter-intuitive and it is very troublesome to label.

Then this package provides a LabelTree, you can use this class to generate dataset, for example, convert example1 and convert example2, and then use the label_tree_to_nltk method to convert them into data conforming to the NLTK label format. Then this package provides a LabelTree, you can use this class to generate dataset, for example, convert example1 and convert example2, and then use the label_tree_to_nltk method to convert them into data conforming to the NLTK label format.

examples

example1

NLTK example 1

     TOP      
      |        
    IP-HLN    
  ____|_____   
 IP   IP    IP
 |    |     |  
 VP   VP    VP
 |    |     |  
 VA   VA    VA
 |    |     |  
 清新   清新    清新

convert example 1

example2

NLTK example 2

                      TOP                 
                       |                   
                     IP-HLN               
                 ______|________________   
              IP-TPC              |     | 
     ___________|______           |     |  
    |                  VP         |     | 
    |            ______|_____     |     |  
    |         PP-DIR         |    |     | 
    |       ____|______      |    |     |  
NP-PN-SBJ  |           NP    VP NP-SBJ  VP
    |      |           |     |    |     |  
    NR     P           NN    VV   NN    VV
    |      |           |     |    |     |  
    广西     对           外     开放   成绩    斐然

convert example 2

More example you can see test.

成分分析树标注工具

这个包的目的在于标注成分分析树。

从nltk标注出来的数据集来看，有点反直觉，标注起来很麻烦。那么此包提供一个LabelTree，您可以通过这个类来生成例如convert example1以及convert example2，然后通过label_tree_to_nltk方法将其转换成符合nltk标注格式的数据出来。

Constituency Tree Labeling Tool

Related tags

Overview

Constituency Tree Labeling Tool

examples

example1

example2

成分分析树标注工具

Owner

张宇

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

German Text-To-Speech Engine using Tacotron and Griffin-Lim

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

Installation, test and evaluation of Scribosermo speech-to-text engine

Trex is a tool to match semantically similar functions based on transfer learning.

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Text-Based zombie apocalyptic decision-making game in Python

Python library for processing Chinese text

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Neural-Machine-Translation - Implementation of revolutionary machine translation models

Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Textpipe: clean and extract metadata from text

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch