Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Last update: Jan 08, 2023

Overview

Pretrained Language Model

This repository provides the latest pretrained language models and its related optimization techniques developed by Huawei Noah's Ark Lab.

Directory structure

PanGu-α is a Large-scale autoregressive pretrained Chinese language model with up to 200B parameter. The models are developed under the MindSpore and trained on a cluster of Ascend 910 AI processors.
NEZHA-TensorFlow is a pretrained Chinese language model which achieves the state-of-the-art performances on several Chinese NLP tasks developed under TensorFlow.
NEZHA-PyTorch is the PyTorch version of NEZHA.
NEZHA-Gen-TensorFlow provides two GPT models. One is Yuefu (乐府), a Chinese Classical Poetry generation model, the other is a common Chinese GPT model.
TinyBERT is a compressed BERT model which achieves 7.5x smaller and 9.4x faster on inference.
TinyBERT-MindSpore is a MindSpore version of TinyBERT.
DynaBERT is a dynamic BERT model with adaptive width and depth.
BBPE provides a byte-level vocabulary building tool and its correspoinding tokenizer.
PMLM is a probabilistically masked language model. Trained without the complex two-stream self-attention, PMLM can be treated as a simple approximation of XLNet.
TernaryBERT is a weights ternarization method for BERT model developed under PyTorch.
TernaryBERT-MindSpore is the MindSpore version of TernaryBERT.
HyperText is an efficient text classification model based on hyperbolic geometry theories.
BinaryBERT is a weights binarization method using ternary weight splitting for BERT model, developed under PyTorch.
AutoTinyBERT provides a model zoo that can meet different latency requirements.

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Related tags

Overview

Pretrained Language Model

Directory structure

Owner

HUAWEI Noah's Ark Lab

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

spaCy plugin for Transformers , Udify, ELmo, etc.

Code for paper "Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features"

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

Mkdocs + material + cool stuff

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

ConvBERT-Prod

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

A NLP program: tokenize method, PoS Tagging with deep learning

Yes it's true :broken_heart:

Lyrics generation with GPT2-based Transformer

An Explainable Leaderboard for NLP

Exploration of BERT-based models on twitter sentiment classifications

BERT Attention Analysis

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Related tags

Overview

Pretrained Language Model

Directory structure

Owner

HUAWEI Noah's Ark Lab

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

spaCy plugin for Transformers , Udify, ELmo, etc.

Code for paper "Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features"

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

Mkdocs + material + cool stuff

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

ConvBERT-Prod

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

A NLP program: tokenize method, PoS Tagging with deep learning

Yes it's true :broken_heart:

Lyrics generation with GPT2-based Transformer

An Explainable Leaderboard for NLP

Exploration of BERT-based models on twitter sentiment classifications

BERT Attention Analysis

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。