AllenNLP integration for Shiba: Japanese CANINE model

Last update: Feb 16, 2022

Overview

Allennlp Integration for Shiba

allennlp-shiab-model is a Python library that provides AllenNLP integration for shiba-model.

SHIBA is an approximate reimplementation of CANINE [1] in raw Pytorch, pretrained on the Japanese wikipedia corpus using random span masking. If you are unfamiliar with CANINE, you can think of it as a very efficient (approximately 4x as efficient) character-level BERT model. Of course, the name SHIBA comes from the identically named Japanese canine.

Installation

Installing the library and dependencies is simple using pip.

pip install allennlp-shiba

Example

This library enables users to specify the in a jsonnet config file. Here is an example of the model in jsonnet config file:

{
    "dataset_reader": {
        "tokenizer": {
            "type": "shiba",
        },
        "token_indexers": {
            "tokens": {
                "type": "shiba",
            }
        },
    },
    "model": {
        "shiba_embedder": {
            "type": "basic",
            "token_embedders": {
                "shiba": {
                    "type": "shiba",
                    "eval_model": true,
                }
            }

        }
    }
}

Reference

Joshua Tanner and Masato Hagiwara (2021). SHIBA: Japanese CANINE model. GitHub repository, GitHub.

You might also like...

Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

5 Aug 25, 2022

Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

216 Dec 22, 2022

Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

2 Jan 6, 2022

An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

32 Dec 14, 2022

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022

A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

325 Jan 5, 2023

This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

60 Nov 11, 2022

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor

92 Dec 27, 2022

AllenNLP integration for Shiba: Japanese CANINE model

Related tags

Overview

Allennlp Integration for Shiba

Installation

Example

Reference

You might also like...

Auto translate textbox from Japanese to English or Indonesia

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Script to download some free japanese lessons in portuguse from NHK

An open collection of annotated voices in Japanese language

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

A Japanese tokenizer based on recurrent neural networks

This repository has a implementations of data augmentation for NLP for Japanese.

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

Releases(v0.1.1)

v0.1.1(Jun 26, 2021)

v0.1.0(Jun 26, 2021)

v0.0.1(Jun 26, 2021)

Owner

Shunsuke KITADA

Chinese Grammatical Error Diagnosis

DLO8012: Natural Language Processing & CSL804: Computational Lab - II

State of the art faster Natural Language Processing in Tensorflow 2.0 .

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

An automated program that helps customers of Pizza Palour place their pizza orders

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Outreachy TFX custom component project

A curated list of FOSS tools to improve the Hacker News experience

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Persian-lexicon - A lexicon of 70K unique Persian (Farsi) words

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

profile tools for pytorch nn models

NeMo: a toolkit for conversational AI

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

Open-source offline translation library written in Python. Uses OpenNMT for translations

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Experiments in converting wikidata to ftm

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.