An open collection of annotated voices in Japanese language

Last update: Dec 14, 2022

Related tags

Text Data & NLP koniwa

Overview

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション

Koniwa (声庭): An open collection of annotated voices in Japanese language

概要

Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテーションのコレクションです．
（商用目的での利用も可能です．）

アノテーション作業は始まったばかりです．皆様のコントリビューションをお待ちしております．

ファイルリンク

sound: 音声データ (Google Drive)
source: 参考データ (Google Drive): 原文などアノテーション時の参考になる資料
data: 書誌情報・アノテーションデータ

シリーズ

本コレクションは現在以下のオープンな音声データを利用しています．公開に関わってくださった皆様に深く感謝いたします．

amagasaki: CC BY 4.0
- 2011年4月〜2015年11月
- 兵庫県尼崎市のラジオ番組 (FMあまがさき)
  - いなむら市長の「ひと咲きまち咲きあまがさき」
  - いなむら市長の「い～なこの街あまがさき」 (2014年11月より改題)
free_culture_2012: CC BY 3.0
- 2012年8月
- J-WAVEのラジオ番組 J-WAVE 360° Forum 〜Seek and Find〜
higashiyodogawa: CC BY 4.0
- 2017年11月〜2021年7月
- 大阪市東淀川区の「広報ひがしよどがわ」音声版
librivox: パブリックドメイン
- LibriVox.orgの収録作品
- 歌など一部のものは除外している
minato: CC BY 4.0
- 2019年5月〜2020年12月
- 大阪市港区の「広報みなと」音声版
nishiyodogawa: CC BY 4.0
- 2018年8月〜2021年7月
- 大阪市西淀川区の『広報紙「きらり☆にしよど」音声版』
roudoku_toshokan: CC BY 2.1 JP (原文はパブリックドメイン)
- 池田英生氏の朗読図書館配信の朗読音声
tnc: CC BY 3.0 (原文はパブリックドメイン)
- テレビ西日本のアナウンサーによる朗読音声

Licence

原文・音声のライセンス

本コレクション内の音声は以下のいずれかでライセンスされているもののみを含めることにしています．

パブリックドメイン
- PDM
- CC0
クリエイティブ・コモンズ
- CC BY

アノテーションや文書のライセンス

以下は全てCC0 1.0でライセンスします

二次的著作物に該当するアノテーションのうち二次的著作部分
アノテーションのコメント・アノテーションマニュアルなどの本レポジトリ内の一次著作物（プログラムを除く）

プログラムのライセンス

プログラムはApache License 2.0でライセンスします．

Maintainer

shirayu

An open collection of annotated voices in Japanese language

Related tags

Overview

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション

概要

ファイルリンク

シリーズ

Licence

原文・音声のライセンス

アノテーションや文書のライセンス

プログラムのライセンス

Maintainer

Owner

Koniwa project

Lingtrain Aligner — ML powered library for the accurate texts alignment.

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Just a Basic like Language for Zeno INC

Large-scale pretraining for dialogue

Text Normalization（文本正则化）

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Nested Named Entity Recognition

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

Converts python code into c++ by using OpenAI CODEX.

A Practitioner's Guide to Natural Language Processing

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

A simple chatbot based on chatterbot that you can use for anything has basic features

Continuously update some NLP practice based on different tasks.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.