Translate .sbv subtitle files

Overview

deepl4subtitle

Deeplを使って字幕ファイル(.sbv)を翻訳します。タイムスタンプも含めて出力しますが、翻訳時はタイムスタンプは文の一部とは切り離されるので、.sbvファイルをそのまま翻訳機に突っ込むよりも高精度な翻訳ができるはずです。

つかいかた

入力する.sbvファイルの前処理として、文の終わりにピリオド(.)を打っていく。これで、Deeplが文の区切りを正しく認識してくれる。

# install deepl 
# https://pypi.org/project/deepl/
pip3 install deepl
python3 deepl4subtitle.py -i sample.sbv -o output.sbv -k YOUR_DEEPL_API_KEY

サンプル

sample video: https://www.youtube.com/watch?v=CL7HuMLIPO0

  • sample.xbv: Youtubeが自動で生成した字幕を若干手直ししたもの
  • sample_deepl4subtitle.sbv: deepl4subtitleを使って翻訳したもの
  • sample_raw_deepl.sbv: sample.xbvの中身をそのままDeeplにコピペして翻訳したもの

sample_raw_deeplだと、タイムスタンプが文章の一部であることが原因であちこちで怪しい翻訳が発生していたのが、sample_deepl4subtitleでは概ね解消されている。

中でやってること

original

(文末のピリオドは手作業で加える必要がある)

0:00:01.340,0:00:04.780
クラウドコンピューティングという言葉を
知っているだろうか.

0:00:04.780,0:00:08.110
クラウドコンピューティングとは
インターネットの先にあるデータセンター

0:00:08.110,0:00:12.420
のサーバーに処理してもらうシステム形態
を指す言葉である.

↓ move timestamp within XML tag, remove newlines

クラウドコンピューティングという言葉を知っているだろうか.クラウドコンピューティングとはインターネットの先にあるデータセンターのサーバーに処理してもらうシステム形態を指す言葉である. ">
<timestamp ts="0:00:01.340,0:00:04.780"/>クラウドコンピューティングという言葉を知っているだろうか.<timestamp ts="0:00:04.780,0:00:08.110"/>クラウドコンピューティングとはインターネットの先にあるデータセンター<timestamp ts="0:00:08.110,0:00:12.420"/>のサーバーに処理してもらうシステム形態を指す言葉である.

↓ translate with Deepl through API, ignoring XML tags

Do you know the term "cloud computing"? Cloud computing is a term that refers to a form of system that is processed by servers in a data center located beyond the Internet. ">
<timestamp ts="0:00:01.340,0:00:04.780"/>Do you know the term "cloud computing"? <timestamp ts="0:00:04.780,0:00:08.110"/> Cloud computing is a term that refers to a form of system that is processed by servers in a data center <timestamp ts="0:00:08.110,0:00:12.420"/>located beyond the Internet. 

↓ put back timestamp and newlines

0:00:01.340,0:00:04.780
Do you know the term "cloud computing"? 

0:00:04.780,0:00:08.110
 Cloud computing is a term that refers to a form of system that is processed by servers in a data center 

0:00:08.110,0:00:12.420
located beyond the Internet. 
Owner
Yasunori Toshimitsu
Yasunori Toshimitsu
A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

Derek Gulbranson 574 Dec 20, 2022
A collection of pre-commit hooks for handling text files.

texthooks A collection of pre-commit hooks for handling text files. In particular, hooks for handling unicode characters which may be undesirable in a

Stephen Rosen 5 Oct 28, 2022
strbind - lapidary text converter for translate an text file to the C-style string

strbind strbind - lapidary text converter for translate an text file to the C-style string. My motivation is fast adding large text chunks to the C co

Mihail Zaytsev 1 Oct 22, 2021
Map Reduce Wordcount in Python using gRPC

This project is implemented in Python using gRPC. The input files are given in .txt format and the word count operation is performed.

Divija 4 Dec 05, 2022
Extract price amount and currency symbol from a raw text string

price-parser is a small library for extracting price and currency from raw text strings.

Scrapinghub 252 Dec 31, 2022
Repository containing the code for An-Gocair text normaliser

Scottish Gaelic Text Normaliser The following project contains the code and resources for the Scottish Gaelic text normalisation project. The repo can

3 Jun 28, 2022
基于Pytex的数学建模工具,实现将md文件转换成pdf/tex文档的前后端

Pytex-for-MCM 基于Pytex的数学建模工具,实现将md文件转换成pdf/tex文档的前后端。

3 May 17, 2021
BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes.

BaseCrack Decoder For Base Encoding Schemes BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes. This tool ca

Mufeed VH 383 Dec 27, 2022
Add your new words to a text file and get them randomly.

Memorize-New-Words In this very very very little project, I've wrote a code to memorize new english words. Therefore you can add the words and their m

Mostafa 2 Jul 04, 2022
Extract knowledge from raw text

Extract knowledge from raw text This repository is a nearly copy-paste of "From Text to Knowledge: The Information Extraction Pipeline" with some cosm

Raphael Sourty 10 Dec 03, 2022
This is a text summarizing tool written in Python

Summarize Written by: Ling Li Ya This is a text summarizing tool written in Python. User Guide Some things to note: The application is accessible here

Marcus Lee 2 Feb 18, 2022
从flomo导出的笔记中生成词云

flomo-word-cloud 从flomo导出的笔记中生成词云 如何使用? 将本项目克隆到你的电脑上,使用如下的命令,安装所需python库 pip install -r requirements.txt 在项目里新建一个file文件夹,把所有从flomo导出的html文件放入其中 运行main

Hannnk 9 Dec 30, 2022
Maiden & Spell community player ranking based on tournament data.

MnSRank Maiden & Spell community player ranking based on tournament data. Why? 2021 just ended and this seemed like a cool idea. Elo doesn't work well

Jonathan Lee 1 Apr 20, 2022
Build a translation program similar to Google Translate with Python programming language and QT library

google-translate Build a translation program similar to Google Translate with Python programming language and QT library Different parts of the progra

Amir Hussein Sharifnezhad 3 Oct 09, 2021
Answer some questions and get your brawler csvs ready!

BRAWL-STARS-V11-BRAWLER-MAKER-TOOL Answer some questions and get your brawler csvs ready! HOW TO RUN on android: Install pydroid3 from playstore, and

9 Jan 07, 2023
An anthology of a variety of tools for the Persian language in Python

An anthology of a variety of tools for the Persian language in Python

Persian Tools 106 Nov 08, 2022
Username reconnaisance tool that checks the availability of a specified username on over 200 websites.

Username reconnaisance tool that checks the availability of a specified username on over 200 websites. Installation & Usage Clone from Github: $ git c

Richard Mwewa 20 Oct 30, 2022
Utility for Text Normalisation or Inverse Normalisation

Text Processor Text Normalisation or Inverse Normalisation for Indonesian, e.g. measurements "123 kg" - "seratus dua puluh tiga kilogram" Currency/Mo

Cahya Wirawan 2 Aug 11, 2022
A simple text editor for linux

wolf-editor A simple text editor for linux Installing using Deb Package Download newest package from releases CD into folder where the downloaded acka

Focal Fossa 5 Nov 30, 2021
Translate .sbv subtitle files

deepl4subtitle Deeplを使って字幕ファイル(.sbv)を翻訳します。タイムスタンプも含めて出力しますが、翻訳時はタイムスタンプは文の一部とは切り離されるので、.sbvファイルをそのまま翻訳機に突っ込むよりも高精度な翻訳ができるはずです。 つかいかた 入力する.sbvファイルの前処理

Yasunori Toshimitsu 1 Oct 20, 2021