Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Overview

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

The script can be used for any channel or video for scraping, in addition will provide you with the option to get any automatic captions. Automatic captions are available in Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Vietnamese and more or any, so use it as you wish.

usage:

pip install youtube_transcript_api scrapetube codext

for default channel

python tube.py 

Custom channel

python tube.py UCSs5vZi0U7qHLkUjF3QnaWg

Get all videos for a channel

import scrapetube

videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")

for video in videos:
    print(video['videoId'])

Filter for manually created transcripts

transcript = transcript_list.find_manually_created_transcript(['de', 'en'])

or automatically generated ones

transcript = transcript_list.find_generated_transcript(['de', 'en'])

The methods find_generated_transcript, find_manually_created_transcript, find_generated_transcript return Transcript objects. They contain metadata regarding the transcript:

print(
    transcript.video_id,
    transcript.language,
    transcript.language_code,
    # whether it has been manually created or generated by YouTube
    transcript.is_generated,
    # whether this transcript can be translated or not
    transcript.is_translatable,
    # a list of languages the transcript can be translated to
    transcript.translation_languages,
)

Codext, contraction of "codecs" and "extension", is a tiny library that gathers a few additional encodings for use with codecs. While imported, it registers new encodings to a proxy codecs registry for making the encodings available from the codecs.(decode|encode|open) calls.

Currently set on Braille codext.encode("Little Endian", "braille") accept even morse

Codecs categories

  • native: the built-in codecs from the original codecs package
  • non-native: this special category regroups all the categories mentioned hereafter
  • base: baseX codecs (e.g. base, base100)
  • binary: codecs working on strings but applying their algorithms on their binary forms (e.g. baudot, manchester)
  • common: common codecs not included in the native ones or simly added for the purpose of standardization (e.g. octal, ordinal)
  • crypto: codecs related to cryptography algorithms (e.g. barbie, rot, xor)
  • language: language-related codecs (e.g. morse, navajo)
  • other: uncategorized codecs (e.g. letters, url)
  • stegano: steganography-related codecs (e.g. sms, resistor)
  • Except the native and non-native categories, the other ones are simply the name of the subdirectories (with "s" right-stripped) of the codext package.
codext.list("binary")
['baudot', 'baudot-spaced', 'baudot-tape', 'bcd', 'bcd-extended0', 'bcd-extended1', 'excess3', 'gray', 'manchester', 'manchester-inverted']
codext.list("language")
['braille', 'leet', 'morse', 'navajo', 'radio', 'southpark', 'southpark-icase', 'tom-tom']
codext.list("native")
['ascii', 'base64_codec', 'big5', 'big5hkscs', 'bz2_codec', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp775', 'cp850', 'cp852', 'cp855', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', ...]

Current channels for scrapping the transcript subtitles in English language and translate them to Braille language

Up to you list, just replace the Youtube channel ID string at 🤯

videoListName = scrapetube.get_channel("UClnw_bcNg4CAzF772qEtq4g")

YouTube uses automatic speech recognition to add automatic captions to videos. The feature is available in English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish. ASR is not available for all videos.

You can eding the language at 😇

transcript = transcript_list.find_generated_transcript(['en']).fetch()

Example output:

https://www.youtube.com/watch?v=ouMK-Q9S7cc
Web3 Foundation - The Next Evolution of the Internet - Dr. Gavin Wood
⠺⠑⠃⠒⠀⠋⠕⠥⠝⠙⠁⠞⠊⠕⠝⠀⠤⠀⠞⠓⠑⠀⠝⠑⠭⠞⠀⠑⠧⠕⠇⠥⠞⠊⠕⠝⠀⠕⠋⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠤⠀⠙⠗⠨⠀⠛⠁⠧⠊⠝⠀⠺⠕⠕⠙
⠊⠀⠞⠓⠊⠝⠅⠀⠞⠓⠑⠗⠑⠀⠺⠑⠗⠑⠀⠁⠀⠇⠕⠞⠀⠕⠋⠀⠏⠑⠕⠏⠇⠑⠀⠞⠓⠁⠞⠀⠗⠑⠁⠇⠇⠽⠀⠃⠑⠇⠊⠑⠧⠑⠙⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠁⠎⠀⠺⠁⠎⠀⠛⠕⠝⠝⠁⠀⠃⠑⠀⠁⠀⠞⠗⠁⠝⠎⠋⠕⠗⠍⠁⠞⠊⠧⠑⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠽⠀⠋⠕⠗⠀⠎⠕⠉⠊⠑⠞⠽⠀⠁⠝⠙⠀⠊⠀⠞⠓⠊⠝⠅⠀⠺⠓⠁⠞⠀⠓⠁⠏⠏⠑⠝⠑⠙⠀⠺⠁⠎⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠁⠎⠀⠙⠑⠎⠊⠛⠝⠑⠙⠀⠊⠝⠀⠎⠥⠉⠓⠀⠁⠀⠺⠁⠽⠀⠞⠓⠁⠞⠀⠊⠞⠀⠁⠇⠇⠕⠺⠑⠙⠀⠊⠞⠀⠺⠁⠎⠀⠋⠇⠑⠭⠊⠃⠇⠑⠀⠊⠞⠀⠁⠇⠇⠕⠺⠑⠙⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠎⠀⠕⠋⠀⠎⠕⠉⠊⠑⠞⠽⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠺⠁⠽⠎⠀⠕⠋⠀⠙⠕⠊⠝⠛⠀⠃⠥⠎⠊⠝⠑⠎⠎⠀⠞⠕⠀⠎⠊⠍⠏⠇⠽⠀⠍⠕⠧⠑⠀⠕⠧⠑⠗⠀⠕⠝⠞⠕⠀⠞⠓⠑⠀⠙⠊⠛⠊⠞⠁⠇⠀⠙⠕⠍⠁⠊⠝⠀⠎⠕⠀⠺⠓⠑⠝⠀⠺⠑⠀⠙⠕⠀⠃⠁⠝⠅⠊⠝⠛⠀⠕⠝⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠑⠀⠎⠞⠊⠇⠇⠀⠥⠎⠑⠀⠁⠀⠃⠁⠝⠅⠀⠺⠑⠀⠎⠞⠊⠇⠇⠀⠥⠎⠑⠀⠕⠥⠗⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠃⠗⠊⠉⠅⠤⠁⠝⠙⠤⠍⠕⠗⠞⠁⠗⠀⠞⠗⠁⠙⠊⠞⠊⠕⠝⠁⠇⠀⠲⠴⠴⠀⠽⠑⠁⠗⠀⠕⠇⠙⠀⠃⠁⠝⠅⠊⠝⠛⠀⠕⠗⠛⠁⠝⠊⠵⠁⠞⠊⠕⠝⠀⠊⠞⠄⠎⠀⠚⠥⠎⠞⠀⠞⠓⠁⠞⠀⠺⠑⠀⠁⠉⠉⠑⠎⠎⠀⠞⠓⠑⠍⠀⠞⠓⠗⠕⠥⠛⠓⠀⠁⠀⠺⠑⠃⠀⠏⠁⠛⠑⠀⠊⠞⠀⠓⠁⠎⠝⠄⠞⠀⠗⠑⠁⠇⠇⠽⠀⠁⠇⠞⠑⠗⠑⠙⠀⠎⠕⠉⠊⠑⠞⠽⠀⠊⠞⠀⠗⠑⠁⠇⠇⠽⠀⠺⠁⠎⠝⠄⠞⠀⠞⠗⠁⠝⠎⠋⠕⠗⠍⠁⠞⠊⠧⠑⠀⠁⠝⠙⠀⠊⠀⠞⠓⠊⠝⠅⠀⠞⠓⠁⠞⠄⠎⠀⠞⠓⠁⠞⠄⠎⠀⠑⠧⠑⠗⠍⠕⠗⠑⠀⠉⠇⠑⠁⠗⠀⠺⠓⠑⠝⠀⠺⠑⠀⠺⠓⠑⠝⠀⠺⠑⠀⠞⠓⠊⠝⠅⠀⠁⠃⠕⠥⠞⠀⠋⠁⠉⠑⠃⠕⠕⠅⠀⠁⠝⠙⠀⠺⠑⠀⠞⠓⠊⠝⠅⠀⠁⠃⠕⠥⠞⠀⠛⠕⠕⠛⠇⠑⠀⠞⠓⠑⠎⠑⠀⠁⠗⠑⠀⠝⠕⠞⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠺⠕⠗⠅⠊⠝⠛⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠏⠑⠕⠏⠇⠑⠀⠺⠕⠗⠅⠊⠝⠛⠀⠞⠕⠛⠑⠞⠓⠑⠗⠀⠊⠝⠀⠗⠑⠁⠇⠊⠞⠽⠀⠞⠓⠑⠽⠄⠗⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠅⠊⠝⠙⠎⠀⠕⠋⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠎⠀⠞⠓⠁⠞⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠓⠊⠑⠗⠁⠗⠉⠓⠊⠉⠁⠇⠀⠕⠗⠛⠁⠝⠊⠵⠁⠞⠊⠕⠝⠎⠀⠞⠓⠁⠞⠀⠓⠁⠧⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠉⠑⠝⠞⠗⠁⠇⠊⠵⠑⠙⠀⠃⠁⠝⠅⠀⠁⠉⠉⠕⠥⠝⠞⠎⠀⠞⠓⠁⠞⠀⠓⠁⠧⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠎⠕⠗⠞⠀⠕⠋⠀⠍⠥⠇⠞⠊⠝⠁⠞⠊⠕⠝⠁⠇⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠀⠁⠎⠀⠁⠇⠇⠀⠕⠋⠀⠞⠓⠑⠀⠧⠁⠗⠊⠕⠥⠎⠀⠕⠞⠓⠑⠗⠀⠋⠕⠗⠞⠥⠝⠑⠀⠢⠴⠴⠀⠉⠕⠗⠏⠕⠗⠁⠞⠑⠀⠉⠕⠍⠏⠁⠝⠊⠑⠎⠀⠊⠝⠀⠗⠑⠁⠇⠊⠞⠽⠀⠞⠕⠀⠉⠓⠁⠝⠛⠑⠀⠎⠕⠉⠊⠑⠞⠽⠀⠺⠑⠀⠗⠑⠁⠇⠇⠽⠀⠝⠑⠑⠙⠀⠞⠕⠀⠙⠕⠀⠎⠕⠍⠑⠞⠓⠊⠝⠛⠀⠃⠑⠞⠞⠑⠗⠀⠞⠓⠁⠝⠀⠉⠗⠑⠁⠞⠊⠝⠛⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠊⠑⠎⠀⠞⠓⠁⠞⠀⠚⠥⠎⠞⠀⠁⠇⠇⠕⠺⠀⠥⠎⠀⠞⠕⠀⠍⠊⠗⠗⠕⠗⠀⠓⠕⠺⠀⠎⠕⠉⠊⠑⠞⠽⠀⠺⠕⠗⠅⠎⠀⠁⠝⠽⠺⠁⠽⠀⠺⠑⠀⠝⠑⠑⠙⠀⠞⠕⠀⠉⠗⠑⠁⠞⠑⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠊⠑⠎⠀⠞⠓⠁⠞⠀⠋⠕⠗⠛⠑⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠺⠕⠗⠅⠀⠺⠊⠞⠓⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠁⠝⠙⠀⠞⠓⠁⠞⠄⠎⠀⠙⠊⠋⠋⠑⠗⠑⠝⠞⠀⠞⠕⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠉⠕⠍⠍⠥⠝⠊⠉⠁⠞⠑⠀⠺⠊⠞⠓⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠊⠞⠄⠎⠀⠁⠇⠎⠕⠀⠛⠕⠞⠀⠞⠕⠀⠃⠑⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠕⠗⠛⠁⠝⠊⠵⠑⠀⠁⠝⠙⠀⠞⠗⠥⠎⠞⠀⠞⠓⠁⠞⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠊⠎⠀⠛⠕⠊⠝⠛⠀⠞⠕⠀⠙⠕⠀⠺⠓⠁⠞⠀⠺⠓⠁⠞⠀⠞⠓⠑⠽⠀⠝⠑⠑⠙⠀⠞⠕⠀⠙⠕⠀⠊⠝⠀⠕⠗⠙⠑⠗⠀⠞⠕⠀⠓⠁⠧⠑⠀⠎⠕⠍⠑⠀⠎⠕⠗⠞⠀⠕⠋⠀⠎⠓⠁⠗⠑⠙⠀⠉⠕⠝⠉⠇⠥⠎⠊⠕⠝⠀⠕⠗⠀⠗⠁⠍⠊⠋⠊⠉⠁⠞⠊⠕⠝⠀⠞⠕⠀⠞⠓⠑⠀⠉⠕⠕⠏⠑⠗⠁⠞⠊⠕⠝⠀⠁⠝⠙⠀⠞⠓⠁⠞⠄⠎⠀⠗⠑⠁⠇⠇⠽⠀⠁⠀⠃⠊⠛⠀⠉⠕⠍⠏⠕⠝⠑⠝⠞⠀⠕⠋⠀⠺⠑⠃⠀⠒⠀⠺⠑⠃⠀⠒⠀⠊⠎⠀⠗⠑⠁⠇⠇⠽⠀⠁⠃⠕⠥⠞⠀⠁⠇⠇⠕⠺⠊⠝⠛⠀⠏⠑⠕⠏⠇⠑⠀⠞⠕⠀⠉⠕⠍⠑⠀⠞⠕⠛⠑⠞⠓⠑⠗⠀⠁⠝⠙⠀⠉⠕⠕⠗⠙⠊⠝⠁⠞⠑⠀⠞⠓⠑⠊⠗⠀⠑⠋⠋⠕⠗⠞⠎⠀⠋⠕⠗⠀⠎⠕⠍⠑⠞⠓⠊⠝⠛⠀⠛⠗⠑⠁⠞⠑⠗⠀⠞⠓⠑⠀⠞⠓⠁⠝⠀⠞⠓⠑⠀⠎⠥⠍⠀⠕⠋⠀⠊⠞⠎⠀⠏⠁⠗⠞⠎⠀⠪⠍⠥⠎⠊⠉⠻

With Git Actions Workflow file for this run as example in real-time

available OS's: [ windows-latest, macos-latest, ubuntu-latest ]

name: Cross-platform matrix run
on: [push]
jobs:
  build:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest]
        python-version: ['3.6', '3.9']
        exclude:
          - os: ubuntu-latest
            python-version: '3.6'
    steps:
      - uses: actions/[email protected]
      - name: Set up Python
        uses: actions/[email protected]
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies 
        run: pip install youtube_transcript_api scrapetube codext
      - name: Web3 Foundation videos to braille language 
        run: python tube.py

For Support && Nominations

  • Display name. KSMNETWORK

  • Email [email protected]

  • Riot @gtoocool:matrix.org

  • KUSAMA (KSM) Address

  • H1bSKJxoxzxYRCdGQutVqFGeW7xU3AcN6vyEdZBU7Qb1rsZ

  • PolkaDOT (DOT) Address:

  • 15FxvBFDd3X7H9qcMGqsiuvFYEg4D3mBoTA2LQufreysTHKA

  • https://ksm.network

Owner
Little Endian
Riot @gtoocool:matrix.org                  KUSAMA Address:  H1bSKJxoxzxYRCdGQutVqFGeW7xU3AcN6vyEdZBU7Qb1rsZ
Little Endian
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

About spellchecker.py Implementing a highly-accurate, brute-force, and dynamically programmed spellchecking program that utilizes the Damerau-Levensht

Raihan Ahmed 1 Dec 11, 2021
PyWorld3 is a Python implementation of the World3 model

The World3 model revisited in Python Install & Hello World3 How to tune your own simulation Licence How to cite PyWorld3 with Bibtex References & ackn

Charles Vanwynsberghe 248 Dec 14, 2022
Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Graph-Bert Source code of "Graph-Bert: Only Attention is Needed for Learning Graph Representations". Please check the script.py as the entry point. We

14 Mar 25, 2022
A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

Ray Chamidullin 0 Nov 15, 2022
Host your own GPT-3 Discord bot

GPT3 Discord Bot Host your own GPT-3 Discord bot i'd host and make the bot invitable myself, however GPT3 terms of service prohibit public use of GPT3

[something hillarious here] 8 Jan 07, 2023
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022
Constituency Tree Labeling Tool

Constituency Tree Labeling Tool The purpose of this package is to solve the constituency tree labeling problem. Look from the dataset labeled by NLTK,

张宇 6 Dec 20, 2022
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果,并在 G

5.4k Jan 03, 2023
Utilizing RBERT model for KLUE Relation Extraction task

RBERT for Relation Extraction task for KLUE Project Description Relation Extraction task is one of the task of Korean Language Understanding Evaluatio

snoop2head 14 Nov 15, 2022
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.2k Jan 08, 2023
The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while

Machel Reid 10 Dec 27, 2022
Pretty-doc - Composable text objects with python

pretty-doc from __future__ import annotations from dataclasses import dataclass

Taine Zhao 2 Jan 17, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 29, 2022
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022
null

CP-Cluster Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection, Instance Segme

Yichun Shen 41 Dec 08, 2022
Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Yu Zhang 50 Nov 08, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

凌逆战 75 Dec 05, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 04, 2022