Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Overview

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

The script can be used for any channel or video for scraping, in addition will provide you with the option to get any automatic captions. Automatic captions are available in Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Vietnamese and more or any, so use it as you wish.

usage:

pip install youtube_transcript_api scrapetube codext

for default channel

python tube.py 

Custom channel

python tube.py UCSs5vZi0U7qHLkUjF3QnaWg

Get all videos for a channel

import scrapetube

videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")

for video in videos:
    print(video['videoId'])

Filter for manually created transcripts

transcript = transcript_list.find_manually_created_transcript(['de', 'en'])

or automatically generated ones

transcript = transcript_list.find_generated_transcript(['de', 'en'])

The methods find_generated_transcript, find_manually_created_transcript, find_generated_transcript return Transcript objects. They contain metadata regarding the transcript:

print(
    transcript.video_id,
    transcript.language,
    transcript.language_code,
    # whether it has been manually created or generated by YouTube
    transcript.is_generated,
    # whether this transcript can be translated or not
    transcript.is_translatable,
    # a list of languages the transcript can be translated to
    transcript.translation_languages,
)

Codext, contraction of "codecs" and "extension", is a tiny library that gathers a few additional encodings for use with codecs. While imported, it registers new encodings to a proxy codecs registry for making the encodings available from the codecs.(decode|encode|open) calls.

Currently set on Braille codext.encode("Little Endian", "braille") accept even morse

Codecs categories

  • native: the built-in codecs from the original codecs package
  • non-native: this special category regroups all the categories mentioned hereafter
  • base: baseX codecs (e.g. base, base100)
  • binary: codecs working on strings but applying their algorithms on their binary forms (e.g. baudot, manchester)
  • common: common codecs not included in the native ones or simly added for the purpose of standardization (e.g. octal, ordinal)
  • crypto: codecs related to cryptography algorithms (e.g. barbie, rot, xor)
  • language: language-related codecs (e.g. morse, navajo)
  • other: uncategorized codecs (e.g. letters, url)
  • stegano: steganography-related codecs (e.g. sms, resistor)
  • Except the native and non-native categories, the other ones are simply the name of the subdirectories (with "s" right-stripped) of the codext package.
codext.list("binary")
['baudot', 'baudot-spaced', 'baudot-tape', 'bcd', 'bcd-extended0', 'bcd-extended1', 'excess3', 'gray', 'manchester', 'manchester-inverted']
codext.list("language")
['braille', 'leet', 'morse', 'navajo', 'radio', 'southpark', 'southpark-icase', 'tom-tom']
codext.list("native")
['ascii', 'base64_codec', 'big5', 'big5hkscs', 'bz2_codec', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp775', 'cp850', 'cp852', 'cp855', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', ...]

Current channels for scrapping the transcript subtitles in English language and translate them to Braille language

Up to you list, just replace the Youtube channel ID string at 🤯

videoListName = scrapetube.get_channel("UClnw_bcNg4CAzF772qEtq4g")

YouTube uses automatic speech recognition to add automatic captions to videos. The feature is available in English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish. ASR is not available for all videos.

You can eding the language at 😇

transcript = transcript_list.find_generated_transcript(['en']).fetch()

Example output:

https://www.youtube.com/watch?v=ouMK-Q9S7cc
Web3 Foundation - The Next Evolution of the Internet - Dr. Gavin Wood
⠺⠑⠃⠒⠀⠋⠕⠥⠝⠙⠁⠞⠊⠕⠝⠀⠤⠀⠞⠓⠑⠀⠝⠑⠭⠞⠀⠑⠧⠕⠇⠥⠞⠊⠕⠝⠀⠕⠋⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠤⠀⠙⠗⠨⠀⠛⠁⠧⠊⠝⠀⠺⠕⠕⠙
⠊⠀⠞⠓⠊⠝⠅⠀⠞⠓⠑⠗⠑⠀⠺⠑⠗⠑⠀⠁⠀⠇⠕⠞⠀⠕⠋⠀⠏⠑⠕⠏⠇⠑⠀⠞⠓⠁⠞⠀⠗⠑⠁⠇⠇⠽⠀⠃⠑⠇⠊⠑⠧⠑⠙⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠁⠎⠀⠺⠁⠎⠀⠛⠕⠝⠝⠁⠀⠃⠑⠀⠁⠀⠞⠗⠁⠝⠎⠋⠕⠗⠍⠁⠞⠊⠧⠑⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠽⠀⠋⠕⠗⠀⠎⠕⠉⠊⠑⠞⠽⠀⠁⠝⠙⠀⠊⠀⠞⠓⠊⠝⠅⠀⠺⠓⠁⠞⠀⠓⠁⠏⠏⠑⠝⠑⠙⠀⠺⠁⠎⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠁⠎⠀⠙⠑⠎⠊⠛⠝⠑⠙⠀⠊⠝⠀⠎⠥⠉⠓⠀⠁⠀⠺⠁⠽⠀⠞⠓⠁⠞⠀⠊⠞⠀⠁⠇⠇⠕⠺⠑⠙⠀⠊⠞⠀⠺⠁⠎⠀⠋⠇⠑⠭⠊⠃⠇⠑⠀⠊⠞⠀⠁⠇⠇⠕⠺⠑⠙⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠎⠀⠕⠋⠀⠎⠕⠉⠊⠑⠞⠽⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠺⠁⠽⠎⠀⠕⠋⠀⠙⠕⠊⠝⠛⠀⠃⠥⠎⠊⠝⠑⠎⠎⠀⠞⠕⠀⠎⠊⠍⠏⠇⠽⠀⠍⠕⠧⠑⠀⠕⠧⠑⠗⠀⠕⠝⠞⠕⠀⠞⠓⠑⠀⠙⠊⠛⠊⠞⠁⠇⠀⠙⠕⠍⠁⠊⠝⠀⠎⠕⠀⠺⠓⠑⠝⠀⠺⠑⠀⠙⠕⠀⠃⠁⠝⠅⠊⠝⠛⠀⠕⠝⠀⠞⠓⠑⠀⠊⠝⠞⠑⠗⠝⠑⠞⠀⠺⠑⠀⠎⠞⠊⠇⠇⠀⠥⠎⠑⠀⠁⠀⠃⠁⠝⠅⠀⠺⠑⠀⠎⠞⠊⠇⠇⠀⠥⠎⠑⠀⠕⠥⠗⠀⠑⠭⠊⠎⠞⠊⠝⠛⠀⠃⠗⠊⠉⠅⠤⠁⠝⠙⠤⠍⠕⠗⠞⠁⠗⠀⠞⠗⠁⠙⠊⠞⠊⠕⠝⠁⠇⠀⠲⠴⠴⠀⠽⠑⠁⠗⠀⠕⠇⠙⠀⠃⠁⠝⠅⠊⠝⠛⠀⠕⠗⠛⠁⠝⠊⠵⠁⠞⠊⠕⠝⠀⠊⠞⠄⠎⠀⠚⠥⠎⠞⠀⠞⠓⠁⠞⠀⠺⠑⠀⠁⠉⠉⠑⠎⠎⠀⠞⠓⠑⠍⠀⠞⠓⠗⠕⠥⠛⠓⠀⠁⠀⠺⠑⠃⠀⠏⠁⠛⠑⠀⠊⠞⠀⠓⠁⠎⠝⠄⠞⠀⠗⠑⠁⠇⠇⠽⠀⠁⠇⠞⠑⠗⠑⠙⠀⠎⠕⠉⠊⠑⠞⠽⠀⠊⠞⠀⠗⠑⠁⠇⠇⠽⠀⠺⠁⠎⠝⠄⠞⠀⠞⠗⠁⠝⠎⠋⠕⠗⠍⠁⠞⠊⠧⠑⠀⠁⠝⠙⠀⠊⠀⠞⠓⠊⠝⠅⠀⠞⠓⠁⠞⠄⠎⠀⠞⠓⠁⠞⠄⠎⠀⠑⠧⠑⠗⠍⠕⠗⠑⠀⠉⠇⠑⠁⠗⠀⠺⠓⠑⠝⠀⠺⠑⠀⠺⠓⠑⠝⠀⠺⠑⠀⠞⠓⠊⠝⠅⠀⠁⠃⠕⠥⠞⠀⠋⠁⠉⠑⠃⠕⠕⠅⠀⠁⠝⠙⠀⠺⠑⠀⠞⠓⠊⠝⠅⠀⠁⠃⠕⠥⠞⠀⠛⠕⠕⠛⠇⠑⠀⠞⠓⠑⠎⠑⠀⠁⠗⠑⠀⠝⠕⠞⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠺⠕⠗⠅⠊⠝⠛⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠏⠑⠕⠏⠇⠑⠀⠺⠕⠗⠅⠊⠝⠛⠀⠞⠕⠛⠑⠞⠓⠑⠗⠀⠊⠝⠀⠗⠑⠁⠇⠊⠞⠽⠀⠞⠓⠑⠽⠄⠗⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠅⠊⠝⠙⠎⠀⠕⠋⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠎⠀⠞⠓⠁⠞⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠓⠊⠑⠗⠁⠗⠉⠓⠊⠉⠁⠇⠀⠕⠗⠛⠁⠝⠊⠵⠁⠞⠊⠕⠝⠎⠀⠞⠓⠁⠞⠀⠓⠁⠧⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠉⠑⠝⠞⠗⠁⠇⠊⠵⠑⠙⠀⠃⠁⠝⠅⠀⠁⠉⠉⠕⠥⠝⠞⠎⠀⠞⠓⠁⠞⠀⠓⠁⠧⠑⠀⠞⠓⠑⠀⠎⠁⠍⠑⠀⠎⠕⠗⠞⠀⠕⠋⠀⠍⠥⠇⠞⠊⠝⠁⠞⠊⠕⠝⠁⠇⠀⠎⠞⠗⠥⠉⠞⠥⠗⠑⠀⠁⠎⠀⠁⠇⠇⠀⠕⠋⠀⠞⠓⠑⠀⠧⠁⠗⠊⠕⠥⠎⠀⠕⠞⠓⠑⠗⠀⠋⠕⠗⠞⠥⠝⠑⠀⠢⠴⠴⠀⠉⠕⠗⠏⠕⠗⠁⠞⠑⠀⠉⠕⠍⠏⠁⠝⠊⠑⠎⠀⠊⠝⠀⠗⠑⠁⠇⠊⠞⠽⠀⠞⠕⠀⠉⠓⠁⠝⠛⠑⠀⠎⠕⠉⠊⠑⠞⠽⠀⠺⠑⠀⠗⠑⠁⠇⠇⠽⠀⠝⠑⠑⠙⠀⠞⠕⠀⠙⠕⠀⠎⠕⠍⠑⠞⠓⠊⠝⠛⠀⠃⠑⠞⠞⠑⠗⠀⠞⠓⠁⠝⠀⠉⠗⠑⠁⠞⠊⠝⠛⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠊⠑⠎⠀⠞⠓⠁⠞⠀⠚⠥⠎⠞⠀⠁⠇⠇⠕⠺⠀⠥⠎⠀⠞⠕⠀⠍⠊⠗⠗⠕⠗⠀⠓⠕⠺⠀⠎⠕⠉⠊⠑⠞⠽⠀⠺⠕⠗⠅⠎⠀⠁⠝⠽⠺⠁⠽⠀⠺⠑⠀⠝⠑⠑⠙⠀⠞⠕⠀⠉⠗⠑⠁⠞⠑⠀⠞⠑⠉⠓⠝⠕⠇⠕⠛⠊⠑⠎⠀⠞⠓⠁⠞⠀⠋⠕⠗⠛⠑⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠺⠕⠗⠅⠀⠺⠊⠞⠓⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠁⠝⠙⠀⠞⠓⠁⠞⠄⠎⠀⠙⠊⠋⠋⠑⠗⠑⠝⠞⠀⠞⠕⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠉⠕⠍⠍⠥⠝⠊⠉⠁⠞⠑⠀⠺⠊⠞⠓⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠊⠞⠄⠎⠀⠁⠇⠎⠕⠀⠛⠕⠞⠀⠞⠕⠀⠃⠑⠀⠝⠑⠺⠀⠺⠁⠽⠎⠀⠕⠋⠀⠃⠑⠊⠝⠛⠀⠁⠃⠇⠑⠀⠞⠕⠀⠕⠗⠛⠁⠝⠊⠵⠑⠀⠁⠝⠙⠀⠞⠗⠥⠎⠞⠀⠞⠓⠁⠞⠀⠑⠁⠉⠓⠀⠕⠞⠓⠑⠗⠀⠊⠎⠀⠛⠕⠊⠝⠛⠀⠞⠕⠀⠙⠕⠀⠺⠓⠁⠞⠀⠺⠓⠁⠞⠀⠞⠓⠑⠽⠀⠝⠑⠑⠙⠀⠞⠕⠀⠙⠕⠀⠊⠝⠀⠕⠗⠙⠑⠗⠀⠞⠕⠀⠓⠁⠧⠑⠀⠎⠕⠍⠑⠀⠎⠕⠗⠞⠀⠕⠋⠀⠎⠓⠁⠗⠑⠙⠀⠉⠕⠝⠉⠇⠥⠎⠊⠕⠝⠀⠕⠗⠀⠗⠁⠍⠊⠋⠊⠉⠁⠞⠊⠕⠝⠀⠞⠕⠀⠞⠓⠑⠀⠉⠕⠕⠏⠑⠗⠁⠞⠊⠕⠝⠀⠁⠝⠙⠀⠞⠓⠁⠞⠄⠎⠀⠗⠑⠁⠇⠇⠽⠀⠁⠀⠃⠊⠛⠀⠉⠕⠍⠏⠕⠝⠑⠝⠞⠀⠕⠋⠀⠺⠑⠃⠀⠒⠀⠺⠑⠃⠀⠒⠀⠊⠎⠀⠗⠑⠁⠇⠇⠽⠀⠁⠃⠕⠥⠞⠀⠁⠇⠇⠕⠺⠊⠝⠛⠀⠏⠑⠕⠏⠇⠑⠀⠞⠕⠀⠉⠕⠍⠑⠀⠞⠕⠛⠑⠞⠓⠑⠗⠀⠁⠝⠙⠀⠉⠕⠕⠗⠙⠊⠝⠁⠞⠑⠀⠞⠓⠑⠊⠗⠀⠑⠋⠋⠕⠗⠞⠎⠀⠋⠕⠗⠀⠎⠕⠍⠑⠞⠓⠊⠝⠛⠀⠛⠗⠑⠁⠞⠑⠗⠀⠞⠓⠑⠀⠞⠓⠁⠝⠀⠞⠓⠑⠀⠎⠥⠍⠀⠕⠋⠀⠊⠞⠎⠀⠏⠁⠗⠞⠎⠀⠪⠍⠥⠎⠊⠉⠻

With Git Actions Workflow file for this run as example in real-time

available OS's: [ windows-latest, macos-latest, ubuntu-latest ]

name: Cross-platform matrix run
on: [push]
jobs:
  build:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest]
        python-version: ['3.6', '3.9']
        exclude:
          - os: ubuntu-latest
            python-version: '3.6'
    steps:
      - uses: actions/[email protected]
      - name: Set up Python
        uses: actions/[email protected]
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies 
        run: pip install youtube_transcript_api scrapetube codext
      - name: Web3 Foundation videos to braille language 
        run: python tube.py

For Support && Nominations

  • Display name. KSMNETWORK

  • Email [email protected]

  • Riot @gtoocool:matrix.org

  • KUSAMA (KSM) Address

  • H1bSKJxoxzxYRCdGQutVqFGeW7xU3AcN6vyEdZBU7Qb1rsZ

  • PolkaDOT (DOT) Address:

  • 15FxvBFDd3X7H9qcMGqsiuvFYEg4D3mBoTA2LQufreysTHKA

  • https://ksm.network

Owner
Little Endian
Riot @gtoocool:matrix.org                  KUSAMA Address:  H1bSKJxoxzxYRCdGQutVqFGeW7xU3AcN6vyEdZBU7Qb1rsZ
Little Endian
Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognit

SpeechBrain 5.1k Jan 09, 2023
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022
Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

Neelanjan Manna 1 Dec 22, 2021
ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

YITUTech 237 Dec 10, 2022
Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

CvarAdversarialRL Official code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning". Initial setup Create a virtual

Mathieu Godbout 1 Nov 19, 2021
PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

PRAnCER (Platform enabling Rapid Annotation for Clinical Entity Recognition) is a web platform that enables the rapid annotation of medical terms within clinical notes. A user can highlight spans of

Sontag Lab 39 Nov 14, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 01, 2023
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in s

Jonas Belouadi 7 Nov 07, 2022
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

Libo Qin 132 Nov 25, 2022
[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 86 Dec 28, 2022
ChatterBot is a machine learning, conversational dialog engine for creating chat bots

ChatterBot ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on

Gunther Cox 12.8k Jan 03, 2023
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Kundan Krishna 6 Jun 04, 2021
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022
Creating an LSTM model to generate music

Music-Generation Creating an LSTM model to generate music music-generator Used to create basic sin wave sounds music-ai Contains the functions to conv

Jerin Joseph 2 Dec 02, 2021
GooAQ 🥑 : Google Answers to Google Questions!

This repository contains the code/data accompanying our recent work on long-form question answering.

AI2 112 Nov 06, 2022
Utilities for preprocessing text for deep learning with Keras

Note: This utility is really old and is no longer maintained. You should use keras.layers.TextVectorization instead of this. Utilities for pre-process

Hamel Husain 180 Dec 09, 2022