whylogs Workshop

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs - The open source standard for data logging (Don't forget to give it a star!)

Workshop

In this hands-on workshop, we’ll learn how to set up a system for monitoring your data pipelines, ensuring data quality and detecting changes in your data.

Without data monitoring, it’s impossible to guarantee to your stakeholders that the data that they are using for their analytics and machine learning use cases is trustworthy. By setting up a data observability system, you’ll be able to get visibility into the health of your data pipelines, thus building your customers’ trust in your work.

We’ll cover the following:

Introduction to data observability and monitoring
whylogs — the open source standard for data logging
How to monitor batch Python or Spark data pipelines with whylogs
How to monitor Kafka streaming pipelines with whylogs

By the end of this workshop, you’ll be able to set up such a system yourself.

Code

This repository contains files that are needed for the workshop:

ccloud_lib.py - file for connecting to confluent cloud
confluent_credentials.txt - template for configuration (put your credentials there - but don't commit them!)
producer.py - the code for putting events to Kafka
requirements.txt - all the dependencies for the workshop

Confluent cloud

For this workshop, you'll need

Account in Deepnote
Account in Confluent cloud (instructions)

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Related tags

Overview

whylogs Workshop

Workshop

Code

Confluent cloud

Owner

DataTalksClub

ConvBERT: Improving BERT with Span-based Dynamic Convolution

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

Code associated with the Don't Stop Pretraining ACL 2020 paper

Asr abc - Automatic speech recognition(ASR),中文语音识别

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

MEDIALpy: MEDIcal Abbreviations Lookup in Python

An end to end ASR Transformer model training repo

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Crowd sourced training data for Rasa NLU models

Installation, test and evaluation of Scribosermo speech-to-text engine

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

A list of NLP(Natural Language Processing) tutorials

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Code and data accompanying Natural Language Processing with PyTorch

Journey is a NLP-Powered Developer assistant

Crie tokens de autenticação íntegros e seguros com UToken.

Multilingual word vectors in 78 languages