Pipelines de datos, 2021.

Last update: May 19, 2022

Related tags

Overview

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi.

Stack principal

Python 3.7+
Streamlit
Scikit-learn
Pandas
Luigi

Idea

El proceso completo es descrito en una app interactiva que encuentras en el script app.py. Checa los detalles de cómo levantar la app en la sección de cómo ejecutar los scripts.

Setup

Crea un entorno virtual (te recomiendo usar conda):
```
conda create --name data-pipes python=3.7
```
Activate the virtual environment:
```
conda activate data-pipes
```
Install requirements:
```
pip install -r requirements.txt
```

Ejecuta los scripts

App interactiva

Para ejecutar la app interactiva, simplemente ejecuta el comando de Streamlit con el entorno virtual activado:

(data-pipes) streamlit run app.py

Esto abrirá un servidor local en: http://localhost:8501.

Pipeline de datos

Si deseas ejecutar una tarea en específico ,supongamos la TareaX que se encuentra en el script tareas.py, entonces ejecuta el comando:

PYTHONPATH=. luigi --module tareas TareaX --local-scheduler

¡Puedes extender el código y agregar las tareas que tú desees!

Pipelines de datos, 2021.

Related tags

Overview

Stack principal

Idea

Setup

Ejecuta los scripts

App interactiva

Pipeline de datos

Owner

Rodolfo Ferro

Pipelines de datos, 2021.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

A single model that parses Universal Dependencies across 75 languages.

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

neural network based speaker embedder

Perform sentiment analysis and keyword extraction on Craigslist listings

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Write Python in Urdu - اردو میں کوڈ لکھیں

Python functions for summarizing and improving voice dictation input.

Python package for Turkish Language.

NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

Tracking Progress in Natural Language Processing

Searching keywords in PDF file folders