Integrate bus data from a variety of sources (batch processing and real time processing).

Last update: Nov 25, 2021

Related tags

Data Analysis bus_data_ingestion_pipeline

Overview

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and real time processing)

Technique:

Python
Application: Kafka, MQTT Explorer, Grafana, Influxdb, MS VS Studio 2019, MS SQL Server, PowerBI Desktop
Framework: kafka-python, numpy, paho-mqtt, pandas, pyodbc, pyspark
Database: sql -- install MS SQL Server
Evironment: window 10 64bit
Editor: cmd

Workflow:

Import raw data offline from csv, txt file source into DataLake (stored in MS SQL Server) with python. Then ETL (Extract Transform Load) data from DataLake into Data Warehouse with SSIS (SQL Server Integration Services).
Setup schedule for pipeline ETL.
Modeling and Visualization from DWH.
Crawl the online General Transport Feed Spec (GTFS) file into JSON file. Convert from Protobuf to JSON file or CSV then save it to my database with python and kafka streaming. Source: https://developer.nationaltransport.ie/
Streaming and draw the data into the dashboard to show the performance by sensor data with paho-mqtt (or kafka-python) and BI tool Grafana.

Output:

Data pipeline from data sources into target data.
Data stored in Data warehouse for analysis.
Raw data from Crawl the online General Transport Feed Spec.
Real-time dashboard with streaming processing.

Next Step:

Analysis data in DWH
Build Real-time dashboard for raw data from Crawl the online General Transport Feed Spec.

Owner

GitHub Repository

This repository contains some analysis of possible nerdle answers

Nerdle Analysis https://nerdlegame.com/ This repository contains some analysis of possible nerdle answers. Here's a quick overview: nerdle.py contains

0 Dec 16, 2022

yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.

The yt Project yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data. yt supports structured, varia

367 Dec 25, 2022

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

95 Dec 13, 2022

Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

1 Apr 12, 2022

Weather Image Recognition - Python weather application using series of data

Weather Image Recognition - Python weather application using series of data

1 Feb 04, 2022

Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Spectacular AI SDK examples Spectacular AI SDK fuses data from cameras and IMU sensors (accelerometer and gyroscope) and outputs an accurate 6-degree-

94 Jan 04, 2023

Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

1 Jan 03, 2022

Stochastic Gradient Trees implementation in Python

Stochastic Gradient Trees - Python Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on th

2 Nov 18, 2022

CSV database for chihuahua (HUAHUA) blockchain transactions

super-fiesta Shamelessly ripped components from https://github.com/hodgerpodger/staketaxcsv - Thanks for doing all the hard work. This code does only

1 Jan 07, 2022

A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

8 Feb 15, 2022

An Indexer that works out-of-the-box when you have less than 100K stored Documents

U100KIndexer An Indexer that works out-of-the-box when you have less than 100K stored Documents. U100K means under 100K. At 100K stored Documents with

7 Mar 15, 2022

Exploratory data analysis

Exploratory data analysis An Exploratory data analysis APP TAPIWA CHAMBOKO 🚀 About Me I'm a full stack developer experienced in deploying artificial

1 Nov 07, 2021

Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

0 Jul 05, 2022

Wafer Fault Detection - Wafer circleci with python

Wafer Fault Detection Problem Statement: Wafer (In electronics), also called a slice or substrate, is a thin slice of semiconductor, such as a crystal

14 Nov 21, 2022

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

🧪📈 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python a

97 Dec 08, 2022

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

35 Jan 04, 2023

Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

17 Aug 21, 2022

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

2 Feb 14, 2022

A DSL for data-driven computational pipelines

"Dataflow variables are spectacularly expressive in concurrent programming" Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum Quick overview Ne

1.9k Jan 03, 2023

PyClustering is a Python, C++ data mining library.

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each

1k Jan 05, 2023