A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Last update: Sep 07, 2022

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

This repo shows my project about real-time stock data pipeline. All the code is written in PYTHON. In this project, I play with various Data Engineering frameworks to develop a financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh. I used Kafka for realtime stock price and market news streaming, Cassandra for historical and realtime stock data warehousing, and Bokeh for visualization on web browsers. I also wrote a web crawler to scrape companys' financial statements and basic information from Yahoo Finance, and played with various economy data APIs.

Architecture

There are currently 3 tabs in the webpage:

Stock: Streaming & Fundamental
- Single stock's candlestick plot, basic company & financial information;
- Realtime S&P500 price during trading hours (fake date during non-trading hours)
Stock: Comparison
- 2 user-selected stocks' price, and their statstical summay and correlation
- 5,10,30-day moving average of adjusted close price
Economy
- Geomap of various economy data by state
- 4 economy indicators nationwide for comparison
- The most recent market news

Here is the architecture of the platform.

How Stock Data is Streamed via Kafka to Cassandra:

Please check each tab's screenshot:

Tab 1:

Tab 2:

Tab 3:

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

Architecture

Owner

Toolchest provides APIs for scientific and bioinformatic data analysis.

Full automated data pipeline using docker images

Data analysis and visualisation projects from a range of individual projects and applications

Fancy data functions that will make your life as a data scientist easier.

Automated Exploration Data Analysis on a financial dataset

A Python Tools to imaging the shallow seismic structure

Basis Set Format Converter

Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Flenser is a simple, minimal, automated exploratory data analysis tool.

Common bioinformatics database construction

Gathering data of likes on Tinder within the past 7 days

SparseLasso: Sparse Solutions for the Lasso

Statistical package in Python based on Pandas

Business Intelligence (BI) in Python, OLAP

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

Hidden Markov Models in Python, with scikit-learn like API

Building house price data pipelines with Apache Beam and Spark on GCP