Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Upbit(업비트) Cryptocurrency Exchange OPEN API Client for Python

Base Repository Python Upbit Client Repository Upbit OPEN API Client @Author: uJhin @GitHub: https://github.com/uJhin/upbit-client/ @Officia

Yu Jhin 37 Nov 06, 2022
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

TWINT - Twitter Intelligence Tool No authentication. No API. No limits. Twint is an advanced Twitter scraping tool written in Python that allows for s

TWINT Project 14.2k Jan 03, 2023
MSE5050/7050 Materials Informatics course at the University of Utah

MaterialsInformatics MSE5050/7050 Materials Informatics course at the University of Utah This github repo contains coursework content such as class sl

41 Dec 30, 2022
AWS DeepRacer Free Student Workshop: Run faster by using your custom waypoints

AWS DeepRacer Free Student Workshop: Run faster by using your custom waypoints Reward Function Template for waypoints def reward_function(params):

Yuen Cheuk Lam 88 Nov 27, 2022
42-event-notifier - 42 Event notifier using 42API and Github Actions

42 Event Notifier 42서울 Agenda에 새로운 이벤트가 등록되면 알려드립니다! 현재는 Github Issue로 등록되므로 상단

6 May 16, 2022
Project developed as part of a selection process for the company Denox

📝 Tabela de conteúdos Sobre Requisitos para rodar o projeto Instalação Rotas da API Observações 🧐 Sobre Projeto desenvolvido como parte de um proces

Ícaro Sant'Ana 1 Mar 01, 2022
This is a simple program that uses Python and pyTwitchAPI to retrieve the list of users in a streamer's chat and then checks each one of these users to see if they follow the broadcaster or not

This is a simple program that uses Python and pyTwitchAPI to retrieve the list of users in a streamer's chat and then checks each one of these users to see if they follow the broadcaster or not

RwinShow 57 Dec 18, 2022
A code that can make your 5 accounts stay 24/7 in a discord voice channel!

Voicecord A code that can make your 5 accounts stay 24/7 in a discord voice channel! Usage ・Fork the repo ・Clone it to replit ・Install the required pa

DraKenCodeZ 3 Jan 09, 2022
A updated and improved version from the original Discord-Netflix from Nirewen.

Discord-Netflix A updated version from the original Discord-Netflix from nirewen A Netflix wrapper that uses Discord RPC to show what you're watching

Void 42 Jan 02, 2023
ESOLinuxAddonManager - Very simple addon manager for Elder Scrolls Online running on Linux.

ESOLinuxAddonManager Very simple addon manager for Elder Scrolls Online running on Linux. Well, more a downloader for now. Currently it's quite ugly b

Akseli 25 Aug 28, 2022
Python wrapper for CoWin API's

Cowin Tracker Python API wrapper for CoWin, India's digital platform launched by the government to help citizens register themselves for the vaccinati

Saiprasad Balasubramanian 43 Jun 11, 2022
A python package to easy the integration with Direct Online Pay (Mpesa, TigoPesa, AirtelMoney, Card Payments)

A python package to easy the integration with Direct Online Pay (DPO) which easily allow you easily integrate with payment options once without having to deal with each of them individually;

Jordan Kalebu 2 Nov 25, 2021
Threat Intel Platform for T-POTs

T-Pot 20.06 runs on Debian (Stable), is based heavily on docker, docker-compose

Deutsche Telekom Security GmbH 4.3k Jan 07, 2023
A Python library for the Docker Engine API

Docker SDK for Python A Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run c

Docker 6.1k Jan 03, 2023
Telegram üzerinden paylaşılan kısa linkleri geçmenin daha hızlı bir yolu

Telegram Url skipper Telegramda paylaşılan kısa linkleri geçmenin daha hızlı bir yolu · Hata Raporla · Öneri Yap İçerik Tablosu Kurulum Kullanım Lisan

WarForPeace 6 Oct 07, 2022
Esse script procura qualquer, dados que você queira na wikipedia! Em breve traremos um com dados em toda a internet.

Buscador de dados simples Dependências necessárias Para você poder começar a utilizar esta ferramenta, você vai precisar da dependência "wikipedia", p

Erick Campoy 4 Feb 24, 2022
A python script to extract information from a Microsoft Remote Desktop Web Access (RDWA) application

This python script allow to extract various information from a Microsoft Remote Desktop Web Access (RDWA) application, such as the FQDN of the remote server, the internal AD domain name (from the FQD

Podalirius 60 Dec 09, 2022
Automate coin farming for dankmemer. Unlimited accounts at once. Uses a proxy

dankmemer-farm Simple script to farm Dankmemer coins with multiple accounts at once. Requires: Proxies, Discord Tokens Disclaimer I don't take respons

Scobra 12 Dec 20, 2022
Telegram PHub Bot using ARQ Api and Pyrogram. This Bot can Download and Send PHub HQ videos in Telegram using ARQ API.

Tg_PHub_Bot Telegram PHub Bot using ARQ Api and Pyrogram. This Bot can Download and Send PHub HQ videos in Telegram using ARQ API. OS Support All linu

TheProgrammerCat 13 Oct 21, 2022
Python app to notify via slack channel the status_code change from an URL

Python app to notify, via slack channel you choose to be notified, for the status_code change from the URL list you setup to be checked every yy seconds

Pedro Nunes 1 Oct 25, 2021