Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Last update: Apr 05, 2022

Related tags

Web Crawling github-scraper-app

Overview

Github Scraper

Github scraper app is used to scrape data for a specific user profile.
Github scraper app gets a github profile name and check whether the given user name is exists or not.
If the user name exists, app will scrape the data from that github profile.
If the user name doesn't exists, app displays a info message.
You can download the scraped data in CSV,JSON and pandas profiling HTML report formats.

Installation :-

To install all necessary requirement packages for the app 👇

pip install -r requirements.txt

Packages Used :-

import requests
import pandas as pd
import streamlit as st
from bs4 import BeautifulSoup
from pandas_profiling import ProfileReport
from streamlit_pandas_profiling import st_profile_report

Function To Scrape the Data :-

def ScrapeData(user_name):
    url = "https://github.com/{}?tab=repositories".format(user_name)
    page = requests.get(url) 
    soup = BeautifulSoup(page.content, "html.parser")
    info = {"name": soup.find(class_="vcard-fullname").get_text()}
    info["image_url"] = soup.find(class_="avatar-user")["src"]
    info["followers"] = (
        soup.select_one("a[href*=followers]").get_text().strip().split("\n")[0]
    )
    info["following"] = (
        soup.select_one("a[href*=following]").get_text().strip().split("\n")[0]
    )

    try:
        info["location"] = soup.select_one("li[itemprop*=home]").get_text().strip()
    except:
        info["location"] = ""

    try:
        info["url"] = soup.select_one("li[itemprop*=url]").get_text().strip()
    except:
        info["url"] = ""

    repositories = soup.find_all(class_="source")
    repo_info = []
    for repo in repositories:
        try:
            name = repo.select_one("a[itemprop*=codeRepository]").get_text().strip()
            link = "https://github.com/{}/{}".format(user_name, name)
        except:
            name = ""
            link = ""
            
        try:
            updated = repo.find("relative-time").get_text()
        except:
            updated = ""

        try:
            language = repo.select_one("span[itemprop*=programmingLanguage]").get_text()
        except:
            language = ""

        try:
            description = repo.select_one("p[itemprop*=description]").get_text().strip()
        except:
            description = ""

        repo_info.append(
            {
                "name": name,
                "link": link,
                "updated ": updated,
                "language": language,
                "description": description,
            }
        )
    repo_info = pd.DataFrame(repo_info)
    return info, repo_info

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Related tags

Overview

Github Scraper

Installation :-

Packages Used :-

Function To Scrape the Data :-

Demo GIF Image 👇 :-

Owner

Siva Prakash

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

A Powerful Spider(Web Crawler) System in Python.

Scraping web pages to get data

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

An arxiv spider

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapes the Sun Life of Canada Philippines web site for historical prices of their investment funds and then saves them as CSV files.

A simple reddit scraper to get memes (only images) from r/ProgrammerHumor.

A simple flask application to scrape gogoanime website.

Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

京东云无线宝积分推送，支持查看多设备积分使用情况

Lovely Scrapper

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

A web crawler for recording posts in "sina weibo"

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Automated Linkedin bot that will improve your visibility and increase your network.

A web scraper for nomadlist.com, made to avoid website restrictions.

A universal package of scraper scripts for humans

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫，以及完善的爬虫报警机制。