A universal package of scraper scripts for humans

Last update: Dec 15, 2022

Related tags

Web Crawling Scrapera

Overview

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Usage
Contributing
Sponsors
License
Contact
Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

Images

Text

Audio

Youtube Playlist Scraper

Videos

Miscellaneous

Yahoo Stocks Scraper

The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

Prerequisites

Prerequisites can be installed separately through the requirements.txt file as below

pip install -r requirements.txt

Installation

Scrapera is built with Python 3 and can be pip installed directly

pip install scrapera

Alternatively, if you wish to install the latest version directly through GitHub then run

pip install git+https://github.com/DarshanDeshpande/Scrapera.git

Usage

To use any sub-module, you just need to import, instantiate and execute

from scrapera.video.vimeo import VimeoScraper
scraper = VimeoScraper()
scraper.scrape('https://vimeo.com/191955190', '540p')

For more examples, please refer to the individual test folders in respective modules

Contributing

Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
For more guidelines, refer to CONTRIBUTING

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Feel free to reach out for any issues or requests related to Scrapera

Darshan Deshpande (Owner) - Email | LinkedIn

Acknowledgements

PyTube

A universal package of scraper scripts for humans

Related tags

Overview

About The Project

Prerequisites

Installation

Usage

Contributing

License

Sponsors

Contact

Acknowledgements

Owner

A low-code tool that generates python crawler code based on curl or url

Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

Amazon scraper using scrapy, a python framework for crawling websites.

Simple library for exploring/scraping the web or testing a website you’re developing

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

Subscrape - A Python scraper for substrate chains

Python Web Scrapper Project

Script used to download data for stocks.

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Snowflake database loading utility with Scrapy integration

SkyScrapers: A collection of variety of Scraping Apps

Google Developer Profile Badge Scraper

12306抢票脚本

学习强国自动化百分百正确、瞬间答题，分值45分

This program scrapes information and images for movies and TV shows.

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

Deep Web Miner Python | Spyder Crawler

A Very simple free proxy list scraper.

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

A universal package of scraper scripts for humans

Related tags

Overview

About The Project

Prerequisites

Installation

Usage

Contributing

License

Sponsors

Contact

Acknowledgements

Owner

A low-code tool that generates python crawler code based on curl or url

Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

Amazon scraper using scrapy, a python framework for crawling websites.

Simple library for exploring/scraping the web or testing a website you’re developing

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤（从2月份稳定运行至今）

Subscrape - A Python scraper for substrate chains

Python Web Scrapper Project

Script used to download data for stocks.

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Snowflake database loading utility with Scrapy integration

SkyScrapers: A collection of variety of Scraping Apps

Google Developer Profile Badge Scraper

12306抢票脚本

学习强国 自动化 百分百正确、瞬间答题，分值45分

This program scrapes information and images for movies and TV shows.

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

Deep Web Miner Python | Spyder Crawler

A Very simple free proxy list scraper.

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

学习强国自动化百分百正确、瞬间答题，分值45分