A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    A low-code tool that generates python crawler code based on curl or url

    KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co

    8 Sep 20, 2021
    Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

    Lexile-Atos-Scraper Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN You will need to install the chrome webdriver if you have n

    1 Feb 11, 2022
    Amazon scraper using scrapy, a python framework for crawling websites.

    #Amazon-web-scraper This is a python program, which use scrapy python framework to crawl all pages of the product and scrap products data. This progra

    Akash Das 1 Dec 26, 2021
    Simple library for exploring/scraping the web or testing a website you’re developing

    Robox is a simple library with a clean interface for exploring/scraping the web or testing a website you’re developing. Robox can fetch a page, click on links and buttons, and fill out and submit for

    Dan Claudiu Pop 79 Nov 27, 2022
    python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤(从2月份稳定运行至今)

    python+selenium实现的web端自动打卡 说明 本打卡脚本适用于郑州大学健康打卡,其他web端打卡也可借鉴学习。(自己用的,从2月分稳定运行至今) 仅供学习交流使用,请勿依赖。开发者对使用本脚本造成的问题不负任何责任,不对脚本执行效果做出任何担保,原则上不提供任何形式的技术支持。 为防止

    Sunday 1 Aug 27, 2022
    Subscrape - A Python scraper for substrate chains

    subscrape A Python scraper for substrate chains that uses Subscan. Usage copy co

    ChaosDAO 14 Dec 15, 2022
    Python Web Scrapper Project

    Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

    Jordan Ítalo Amaral 2 Jan 04, 2022
    Script used to download data for stocks.

    This script is useful for downloading stock market data for a wide range of companies specified by their respective tickers. The script reads in the d

    Carmelo Gonzales 71 Oct 04, 2022
    A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

    GNews 🚩 A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response 🚩 As well as you can fetch full

    Muhammad Abdullah 273 Dec 31, 2022
    Snowflake database loading utility with Scrapy integration

    Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

    Oleg T. 0 Dec 06, 2021
    SkyScrapers: A collection of variety of Scraping Apps

    SkyScrapers Collection of variety of Web Scraping Apps The web-scrapers involved

    Biplov Pokhrel 3 Feb 17, 2022
    Google Developer Profile Badge Scraper

    Google Developer Profile Badge Scraper It is a Google Developer Profile Web Scraper which scrapes for specific badges in a user's Google Developer Pro

    Hemant Sachdeva 2 Feb 22, 2022
    12306抢票脚本

    12306抢票脚本

    罐子里的茶 457 Jan 05, 2023
    学习强国 自动化 百分百正确、瞬间答题,分值45分

    项目简介 学习强国自动化脚本,解放你的时间! 使用Selenium、requests、mitmpoxy、百度智能云文字识别开发而成 使用说明 注:Chrome版本 驱动会自动下载 首次使用会生成数据库文件db.db,用于提高文章、视频任务效率。 依赖安装 pip install -r require

    lisztomania 359 Dec 30, 2022
    This program scrapes information and images for movies and TV shows.

    Media-WebScraper This program scrapes information and images for movies and TV shows. Summary For more information on the program, read the WebScrape_

    1 Dec 05, 2021
    An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

    An introduction to free, automated web scraping with GitHub’s powerful new Actions framework Published at palewi.re/docs/first-github-scraper/ Contrib

    Ben Welsh 15 Nov 24, 2022
    Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

    GetTss python Package extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file. Install $ pip install GetTss Us

    laojunjun 6 Nov 21, 2022
    Deep Web Miner Python | Spyder Crawler

    Webcrawler written in Python. This crawler does dig in till the 3 level of inside addressed and mine the respective data accordingly

    Karan Arora 17 Jan 24, 2022
    A Very simple free proxy list scraper.

    Scrappp A Very simple free proxy list scraper, made in python The tool scrape proxy from diffrent sites and api's. Screenshots About the script !!! RE

    Joji aka Moncef 12 Oct 27, 2022
    A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

    combined-shop-scraper A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items. Features Define an

    2 Dec 13, 2021