爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

Last update: Jan 05, 2023

Overview

lxSpider

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说网站、招标采购网》

简介：

时光荏苒，记不清写了多少案例了。作者文章发布在csdn，代码随后往github上更新。csdn部分文章为收费案例，合理订阅。

声明：

本库以教学为基准、本库提供的可操作性不得用于任何商业用途和违法违规场景。
作者对任何原因在使用本库中提供的代码和策略时可能对用户自己或他人造成的任何形式的损失和伤害不承担责任。
因本库引起的或与之有关的任何争议，各方应友好协商解决，协商不成的任何后果与作者无关。

专栏

网络爬虫基础：适合有python语法基础准备学爬虫的同学

web逆向基础：有爬虫经验即可（包含猿人学爬虫题目解析）

安卓逆向基础：工具介绍、逆向记录、案例分享

爬虫案例合集：付费专栏、经典案例、持续更新

博客

交流

Releases(快手弹幕采集工具)

快手弹幕采集工具(Jan 30, 2021)
使用说明：

1、启动dist目录下的run.exe程序。

2、填入主播uid，你的cookie，房间id

3、点击启动后，等待即可，不可重复点击。

4、需要确认主播当前是否还在直播。

参数获取：

主播uid：浏览器上的网址最后一个参数。

比如网址为： https://live.kuaishou.com/u/yingjia2019

主播的uid为： yingjia2019

你的cookie：

1、打开控制台，鼠标右键点击审查元素或者按F12.

2、点击控制台的Network。

3、刷新页面，可已按F5刷新

4、找到和主播uid一样html文件，然后点击右侧的headers

5、鼠标划到最下面找到cookie一行。复制里面的did=web_xxxxxxxxxxxxxx;

6、需要在软件上填入的cookie是 web_xxxxxxxxxxxxxx

房间id：

1、点击控制台的 Elements，按ctrl+F，打开搜索框。输入： live-stream-id

2、复制 live-stream-id="Zo9Upaz8w90"

3、要输入的房间id是 Zo9Upaz8w90

运行时最好保持页面打开，关闭页面后过一段时间会导致cookie失效。

此工具以学习为主，禁止滥用
Source code(tar.gz)
Source code(zip)
default.rar(21.47 MB)
小说下载器(Feb 2, 2021)
简介

1、小说下载(优势：速度快，直接从网络上搜集完整txt文件速度快) 2、在线小说爬取(优势：资源全，已上架的小说几乎都能找到)

特别声明:

本脚本仅用于测试和学习研究，禁止用于商业用途，不能保证其合法性，准确性，完整性和有效性，请根据情况自行判断。

本项目内所有资源文件，禁止任何公众号、自媒体进行任何形式的转载、发布。

本项目内任何脚本问题概不负责，包括但不限于由任何脚本错误导致的任何损失或损害.

请勿将项目的任何内容用于商业或非法目的，否则后果自负。

本项目遵循GPL-3.0 License协议，如果本特别声明与GPL-3.0 License协议有冲突之处，以本特别声明为准。

Source code(tar.gz)
Source code(zip)
default.zip(44.16 MB)

Owner

lx

Every noble work is at first impossible.

GitHub Repository

SmartScraper: 简单、自动、快捷的Python网络爬虫

SmartScraper: 简单、自动、快捷的Python网络爬虫 Note: The origin developer of SmartScraper is Alireza Mika， I only change a little code of AutoScraper. SmartScraper

9 Apr 16, 2022

Screen scraping and web crawling framework

Pomp Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the

61 Jun 21, 2021

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

543 Jan 03, 2023

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

1 Dec 20, 2021

Subscrape - A Python scraper for substrate chains

subscrape A Python scraper for substrate chains that uses Subscan. Usage copy co

14 Dec 15, 2022

script to scrape direct download links (ddls) from google drive index.

bhadoo Google Personal/Shared Drive Index scraper. A small script to scrape direct download links (ddls) of downloadable files from bhadoo google driv

53 Dec 16, 2022

Crawler in Python 3.7, 3.8. 3.9. Pypy3

Description Python Crawler written Python 3. (Supports major Python releases Python3.6, Python3.7 and Python 3.8) Installation and Use Setup VirtualEn

2 Mar 12, 2022

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

crawler_for_university 用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。环境依赖 wxpy,requests,bs4等库功能描述该项目基于python，通过爬虫爬各高校的就业信息网，爬取招聘信

8 Aug 16, 2021

Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息，以及城市信息和资料**/ translatio

1 Jan 06, 2022

基于Github Action的定时HITsz疫情上报脚本，开箱即用

HITsz Daily Report 基于 GitHub Actions 的「HITsz 疫情系统」访问入口定时自动上报脚本，开箱即用。感谢 @JellyBeanXiewh 提供原始脚本和 idea。感谢 @bugstop 对脚本进行重构并新增 Easy Connect 校内代理访问。

56 Nov 27, 2022

A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

151 Dec 23, 2022

A simple app to scrap data from Twitter.

Twitter-Scraping-App A simple app to scrap data from Twitter. Available Features Search query. Select number of data you want to fetch from twitter. C

2 Oct 31, 2022

哔哩哔哩爬取器：以个人为中心

Open Bilibili Crawer 哔哩哔哩是一个信息非常丰富的社交平台，我们基于此构造社交网络。在该网络中，节点包括用户（up主），以及视频、专栏等创作产物；关系包括：用户之间，包括关注关系（following/follower），回复关系（评论区），转发关系（对视频or动态转发）；用户对创

3 Oct 21, 2021

for those who dont want to pay $10/month for high school game footage with ads

nfhs-scraper Disclaimer: I am in no way responsible for what you choose to do with this script and guide. I do not endorse avoiding paywalls or any il

5 Apr 12, 2022

WebScrapping Project - G1 Latest News

Web Scrapping com Python Esse projeto consiste em um código para o usuário buscar as últimas nóticias sobre um termo qualquer, no site G1. Para esse p

2 Feb 13, 2022

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Repositório contendo scripts Python que realizam a consulta de CPF e CNPJ diretamente no site da Receita Federal.

5 Nov 29, 2021

Scraping news from Ucsal portal with Scrapy.

NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

0 Sep 30, 2021

Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

45.5k Jan 07, 2023

Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

2.3k Jan 04, 2023

Script used to download data for stocks.

This script is useful for downloading stock market data for a wide range of companies specified by their respective tickers. The script reads in the d

71 Oct 04, 2022