A web crawler for recording posts in "sina weibo"

Last update: Aug 20, 2022

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description

search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Name	Description
`search.py`	Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) `search_string`: The string to search for. All posts containing this string will be recorded, 50 pages at most. `start_time`: Only posts which are posted after this time will be recorded. (Accurate to hour level) `end_time`: Only posts which are posted before this time will be recorded. (Accurate to hour level) `rest_time`: The interval between two requests, where the unit is second. Results are saved as Python pickle format at `results/weibo-{search_string}-{start_time}-{end_time}.pkl`. The `start_time` and `end_time` in filename are formatted as Unix timestamp (the unit is second).

Installation

Run pip install -r requirements.txt.
According to "Function" section, find the script you need.
Edit parameters at the head of the script.
Run the script with Python.

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

基于Github Action的定时HITsz疫情上报脚本，开箱即用

Scraping Top Repositories for Topics on GitHub,

Minimal set of tools to conduct stealthy scraping.

A dead simple crawler to get books information from Douban.

A web service for scanning media hosted by a Matrix media repository

Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

This is my CS 20 final assesment.

Scrape Twitter for Tweets

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

A tool to easily scrape youtube data using the Google API

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

A distributed crawler for weibo, building with celery and requests.

Deep Web Miner Python | Spyder Crawler

A crawler of doubamovie

Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Download images from forum threads