This is a webscraper for a specific website

Last update: Dec 13, 2021

Overview

Web-Scraper-for-a-news-website

This is a webscraper for a specific website (Economic Times). It is tuned to extract the headlines of that website. With some little adjustments the webscraper is able to extract any part of the website.

Installation

Install the following:

Selenium: Please follow the link https://selenium-python.readthedocs.io/installation.html and install the selenium.
Chromedriver: Check your Chrome browser's version (Menu -> Help -> About Google Chrome) and download the relevant Chromedriver from https://sites.google.com/chromium.org/driver/home
TQDM: https://pypi.org/project/tqdm/
BeautifulSoup4: https://pypi.org/project/beautifulsoup4/

Using the webscraper

It is important to take care of the sequence of executing these files. Please follow the sequence below:

ET_Archive_Links.py: Use this website as it is the source of everything that we'll do later. This scripy gives us the initial links in the Archive page of the website.
ET_All_Links_Inside_Archive.py: This is the script that takes the output (csv file) of the previous script. It produces a new file which contain URLs of all the archived news on the website since 2002.
ET_Content.py: Finally, this is the script that scrapes the headlines along with the dates. ( If you want to scrap any other part of the website then this is the script that you have to edit )

Dataset

I used the scraper on another news website named "Businessline". It's dataset is available on Kaggle(https://www.kaggle.com/rsiyanwal/20182019-businessline-headlines).

This is a webscraper for a specific website

Related tags

Overview

Web-Scraper-for-a-news-website

Installation

Using the webscraper

Dataset

Owner

Rahul Siyanwal

This is a webscraper for a specific website

This code will be able to scrape movies from a movie website and also provide download links to newly uploaded movies.

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Example of scraping a paginated API endpoint and dumping the data into a DB

Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

IGLS - Instagram Like Scraper CLI tool

This is python to scrape overview and reviews of companies from Glassdoor.

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

基于Github Action的定时HITsz疫情上报脚本，开箱即用

Web Scraping Framework

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

河南工业大学完美校园自动校外打卡

Subscrape - A Python scraper for substrate chains

Linkedin webscraping - Linkedin web scraping with python

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Facebook Group Scraping Using Beautiful Soup & Selenium

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This is a webscraper for a specific website

Related tags

Overview

Web-Scraper-for-a-news-website

Installation

Using the webscraper

Dataset

Owner

Rahul Siyanwal

This is a webscraper for a specific website

This code will be able to scrape movies from a movie website and also provide download links to newly uploaded movies.

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Example of scraping a paginated API endpoint and dumping the data into a DB

Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤（从2月份稳定运行至今）

IGLS - Instagram Like Scraper CLI tool

This is python to scrape overview and reviews of companies from Glassdoor.

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

基于Github Action的定时HITsz疫情上报脚本，开箱即用

Web Scraping Framework

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

河南工业大学 完美校园 自动校外打卡

Subscrape - A Python scraper for substrate chains

Linkedin webscraping - Linkedin web scraping with python

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Facebook Group Scraping Using Beautiful Soup & Selenium

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

河南工业大学完美校园自动校外打卡