Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Automatically block traffic on Cloudflare's side based on Nginx Log parsing.

AutoRL This is a PoC of automatically block traffic on Cloudflare's side based on Nginx Log parsing. It will evaluate Nginx access.log and find potent

Nova Kwok 62 Dec 28, 2022
ARTEMIS: Real-Time Detection and Automatic Mitigation for BGP Prefix Hijacking.

ARTEMIS: Real-Time Detection and Automatic Mitigation for BGP Prefix Hijacking. This is the main ARTEMIS repository that composes artemis-frontend, artemis-backend, artemis-monitor and other needed c

INSPIRE Group @FORTH-ICS 273 Jan 01, 2023
Simple threaded Python Rickroll server. Listens on port 23 by default.

Terminal Rickroll Simple threaded Python Rickroll server. Listens on port 23 by default. Rickroll video made using Video-To-Ascii and the standard ric

AG 10 Sep 13, 2022
Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Integrating Oxylabs' Residential Proxies with AIOHTTP Requirements for the Integration For the integration to work you'll need to install aiohttp libr

Oxylabs.io 6 Sep 14, 2022
Wallc - Calculate the layout on the wall to hang up pictures

wallc Calculate the layout on the wall to hang up pictures. Installation pip install git+https://github.com/trbznk/wallc.git Getting Started Currently

Alex Trbznk 68 Sep 09, 2022
CSP-style concurrency for Python

aiochan Aiochan is a library written to bring the wonderful idiom of CSP-style concurrency to python. The implementation is based on the battle-tested

Ziyang Hu 127 Dec 23, 2022
User-friendly packet captures

capture-packets: User-friendly packet captures Please read before using All network traffic occurring on your machine is captured (unless you specify

Seth Michael Larson 2 Feb 05, 2022
Simple Python Script to Parse Apache Log, Get all Unique IPs and Urls visited by that IP

Parse_Apache_Log Simple Python Script to Parse Apache Log, Get all Unique IPs and Urls visited by that IP. It will create 3 different files. allIP.txt

Kathan Patel 2 Mar 29, 2022
A Python3 discord trojan, utilizing discord webhooks for sending information.

Vape-Lite-RAT A Python3 discord trojan, utilizing discord webhooks for sending information. What you do with this code / project / idea is non of my b

NightTab 12 Oct 15, 2022
GNS3 Graphical Network Simulator

GNS3-gui GNS3 GUI repository.

GNS3 1.7k Dec 29, 2022
Load balancing DICOM router

dicom-loadbalancer Load balancing DICOM router (WORK IN PROGRESS) The DICOM loadbalancer provides functionality for acting as any number of DICOM SCPs

Søren Boll Overgaard 1 Jan 15, 2022
Dokumentasi belajar Network automation

Repositori belajar network automation dengan Docker, Python & GNS3 Using Frameworks and integrate with: Paramiko Netmiko Telnetlib CSV SFTP Netmiko, S

Daniel.Pepuho 3 Mar 15, 2022
Extended refactoring capabilities for Python LSP Server using Rope.

pylsp-rope Extended refactoring capabilities for Python LSP Server using Rope. This is a plugin for Python LSP Server, so you also need to have it ins

36 Dec 24, 2022
Socialhome is best described as a federated personal profile with social networking functionality

Description Socialhome is best described as a federated personal profile with social networking functionality. Users can create rich content using Mar

Jason Robinson 332 Dec 30, 2022
A socket script to obtain chinese phones-sequence for any english word

Foreign Pronunciation Generator (English-Chinese) We provide a simple socket script for acquiring Chinese pronunciation of English words (phones in ai

Ephemeroptera 5 Jul 25, 2022
Ip-Seeker - See Details With Public Ip && Find Web Ip Addresses

IP SEEKER See Details With Public Ip && Find Web Ip Addresses Tool By Heshan

M.D.Heshan Sankalpa 1 Jan 02, 2022
IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

Paul 0 Jun 11, 2022
BibleNotifyDesktop - Desktop version of Bible Notify

Bible Notify Desktop This is the repository for the Desktop version of the daily

Bible Notify 5 Nov 16, 2022
Typhon is a macOS specific payload aimed at targetting Jamf managed devices.

Typhon is a macOS specific payload aimed at targetting Jamf managed devices. This payload can be used to manipulate macOS devices into communicating with a Mythic instance, which acts as a Jamf serve

Mythic Agents 29 Dec 23, 2022
Network Engineer's Unified Realtime Automation Library

NEURAL is the premiere CLI jockey replacement full stack web/app/database network automation application, providing a "no-code" web app for network engineers developed by a network engineer!

Brett M Spunt 3 Aug 15, 2022