Generate a repository with mirror links for DriveDroid app

Last update: Nov 19, 2022

Overview

DriveDroid Repository Generator

Generate a repository for the app that allow boot a PC using ISO files stored on your Android phone

Check also an official scraper written in JavaScript

Try Already Built Repo

Add the next link to image repositories in DriveDroid app:

https://dd.hexed.pw

https://raw.githubusercontent.com/flameshikari/ddrg/master/repo/repo.json

Requirements
Usage
How to Make a Scraper
Misc
Roadmap
Credits
License

Requirements

Python 3.6+ with packages included in requirements.txt.

I recommend to create a venv then install packages there.

Usage

python ./src/main.py [-i dir] [-o dir] [-g]

-i dir where dir is a directory with distro scrapers (./src/distros is default).

-o dir where dir is a directory where the built repo will be saved (./build is default).

-g will generate a webpage to present the content of repo.json.

-h option is available anyway.

How to Make a Scraper

Create a folder in ./src/distros with next structure:

distro_name
├── info.toml
├── logo.png
└── scraper.py

If distro_name starts with underscore (e.g. _disabled), it will not be counted.

Let's take a look for every file.

`info.toml`

info.toml contains a distro name and a link to the official website. Arch Linux info.toml example:

name = "Arch Linux" # name of distro
url  = "https://example.com" # official site

If info.toml is missing or values ain't provided, fallback values will be used. Arch Linux fallback values will be next:

name = "arch" # distro folder name as value, also used in url
url  = "https://distrowatch.com/table.php?distribution=arch"

`logo.png`

Should be 128x128px with transparent background. Arch Linux logo.png example:

If logo.png is missing, the fallback logo will be used:

`scraper.py`

A scraper can be written as you like, as long as it returns the desired values.

It must return an array of tuples (every tuple contains iso_url, iso_arch, iso_size, iso_version in order).

Arch Linux scraper returns next values:

[
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.05.01/archlinux-2021.05.01-x86_64.iso',
    'x86_64',
    792014848,
    '2021.05.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.06.01/archlinux-2021.06.01-x86_64.iso',
    'x86_64',
    811937792,
    '2021.06.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.07.01/archlinux-2021.07.01-x86_64.iso',
    'x86_64',
    817180672,
    '2021.07.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/archboot/2020.07/archlinux-2020.07-1-archboot-network.iso',
    'x86_64',
    516947968,
    '2020.07'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/archboot/2020.07/archlinux-2020.07-1-archboot.iso',
    'x86_64',
    1280491520,
    '2020.07'
  )
]

A scraper includes from public import * in top which imports next stuff to the namespace:

bs (short for BeautifulSoup)
json
re
requests

Also it includes these functions:

get_afh_url(iso_url) — returns a download link for the file from AndroidFileHost
iso_url must be like this: https://androidfilehost.com/?fid=8889791610682936459
get_iso_arch(iso_url) — returns the used processor architecture of iso_url
get_iso_size(iso_url) — returns the file size of iso_url in bytes

Arch Linux scraper.py example:

from public import *  # noqa


def init():

    array = []
    base_urls = [
        "https://mirror.yandex.ru/archlinux/iso/latest",
        "https://mirror.yandex.ru/archlinux/iso/archboot/latest"
    ]

    for base_url in base_urls:

        html = bs(requests.get(base_url).text, "html.parser")

        for filename in html.find_all("a", {"href": re.compile("^.*\.iso$")}):

            iso_url = f"{base_url}/{filename['href']}"
            iso_arch = get_iso_arch(iso_url)
            iso_size = get_iso_size(iso_url)
            iso_version = re.search(r"-(\d+.\d+(.\d+)?)", iso_url).group(1)

            array.append((iso_url, iso_arch, iso_size, iso_version))

    return array

Misc

Here's a snippet for nginx if you decided to self host the repository with website and you wanna access repo.json only by hostname via DriveDroid. Place it in server section of your config:

location = / {
  if ($http_user_agent ~* 'okhttp') {
    rewrite ^/(.*)$ /repo.json break;
  }
}

Roadmap

Option to generate a webpage
Add a mechanism to retry scraping if a network error occurs
Option to select mirrors (mainly uses mirrors based in Russia)
Package this project perhaps
Probably make the code better

Credits

afh-dl by kade-robertson
Yandex.Disk direct links by DokPub

License

MIT License

Generate a repository with mirror links for DriveDroid app

Related tags

Overview

DriveDroid Repository Generator

Try Already Built Repo

Contents

Requirements

Usage

How to Make a Scraper

`info.toml`

`logo.png`

`scraper.py`

Misc

Roadmap

Credits

License

Owner

Evgeny

Web and PDF Scraper Refactoring

Web Scraping OLX with Python and Bsoup.

Crawl the information of a given keyword on Google search engine

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

Script used to download data for stocks.

🐞 Douban Movie / Douban Book Scarpy

a way to scrape a database of all of the isef projects

Download images from forum threads

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

Visual scraping for Scrapy

Screen scraping and web crawling framework

Amazon web scraping using Scrapy Framework

Pelican plugin that adds site search capability

Web Scraping Framework

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye

NASA APOD Discord Bot - Fetches information from NASA APOD site.

This is a sport analytics project that combines the knowledge of OOP and Webscraping

Web scrapping

A scrapy pipeline that provides an easy way to store files and images using various folder structures.