Confirm that files have been uploaded to Backblaze Cloud Backup successfully

Overview

Backblaze Backup Checker

This Python script compares metadata captured from files within source folders against data parsed from Backblaze Cloud Backup's log files (bz_done files, stored locally), reporting any files that do not appear to have been uploaded or that have conflicting cryptographic hash values, as well as files that would not be uploaded due to filter/exclusion rules.

The primary use case is to provide confidence that files have been backed up with expected integrity, or if not, what filter/exclusion rules may be blocking the backup.

Due to limited availability of test data, this script is currently in alpha state. It is likely that false positives will be reported (see Known Issues below).

Prerequisites

Python 3.7 or later is required, with the tqdm progress bar module installed (pip install tqdm).

This script has been tested using Backblaze client version 7.0.2.470 on macOS 11.5, and client version 8.0.0.517 on Windows 10 20H2.

This script is only compatible with 'Version 5' log records (the standard used by Backblaze clients since October 2014). Any log records that are in an older format will not be processed.

Script behaviour

Backblaze configuration and log data, as well as files in source folders, are opened in read-only mode and will not be modified.

For each source folder specified by the user, file path and size metadata will be only captured for those files that 'pass' the filter/exclusion rules and have not been created or modified in the last 24 hours (to allow the Backblaze client time to upload them). Filter/exclusion rules will be read automatically from Backblaze configuration data; both Backblaze-default and user-configured rules will be processed.

Under default script configuration, SHA1 hash metadata will also be generated for files that are <= 100MB in size, in order to confirm file integrity. (Files > 100MB in size are split in the Backblaze logs; the original hash value is not retained, and I am unaware of any technique to construct the true SHA1 hash from the available split data.)

Symlinks will be ignored during processing as these are not backed up by Backblaze. Mac hidden files .DS_Store and .localized will also be ignored.

Progress is reported within the console (and if specified, to log files). At script conclusion, up to four text files will be created in either the current working folder or a user-specified output folder, with filenames and comma-separated contents as follows:

  1. [datetime]_backblaze_uploaded_files.txt: list of files that are present and correct in Backblaze log data, indicating successful upload.
  2. [datetime]_backblaze_mismatch_files.txt: list of files whose path is either not present in Backblaze log data, or the path is present but SHA1/size metadata does not match between the local file and Backblaze log data. Files listed in this output may not have been uploaded to Backblaze successfully - but be aware that there are likely to be false positives (see Known Issues below).
  3. [datetime]_backblaze_excluded_by_filter_files.txt: list of files that have not been uploaded to Backblaze due to filter/exclusion rules, with details of which rules the files are 'failing' on.
  4. [datetime]_backblaze_recent_files_not_processed.txt: list of files created or modified in the last 24 hours are listed in this output and not processed during script execution (as Backblaze may not have had time to upload them yet).

Recommended usage

It is recommended that:

  1. Specific folders containing user data are processed rather than full operating system drives. This is because various permissions errors and temporary files will be encountered if an entire operating system drive is processed, which will complicate the output (although the script should still run).
  2. The script is run only after sufficient time has been allowed for files to have been uploaded to Backblaze. Files created or updated within the last 24 hours will be identified and not processed, but this approach is not robust (further script development to account for data within bz_todo log files would enable accurate filtering of files that are still being uploaded).

Script usage

Syntax:

python3 bbcheck.py source_path [source_path ...] [flags]

Usage example (Windows) with two source folders:

python3 bbcheck.py C:\Users\john\Documents "E:\data folder"

Usage example (Mac) with one source folder:

python3 bbcheck.py /Users/john/Documents

Absolute or relative paths may be provided; all metadata generated will revert to absolute paths.

Flags can be viewed using: python3 bbcheck.py --help, and are as follows:

  • -l or --log: opt to write the status messages displayed in the console to a log file in the log folder.
  • -d or --debug: display debug messages and write these to a dedicated debug log file in the log folder.
  • --logfolder [str]: folder to write logs to (if not specified, the default of bbcheck_logs will be used).
  • -o [str] or --output [str]: folder to write check results to (if not specified, results will be written to the current working folder).
  • -b [str] or --bzdata-folder [str]: by default, the script will attempt to read Backblaze configuration and log data from standard client install locations for Windows and Mac. An alternative path to the bzdata folder may be provided with this flag.
  • -s or --only-check-size: by default, SHA1 hash values will be generated for the integrity check for files <= 100MB in size. Hash data may take a long time to generate for large source folders; this flag sets the script to instead check integrity using file size metadata, which should execute quickly (but is not a true integrity check and is therefore less reliable).
  • --hash-files [str ... str]: instead of generating SHA1 hash data during script execution, hash file(s) created using the hash mode in Vericopy may be used as a lookup. This approach allows for quick successive executions of this script, and is reliable on condition that files within the source folders are not changed between script executions.

Usage example (Windows) incorporating flags:

python3 bbcheck.py C:\Users\john\Documents "E:\data folder" -l -o output_folder -s --hash-files e_drive_hashes.txt -b "C:\ProgramData\Backblaze Alternative Location\bzdata"

Known issues

  1. Hash/size mismatches may be reported for some files, where Backblaze logs reference an incorrect hash/size value for a file - but upon recovery of the file using the Backblaze web interface, the correct original file (with correct hash/size) will be downloaded. In testing this behaviour seems prevalent for certain file types, such as .doc files, but as yet I have not discovered the reason for this behaviour.
  2. Although symlinks are not processed during script execution, Mac alias links will be processed and raised (incorrectly) as missing files.
  3. Depending on upload speed and duration the client has been running since files are created or modified, it is possible that files may be reported as missing that are still uploading (further script development to account for data within bz_todo log files would mitigate this).

Privacy, log data, and uninstallation

This script runs entirely locally; neither Backblaze nor any other third party services are communicated with.

Script output is stored by default in the folder the script is executed in. If -l or -d is used to output logs to a file, these are stored by default in folder bbcheck_logs (created in the folder that the script is executed in). Debug logs may contain sensitive information, such as system details (including Python version and operating system), command line arguments used, and events occurring with data processed during script execution.

Full uninstallation can be achieved by:

  1. Deleting the script and any other downloaded files (e.g. the readme and license).
  2. Deleting script output and the logs folder (bbcheck_logs by default).
  3. If desired, removing the tqdm library and Python runtime.

Contributing

If you would like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Licensing

The code in this project is licensed under the MIT License.

You might also like...
💻  A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!
💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

LocalStack - A fully functional local AWS cloud stack LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Cur

Cloud-native, data onboarding architecture for the Google Cloud Public Datasets program
Cloud-native, data onboarding architecture for the Google Cloud Public Datasets program

Public Datasets Pipelines Cloud-native, data pipeline architecture for onboarding datasets to the Google Cloud Public Datasets Program. Overview Requi

Prisma Cloud utility scripts, and a Python SDK for Prisma Cloud APIs.

pcs-toolbox Prisma Cloud utility scripts, and a Python SDK for Prisma Cloud APIs. Table of Contents Support Setup Configuration Script Usage CSPM Scri

Python client for using Prefect Cloud with Saturn Cloud

prefect-saturn prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tu

:electric_plug: Generating short urls with python has never been easier
:electric_plug: Generating short urls with python has never been easier

pyshorteners A simple URL shortening API wrapper Python library. Installing pip install pyshorteners Documentation https://pyshorteners.readthedocs.i

API to retrieve the number of grades on the OGE website (Website listing the grades of students) to know if a new grade is available. If a new grade has been entered, the program sends a notification e-mail with the subject.
API to retrieve the number of grades on the OGE website (Website listing the grades of students) to know if a new grade is available. If a new grade has been entered, the program sends a notification e-mail with the subject.

OGE-ESIREM-API Introduction API to retrieve the number of grades on the OGE website (Website listing the grades of students) to know if a new grade is

A Bot to Upload files to Many Cloud services. Powered by Telethon.
A Bot to Upload files to Many Cloud services. Powered by Telethon.

oVo MultiUpload V1.0 👀 A Bot to Upload files to Many Cloud services. Powered by Telethon _ 🎯 Follow me and star this repo for more telegram bots. @H

Bot simply search for the files from provided channel according to given query and gives link to those files as buttons!

Auto Filter Bot ㅤㅤㅤㅤㅤㅤㅤ ㅤㅤㅤㅤㅤㅤㅤ You can call this as an Auto Filter Bot if you like :D Bot simply search for the files from provided channel according

Aggrokatz is an aggressor plugin extension for Cobalt Strike which enables pypykatz to interface with the beacons remotely and allows it to parse LSASS dump files and registry hive files to extract credentials and other secrets stored without downloading the file and without uploading any suspicious code to the beacon.
Comments
  • _csv.Error: line contains NUL

    _csv.Error: line contains NUL

    I tried to run it but ran into an error (see below).

    My Environment:

    • Windows 10 Version 20H2 (OS Build 19042.1237)
    • Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] on win32
    • Backblaze log data location: C:\ProgramData\Backblaze\bzdata

    Output: python3 bbcheck.py -d E:\

    2021-10-02 21:39:15,419 - DEBUG - python_version: 3.9.7
    2021-10-02 21:39:15,444 - DEBUG - system: Windows
    2021-10-02 21:39:15,444 - DEBUG - machine: AMD64
    2021-10-02 21:39:15,457 - DEBUG - platform: Windows-10-10.0.19042-SP0
    2021-10-02 21:39:15,458 - DEBUG - version: 10.0.19042
    2021-10-02 21:39:15,459 - DEBUG - mac_ver: ('', ('', '', ''), '')
    2021-10-02 21:39:15,460 - DEBUG - commandline_args: {'source_paths': ['E:\\'], 'log': False, 'debug': True, 'logfolder': 'bbcheck_logs', 'output': None, 'bzdata_folder': None, 'only_check_size': False, 'hash_files': None}
    2021-10-02 21:39:15,460 - INFO - Logs will be stored in folder 'bbcheck_logs'
    2021-10-02 21:39:15,461 - INFO - Source path(s) 'E:\' will be compared against Backblaze log data in 'C:\ProgramData\Backblaze\bzdata'
    2021-10-02 21:39:15,467 - INFO - Findings will be written to current working folder
    Traceback (most recent call last):
      File "D:\Downloads\bbz\bbcheck.py", line 1073, in <module>
        main()
      File "D:\Downloads\bbz\bbcheck.py", line 1050, in main
        check_backup(
      File "D:\Downloads\bbz\bbcheck.py", line 753, in check_backup
        bzdone_metadata = parse_bz_done_files(bzdone_file_paths, only_check_size_flag)
      File "D:\Downloads\bbz\bbcheck.py", line 483, in parse_bz_done_files
        for row in reader:
    _csv.Error: line contains NUL
    

    *EDIT: Just re-ran script with '-d' option and replaced output with the new output.

    opened by jeffkujath 7
  • XML parsing error

    XML parsing error

    Hi, first of all - thanks for making this nice script!

    I run into XML parsing error:

    INFO - Source path(s) '/Users/msoida/Downloads' will be compared against Backblaze log data in '/Library/Backblaze.bzpkg/bzdata'
    INFO - Findings will be written to current working folder
    
    Traceback (most recent call last):
      File "/Users/msoida/Downloads/backblaze-backup-checker-0.1.1/bbcheck.py", line 1102, in <module>
        main()
      File "/Users/msoida/Downloads/backblaze-backup-checker-0.1.1/bbcheck.py", line 1085, in main
        datetime_string=datetime_string,
      File "/Users/msoida/Downloads/backblaze-backup-checker-0.1.1/bbcheck.py", line 759, in check_backup
        exclude_rules = get_excludes(bzdata_folder_path)
      File "/Users/msoida/Downloads/backblaze-backup-checker-0.1.1/bbcheck.py", line 586, in get_excludes
        root = xml.etree.ElementTree.parse(exclude_file_path).getroot()
      File "/Users/msoida/.pyenv/versions/3.7.3/lib/python3.7/xml/etree/ElementTree.py", line 1197, in parse
        tree.parse(source, parser)
      File "/Users/msoida/.pyenv/versions/3.7.3/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse
        self._root = parser._parse_whole(source)
    xml.etree.ElementTree.ParseError: XML or text declaration not at start of entity: line 2, column 0
    

    After some debugging, I found the problem to be in /Library/Backblaze.bzpkg/bzdata/bzexcluderules_mandatory.xml - this file had an empty line at the start. After removing this empty line, the script works great.

    Now, I have no idea why this line was there. Since I have BackBlaze running for many years now, it's quite possible this happend due to some automatic file structure update many years ago. Whatever the reason is, it seems the BB client does not mind this extra line, and the Python XML parser does not like it.

    While I expect that such problems are probably not too common, it would be nice to defend the rest of this awesome script against them ;)

    PS: I'm using MacOS 10.15.7/Python 3.7.3, but I doubt it's relevant in this case.

    opened by msoida 2
Releases(v0.1.2)
  • v0.1.2(Oct 4, 2021)

  • v0.1.1(Oct 2, 2021)

  • v0.1(Oct 1, 2021)

    Initial release - due to limited availability of test data, this script is currently in alpha state. It is likely that false positives will be reported (see Known Issues in README).

    Source code(tar.gz)
    Source code(zip)
Just another Shiny and Greninja-ash killing preventor for Myuu

Myuu-Anti-Shiny-Discord-Bot Why I made it? Since, I was legit fed up of NebbyBot's lag (not criticising it), I decided to make my own but in python an

5 Nov 12, 2022
Simple Reddit bot that replies to comments containing a certain word.

reddit-replier-bot Small comment reply bot based on PRAW. This script will scan the comments of a subreddit as they come in and look for a trigger wor

Kefendy 0 Jun 04, 2022
A wrapper for the Discord Python Pixels API.

DPYPX A simple wrapper around Python Discord Pixels. Requires Python 3.7+ (3.x where x = 7). Requires pillow and aiohttp from pip. Example import dpy

Artemis 3 Oct 01, 2022
A Discord Self bot written in python

WitheredBot A Discord Self bot written in python Requirement Python = 3.9 How to Configure git clone https://github.com/a-a-a-aa/WitheredBot.git cd W

......... 0 Jan 05, 2023
A simple python oriented telegram bot to give out creative font style's

Font-Bot A simple python oriented telegram bot to give out creative font style's REQUIREMENTS tgcrypto pyrogram==1.2.9 Installation Fork this reposito

BL4CK H47 4 Jan 30, 2022
A quick-and-dirty script to scrape the daily menu of Leipzig University Mensa and send it to a telegram channel.

Feed me Mensa UL A quick-and-dirty script to scrape the daily menu of Leipzig University Mensa and send it to a telegram channel. For food and cat lov

3 Apr 08, 2022
A telegram bot to track whales activities on multiple blockchains.

Telegram Bot : Whale Watcher A straightforward telegram bot written in python to track whales activity on multiple blockchains, using whale-alert API

Laurenz Bougan 1 Dec 10, 2021
Python wrapper to simplify calls to AncestryDNA API.

AncestryDNA API wrapper Ancestry exposes an undocumented REST API for its DNA features. This Python wrapper inventories the available calls, and expos

Matt 2 Jun 10, 2022
A google search telegram bot.

Google-Search-Bot A google search telegram bot. Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.c

Fayas Noushad 37 Nov 24, 2022
GTK3-based panel for sway window manager

nwg-panel I have been using sway since 2019 and find it the most comfortable working environment, but... Have you ever missed all the graphical bells

Piotr Miller 290 Jan 07, 2023
This repository will (hopefully) always contain the latest version of the libProfessorP.asm.so shared object.

libPuhfessorP - Deploy Repo This repo should (hopefully) always contain the latest version of the libPuhfessorP.asm.so shared object, to be linked wit

Puhfessor P - CPSC 240 3 Sep 30, 2021
Due to changes to the discord API and discord.py being discontinued

Talia Due to changes to the discord API and discord.py being discontinued, Talia development has been halted permanently A customizable economy discor

2 Mar 08, 2022
Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

⭐️ Rocks VC Userbot ⭐️ Telegram Userbot To Play Audio And Video Song On VC Chat

Dr Asad Ali 10 Jul 18, 2022
Exchange indicators & Basic functions for Binance API.

binance-ema Exchange indicators & Basic functions for Binance API. This python library has been written to calculate SMA, EMA, MACD etc. functions wit

Emre MENTEŞE 24 Jan 06, 2023
Automatically load stolen cookies from ChromePass

AutoCookie - Automatically loading stolen cookies from ChromePass View Demo · Report Bug · Request Feature Table of Contents About the Project Getting

darkArp 21 Oct 11, 2022
twitter bot tha uses tweepy library class to connect to TWITTER API

TWITTER-BOT-tweepy- twitter bot that uses tweepy library class to connect to TWITTER API replies to mentions automatically and follows the tweet.autho

Muziwandile Nkomo 2 Jan 08, 2022
A simple Discord bot wrote with Python. Kizmeow let you track your NFT project and display some useful information

Kizmeow-OpenSea-and-Etherscan-Discord-Bot 中文版 | English Ver A Discord bot wrote with Python. Kizmeow let you track your NFT project and display some u

Xeift 93 Dec 31, 2022
With Google Drive API. My computer and my phone are in love now.

Channel trought Google Drive Google Drive API In this case, "Google Drive App" is the program. To install everything you need(has some extra things),

Luis Quiñones Requelme 1 Dec 15, 2021
Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail especificado.

getWIFIConnection Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail e

Vinícius Azevedo 3 Nov 27, 2022
Detects members having unicode names. Public bot: @scarletwitchprobot

✨ Scarletwitch bot ✨ Detects unicode names members in a tg chat & provides a option to take action on that user ! Public bot: @scarletwitchprobot Supp

ÁÑÑÍHÌLÅTØR SPÄRK 18 Nov 12, 2022