Find exposed data in Azure with this public blob scanner

Last update: Jan 03, 2023

Related tags

Overview

BlobHunter

A tool for scanning Azure blob storage accounts for publicly opened blobs.
BlobHunter is a part of "Hunting Azure Blobs Exposes Millions of Sensitive Files" research:
https://www.cyberark.com/resources/threat-research-blog/hunting-azure-blobs-exposes-millions-of-sensitive-files

Overview

BlobHunter helps you identify Azure blob storage containers which store files that are publicly opened to everyone over the internet.
It can help you check for poorly configured containers storing sensitive data.
This can be helpful on large Azure subscriptions where there are lots of storage accounts that could be hard to track.
BlobHunter produces an informative csv result file with important details on each publicly opened container in the scanned environment.

Requirements

Python 3.5+
Azure CLI
requirements.txt packages

Azure user with one of the following built-in roles:

Or any Azure user with a role that allows to perform the following Azure actions:

Microsoft.Resources/subscriptions/read
Microsoft.Resources/subscriptions/resourceGroups/read
Microsoft.Storage/storageAccounts/read
Microsoft.Storage/storageAccounts/listkeys/action
Microsoft.Storage/storageAccounts/blobServices/containers/read
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read

Build

Example for installation on Ubuntu:

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

pip3 install -r requirements.txt

Usage

Simply run

python3 BlobHunter.py

If you are not logged in in the Azure CLI, a browser window will be prompted at you for inserting your Azure user credentials.

Demo

References

For any question or feedback, please contact Daniel Niv, Asaf Hecht and CyberArk Labs. This project is not accepting contributions at this time.

Find exposed data in Azure with this public blob scanner

Related tags

Overview

BlobHunter

Overview

Requirements

Build

Example for installation on Ubuntu:

Usage

Demo

References

License

Owner

CyberArk

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

Stochastic Gradient Trees implementation in Python

Python reader for Linked Data in HDF5 files

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

This is a repo documenting the best practices in PySpark.

Find exposed data in Azure with this public blob scanner

Functional tensors for probabilistic programming

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

Manage large and heterogeneous data spaces on the file system.

This python script allows you to manipulate the audience data from Sl.ido surveys

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)

A variant of LinUCB bandit algorithm with local differential privacy guarantee

Python implementation of Principal Component Analysis

Data imputations library to preprocess datasets with missing data

Pip install minimal-pandas-api-for-polars

High Dimensional Portfolio Selection with Cardinality Constraints

VevestaX is an open source Python package for ML Engineers and Data Scientists.

A neural-based binary analysis tool