Clustering is a popular approach to detect patterns in unlabeled data

Overview

Visual Clustering

Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar points. Visual Clustering a different way of clustering points in 2-dimensional space, inspired by how humans "visually" cluster data. The algorithm is based on trained neural networks that perform instance segmentation on plotted data.

For more details, see the accompanying paper: "Clustering Plotted Data by Image Segmentation", arXiv preprint, and please use the citation below.

@article{naous2021clustering,
  title={Clustering Plotted Data by Image Segmentation},
  author={Naous, Tarek and Sarkar, Srinjay and Abid, Abubakar and Zou, James},
  journal={arXiv preprint arXiv:2110.05187},
  year={2021}
}

Installation

pip install visual-clustering

Usage

The algorithm can be used the same way as the classical clustering algorithms in scikit-learn:
You first import the class VisualClustering and create an instance of it.

from visual_clustering import VisualClustering

model = VisualClustering(median_filter_size = 1, max_filter_size= 1)

The parameters median_filter_size and max_filter_size are set to 1 by default.
You can experiment with different values to see what works best for your dataset !

Let's create a simple synthetic dataset of blobs.

from sklearn import datasets

data = datasets.make_blobs(n_samples=50000, centers=6, random_state=23,center_box=(-30, 30))
plt.scatter(data[0][:, 0], data[0][:, 1], s=1, c='black')

blobs

To cluster the dataset, use the fit function of the model:

predictions = model.fit(data[0])

Visualizing the results

You can visualize the results using matplotlib as you would normally do with classical clustering algorithms:

import matplotlib.pyplot as plt
from itertools import cycle, islice
import numpy as np

colors = np.array(list(islice(cycle(["#000000", '#377eb8', '#ff7f00', '#4daf4a', '#f781bf', '#a65628', '#984ea3']), int(max(predictions) + 1))))
#Black color for outliers (if any)
colors = np.append(colors, ["#000000"])
plt.scatter(data[0][:, 0], data[0][:, 1], s=10, color=colors[predictions.astype('int8')])

clustered_blobs

Run this code inside a colab notebook:
https://colab.research.google.com/drive/1DcZXhKnUpz1GDoGaJmpS6VVNXVuaRmE5?usp=sharing

Dependencies

Make sure that you have the following libraries installed:

transformers 4.15.0
scipy 1.4.1
tensorflow 2.7.0
keras 2.7.0
numpy 1.19.5
cv2 4.1.2
skimage 0.18.3

Contact

Tarek Naous: Scholar | Github | Linkedin | Research Gate | Personal Wesbite | [email protected]

Owner
Tarek Naous
Tarek Naous
Natural Intelligence is still a pretty good idea.

Human Learn Machine Learning models should play by the rules, literally. Project Goal Back in the old days, it was common to write rule-based systems.

vincent d warmerdam 641 Dec 26, 2022
A check for whether the dependency jobs are all green.

alls-green A check for whether the dependency jobs are all green. Why? Do you have more than one job in your GitHub Actions CI/CD workflows setup? Do

Re:actors 33 Jan 03, 2023
Scheme for training and applying a label propagation framework

Factorisation-based Image Labelling Overview This is a scheme for training and applying the factorisation-based image labelling (FIL) framework. Some

Wellcome Centre for Human Neuroimaging 2 Dec 17, 2021
Libraries, tools and tasks created and used at DeepMind Robotics.

dm_robotics: Libraries, tools, and tasks created and used for Robotics research at DeepMind. Package overview Package Summary Transformations Rigid bo

DeepMind 273 Jan 06, 2023
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 08, 2022
Аналитика доходности инвестиционного портфеля в Тинькофф брокере

Аналитика доходности инвестиционного портфеля Тиньков Видео на YouTube Для работы скрипта нужно установить три переменных окружения: export TINKOFF_TO

Alexey Goloburdin 64 Dec 17, 2022
A Free and Open Source Python Library for Multiobjective Optimization

Platypus What is Platypus? Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs)

Project Platypus 424 Dec 18, 2022
Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

Tim Dettmers 586 Dec 24, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model a

Google 3.2k Dec 31, 2022
PuppetGAN - Cross-Domain Feature Disentanglement and Manipulation just got way better! 🚀

Better Cross-Domain Feature Disentanglement and Manipulation with Improved PuppetGAN Quite cool... Right? Introduction This repo contains a TensorFlow

Giorgos Karantonis 5 Aug 25, 2022
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

mandos 43 Dec 07, 2022
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac

12.9k Jan 09, 2023
CryptoFrog - My First Strategy for freqtrade

cryptofrog-strategies CryptoFrog - My First Strategy for freqtrade NB: (2021-04-20) You'll need the latest freqtrade develop branch otherwise you migh

Robert Davey 137 Jan 01, 2023
YOLOX Win10 Project

Introduction 这是一个用于Windows训练YOLOX的项目,相比于官方项目,做了一些适配和修改: 1、解决了Windows下import yolox失败,No such file or directory: 'xxx.xml'等路径问题 2、CUDA out of memory等显存不

5 Jun 08, 2022
百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

项目说明: 百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline 比赛链接:https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true 官方的baseline版本是基于paddlepadd

周俊贤 54 Nov 23, 2022
NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset

NOD (Night Object Detection) Dataset NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset, BM

Igor Morawski 17 Nov 05, 2022
Naszilla is a Python library for neural architecture search (NAS)

A repository to compare many popular NAS algorithms seamlessly across three popular benchmarks (NASBench 101, 201, and 301). You can implement your ow

270 Jan 03, 2023
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

72 Jan 03, 2023
OpenMMLab Text Detection, Recognition and Understanding Toolbox

Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi

OpenMMLab 3k Jan 07, 2023
GNN-based Recommendation Benchma

GRecX A Fair Benchmark for GNN-based Recommendation Preliminary Comparison DiffNet-Yelp dataset (featureless) Algo 73 Oct 17, 2022