Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Overview


Release Website Documentation Discord


Vector AI is a framework designed to make the process of building production grade vector based applications as quickly and easily as possible. Create, store, manipulate, search and analyse vectors alongside json documents to power applications such as neural search, semantic search, personalised recommendations recommendations etc.


Features

  • Multimedia Data Vectorisation: Image2Vec, Audio2Vec, etc (Any data can be turned into vectors through machine learning)
  • Document Orientated Store: Store your vectors alongside documents without having to do a db lookup for metadata about the vectors.
  • Vector Similarity Search: Enable searching of vectors and rich multimedia with vector similarity search. The backbone of many popular A.I use cases like reverse image search, recommendations, personalisation, etc.
  • Hybrid Search: There are scenarios where vector search is not as effective as traditional search, e.g. searching for skus. Vector AI lets you combine vector search with all the features of traditional search such as filtering, fuzzy search, keyword matching to create an even more powerful search.
  • Multi-Model Weighted Search: Our Vector search is highly customisable and you can peform searches with multiple vectors from multiple models and give them different weightings.
  • Vector Operations: Flexible search with out of the box operations on vectors. e.g. mean, median, sum, etc.
  • Aggregation: All the traditional aggregation you'd expect. e.g. group by mean, pivot tables, etc
  • Clustering: Interpret your vectors and data by allocating them to buckets and get statistics about these different buckets based on data you provide.
  • Vector Analytics: Get better understanding of your vectors by using out-of-the-box practical vector analytics, giving you better understanding of the quality of your vectors.

Quick Terminologies

  • Models/Encoders (aka. Embedders) ~ Turns data into vectors e.g. Word2Vec turns words into vector
  • Vector Similarity Search (aka. Nearest Neighbor Search, Distance Search)
  • Collection (aka. Index, Table) ~ a collection is made up of multiple documents
  • Documents (aka. Json, Item, Dictionary, Row) ~ a document can contain vectors, text and links to videos/images/audio.

QuickStart

Install via pip! Compatible with any OS.

pip install vectorai

If you require the nightly version due to on-going improvements, you can install the nightly version using:

pip install vectorai-nightly

Note: while the nightly version will still pass automated tests, it may not be stable.

Check out our quickstart notebook on how to make a text/image/audio search engine in 5 minutes: quickstart.ipynb

from vectorai import ViClient, request_api_key

api_key = request_api_key(username=<username>, email=<email>, description=<description>, referral_code="github_referred")

vi_client = ViClient(username=username, api_key=api_key)

from vectorai.models.deployed import ViText2Vec
text_encoder = ViText2Vec(username, api_key)

documents = [
    {
        '_id': 0,
        'color': 'red'
    },
    {
        '_id': 1,
        'color': 'blue'
    }
]

# Insert the data
vi_client.insert_documents('test-collection', documents, models={'color': text_encoder.encode})

# Search the data
vi_client.search('test-collection', text_encoder.encode('maroon'), 'color_vector_', page_size=2)

# Get Recommendations
vi_client.search_by_id('test-collection', '1', 'color_vector_', page_size=2)

Access Powerful Vector Analytics

Vector AI has powerful visualisations to allow you to analyse your vectors as easily as possible - in 1 line of code.

vi_client.plot_dimensionality_reduced_vectors(documents, 
    point_label='title', 
    dim_reduction_field='_dr_ivis', 
    cluster_field='centroid_title', cluster_label='centroid_title')

View Dimensionality-Reduced Vectors

vi_client.plot_2d_cosine_similarity(
    documents,
    documents[0:2],
    vector_fields=['use_vector_'],
    label='name',
    anchor_document=documents[0]
)

Compare vectors and their search performance on your documents easily! 1D plot cosine simlarity


Why Vector AI compared to other Nearest Neighbor implementations?

  • Production Ready: Our API is fully managed and can scale to power hundreds of millions of searches a day. Even at millions of searches it is blazing fast through edge caching, GPU utilisation and software optimisation so you never have to worry about scaling your infrastructure as your use case scales.
  • Simple to use. Quick to get started.: One of our core design principles is that we focus on how people can get started on using Vector AI as quickly as possible, whilst ensuring there is still a tonne of functionality and customisability options.
  • Richer understanding of your vectors and their properties: Our library is designed to allow people to do more than just obtain nearest neighbors, but to actually experiment, analyse, interpret and improve on them the moment the data added to the index.
  • Store vector data with ease: The document-orientated nature for Vector AI allows users to label, filter search and understand their vectors as much as possible.
  • Real time access to data: Vector AI data is accessible in real time, as soon as the data is inserted it is searchable straight away. No need to wait hours to build an index.
  • Framework agnostic: We are never going to force a specific framework on Vector AI. If you have a framework of choice, you can use it - as long as your documents are JSON-serializable!

Using VectorHub Models

VectorHub is Vector AI's main model repository. Models from VectorHub are built with scikit-learn interfaces and all have examples of Vector AI integration. If you are looking to experiment with new off-the-shelf models, we recommend giving VectorHub models a go - all of them have been tested on Colab and are able to be used in as little as 3 lines of code!

Schema Rules for documents (BYO Vectors and IDs)

Ensure that any vector fields contain a '_vector_' in its name and that any ID fields have the name '_id'.

For example:

example_item = {
    '_id': 'James',
    'skills_vector_': [0.123, 0.456, 0.789, 0.987, 0.654, 0.321]
}

The following will not be recognised as ID columns or vector columns.

example_item = {
    'name_id': 'James',
    'skillsvector_': [0.123, 0.456, 0.789, 0.987, 0.654, 0.321]
}

How does this differ from the VectorAI API?

The Python SDK is designed to provide a way for Pythonistas to unlock the power of VectorAI in as few lines as code as possible. It exposes all the elements of an API through our open-sourced automation tool and is the main way our data scientists and engineers interact with the VectorAI engine for quick prototyping before developers utilise API requests.

Note: The VectorAI SDK is built on the development server which can sometimes cause errors. However, this is important to ensure that users are able to access the most cutting-edge features as required. If you run into such issues, we recommend creating a GitHub Issue if it is non-urgent, but feel free to ping the Discord channel for more urgent enquiries.


Building Products with Vector AI

Creating a multi-language AI fashion assistant: https://fashionfiesta.me | Blog

Demo

Do share with us any blogs or websites you create with Vector AI!

You might also like...
The end-to-end platform for building voice products at scale
The end-to-end platform for building voice products at scale

Picovoice Made in Vancouver, Canada by Picovoice Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Goog

Python library containing BART query generation and BERT-based Siamese models for neural retrieval.
Python library containing BART query generation and BERT-based Siamese models for neural retrieval.

Neural Retrieval Embedding-based Zero-shot Retrieval through Query Generation leverages query synthesis over large corpuses of unlabeled text (such as

Repo for CVPR2021 paper
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper: SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.
QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently bu

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding 📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

Generative Query Network (GQN) in PyTorch as described in
Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Update 2019/06/24: A model trained on 10% of the Shepard-Metzler dataset has been added, the following notebook explains the main features of this mod

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Vision-Language Transformer and Query Generation for Referring Segmentation Please consider citing our paper in your publications if the project helps

Comments
  • Accessing Discord

    Accessing Discord

    Hi Vector AI Team!

    I'm trying to access the Discord invite link mentioned in the readme: https://discord.gg/CbwUxyD But getting an "invalid invite link".

    I'm writing a new blog post covering the many neural search frameworks, in spirit of my blog post on Vector DBs: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

    If that's okay, I'd like to ask a couple of questions on the inner workings of the framework and some of its features.

    Thanks,

    Dmitry

    opened by DmitryKey 0
  • Same search results for searching very different images.

    Same search results for searching very different images.

    Using the unsplash-images collection: https://playground.getvectorai.com/collections/?collection=unsplash-images

    result for: vi_client.search_image('unsplash-images', image_url, ['image_url_vector_']) with image_url as: https://www.rover.com/blog/wp-content/uploads/2020/06/siberian-husky-4735878_1920.jpg https://davidkerrphotography.co.nz/wp-content/uploads/2016/10/Slide01.jpg

    identical result for both:

    {'count': 17506,
     'results': [{'_clusters_': {},
                  '_id': 'tLUgvVaCQnY',
                  '_search_score': 0.6311334,
                  'dictionary_label_1': 'wineglasses',
                  'dictionary_label_2': 'delftware',
                  'image_url': 'https://images.unsplash.com/photo-1540735242080-bc0ad0cdcd1e?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:08.205446',
                  'likes': 150005},
                 {'_clusters_': {},
                  '_id': 'wVMuNOSt5KY',
                  '_search_score': 0.6278121000000001,
                  'dictionary_label_2': 'bootstrapping',
                  'image_url': 'https://images.unsplash.com/photo-1556912743-90a361c19b16?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:08.018132',
                  'likes': 173693},
                 {'_clusters_': {},
                  '_id': 'kkBXGVE9k-8',
                  '_search_score': 0.626989,
                  'dictionary_label_1': 'occupant',
                  'dictionary_label_2': 'catabolized',
                  'image_url': 'https://images.unsplash.com/photo-1526529516337-f40ddc5532e2?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:08.129598',
                  'likes': 627490},
                 {'_clusters_': {},
                  '_id': 'pLshzlb5yOA',
                  '_search_score': 0.6268415,
                  'dictionary_label_2': 'wood',
                  'image_url': 'https://images.unsplash.com/photo-1582459208380-f99d357adf33?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:08.096761',
                  'likes': 173756},
                 {'_clusters_': {},
                  '_id': 'sHmW616civc',
                  '_search_score': 0.6268100999999999,
                  'dictionary_label_2': 'trail',
                  'image_url': 'https://images.unsplash.com/photo-1556674524-65bf99573bef?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:08.000302',
                  'likes': 682592},
                 {'_clusters_': {},
                  '_id': 'VoTqMJLLSI8',
                  '_search_score': 0.6235797000000001,
                  'dictionary_label_1': 'trays',
                  'dictionary_label_2': 'dishware',
                  'image_url': 'https://images.unsplash.com/photo-1569272559969-2a9275513966?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:08.202763',
                  'likes': 172006},
                 {'_clusters_': {},
                  '_id': 'XcWKh-GF69M',
                  '_search_score': 0.6210401999999999,
                  'dictionary_label_2': 'obliging',
                  'image_url': 'https://images.unsplash.com/photo-1581280227715-56d3062138a9?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:20.517206',
                  'likes': 678324},
                 {'_clusters_': {},
                  '_id': 'b2_pVdk4lGI',
                  '_search_score': 0.6187004,
                  'dictionary_label_2': 'jukebox',
                  'image_url': 'https://images.unsplash.com/photo-1568967906094-1d0acfbf0676?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:20.509971',
                  'likes': 138088},
                 {'_clusters_': {},
                  '_id': '22HltbHJbPI',
                  '_search_score': 0.6182232000000001,
                  'dictionary_label_1': 'shoreline',
                  'dictionary_label_2': 'buckeens',
                  'image_url': 'https://images.unsplash.com/photo-1541514467948-60ec8a24e84f?w=300&q=80',
                  'insert_date_': '2021-02-25T09:44:25.156647',
                  'likes': 758805},
                 {'_clusters_': {},
                  '_id': 'uM3pEsEkPHA',
                  '_search_score': 0.6179558,
                  'dictionary_label_2': 'dewclaw',
                  'image_url': 'https://images.unsplash.com/photo-1572725364984-c2a074c6740c?w=300&q=80',
                  'insert_date_': '2021-02-25T03:38:08.111128',
                  'likes': 655907}]}
    
    opened by elliotsayes 4
  • Bulid type-safe assertive decorator

    Bulid type-safe assertive decorator

    With Python's type-safety is difficult but it can be implemented through smart use of Python decorators. An interesting example can be seen below:

    import itertools as it
    
    @parametrized
    def types(f, *types):
        def rep(*args):
            for a, t, n in zip(args, types, it.count()):
                if type(a) is not t:
                    raise TypeError('Value %d has not type %s. %s instead' %
                        (n, t, type(a))
                    )
            return f(*args)
        return rep
    
    @types(str, int)  # arg1 is str, arg2 is int
    def string_multiply(text, times):
        return text * times
    
    print(string_multiply('hello', 3))    # Prints hellohellohello
    print(string_multiply(3, 3))          # Fails miserably with TypeError
    
    # From: https://stackoverflow.com/questions/5929107/decorators-with-parameters
    
    enhancement 
    opened by boba-and-beer 0
Releases(v0.2.5)
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

Teddy Koker 655 Jan 04, 2023
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

ENet in Caffe Execution times and hardware requirements Network 1024x512 1280x720 Parameters Model size (fp32) ENet 20.4 ms 32.9 ms 0.36 M 1.5 MB SegN

Timo Sämann 561 Jan 04, 2023
2021 CCF BDCI 全国信息检索挑战杯(CCIR-Cup)智能人机交互自然语言理解赛道第二名参赛解决方案

2021 CCF BDCI 全国信息检索挑战杯(CCIR-Cup) 智能人机交互自然语言理解赛道第二名解决方案 比赛网址: CCIR-Cup-智能人机交互自然语言理解 1.依赖环境: python==3.8 torch==1.7.1+cu110 numpy==1.19.2 transformers=

JinXiang 22 Oct 29, 2022
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Spotify 10.6k Jan 04, 2023
Galileo library for large scale graph training by JD

近年来,图计算在搜索、推荐和风控等场景中获得显著的效果,但也面临超大规模异构图训练,与现有的深度学习框架Tensorflow和PyTorch结合等难题。 Galileo(伽利略)是一个图深度学习框架,具备超大规模、易使用、易扩展、高性能、双后端等优点,旨在解决超大规模图算法在工业级场景的落地难题,提

JD Galileo Team 128 Nov 29, 2022
Markov Attention Models

Introduction This repo contains code for reproducing the results in the paper Graphical Models with Attention for Context-Specific Independence and an

Vicarious 0 Dec 09, 2021
A hyperparameter optimization framework

Optuna: A hyperparameter optimization framework Website | Docs | Install Guide | Tutorial Optuna is an automatic hyperparameter optimization software

7.4k Jan 04, 2023
This is the official github repository of the Met dataset

The Met dataset This is the official github repository of the Met dataset. The official webpage of the dataset can be found here. What is it? This cod

Nikolaos-Antonios Ypsilantis 35 Dec 17, 2022
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

Teli Ma 4 Jan 20, 2022
Reinforcement learning for self-driving in a 3D simulation

SelfDrive_AI Reinforcement learning for self-driving in a 3D simulation (Created using UNITY-3D) 1. Requirements for the SelfDrive_AI Gym You need Pyt

Surajit Saikia 17 Dec 14, 2021
This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.

ReceptiveFieldAnalysisToolbox This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures usin

84 Nov 23, 2022
An implementation of shampoo

shampoo.pytorch An implementation of shampoo, proposed in Shampoo : Preconditioned Stochastic Tensor Optimization by Vineet Gupta, Tomer Koren and Yor

Ryuichiro Hataya 69 Sep 10, 2022
Global-Local Context Network for Person Search

Global-Local Context Network for Person Search Abstract: Person search aims to jointly localize and identify a query person from natural, uncropped im

Peng Zheng 15 Oct 17, 2022
Semantic Edge Detection with Diverse Deep Supervision

Semantic Edge Detection with Diverse Deep Supervision This repository contains the code for our IJCV paper: "Semantic Edge Detection with Diverse Deep

Yun Liu 12 Dec 31, 2022
Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

smart_edu-autobooking Sistema di autoprenotazione per l'aula studio [email protected]

Davide Carnemolla 17 Jun 20, 2022
gACSON software for visualization, processing and analysis of three-dimensional electron microscopy images

gACSON gACSON software is to visualize, segment, and analyze the morphology of neurons in three-dimensional electron microscopy images. If you use any

Andrea Behanova 2 May 31, 2022
SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss This repository implements the SAFL in pytorch. Installation conda env create -f environm

6 Aug 24, 2022
Official Implementation of SWAD (NeurIPS 2021)

SWAD: Domain Generalization by Seeking Flat Minima (NeurIPS'21) Official PyTorch implementation of SWAD: Domain Generalization by Seeking Flat Minima.

Junbum Cha 97 Dec 20, 2022
This repository is a basic Machine Learning train & validation Template (Using PyTorch)

pytorch_ml_template This repository is a basic Machine Learning train & validation Template (Using PyTorch) TODO Markdown 사용법 Build Docker 사용법 Anacond

1 Sep 15, 2022
Unofficial PyTorch implementation of TokenLearner by Google AI

tokenlearner-pytorch Unofficial PyTorch implementation of TokenLearner by Ryoo et al. from Google AI (abs, pdf) Installation You can install TokenLear

Rishabh Anand 46 Dec 20, 2022