Bayesian A/B testing

Last update: Dec 15, 2022

Overview

Bayesian A/B testing

bayesian_testing is a small package for a quick evaluation of A/B (or A/B/C/...) tests using Bayesian approach.

The package currently supports these data inputs:

binary data ([0, 1, 0, ...]) - convenient for conversion-like A/B testing
normal data with unknown variance - convenient for normal data A/B testing
delta-lognormal data (lognormal data with zeros) - convenient for revenue-like A/B testing

The core evaluation metric of the approach is Probability of Being Best (i.e. "being larger" from data point of view) which is calculated using simulations from posterior distributions (considering given data).

Installation

bayesian_testing can be installed using pip:

pip install bayesian_testing

Alternatively, you can clone the repository and use poetry manually:

cd bayesian_testing
pip install poetry
poetry install
poetry shell

Basic Usage

The primary features are BinaryDataTest, NormalDataTest and DeltaLognormalDataTest classes.

In all cases, there are two methods to insert data:

add_variant_data - adding raw data for a variant as a list of numbers (or numpy 1-D array)
add_variant_data_agg - adding aggregated variant data (this can be practical for large data, as the aggregation can be done on a database level)

Both methods for adding data are allowing specification of prior distribution using default parameters (see details in respective docstrings). Default prior setup should be sufficient for most of the cases (e.g. in cases with unknown priors or large amounts of data).

To get the results of the test, simply call method evaluate, or probabs_of_being_best for returning just the probabilities.

Probabilities of being best are approximated using simulations, hence evaluate can return slightly different values for different runs. To stabilize it, you can set sim_count parameter of evaluate to higher value (default value is 20K), or even use seed parameter to fix it completely.

BinaryDataTest

Class for Bayesian A/B test for binary-like data (e.g. conversions, successes, etc.).

import numpy as np
from bayesian_testing.experiments import BinaryDataTest

# generating some random data
rng = np.random.default_rng(52)
# random 1x1500 array of 0/1 data with 5.2% probability for 1:
data_a = rng.binomial(n=1, p=0.052, size=1500)
# random 1x1200 array of 0/1 data with 6.7% probability for 1:
data_b = rng.binomial(n=1, p=0.067, size=1200)

# initialize a test
test = BinaryDataTest()

# add variant using raw data (arrays of zeros and ones):
test.add_variant_data("A", data_a)
test.add_variant_data("B", data_b)
# priors can be specified like this (default for this test is a=b=1/2):
# test.add_variant_data("B", data_b, a_prior=1, b_prior=20)

# add variant using aggregated data (same as raw data with 950 zeros and 50 ones):
test.add_variant_data_agg("C", totals=1000, positives=50)

# evaluate test
test.evaluate()

[{'variant': 'A',
  'totals': 1500,
  'positives': 80,
  'conv_rate': 0.05333,
  'prob_being_best': 0.06625},
 {'variant': 'B',
  'totals': 1200,
  'positives': 80,
  'conv_rate': 0.06667,
  'prob_being_best': 0.89005},
 {'variant': 'C',
  'totals': 1000,
  'positives': 50,
  'conv_rate': 0.05,
  'prob_being_best': 0.0437}]

NormalDataTest

Class for Bayesian A/B test for normal data.

import numpy as np
from bayesian_testing.experiments import NormalDataTest

# generating some random data
rng = np.random.default_rng(21)
data_a = rng.normal(7.2, 2, 1000)
data_b = rng.normal(7.1, 2, 800)
data_c = rng.normal(7.0, 4, 500)

# initialize a test
test = NormalDataTest()

# add variant using raw data:
test.add_variant_data("A", data_a)
test.add_variant_data("B", data_b)
# test.add_variant_data("C", data_c)

# add variant using aggregated data:
test.add_variant_data_agg("C", len(data_c), sum(data_c), sum(np.square(data_c)))

# evaluate test
test.evaluate(sim_count=20000, seed=52)

[{'variant': 'A',
  'totals': 1000,
  'sum_values': 7294.67901,
  'avg_values': 7.29468,
  'prob_being_best': 0.1707},
 {'variant': 'B',
  'totals': 800,
  'sum_values': 5685.86168,
  'avg_values': 7.10733,
  'prob_being_best': 0.00125},
 {'variant': 'C',
  'totals': 500,
  'sum_values': 3736.91581,
  'avg_values': 7.47383,
  'prob_being_best': 0.82805}]

DeltaLognormalDataTest

Class for Bayesian A/B test for delta-lognormal data (log-normal with zeros). Delta-lognormal data is typical case of revenue per session data where many sessions have 0 revenue but non-zero values are positive numbers with possible log-normal distribution. To handle this data, the calculation is combining binary Bayes model for zero vs non-zero "conversions" and log-normal model for non-zero values.

0 for x in data_b), sum_values=sum(data_b), sum_logs=sum([np.log(x) for x in data_b if x > 0]), sum_logs_2=sum([np.square(np.log(x)) for x in data_b if x > 0]) ) test.evaluate(seed=21)">

import numpy as np
from bayesian_testing.experiments import DeltaLognormalDataTest

test = DeltaLognormalDataTest()

data_a = [7.1, 0.3, 5.9, 0, 1.3, 0.3, 0, 0, 0, 0, 0, 1.5, 2.2, 0, 4.9, 0, 0, 0, 0, 0]
data_b = [4.0, 0, 3.3, 19.3, 18.5, 0, 0, 0, 12.9, 0, 0, 0, 0, 0, 0, 0, 0, 3.7, 0, 0]

# adding variant using raw data
test.add_variant_data("A", data_a)

# alternatively, variant can be also added using aggregated data:
test.add_variant_data_agg(
    name="B",
    totals=len(data_b),
    positives=sum(x > 0 for x in data_b),
    sum_values=sum(data_b),
    sum_logs=sum([np.log(x) for x in data_b if x > 0]),
    sum_logs_2=sum([np.square(np.log(x)) for x in data_b if x > 0])
)

test.evaluate(seed=21)

[{'variant': 'A',
  'totals': 20,
  'positives': 8,
  'sum_values': 23.5,
  'avg_values': 1.175,
  'avg_positive_values': 2.9375,
  'prob_being_best': 0.18915},
 {'variant': 'B',
  'totals': 20,
  'positives': 6,
  'sum_values': 61.7,
  'avg_values': 3.085,
  'avg_positive_values': 10.28333,
  'prob_being_best': 0.81085}]

Development

To set up development environment use Poetry and pre-commit:

pip install poetry
poetry install
poetry run pre-commit install

Roadmap

Test classes to be added:

PoissonDataTest
ExponentialDataTest

Metrics to be added:

Expected Loss
Potential Value Remaining

References

bayesian_testing package itself is dependent only on numpy package.
Work on this package (including default priors selection) was inspired mainly by Coursera course Bayesian Statistics: From Concept to Data Analysis.

You might also like...

Language-agnostic HTTP API Testing Tool

Dredd — HTTP API Testing Framework Dredd is a language-agnostic command-line tool for validating API description document against backend implementati

4k Jan 5, 2023

Web testing library for Robot Framework

SeleniumLibrary Contents Introduction Keyword Documentation Installation Browser drivers Usage Extending SeleniumLibrary Community Versions History In

1.2k Jan 3, 2023

✅ Python web automation and testing. 🚀 Fast, easy, reliable. 💠

Build fast, reliable, end-to-end tests. SeleniumBase is a Python framework for web automation, end-to-end testing, and more. Tests are run with "pytes

3k Jan 4, 2023

A command-line tool and Python library and Pytest plugin for automated testing of RESTful APIs, with a simple, concise and flexible YAML-based syntax

1.0 Release See here for details about breaking changes with the upcoming 1.0 release: https://github.com/taverntesting/tavern/issues/495 Easier API t

909 Dec 15, 2022

One-stop solution for HTTP(S) testing.

HttpRunner HttpRunner is a simple & elegant, yet powerful HTTP(S) testing framework. Enjoy! ✨ 🚀 ✨ Design Philosophy Convention over configuration ROI

3.5k Jan 4, 2023

Declarative HTTP Testing for Python and anything else

Gabbi Release Notes Gabbi is a tool for running HTTP tests where requests and responses are represented in a declarative YAML-based form. The simplest

139 Sep 21, 2022

A modern API testing tool for web applications built with Open API and GraphQL specifications.

Schemathesis Schemathesis is a modern API testing tool for web applications built with Open API and GraphQL specifications. It reads the application s

1.6k Dec 30, 2022

A framework-agnostic library for testing ASGI web applications

async-asgi-testclient Async ASGI TestClient is a library for testing web applications that implements ASGI specification (version 2 and 3). The motiva

122 Nov 22, 2022

A Modular Penetration Testing Framework

fsociety A Modular Penetration Testing Framework Install pip install fsociety Update pip install --upgrade fsociety Usage usage: fsociety [-h] [-i] [-

802 Dec 31, 2022

Comments

Results are different from online tool

Hi,

I tested your library and cross-checked against this online calculator: Here is the result from your library:

[{'variant': 'True True True False False False False',
  'totals': 1172,
  'positives': 461,
  'positive_rate': 0.39334,
  'prob_being_best': 0.7422,
  'expected_loss': 0.0582635},
 {'variant': 'False True True False False False False',
  'totals': 222,
  'positives': 27,
  'positive_rate': 0.12162,
  'prob_being_best': 0.0,
  'expected_loss': 0.3280173},
 {'variant': 'False False True False False False False',
  'totals': 1363,
  'positives': 63,
  'positive_rate': 0.04622,
  'prob_being_best': 0.0,
  'expected_loss': 0.4051768},
 {'variant': 'False False False False False False False',
  'totals': 1052,
  'positives': 0,
  'positive_rate': 0.0,
  'prob_being_best': 0.0,
  'expected_loss': 0.4512031},
 {'variant': 'True False True False False False False',
  'totals': 1,
  'positives': 0,
  'positive_rate': 0.0,
  'prob_being_best': 0.2578,
  'expected_loss': 0.1997566}]

So the best variant has 74% probability to be the winner. On the online calculator it is 63.48% instead (last variant is 36.52% instead of 25.78%).

I used the BinaryDataTest() without any priors.

I did not dig deeper on what might be right here, but wanted to drop this as feedback.

opened by ThomasMeissnerDS 6

Minimum sample size

First, this package is great! I wanted to know if the probability estimates rely on a minimum sample size or how one might go about determining minimum sample size for a Binary test, for example.

opened by abrunner94 5
Bump jupyter-server from 1.13.5 to 1.15.4
Bumps jupyter-server from 1.13.5 to 1.15.4.

Release notes

Sourced from jupyter-server's releases.

v1.15.3

1.15.3

(Full Changelog)

Bugs fixed

Fix server-extension paths (3rd time's the charm) #734 (@minrk)

Revert "Server extension paths (#730)" #732 (@blink1073)

Maintenance and upkeep improvements

Avoid usage of ipython_genutils #718 (@blink1073)

Contributors to this release

(GitHub contributors page for this release)

@blink1073 | @codecov-commenter | @minrk

v1.15.2

1.15.2

(Full Changelog)

Bugs fixed

Server extension paths #730 (@minrk)

allow handlers to work without an authorizer in the Tornado settings #717 (@Zsailer)

Maintenance and upkeep improvements

Skip nbclassic downstream tests for now #725 (@blink1073)

Contributors to this release

(GitHub contributors page for this release)

@blink1073 | @minrk | @Zsailer

v1.15.1

1.15.1

(Full Changelog)

... (truncated)

Changelog

Sourced from jupyter-server's changelog.

Changelog

All notable changes to this project will be documented in this file.

1.16.0

(Full Changelog)

New features added

add hook to observe pending sessions #751 (@Zsailer)

Enhancements made

Add max-age Cache-Control header to kernel logos #760 (@divyansshhh)

Bugs fixed

Regression in connection URL calcuation in ServerApp #761 (@jhamet93)

Include explicit package data #757 (@blink1073)

Ensure terminal cwd exists #755 (@fcollonval)

make 'cwd' param for TerminalManager absolute #749 (@rccern)

wait to cleanup kernels after kernel is finished pending #748 (@Zsailer)

Maintenance and upkeep improvements

Skip jsonschema in CI #766 (@blink1073)

Remove redundant job and problematic check #765 (@blink1073)

Update pre-commit #764 (@blink1073)

Install pre-commit automatically #763 (@blink1073)

Add pytest opts and use isort #762 (@blink1073)

Ensure minimal nbconvert support jinja2 v2 & v3 #756 (@fcollonval)

Fix error handler in simple extension examples #750 (@andreyvelich)

Clean up workflows #747 (@blink1073)

Remove Redundant Dir_Exists Invocation When Creating New Files with ContentsManager #720 (@jhamet93)

Other merged PRs

Handle importstring pre/post save hooks #754 (@dleen)

Contributors to this release

(GitHub contributors page for this release)

@andreyvelich | @blink1073 | @codecov-commenter | @divyansshhh | @dleen | @fcollonval | @jhamet93 | @meeseeksdev | @minrk | @rccern | @welcome | @Zsailer

... (truncated)

Commits

427ce75 Bump to 1.15.4

a5683ac Merge pull request from GHSA-p737-p57g-4cpr

e4a3141 Publish 1.15.3

a4542cf Automated Changelog Entry for 1.15.3 on main (#735)

461b551 Fix server-extension paths (3rd time's the charm) (#734)

93b1c83 Revert "Server extension paths (#730)" (#732)

0fd5c7b Avoid usage of ipython_genutils (#718)

f4d131b Publish 1.15.2

5b83bd7 Automated Changelog Entry for 1.15.2 on main (#731)

9711822 Server extension paths (#730)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 2

Releases(v0.5.3)

v0.5.3(Dec 26, 2022)

Fixing normal posterior mean in evaluate display.
Source code(tar.gz)
Source code(zip)
v0.5.2(Dec 26, 2022)

Adding posterior mean to binary and normal data tests.
Source code(tar.gz)
Source code(zip)
v0.5.1(Dec 11, 2022)
cosmetic fixes of Poisson outputs

outputs in main Readme file as markdown tables

Source code(tar.gz)
Source code(zip)
v0.5.0(Dec 11, 2022)

Introducing Poisson data test which allows A/B (or as always A/B/C/...) testing for data of event counts (e.g. goals scored, goals received, number of orders in a session, etc.). 🎉
Source code(tar.gz)
Source code(zip)
v0.4.0(Dec 10, 2022)

Adding a bool option min_is_best into evaluation so "being best" of PBB can be changed from "being greatest" to "being smallest". Default value remains as before ("being best" = "being greatest").
Source code(tar.gz)
Source code(zip)
v0.3.0(Aug 5, 2022)

Adding expected loss. 🎉 🎊🍾
Source code(tar.gz)
Source code(zip)
v0.2.3(Jul 16, 2022)

Updating poetry packages to resolve security alerts.
Source code(tar.gz)
Source code(zip)
v0.2.2(Mar 30, 2022)

Updating poetry and packages to solve vulnerabilities alerts.
Source code(tar.gz)
Source code(zip)
v0.2.1(Feb 21, 2022)

Renaming categories to states in discrete test.
Source code(tar.gz)
Source code(zip)
v0.2.0(Feb 20, 2022)

Added new class for discrete data (e.g. dice rolls, rating data, etc.).
Source code(tar.gz)
Source code(zip)
v0.1.4(Jan 3, 2022)

Fixing example in readme.
Source code(tar.gz)
Source code(zip)
v0.1.3(Jan 2, 2022)

Updating readme examples and adding PyPI badge.
Source code(tar.gz)
Source code(zip)
v0.1.2(Jan 1, 2022)

Relaxing numpy dependency to ">=1.19" or higher (before it was ">=1.20").
Source code(tar.gz)
Source code(zip)
v0.1.1(Jan 1, 2022)

Adding readme to package.
Source code(tar.gz)
Source code(zip)
v0.1.0(Jan 1, 2022)

Initial release of Bayesian Testing.
Source code(tar.gz)
Source code(zip)

Owner

Matus Baniar

Data data data

GitHub Repository

pywinauto is a set of python modules to automate the Microsoft Windows GUI

pywinauto is a set of python modules to automate the Microsoft Windows GUI. At its simplest it allows you to send mouse and keyboard actions to windows dialogs and controls, but it has support for mo

3.8k Jan 06, 2023

Connexion-faker - Auto-generate mocks from your Connexion API using OpenAPI

Connexion Faker Get Started Install With poetry: poetry add connexion-faker # a

6 Dec 19, 2022

This repository contnains sample problems with test cases using Cormen-Lib

Cormen Lib Sample Problems Description This repository contnains sample problems with test cases using Cormen-Lib. These problems were made for the pu

3 Jun 30, 2022

Coverage plugin for pytest.

Overview docs tests package This plugin produces coverage reports. Compared to just using coverage run this plugin does some extras: Subprocess suppor

1.4k Dec 29, 2022

HTTP traffic mocking and testing made easy in Python

pook Versatile, expressive and hackable utility library for HTTP traffic mocking and expectations made easy in Python. Heavily inspired by gock. To ge

305 Dec 23, 2022

A testing system for catching visual regressions in Web applications.

Huxley Watches you browse, takes screenshots, tells you when they change Huxley is a test-like system for catching visual regressions in Web applicati

4.1k Nov 30, 2022

CNE-OVS-SIT - OVS System Integration Test Suite

CNE-OVS-SIT - OVS System Integration Test Suite Introduction User guide Discussion Introduction CNE-OVS-SIT is a test suite for OVS end-to-end functio

4 Jan 09, 2022

Test scripts etc. for experimental rollup testing

rollup node experiments Test scripts etc. for experimental rollup testing. untested, work in progress python -m venv venv source venv/bin/activate #

14 Jan 25, 2022

fsociety Hacking Tools Pack – A Penetration Testing Framework

Fsociety Hacking Tools Pack A Penetration Testing Framework, you will have every script that a hacker needs. Works with Python 2. For a Python 3 versi

8.2k Jan 03, 2023

This project demonstrates selenium's ability to extract files from a website.

This project demonstrates selenium's ability to extract files from a website. I've added the challenge of connecting over TOR. This package also includes a personal archive site built in NodeJS and A

2 Jan 16, 2022

PyBuster A directory busting tool for web application penetration tester, written in python

PyBuster A directory busting tool for web application penetration tester, written in python. Supports custom wordlist,recursive search. Screenshots Pr

4 Jan 30, 2022

Voip Open Linear Testing Suite

VOLTS Voip Open Linear Tester Suite Functional tests for VoIP systems based on voip_patrol and docker 10'000 ft. view System is designed to run simple

17 Dec 30, 2022

A feature flipper for Django

README Django Waffle is (yet another) feature flipper for Django. You can define the conditions for which a flag should be active, and use it in a num

952 Jan 06, 2023

A friendly wrapper for modern SQLAlchemy and Alembic

A friendly wrapper for modern SQLAlchemy (v1.4 or later) and Alembic. Documentation: https://jpsca.github.io/sqla-wrapper/ Includes: A SQLAlchemy wrap

129 Nov 28, 2022

A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website.

Sneaker-Bot-UK A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website. This bot is still in development and is

4 Dec 14, 2022

Minimal example of how to use pytest with automated 'devops' style automated test runs

Pytest python example with automated testing This is a minimal viable example of pytest with an automated run of tests for every push/merge into the m

2 Jan 02, 2022

🏃💨 For when you need to fill out feedback in the last minute.

BMSCE Auto Feedback For when you need to fill out feedback in the last minute. 🏃 💨 Setup Clone the repository Run pip install selenium Set the RATIN

10 May 23, 2022

Useful additions to Django's default TestCase

django-test-plus Useful additions to Django's default TestCase from REVSYS Rationale Let's face it, writing tests isn't always fun. Part of the reason

546 Dec 22, 2022

Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away

Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away. Fast execution of profit-take/loss-cut orders is built-in. Seamless with Pandas.

39 Jan 06, 2023

This file will contain a series of Python functions that use the Selenium library to search for elements in a web page while logging everything into a file

element_search with Selenium (Now With docstrings 😎 ) Just to mention, I'm a beginner to all this, so it it's very possible to make some mistakes The

2 Aug 12, 2021

Bayesian A/B testing

Related tags

Overview

Bayesian A/B testing

Installation

Basic Usage

BinaryDataTest

NormalDataTest

DeltaLognormalDataTest

Development

Roadmap

References

You might also like...

Language-agnostic HTTP API Testing Tool

Web testing library for Robot Framework

✅ Python web automation and testing. 🚀 Fast, easy, reliable. 💠

A command-line tool and Python library and Pytest plugin for automated testing of RESTful APIs, with a simple, concise and flexible YAML-based syntax

One-stop solution for HTTP(S) testing.

Declarative HTTP Testing for Python and anything else

A modern API testing tool for web applications built with Open API and GraphQL specifications.

A framework-agnostic library for testing ASGI web applications

A Modular Penetration Testing Framework

Comments

Results are different from online tool

Minimum sample size

Bump jupyter-server from 1.13.5 to 1.15.4

v1.15.3

1.15.3

Bugs fixed

Maintenance and upkeep improvements

Contributors to this release

v1.15.2

1.15.2

Bugs fixed

Maintenance and upkeep improvements

Contributors to this release

v1.15.1

1.15.1

Changelog

1.16.0

New features added

Enhancements made

Bugs fixed

Maintenance and upkeep improvements

Other merged PRs

Contributors to this release

Releases(v0.5.3)

v0.5.3(Dec 26, 2022)

v0.5.2(Dec 26, 2022)

v0.5.1(Dec 11, 2022)

v0.5.0(Dec 11, 2022)

v0.4.0(Dec 10, 2022)

v0.3.0(Aug 5, 2022)

v0.2.3(Jul 16, 2022)

v0.2.2(Mar 30, 2022)

v0.2.1(Feb 21, 2022)

v0.2.0(Feb 20, 2022)

v0.1.4(Jan 3, 2022)

v0.1.3(Jan 2, 2022)

v0.1.2(Jan 1, 2022)

v0.1.1(Jan 1, 2022)

v0.1.0(Jan 1, 2022)

Owner

Matus Baniar

pywinauto is a set of python modules to automate the Microsoft Windows GUI

Connexion-faker - Auto-generate mocks from your Connexion API using OpenAPI

This repository contnains sample problems with test cases using Cormen-Lib

Coverage plugin for pytest.

HTTP traffic mocking and testing made easy in Python

A testing system for catching visual regressions in Web applications.

CNE-OVS-SIT - OVS System Integration Test Suite

Test scripts etc. for experimental rollup testing

fsociety Hacking Tools Pack – A Penetration Testing Framework

This project demonstrates selenium's ability to extract files from a website.

PyBuster A directory busting tool for web application penetration tester, written in python

Voip Open Linear Testing Suite

A feature flipper for Django

A friendly wrapper for modern SQLAlchemy and Alembic

A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website.

Minimal example of how to use pytest with automated 'devops' style automated test runs