Bayesian A/B Test Calculator

The Beta-Bernoulli model in the context of A/B testing.

TL:DR; Instructions

  1. 1Specify the prior alpha and beta parameters.
  2. 2Plot the priors and revise parameters as necessary.
  3. 3Enter data on the number of successes and failures in the test and control groups.
  4. 4Plot to see posterior distributions.

Test and Control Probability Density Functions

The success probability distributions in test and control groups.

Histogram of Test - Control Probability

Distribution of differences in success probability between test and control groups.

Quantiles of the differences distribution.

Posterior probability that the difference lies below the value x.

The average difference between test and control is: 0. The probability that test performs better: 0.5

Explanation

This simple calculator uses the Beta-Bernoulli model (a binary outcome model, where the prior for the success probability is a Beta distribution) applied in the A/B testing context, where the goal of inference is understanding the probability that the test group performs better than the control group.

Bayesian inference consists in first specifying a prior belief about what effects are likely, and then updating the prior with incoming data.

For example, if our conversion rate is 5%, we may say that it's reasonably likely that a change we want to test could improve that by 5 percentage points—but that it is most likely that the change will have no effect, and that it is entirely unlikely that the conversion rate will shoot up to 30% (after all, we are only making a small change).

As the data start coming in, we start updating our beliefs. If the incoming data points point to an improvement in the conversion rate, we start moving our estimate of the effect from the prior upwards; the more data we collect, the more confident we are in it and the further we can move away from our prior. The end result is what is called the posterior—a probability distribution describing the likely effect from our treatment.

  1. 1 Specify the prior through the alpha and beta parameters of the Beta distribution. The parameter values govern two things: the prior success probability (our belief about the average conversion rate, for example) as well as the variance of the prior distribution (small alpha and beta will lead to a prior distribution where success probabilities can vary quite a lot around their mean; large values will lead to a distribution with a small variance). For example, setting alpha to 10 and beta to 10 will give us a prior distribution where the expected success probability is 0.5, but there is a fair amount of uncertainty around that value. Setting them to 100 and 100 will give us the same expected probability of 0.5, but the variance around that value will be much smaller.
  2. 2 Have a look at the histogram of success probability differences between the test and control. It expresses prior beliefs about the likely difference of success probabilities between the test and control groups. Because we specified a symmetric prior, the belief is centered around a difference of zero (a priori, A/B tests are just as likely to do worse as they are to do better than the control). If our priors have a low variance, the histogram will put put a low weight on large differences (it is unlikely that a test will do much better or much worse than the control); if the priors have a high variance, large differences will be much more likely.
  3. 3 Gather data!
  4. 4 Input the number of successes (conversions, clicks and so on) and failures in both the test and control groups. This triggers updating the priors with the data.
  5. 5 The prior plots shift to express posterior (prior updated with data) distributions. The density plots will (may) diverge, showing the posterior distributions of the success probability in test and control groups. Similarly, the difference histogram will shift. The part of the distribution lying to the right of zero expresses the confidence that the test performs better; the part to the left that it performs worse.