Don’t Over Promise Based on A/B Tests


We all know that A/B tests are a fundamental tool in any modern marketer’s tool kit, whether it be for email campaigns, webpage UX testing, or anything customer facing and content driven.

The development of more sophisticated test methods have marketers working with data scientists to find out what the trick is in measuring true campaign to revenue attribution is, but A/B tests using click-throughs to determine best content and practices are still the foundation for most data driven marketers, and are also very helpful when building out foundations for personalization.

While A/B tests are useful for predicting success of campaigns, it’s very wise to avoid over promising results based on the success of a test campaign. The further away your test campaign performed compared to the mean, the more likely that the actual campaign will come up short in comparison, a concept summed up by “regression towards the mean”.

Instead of diving into regression curves and complex mathematics, a simpler explanation of how “regression towards the mean” has an effect on A/B testing using email campaigns and Akitas should make understanding this concept much more digestible.

Pretend that we’ve recently performed an A/B test of an email campaign and found two wildly different click-throughs:

Campaign A: 5.5%
Campaign B: 2.3%

Given that the audience reached statistical significance, A would be the obvious choice to send – but beware of overpromising results. Given that the difference in performance is so high and likely further from an average click-through rate than the norm, it is highly likely that a live campaign follow up using A will result in lower click-throughs than the test.

In mathematics, the further from average and more extreme a variable is, the more likely that a second close to replicate experiment will return closer to average numbers.

Imagine that you’re a dog trainer given the task of teaching a large group of stubborn Akitas to run through an obstacle course. If the average success rate was 50%, and your first group managed to finish at a staggering 75% completion rate – guess what?

The second time you run through the course with a another group of Akitas, you can bet that your chances of finishing with above a 75% completion rate vs something lower than 75% will be low.

So knowing that expectations should be set much lower on further than average tests (on the positive side of course) is one thing, but explaining to a CMO or someone you report to with the right expectations is a whole other ball game.

Misunderstanding the impact of “regression towards the mean” can set unreal expectations on your marketing team’s shoulders, and there are plenty of things that can throw you off your understanding of “regressions towards the mean” when it comes to A/B testing.

Since the variables that determine how we segment our lists will be increasingly different and many in number, understanding the effect of your customer data variables on your lists will help avoid assuming “regression towards the mean” is the reason for a live campaign doing more poorly than an initial test.

Say you are sending out a test campaign to determine whether a small discount will have a positive effect on your loyal users (who you classified based on frequency of purchases and also total spend).

So now it gets a little more complicated than a standard test, because you are not only looking at the variance in two campaign performances using click-throughs, but now you are also looking at behavioral changes over time for your loyal users post campaign to determine the right track. We’ll look at whether or not the discounts positively impacted loyal users in another article.

By doing an A/B test, you find the following results:

Test A: 4.3%
Test B: 3.1%

You choose Test A as your live campaign and shoot it out – and as presumed, the live campaign does slightly less stellar, with 3.7% in click-throughs.

By assuming that this was a part of “regression towards mean” in A/B testing would be a mistake, because it turns out that the test campaign you sent out had a much higher percentage of users that purchased less frequently but had high total spend (they typically wait for sales and holidays to mass buy), and the live campaign had the total sample with customers that had a more balanced frequency and total spend ratio.

A/B testing is a simple concept embraced (and rightly so) by marketers around the world – but it should always be done with as many variables in mind as possible. As we move more towards hyper-segmentation and personalization, knowing what your sample consists of and how to set the right expectations from your test campaigns will become even more paramount. As marketers, our task won’t be easy – but I have faith that our community will continue to exchange knowledge to build a better understanding of data, technology, and marketing.


About Author

I juggle marketing, business development, and multiple other hats at marketing personalization and automation company nectarOM. Recent innovations in marketing tech has me hooked on analytics and also the creative side of data usage. In my free time, I like to immerse myself in the Dallas start-up scene and frequently visit the numerous coworking spaces and incubators around time. I don't like to call myself a total geek, but I'm kind of into science fiction (Star Wars extended universe anyone?) and continuously try to find zen in balancing piano, writing, gaming, and lifting.

Leave A Reply