There are lots of great A/B Split testing tools at our disposal these days and my favourites include…

[list style="checkmark"]

But there’s a HUGE flaw in almost every single one of these tools causing countless Internet marketers to make decisions that could actually be losing them money when they’re expecting the opposite.

Here’s the problem…

When someone runs a test using any of these tools they’re typically looking at a few key metrics including…

  • [list style="arrow"]
    • % change in conversion
    • Confidence score

    [/list]

For those of you who are new to split testing, “confidence score” is basically a measure of the confidence that the projected increase/decrease is accurate. The more visitors & conversions measured the higher the confidence score and accuracy of the results.

For example, if I ran an A/B test on a landing page and version B showed a 6% increase in conversion with a 95% confidence score,  then version would B should be the winner.

Unfortunately that is not necessarily true…

You see, not too long ago I ran a VERY interesting experiment on a very high volume ecommerce website. Because it was a client’s website I cannot disclose the actual site but I can share the details that matter…

Over the past year I’d been working with my client to set up numerous different split tests on their landing pages. In many cases the tests were beating the control in which case we would roll it out and it would become the new control. But I noticed that after running quite a few of these tests and rolling out the winner we did not see the expected bump in sales.

Something wasn’t right.

Then to fuel the fire I brought this up to few other Internet buddies to see what they had to say and the feedback was the same… they would run a test, find a winner that showed an increase but then after rolling it out their conversion stayed flat.

What was going on? Were these testing solutions wrong? 

Now I’m fortunate to have access to some very high volume websites to run tests on so I set-up an A/B split test experiment.

The goal of my experiment was to determine the margin of error in the predicted conversion increase/decrease when running a split test and the only way to do this was to take a website and set up a split test where the control and the test version were identical.

So version A and version B were identical. No changes.

In theory, if these testing tools really work and I ran a test until our confidence score was 90%+ the predicted conversion increase should be almost identical right?

Wrong.

This is the dirty secret that these split testing companies either don’t want to tell us or don’t actually know…

After running this test until we reached a 90%+ confidence it showed the test version was outperforming the control by 6%.

WTF?

How is that possible? They were identical.

But I did not stop there. I let this test run for weeks and over the following weeks I watched the confidence score continue to creep up and up while the ‘winner’ bounced back and forth between the two versions. Some weeks the control showed a higher conversion, some weeks the test showed a higher conversion… even when the confidence score was almost 100%.

So does that mean these tools are now useless?

No.

What it means is there’s a margin of error we need to account for when reading the results from these testing platforms and based on the results of this experiment my conversion needs to increase by 10% or higher before I can confidently consider it a real winner because I know there’s a margin of error of around 8% I need to account for.

So the moral of the story is A/B split testing tools like Google Website Optimizer are not perfect. There’s a margin of error that I estimate to be around 8% that needs to be accounted for.

So if you have been running tests that are predicting increased conversion/sales only to be disappointed by the results when the winner is rolled out this might be the problem.

Happy Testing!

Derek