Conversion

Exploration vs. Exploitation: how to balance short-term results with long-term impact

Edmund Beggs and Frazer Mawson — Tue, 05 Mar 2024 12:50:18 +0000

Today in 2024, pretty much every functional experimentation team understands the importance of iteration:

You run a test;
You analyze the results to work out why the test did or didn’t work;
You build another test to exploit this learning;
You analyze this second test to work out why it did or didn’t work;
You build another test to exploit this learning;
etc.

This approach can yield some very strong results in the short-to-medium term, but in the longer-term, you’re likely to find that the wins from your chosen line of testing begin to dry up.

From here, it’s only a matter of time until you encounter the most feared phenomenon in all of optimization:

The plateau.

We’ve had a lot of experience helping in-house teams push through this kind of performance plateau, and in our experience, it’s almost caused by the same thing:

A less than optimal approach to the explore-exploit tradeoff.

In this article, we’re going to explain what the explore-exploit tradeoff is, how we’re using our Levers Framework to optimally solve it for our clients – and how you too can do the same for your program.

Contents:

What is the explore-exploit tradeoff?
What is the Levers Framework?
Using the Levers Framework to balance exploration and exploitation

1. What is the explore-exploit tradeoff?

In essence, the explore-exploit tradeoff is the tradeoff between gathering new information (exploration) and using that information to improve performance (exploitation).

When you’re exploring new information, you’re not exploiting the information you already have to drive impact now.

When you’re exploiting preexising information, you’re not gathering new information that might drive an even bigger impact in the future.

As it turns out, the explore-exploit tradeoff shows up in a ridiculously broad range of contexts. For example,

In reinforcement learning, where the goal is to teach an AI agent to make decisions based on feedback from its environment: should the agent select the best-known option based on past experience (exploit) or should it explore new options that may lead to a more optimal solution in the future (explore)?
In CRO: should we iterate on what’s already proven to be effective (exploit) or should we trial something completely new (explore)?
And also in much more everyday contexts: should I pick my favorite burger that I know I’ll enjoy (exploit) or should I venture further afield and try the escargots instead (explore)?

We’re not saying that we’ve found a global solution to the explore-exploit dilemma – one that will apply to all of the various domains mentioned above.

What we are saying, though, is that we believe we’ve found a strong, close-to-optimal solution to this problem within the context of CRO/experimentation – one that will ultimately allow you to move away from the current local maximum (plateau) you are stuck on, and towards your global maximum.

Central to this approach is our Levers Framework.

2. What is the Levers Framework?

We’re not going to delve too deeply into our Levers Framework here, since we’ve got other pieces of content that fulfill this purpose already – see our recent white paper, webinar, and blog.

Saying that, our entire solution to the explore-exploit tradeoff is built around the Levers Framework, so it’s worth offering up a quick high-level overview before we go any further. If you’re already acquainted with this stuff, feel free to skip ahead.

So, to begin: in order to explain what the Levers Framework is, we first need to define what we mean by the word ‘lever’.

For us, a lever is any feature of the user experience that influences user behavior.

For instance, sales countdown timers exploit a sense of urgency. Within the Levers Framework, an experiment that deploys a sales countdown timer would therefore be categorized under the urgency lever, since this is the means by which it influences user behavior.

In essence, the Levers Framework is a comprehensive taxonomy of the user experience features that influence user behavior (see below). The framework is a treelike structure that aims to categorize these features of user experience at three levels of generality: Master Levers (most general); Levers (middle layer); and Sub-levers (most specific).

High-level overview of our Levers Framework.

In principle, this means that every experiment we run – every lever we pull – can be categorized at three different levels of generality.

For example, let’s say we’ve added a Trustpilot Logo to a landing page of one of our clients. By adding this logo we are:

Trying to elicit trust from the user, so this test is classed under the Trust Master Lever.
Trying to enhance our client’s credibility in the eyes of the user, so this test is classed under the Credibility Lever.
Appealing to a broader source of endorsement among people who will be considered representative for the typical user, so this test is classed under the Social Proof Sub Lever.

The Levers Framework is the product of more than 16 years worth of iterations, and it has been validated both by its efficacy in our day-to-day client work and by its profound predictive power.

The framework has a huge range of applications that we won’t touch on here (check out the white paper for more), but one thing worth flagging is that it serves as a fine-grained, comprehensive map of the various user experience solutions that influence conversion…

…and once you have a trustworthy map, exploring the territory becomes a whole lot easier.

3. Using the Levers Framework to balance exploration and exploitation

On the surface, our approach to the explore-exploit tradeoff is quite simple:

When we first start working on a website, we make a conscious effort to run exploratory experiments on all 5 of our Master Levers, i.e. Cost, Trust, Motivation, Usability, and Comprehension.

While this approach means that our initial win-rate may be a little bit lower than it would have been had we focussed solely on low hanging fruit, it allows us to gather valuable information about the kinds of interventions that are likely to be most effective on any given website (and those that aren’t).

Programs that pursue quick-wins tend to show lower win-rates in the the long-term than programs focussed on balancing exploration and exploitation in a structured way.

By intentionally collecting information about a broad selection of levers, we are then able to explore the full range of possible solutions in a structured way.

Once we’re confident in our results, we can finally shift into exploit mode and start ruthlessly folding poorly performing levers while doubling down on successful ones to drive maximum impact for our clients.

Putting this in terms of the Local/Global Maximum analogy above:

At the start of a program, we will essentially fly over the entire optimization landscape in search of the region within which the Global Maximum exists.

Once we think we’ve found it, we’ll then drop down into this general region and begin performing a ‘hill-climbing’ operation, which basically involves iteratively improving the website to gradually ascend to the global maximum.

Of course, the reality of the situation is a good deal more complicated than this theoretical sketch may suggest, but we’ve found that the principle behind this approach is sound – and that it often provides an optimal path through the explore-exploit quandary, allowing us to maximize long-term value for our clients.

To support you in actually applying this approach to your own work, we’re going to run through the key steps in our process, with the goal of adding some additional detail and actionability to the picture painted thus far.

1. Research

As mentioned above, our approach involves distributing our experiments across all 5 Master Levers.

However, before we do this, we first need to identify the most impactful levers within those 5 Master Levers, as well as the types of experiments that are most likely to succeed for each.

We therefore typically begin by running a UX research project, which will include methodologies like analytics reviews, user testing, surveys, scroll/heatmaps, competitor analysis, and more.

This allows us to collect a huge range of observations about the barriers and motivations that are active on any given website.

We will then combine sets of these individual observations into what we term ‘insights’, which are the unifying themes under which observations can be grouped.

So, to give a more concrete example:

50% of survey respondents said low confidence in the service was a barrier to conversion.
User testing found that 3 out of 5 participants shared feedback to the effect of ‘I’m not familiar with this service. I’m not confident it will work.’
The insight for these two observations would then be something like ‘users lack trust in the efficacy of the service.’

Once we’ve combined all of our observations into insights, we start assigning each insight to a Master Lever, Lever, and Sub Lever, working our way down our framework to establish an increasingly specific understanding of the problem we’re trying to address.

Different research methods generate observations, which we cluster together under different themes known as insights. We then aim to map each of these insights to the lever that relates most closely to it.

So, returning to the example from above: if our insight is ‘users lack trust in the efficacy of the service,’ this is clearly a trust issue, so we will assign this insight to the Trust Master Lever.

The Trust Master Lever, along with its constituent Levers and Sub-Levers

Moving one layer further down the framework, we must then ask: ‘is this a legitimacy question, a credibility question, or a security question?’

In our framework, Credibility is about whether a company is able to live up to the claims that it makes on its website, so clearly this is a Credibility question rather than one pertaining to Security or Legitimacy.

Interpreting the Sub Lever may be slightly more difficult, but for now, we may tentatively identify this as an Authority issue, since an increase in Authority would likely assuage the trust-related concerns associated with this insight.

Using this approach, we will attempt to tag every insight we’ve collected from our research to a specific Master Lever, Lever, and Sub Lever.

This will then typically leave us with a huge range of insights, distributed across all 5 of our master levers (see graph below).

Insights from one of our programs, distributed across our 5 Master Levers.

2. Ideation

Now that you have each of your insights assigned to a lever, you’ll need to develop the execution for each of these specific levers.

To clarify, let’s once again return to the example from the previous section, where we found that the Master Lever was Trust, the Lever was Credibility, and the Sub Lever was Authority.

So far, we have quite a specific idea about what we might want to test, i.e. anything that is going to enhance the authority of our brand. This solution, however, still leaves us with some scope with respect to the actual experiment we want to run.

For example,

On the homepage we could add a logo of an institution that we’ve been endorsed by;
We could include a testimonial from an authority figure in our industry endorsing our brand;
We could roll either of these changes out at different stages of the funnel or on different parts of the page;
etc.

We have an entire process for developing high-impact experiment concepts from our initial research – we’ll be sharing more info about this in the future – but for now, here are a few considerations to keep in mind during this initial ideation session:

We want to gather data about our Levers at minimum cost and effort. We therefore recommend using a Minimum Viable Experiment (MVE) approach, which essentially means creating the smallest (in terms of build time, sign-off, etc.) experiment possible that will allow you to validate your selected levers. This means you can gather valuable information at minimal upfront expense. With this data in hand, you can then be more confident running resource-expensive experiments on this lever further down the line.
Make sure that the test is conducted on an area of the website with sufficient traffic to ensure a conclusive result, e.g. if your About Us page only gets 100 visitors a month but your PDPs receive millions, PDPs are the only viable option out of the two.
If you have a high minimum detectable effect, ensure that your concept is bold enough to potentially achieve this threshold.
Ensure that the execution you’ve developed actually aligns with the various levers of interest you have identified. Ultimately, each execution should derive from insights for one lever.

3. Roadmap strategy

Once you’ve developed executions for each of your initial concepts, you’ll then need to prioritize these executions.

We’ve developed a machine learning assisted prioritization tool for this purpose, but feel free to apply whatever prioritization framework you’re currently using.

The goal is to end up with a relatively long list of experiment executions, prioritized based on things like:

Ease of implementation
Potential impact
No. of observations/insights supporting that concept
etc.

Once you’ve got this prioritized list together, you then need to build your roadmap.

This is the stage where you get to exert more intentional control over your balance of exploration vs. exploitation.

We would always recommend testing across all 5 Master Levers, but perhaps you want to give a slightly stronger weight to exploitation rather than exploration. In this case, for the first 20 experiments in your roadmap, you may choose to run 8 on the Master Lever with the most insights attached to it and only 1 or 2 to the Master Lever with the least.

Conversely, if you want to ensure that your initial exploration is as thorough as possible, you will need to make sure that your tests are fairly evenly diversified across all 5 Master Levers so as to gather as much information about their respective efficacies as possible.

Saying that, we do not recommend running tests without any supporting research. If one of your Master Levers has few or no insights attached to it, we would recommend shifting your attention to the other Master Levers.

CRO is often a balance between earning and learning; by intentionally weighing your balance of exploration to exploitation at this step, you can ensure that your roadmap aligns as closely with your goals as possible.

4. Experiment

Once you’ve decided which tests you want to run, the next thing to do is actually run them!

5. Iterate

You may think that once you’ve run your initial tests, you’re ready to start folding losers and doubling down on winners to drive value now.

For winning tests, this is more or less how it works. When you find an effective lever, our recommendation is that you exploit that lever relentlessly, for as long as it continues to deliver value.

With one of our clients, we’ve run 46 iterations on a single lever – and it still delivers results to this day!

For losing experiments, on the other hand, there are some additional considerations worth factoring in. As a pretty reliable rule of thumb, a test will lose for one of two reasons:

Execution – the lever you selected may actually be effective, but the execution you chose may be poor.
Lever – the specific levers you’ve tested are themselves ineffective. This reason breaks down into two further types:
- The specific Master Levers and Levers you’ve tested on are ineffective.
- The specific Sub Lever you’ve chosen to test is ineffective, but the general formulation of your problem re: Master Levers and Levers is correct.

We’ve previously written in detail about our process for diagnosing the cause of a test’s loss, as well as how to iterate on this type of result. This blog post here is already pretty long, so we won’t go into this again now, but if you’d like to read about this subject in detail, click here.

One important thing to keep in mind is that losing experiments are not ‘the end of the line’ for a lever. In fact, they often tell you a lot more than inconclusive experiments – and therefore provide direction for future testing.

Ultimately, while no single non-winning experiment is sufficient to rule out a lever’s importance, losing experiments have the advantage of telling you something more: that what you are doing at least matters to users.

This suggests that a better message or change relevant to that lever might well intervene in a way that makes a positive rather than negative difference. Equally, simply doing less of – or the opposite of – what had the negative effect might be most effective.

Final thoughts

In our experience, a sub-optimal balance between exploration and exploitation is the cause of 9 out of 10 performance plateaus. In this blog, we shared the approach we’ve been using to help our clients successfully navigate the explore-exploit tradeoff and begin once again driving revenue with CRO/experimentation.

As will be clear by now, much of this approach is driven by our Levers Framework, so if you’re keen to put the method laid out here into practice, we’d recommend that you download our recent white paper. This should hopefully give you everything you need to get started.

And in the meantime, if you’ve got any further questions about how any of this works, feel free to drop us a line – we’re passionate about experimentation and are always keen to share our expertise where we can!

The post Exploration vs. Exploitation: how to balance short-term results with long-term impact appeared first on Conversion.

Confidence AI: the next generation of A/B test prioritization

Sim Lenz — Wed, 21 Feb 2024 18:43:21 +0000

Most experimentation teams have far more test ideas than they could ever conceivably launch.

This creates a problem:

How do we decide, in as objective a manner as possible, which tests to prioritize in order to maximize a program’s overall impact?

Historically, there have been a number of prioritization frameworks developed to solve this problem. Unfortunately, all of them have fallen short in a number of fairly big ways.

Now, with the help of artificial intelligence, we think we’ve finally found a close-to-optimal solution.

In this article we’re going to briefly discuss the shortcomings of past prioritization approaches, before sharing an overview of our new prioritization tool – Confidence AI – and how it overcomes them.

And for those of you who are skeptical, here’s a fact to keep you reading:

Confidence AI is able to predict the results of winning a/b tests with 57% accuracy.* Based on standard industry win rates, this suggests it is several times more accurate than the average experimentation practitioner.

Contents:

Prioritization tools of the past
What is Confidence AI and how does it work?
How accurate is Confidence AI?
How we actually use Confidence AI: beyond win rate
Final thoughts: Confidence AI’s limitations

*Concepts with a confidence score of 50% and above are classed as predictions of a winner. Those with a confidence score of under 50% are classed as predictions of losers. For more on how confidence scores work, see the What is Confidence AI and how does it work? section of this page.

1. Prioritization tools of the past

Prior to now, all leading prioritization tools have fallen short in at least one of two ways:

Subjectivity – at its core, our goal as experimenters is to use objective data to make better decisions. Most prioritization tools rely heavily on human interpretation, which introduces an undesired element of subjectivity into the experimentation process.
One size fits all – every page, every website, every company, and every industry is different. To prioritize our tests appropriately, we need to evaluate test ideas using flexible, data-backed criteria that adapt to the unique context within which each test is being conducted (more on this shortly). This is a big ask, and unsurprisingly, every prioritization tool up until now falls short in this respect too.

To illustrate these points, consider two of our industry’s favorite prioritization frameworks: ICE and PXL.

(If you’re not interested in a comparison of different prioritization methods, feel free to skip ahead!)

Growthhacker’s ICE framework prioritizes a/b test concepts based on three factors:

Impact – if this works, how big will its impact be?
Confidence – how confident are we that this will work?
Ease – how easy will this be to implement?

In essence, you score each concept out of ten for each of these three factors, and you then add up the scores and divide them by three to give an average out of ten. Concepts with the highest scores are prioritized.

As will be evident from this brief explanation, this method – while undeniably useful – is extremely flawed. After all, there’s no reliable way of generating an estimate of a test’s ‘impact’ or ‘confidence’ short of actually running the test itself. Any estimate we do make is therefore bound to be a very crude guess.

Experimentation is supposed to be about eliminating gut-feel and intuition where at all possible, and yet this prioritization framework is literally built on these things.

CXL’s PXL framework attempts to minimize subjectivity by creating a set of criteria that is believed to predict the impact of any given experiment. These criteria are weighted based on expected importance, and the scores for each individual criterion are then summed to give an overall prioritization score. As with ICE, experiments with higher prioritization scores are prioritized.

This approach has a number of advantages over ICE:

Many of the questions involve factual yes or no answers, e.g. whether or not the test is above the fold, whether or not it involves removing or adding an element, etc. The less ambiguity involved in the scoring, the less room there is for subjectivity to creep in.
Many of the questions can be answered using empirical data (rather than intuition), e.g. whether or not the change is noticeable within 5 seconds can be answered using a simple usability test. Again, this minimizes subjectivity.

But while PXL does a laudable job of side-stepping the subjectivity objection, it falls quite badly afoul of the one-size-fits-all objection.

To see why, consider the first two criteria in the PXL framework:

‘Above the fold?’
‘Noticeable within 5 sec?’

On a homepage experiment, there’s every possibility that each of these criteria will be strongly correlated with the impact of the test. With a product page experiment, on the other hand, users are often prepared to delve deep beneath the fold, so the importance of these criteria is likely to be much less pronounced.

In actual fact, based on our own internal analysis, we’ve found that experiments beneath the fold have a similar – sometimes even higher – win-rate than those above it (see graph below).

On mobile, above the fold tests have a win rate of 31% vs. 41% for below the fold. On desktop, it is 37% vs. 36%. Sample size: 505 a/b tests.

All this to say, this one-size-fits-all approach means that certain kinds of tests will be prioritized ahead of others for no other reason than that the set of criteria being used is biased in their favor.

Like ICE, this framework is undeniably useful, but also like ICE, it falls far short of the ideal.

So, this begs a question:

What is the ideal?

Well, in our view, an ideal prioritization tool would:

Be as close to completely objective as possible;
Produce flexible criteria for each test concept;
Use past results from each individual program to inform prioritization criteria and scores;
Require no additional work on top of what you already do;
Be completely dynamic, so that concepts are continuously reprioritized when new results and insights are uncovered. (This is another big issue with current prioritization approaches: prioritization only happens once at the beginning of a quarter or strategy. That means the backlog is a freeze frame of what the priority was at the time of creation, rather than being continuously updated based on the latest learnings from the experiments that are being run.)

This is, we realize, a huge ask.

Up until recently, it has been far beyond the capabilities of any prioritization tool in existence.

With Confidence AI, however, that’s all changed…

2. What is Confidence AI and how does it work?

Confidence AI is a machine learning model that we’ve developed to predict the results of a/b tests. By embedding Confidence AI into our prioritization approach, we’re able to almost completely remove both of the shortcomings of traditional prioritization methods identified above (subjectivity and one-size-fits-all-ness).

Here’s an outline of how it works:

As some of you may know, here at Conversion we store extensive data on every experiment we run. This includes data about the client, the client’s industry, the page, the levers, the change type, the psychological principle, and more.

We’ve been in business for over 15 years, and we’ve worked with over 200 clients in over 30 industries. This means that we now have a huge experiment database, consisting of over 20,000 experiments and hundreds of thousands of data points.

Having trained Confidence AI on this vast dataset, we can now input the parameters of tests we plan to run into the model and it will compute a confidence score based on how likely it predicts each experiment is to win.

Confidence AI is integrated within each client’s Experiment OS, where it prioritizes their experiment backlog based on the confidence score of each test concept. Concepts with a high confidence score are pushed to the top of the priority list and those with a low confidence score are pushed to the bottom.

Screenshot of Experiment OS

What’s more, as we run more tests and gather more data, Confidence AI dynamically updates the confidence scores of experiments in the backlog to reflect new learnings from each client’s experiments as they come in.

3. How accurate is Confidence AI?

This may all sound very impressive in theory, but in practice, Confidence AI is really only as good as the predictions it produces – so, the question on everyone’s mind:

Does Confidence AI actually work?

Here’s the data:

10 months ago we rolled out Confidence AI across our entire consulting team. What this means, in practice, is that each time our consultants develop a new concept for one of our clients, the concept is fed into Confidence AI.

Confidence AI then takes this concept and computes a confidence score out of 100 based on how likely it believes an experiment is to win.

We grouped these confidence scores into 3 categories – low confidence (0-33), medium confidence (33-66), and high confidence (66-100) – and then looked at the actual average win rates for each of these categories.

Here are the results:

Low confidence (0-33) – 223 completed experiments – win rate of 27%.
Medium confidence (33-66) – 120 completed experiments – win rate of 43%.
High confidence (66-100) – 49 completed experiments – win rate of 63%.

Confidence AI prediction data

Given that elite experimentation organizations like Microsoft, Airbnb, Google, and Booking.com report win rates in the range of 8-30%* – and if we assume that every experiment is conducted under the assumption that it is going to win – then it seems that Confidence AI is massively outperforming the average practitioner in terms of its ability to predict winners.

*There may be a little bit of noise here. There are a number of reasons that the win-rates of these tech giants are often so low. To name two: 1) no-brainers are often implemented without testing; 2) website assets are often already very well-optimized, making wins harder to come by. Nonetheless, Confidence AI’s accuracy is significantly higher than any industry win-rate we’ve ever come across – including our own! – which gives us good reason to be confident in its utility.

4. How we actually use Confidence AI: beyond win rate

The eagle-eyed amongst you may have spotted what seems to be a bit of a discrepancy in the last section:

If Confidence AI is supposedly prioritizing experiments with the highest confidence scores, why did we run so few high confidence tests during the trial period (49 high confidence, 120 medium confidence, and 230 low confidence)?

This raises an important point: when deciding which experiments to run, there are many factors that come into play in addition to likelihood of winning. On a basic level, we also need to consider things like build size and dependencies, which will determine how resource expensive a test is likely to be.

For example, if we have a high confidence test concept that is going to take a month to build and that also requires signoff from our client’s entire board of directors, we might choose to prioritize a lower confidence test with fewer dependencies and a quicker build.

This approach allows us to move at speed and gather learnings as we go – learnings that feed back into Confidence AI and that generate more accurate predictions.

But going beyond considerations about resource constraints and velocity, it’s also worth making a more general point: experimentation is about more than exploiting low hanging fruit (as identified by Confidence AI); it’s about exploring the full landscape of potential solutions to help our clients find the overall optimal solution to their problem – the global maxima.

Our goal is to help our clients find their global maxima.

Confidence AI is a tool – an extremely powerful one – in our consultants’ toolkit. It provides them with a high-fidelity picture of the risk-reward landscape that they are operating within, which means they can – if they choose to – aggressively exploit high confidence tests that are likely to deliver strong short-term ROI for our clients.

But our goal as an agency is to maximize long-term – not just short-term – ROI for our clients.

In our experience, safe, high-confidence tests are great for generating incremental uplifts, but the most value tends to come when we use experimentation to help our clients take bold risks with a safety net.

This is a big part of the reason that we’re so often able to help our clients move beyond the plateaus on which they were previously stuck and continue their ascent towards their respective global maximas.

Saying that, we of course allow each of our clients to set the agenda for their own programs. If our client asks us to use Confidence AI to drive as much short-term value as possible, this is absolutely what we will do. But more often than not, our clients understand the long-term value that experimentation can bring to their entire business, so the emphasis of most programs is on measured exploration as much as it is on exploitation.

Circling back to the question asked at the start of this section, then: the primary reason that we often run fewer high-confidence tests than might be expected is that we often choose to explore uncharted territory and help our clients discover the high-risk, huge-reward solutions that have the power to revolutionize their businesses.

5. Final thoughts: silver bullet or something else?

Confidence AI is an incredibly exciting development that has significantly improved our ability to prioritize promising avenues of experimentation.

What’s more, the model is still in its infancy. This is the first iteration of Confidence AI, and we have every reason to believe that as we gather more data and roll out further iterations, the model will get more accurate over time.

But it’s important to emphasize that Confidence AI is not a silver bullet.

Ultimately, it’s only as effective as the experiment concepts themselves. If supporting research is poor or if experiment executions are weak, then Confidence AI’s predictive power goes way down.

Equally important, while Confidence AI may be able to give us a strong indication about how to maximize the short-term impact of a program, ascending to a program’s global maxima requires trialling bold, innovative ideas – the kind of ideas that a tool like Confidence AI is not particularly well equipped to evaluate.

All this to say, maybe one day Confidence AI will be able to predict every kind of experiment result with unerring accuracy. Until then, it is simply a powerful tool that is as effective – or as ineffective – as the practitioner wielding it.

The post Confidence AI: the next generation of A/B test prioritization appeared first on Conversion.

5 experimentation industry predictions for 2024

Frazer Mawson — Tue, 30 Jan 2024 08:27:34 +0000

As we begin rolling out our plans for 2024, we’ve been doing a lot of thinking about what’s likely to come next for the CRO/experimentation industry.

Will AI finally make true 1-to-1 personalization possible?

How will economic pressures affect accountability and decision making?

How is new legislation likely to impact our space?

In this post, we share five of our predictions for the industry in 2024.

1. Increased budget scrutiny will lead to more experiment-backed decision making

In today’s struggling global economy, many businesses are looking to make cuts and minimize waste. Such cost cutting means that budgets are under huge scrutiny, which is likely to raise the evidence threshold for certain types of budget-related decisions.

For example, should we invest in branding or performance marketing? Should we hold our ad spend steady during the downturn or should we tighten our belts while CACs are high? Which pricing model works best for our business?

No longer will the gut feelings of HIPPOs – or herds of HIPPOs – be sufficient to answer these questions. Instead, we believe that business leaders are going to seek out high-quality, reliable evidence to guide their decisions during these challenging times…and what evidence is higher quality and more reliable than experiment data (randomised controlled trials?

Hierarchy of evidence – experiment data (randomised controlled trials) provides the most reliable data available to any business leader.

We therefore believe – and are hopeful – that economic pressures are going to lead to a rise in experiment-backed decision making.

2. More budget scrutiny will force a more honest accounting of program results (decay effects, holdout groups, etc.)

Tying in with the point above, we’ve recently found that many of our clients are becoming increasingly keen to demonstrate the value of their experimentation programs beyond any shadow of a doubt.

In such instances, sharing experiment results and running through the usual statistics is not enough. This type of data tells us how each treatment fared against its control, but it does not tell us how these uplifts evolve over time nor how each individual experiment works in conjunction with others.

In a bid to provide data-backed answers to these questions for our clients, we’ve recently run a number of holdouts and even a master experiment (combining all winners into a single experiment and running against the original site) to help our clients demonstrate conclusively that their program is driving significant value for their business. These examples are maybe on the extreme end of the spectrum – long-term holdouts can be quite impractical and master experiments are often fairly labor intensive – but they serve to illustrate a growing desire from stakeholders for more honest accounting of program results.

We believe that throughout 2024 this trend is likely to continue. Budgets are tight and business leaders understandably want to make absolutely certain that their experimentation spend is yielding the results it was assigned to achieve.

3. New legislation to cause a surge in subscription experimentation

As things currently stand, there is a new bill moving through the UK’s House of Lords that is set to have a significant impact on digital markets.

The implications of this bill are quite far reaching, but one particularly significant point (for our clients, at least) relates to subscription brands and the rules surrounding sign-up, renewal, free trials, and more. Specifically, should the bill go through, subscription businesses will be required to:

provide an easy way of canceling online
be transparent with costs
send renewal notices prior to payments being made

While some subscription brands already meet all of these requirements, many do not.

We therefore believe that the bill is likely to stimulate new innovation and experimentation within this space, encouraging subscription brands to develop and optimize new user journeys and policies that keep them on the right side of these new laws.

4. A (partial) return to client-side testing

In 2023, many experimentation organizations made the switch to server-side testing. This came with all kinds of benefits, but it also had its drawbacks.

One particular drawback relates to the increased development time associated with server side testing. Extended development times ultimately mean that testing velocity takes a hit, which in turn reduces an organization’s ability to learn and optimize at speed.

In 2024, we predict that many teams that made a complete switch to server-side testing will begin to shift some percentage of their experiments back to client-side.

In our view, both types of testing have their place in any testing program; the best teams will find a way to get the best out of both.

5. AI will change everything…but maybe not this year

We, like most people, are excited and optimistic (and maybe a little scared!) by the possibilities presented by AI.

In fact, as an agency, we’ve already started putting AI to good use by developing our own machine learning assisted prioritization tool, ConfidenceAI, which is as of a couple of months ago predicting the results of experiments with significantly higher accuracy than the average practitioner.

Screenshot of Linkedin post by our DIrector of Experimentation, Sim Lenz – full post here

While we really do believe that AI has the potential to change the game for our industry, we’re not 100% sure that this is all going to happen in 2024.

On the optimistic side of things, we believe that at some point AI may finally make true, 1-to-1 personalization a reality. After all, if AI becomes sufficiently sophisticated to understand individual user behavior and how best to influence it, why shouldn’t we be able to create completely personalized user experiences that respond to the unique needs and psychologies of each and every user – at scale?

Saying this, we’re well aware of the fact that most new technologies fall prey to the Gartner Hype Cycle (see below). While it’s perfectly conceivable that AI may be the exception to this rule, we don’t want to get carried away just yet.

Gartner Hype Cycle

Needless to say, we’ll be keeping a close eye on the AI space in the hope of finding new ways to use this technology to deliver more value for our clients’ programs than ever before.

Final thoughts

If last year is anything to go by, the best any of us can do in 2024 is hope for the best, prepare for the worst, and watch in awe as the world’s leading tech companies tear up the rule book once again.

As something of a tech-adjacent space, we believe the experimentation industry has a lot to benefit from incorporating the latest technologies – particularly AI – into the ways we work. The onus is therefore on all of us to keep abreast of these developments and look for opportunities to apply them to our programs to drive more value than ever before.

This would be true even at the best of times, but in an economic downturn, when the need to find a competitive edge is greater than ever, this becomes absolutely imperative.

If you’d like to see how we as an agency are incorporating AI into our processes and approach – plus, how other innovative teams are doing the same – subscribe to our newsletter and we’ll do our best to keep you in the know.

The post 5 experimentation industry predictions for 2024 appeared first on Conversion.

Spotlight: Kevin Turchyn on how Whirlpool Corporation relentlessly tests key assumptions for breakthrough results

James Flory — Wed, 24 Jan 2024 21:50:38 +0000

Whirlpool Corporation is a world leader in the appliance industry, bringing its users a cutting edge digital experience with the help of a robust experimentation program. We sat down with Kevin Turchyn, Senior Manager – Digital Products to discuss his career growth at Whirlpool and how experimentation has played a critical role in his success at the organization by relentlessly testing key assumptions for breakthrough results.

Note: if you’re interested in hearing more about how experimentation works at Whirlpool, sign up to our upcoming Open Book Session on the 15th of Feb, where Kevin will be chatting with our very own James Flory about how he and the Whirlpool team have built and scaled a high-impact testing program.

You’ve been at Whirlpool for over 13 years, how has your understanding of the relationship between successfully developing digital products and experimentation grown over that time?

I’ve been fortunate enough to have split my career here between B2B and D2C. Through this, I’ve seen the differences in access to information with which to make confident digital product decisions. In contrast to our over 110 years of experience and collaboration with our B2B customers that supports building successful physical and digital products, D2C as a whole is relatively new for us and comes with less accessible clarity around the needs of users. This has elevated the importance of experimentation in supporting product discovery and decision making with confidence.

What are some standout ways you have integrated experimentation into your decision making at Whirlpool? Are there any key-takeaways you’d like to highlight to our readers?

We’ve integrated experimentation into our Digital Product organization’s decision making in two main ways. The first is to use User Experience Research (UXR) to help generate and prioritize the problems or opportunities that are most valuable to solve. As an outcome-oriented group, this has helped us better understand the opportunity spaces we have with our users that match well with our business goals.

The second would be to leverage A/B testing to test our assumptions and assess the user and business value of proposed solutions to those problems or opportunities that were identified and prioritized with the help of UXR.

This mixed-methods approach helps us build empathy for our users, avoid unconscious biases, and scientifically understand the value of our proposed solutions before committing them to development. This method is the decision science behind many of our best decisions.

How has experimentation helped with making key business decisions at Whirlpool? Any standout decisions you can highlight?

Even in situations where we are working with experienced professionals with deep industry insight, being able to offer an assumption check has proven to be invaluable to de-risking our solutions. Our experimentation program has confirmed new user experiences that we had strong gut feelings about but also disproved assumptions we held about some “no-brainers”. In the webinar James Flory and I host, I’ll get into more detail about a few specific assumptions that, when tested, generated breakthrough results and another that prevented significant revenue loss.

What are some of the biggest challenges and obstacles in turning experimentation results into business decisions? How have you overcome them?

I’ll address one of the largest issues that I know many experimenters are familiar with, the challenge of winning experiments not being implemented or hard-won insight simply not being acted upon. We’ve been minimizing this challenge in two ways. The first is clear stakeholder sponsorship (or at least ride-along) of most experiments. When we are collaborating closely with the business in discovery, rather than running on a bit of a tangent of our own, the goals are aligned and the results are co-created. That makes acceptance and implementation of the experiment or insight much easier. The second is that our experimentation backlog, roadmap, live tests, and results are shared broadly and regularly in the light of day. Transparency builds our collective understanding and helps us all stay accountable to our broader partners in achieving our goals. This combination of stakeholder involvement and transparency has helped us make sure that good experiment findings make it from the lab to the user.

How have you helped foster a culture of experimentation at Whirlpool? How can teams at other companies learn from the experimentation engine you’ve helped build at the organization?

We’ve had success with our Digital Product organization exercising three principles that have supported a healthy experimentation culture in our organization. These rose naturally over time but we’ve also made conscious efforts to practice them once we knew the impact they had on our experimentation culture.

The first principle is exercised with respect to our stakeholders; “We love your goals”. Through seeing my team use this statement, I’ve witnessed it build a collective outcome-orientation. That phrase also deliberately ends as quoted because this leaves the problem, opportunity, and solution space open to collaboratively use experimentation to discover value with a solution that works for users and our business.

The second is that we are committed to making evidence-based decisions. Here, experimentation is fully embedded in our Digital Products group, so our extended team is in constant contact with business stakeholders who are looking for or even recommending specific solutions. When the entire product group understands how to leverage experimentation in pursuit of an outcome, evidence finding becomes accessible, and the value provided to stakeholders increases because there is commitment to shared goals (see first principle) and evidence-based ways to get there are prioritized.

The third is that we share our work. As I alluded to earlier, our entire program is proactively transparent. We’ve made it a mission to level-up our organization’s knowledge and capability through experimentation. So we share our work in a couple core ways. The first way is through an internal Google Space. This simple space started with a small Canadian following but has grown to an enthusiastic mass throughout Whirlpool globally. This group gets a standardized, predictable, and detailed share-out of what we are doing and learning. I’ve never seen so many and reactions used to what we publish there. This commitment to sharing has built a strong grass-roots following that continues to send evidence-seekers around Whirlpool’s world our way. We’ve also made sure to get in front of leadership to demonstrate how we can use experimentation to achieve our goals, share select highlights, and detail a clear ROI. Between the grassroots enthusiasm our Google Space has built and the leadership support, we’ve seen experimentation go from sub-culture to cultural norm.

Kevin Turchyn is a Senior Manager – Digital Products at Whirlpool Corporation, where he leads the product group and progresses the entire digital experience of the company. To sign up for his experimentation deep dive on the 15th Feb with our VP of Delivery, James Flory, sign up here.

And for those of you who interested in learning about what Mixed Method Experimentation is and how it helped Whirlpool imagine a new interstitial experience, read Part 1 of our case study!

The post Spotlight: Kevin Turchyn on how Whirlpool Corporation relentlessly tests key assumptions for breakthrough results appeared first on Conversion.

Conversion gets certified as a Great Place to Work®

Saloni Kumar — Wed, 20 Dec 2023 16:49:32 +0000

Conversion gets certified as a Great Place to Work®, demonstrating commitment to team well-being and innovation.

Conversion, a leading organization specializing in evidence-based decisions through A/B testing, UX research, and personalization, proudly announces its certification as a Great Place to Work®. At Conversion, we firmly believe that our people are the driving force behind our success. Our unique culture serves as the fuel that propels us toward achieving excellence for our clients. This certification emphasizes our continued commitment to fostering an environment built on shared values, equality, inclusion, empowerment, and respect.

Our core values – GRIT, REAL, MAVERICK, CURIOSITY, and INTEGRITY – form the foundation of our organizational culture. These principles guide our daily practices, behaviors, and attitudes, creating a workplace that encourages creativity, innovation, and collaboration.

“We are thrilled to be recognized as a Great Place to Work®. Our commitment to excellence extends beyond our client relationships; it encompasses our team members and the environment we work in every day. The Great Place to Work assessment process was a great opportunity to measure our workplace culture, understand our strengths, and identify areas where we can further enhance our work environment. It’s a chance for us to shine a light on what makes us an outstanding agency, both to our clients and our team members.”

— Mike St. Laurent, Managing Director at Conversion, North America.

As we celebrate this achievement, Conversion remains committed to maintaining an inclusive and supportive workplace that encourages continuous improvement and excellence. The certification not only validates our existing efforts but also inspires us to explore new ways to enhance team member satisfaction and engagement.

“It feels surreal to be certified as a Great Place to Work®. It’s the result of our teams’ dedication to our values that have been instrumental in creating a workplace where everyone feels supported, inspired, and empowered to turn creativity into innovation, benefitting both our clients and each other.”

— Victoria Petriw, COO at Conversion, North America.

About Conversion

Conversion is a leading organization specializing in evidence-based decisions through A/B testing, UX research, and personalization. Our unique culture, built on the values of GRIT, REAL, MAVERICK, CURIOSITY, and INTEGRITY, is at the core of our commitment to creating an environment focused on equality, inclusion, empowerment, and respect. We are dedicated to turning creativity into innovation to serve our clients and foster a workplace where everyone can thrive.

The post Conversion gets certified as a Great Place to Work® appeared first on Conversion.

Is Goodyear’s UX as effective as its tires?

James Flory — Mon, 04 Dec 2023 18:24:35 +0000

Introduction

Goodyear is one of the world’s largest tire companies – as part of our Levers Framework teardown series, we ran a heuristic analysis on their ‘Roll’ service page to highlight the power of the framework in identifying strengths – and weaknesses in a user experience.

As part of our Levers Framework White Paper release, we wanted to show you how powerful the framework can be for diagnosing and fixing conversion issues.

Methodology: The Levers Framework

The Levers Framework: Click to enlarge

In this analysis, we aimed to identify the most important Master Levers that we believed could be used to positively impact conversions on this page.

Additionally, at the end of the analysis, we provide the outline of three experiment concepts that Goodyear could run if they wanted to optimize the page further.

Applying the Levers Framework

We’re going to move through Goodyear’s service page one lever at a time. Whenever we introduce a new Master Lever, we’ll give a quick rundown of what that specific lever entails, plus common user questions associated with it.

Lever: Usability

Unsurprisingly, the Usability Master Lever is all about how easily users can progress through the website, from arrival on the site through to fulfilling their desired goal, i.e. converting.

Common user questions around Usability

Do I know where I am, and what I have to do next?
Does this look like it’s going to take a lot of effort?
Do I feel like persisting with the difficult parts?
Are my product options arranged in a findable and easily understood way?
Is my attention being focused on genuinely useful things?

Usability: User flow

The first usability issue we identified relates to the lack of a clear CTA above the fold.

While there admittedly is a ‘Find Tires’ CTA at the top of the page, the styling of this CTA is very discreet, meaning many users may miss it as they work through the page’s content.

Other than this, the user will need to scroll beneath the fold before they encounter a prominent CTA.

Usability: User flow

We also found that there was no clear, consistent CTA styling across the page. This lack of consistency is likely to make parsing the contents of the page more difficult for users, which could result in fewer users entering the sign-up flow. While there admittedly is a ‘Find Tires’ CTA at the top of the page, the styling of this CTA is very discreet, meaning many users may miss it as they work through the page’s content.

Another issue relating to this lack of consistency was that many of these CTAs were presented so discreetly that users are unlikely to see them, e.g. ‘Find tires for your vehicle’.

Usability: Attention

Eye-tracking studies have shown that if a photo includes a person staring directly at the user, the user’s attention may be drawn away from other content on the page and towards the person in the photo

Goodyear includes an image of a tire installer staring straight at the camera. This may distract users away from some of the more important motivational content.

Lever: Comprehension

Comprehension is about how well a website explains the information about the company, product, and industry that might help a user feel comfortable in coverting.

Common user questions around Comprehension

Do I understand enough about this industry and type of product to feel comfortable purchasing this service/product at all?
Do I understand everything I need to know about this product and company to convert on this site?
Do I understand everything I need to about the transaction I’m agreeing to, to convert?

Comprehension: Product Understanding

We believe some of the messaging on the page is slightly conflicting and may cause confusion for some users.

Specifically, the ‘Find Tires’ CTA seems to suggest that users will be given an opportunity to browse tires, whereas the ‘We Come to You’ heading is geared more towards mobile installation and booking an appointment. We would recommend unifying the messaging on the page so that users can more easily comprehend exactly what is being offered.

Lever: Trust

In our framework, the Master Lever Trust is about assessments of risk that a user makes when interacting with a website. Depending on the severity of the trust question, this may relate to several categories of problem.

Common user questions around Comprehension

Is this a legitimate website? (or a scam?)
Do I believe their claims about the quality of their product/service? (or are they likely exaggerating?
Is there proper protection of sensitive information? (How comfortable do I feel entering confidential information on this site, even if it is a real company?)

Trust: Credibility

Towards the bottom of the page, Goodyear has included a testimonials carousel, which is likely to help establish social proof and build the brand’s credibility. However, we believe this execution could be improved by presenting additional details about these users. For example, could Goodyear share info about where these users are from, what vehicle they drive, etc.

This will make the testimonials feel more real, while allowing current users a chance to relate more closely with the content.

Lever: Motivation

Motivation is the broadest category of change in our model. It is concerned with the ‘upside’ of the product or service. Fundamentally, it is asking “What’s in it for me, and or the person for whom I’m purchasing?”

Common user questions around Motivation

Do I feel inspired and excited by the benefits of this product?
Do I feel a sense of obligation to convert?
Do I feel a sense of urgency to convert?
Does this product/service give me access to an imagined community?
Is there a way to try it out?

Motivation: Value statement

Goodyear does a very good job of emphasizing two of the service’s key benefits: convenience and timing.

However, another important benefit of the product – one which will be a decisive factor for many users – is it’s price. The service comes at an extremely competitive price, but there is no mention of this anywhere on the page.

In fact, it’s not until you reach the end of a multi-step tire selection funnel that you finally encounter information about the service’s price. In our view, this is a significant missed opportunity.

Recommended Experiments

1. Sub Lever: Userflow (Lever: Usability)

Without a clear CTA above the fold, users may struggle to enter the signup flow. We would therefore recommend testing moving the secondary ‘Find tires’ CTA above the fold.

2. Sub Lever: Userflow (Lever: Usability)

The sticky ‘Find Tires’ CTA at the top of the page (across the entire site) is extremely discreet and likely to be missed by many users. We would recommend testing restyling this CTA to increase its prominence.

2. Sub Lever: Value Proposition (Lever: Motivation)

We believe the service’s value proposition could be made significantly more impactful by highlighting the service’s price alongside its convenience. We would therefore recommend testing a newly formulated value prop that also emphasizes price.

Conclusion

The Levers Framework is an incredibly powerful tool in any optimizers arsenal for diagnosing and fixing conversion issues. As part of this teardown series, we will be providing an in-depth analysis on pages for some of the most popular global brands. Check out our previous teardown on McDonald’s homepage.

Also, if you would like to learn more about different proven use cases for the Levers Framework, like how it can help you compare user experiences between brands, check out our newest Subscription Benchmarking Report

The post Is Goodyear’s UX as effective as its tires? appeared first on Conversion.

Does McDonald’s homepage McDeliver?

James Flory — Fri, 24 Nov 2023 21:15:52 +0000

Introduction

We use our Levers™ Framework to perform a heuristic analysis on the fast food giant’s US homepage.

As part of our Levers™ Framework White Paper release, we wanted to show you how powerful the framework can be for diagnosing and fixing conversion issues.

McDonald’s is one of the world’s most beloved food brands – so we decided to run a heuristic analysis on their US homepage, using the framework as a lens through which possible improvements make themselves known.

Methodology: The Levers™ Framework

The Levers™ Framework: Click to enlarge

In this analysis, we aim to identify the top 3 Master Levers that we believe could be used to positively impact conversions on this page.

Additionally, at the end provide the outline of 3 experiment concepts that McDonald’s might want to test in order to optimize the page further.

Worth noting:

Almost every single call to action on this page is geared towards persuading users to download the McDonalds app, so we’ve provided our recommendations with that in mind.

Overview of Levers used in this analysis

Lever: Comprehension

Comprehension is about how well a website explains the information about the company, product, and industry that might help a user feel comfortable in coverting.

Common user questions around Comprehension

Do I understand enough about this industry and type of product to feel comfortable purchasing this service/product at all?
Do I understand everything I need to know about this product and company to convert on this site?
Do I understand everything I need to about the transaction I’m agreeing to, to convert?

Lever: Usability

Unsurprisingly, the Usability Master Lever is all about how easily users can progress through the website, from arrival on the site through to fulfilling their desired goal, i.e. converting.

Common user questions around Usability

Do I know where I am, and what I have to do next?
Does this look like it’s going to take a lot of effort?
Do I feel like persisting with the difficult parts?
Are my product options arranged in a findable and easily understood way?
Is my attention being focused on genuinely useful things?

Lever: Motivation

Common user questions around Motivation

Do I feel inspired and excited by the benefits of this product?
Do I feel a sense of obligation to convert?
Do I feel a sense of urgency to convert?
Does this product/service give me access to an imagined community?
Is there a way to try it out?

We’ll now move down McDonald’s homepage from top to bottom, using the three above Master Levers to analyze…

Applying the Levers

Lever: Comprehension

At the top of the homepage, McDonald’s is promoting its exciting new offer – but there’s a problem:
It’s not actually clear what’s being promoted.

Is this the promotion of a limited edition sauce?
Or a new menu?
Or of something else entirely?
Moreover, what’s in it for the user?
Do they receive a free meal?

It doesn’t actually say.

As an added obstacle to comprehension, the extremely long second sentence makes reading the copy particularly difficult.

Lever: Usability

In the second viewport, lots of page space has been dedicated to an extensive list of all the blockbuster hits that have featured McDonald’s meals in them over the years.

This shows the huge cultural influence McDonald’s has had – but the content as it is currently displayed is not at all engaging for a user, and it pushes lots of valuable content further down the page, reducing the likelihood that it will be seen.

Moreover, this is the entire second viewport, and yet there is no call to action at the end of it.

Lever: Motivation

In our view, this is a strongly done value statement. Explaining a unique and compelling benefit that users stand to gain from downloading the app.

This is likely to boost motivation and increase the number of users who are willingly clicking on the call to action.

Lever: Comprehension

McDelivery is an exciting new(ish) options for McDonald’s aficionados. The fact that ordering a McDelivery through the app will generate points is likely to persuade a distinct subset of users to download the app and start ordering their deliveries right away.

One downside, through, is that it’s not clear exactly how many points people will receive – nor what specifically they can redeem them for. We believe that adding specificity here will aid with comprehension and make the offer that much more compelling.

Lever: Usability

In our view, of all the specific information and promotions on offer, this is one of the most compelling – after all, who doesn’t want a free box of delicious fries?

Unfortunately, this offer is very close to the bottom of the page – where almost nobody is likely to see it.

Lever: Comprehension

This section is at the very bottom of the page, and in out view, it struggles from a lack of comprehensibility.

Firstly, the sole sentence in the body copy is long and windy, with numerous prepositional phrases lined up one after the other. This makes understanding its meaning quite fatiguing.

If you are able to parse the sentence, it is then unclear as to what deals are actually on offer. Is contactless Mobile Order & Pay the deal in question – or is this unrelated to the exclusive deals mentioned at the start?

Recommended Experiments

1. Sub Lever: Benefits (Lever: Motivation)

The free large fries offer is compelling – but its too far down the page, beneath other elements that are less likely to drive conversions. We would recommend testing moving this up into a high view port.

2. Sub Lever: Distraction (Lever: Usability)

In the second viewport, rather than simply listing all the movies in which McDonald’s meals have been featured, try a shorter ‘Top 5’ list that includes the specific meals that were featured in those movies – plus hyperlinks to their pages on the menu.

2. Sub Lever: Value Proposition (Lever: Motivation)

The ‘A-Listers Only’ value statement struggles through lack of clarity. Try sharpening this up by creating a clearer statement that clearly describes what the user stands to gain from your offer.

Conclusion

Also, if you would like to learn more about different proven use cases for the Levers™ Framework, like how it can help you compare user experiences between brands, check out our newest Subscription Benchmarking Report

The post Does McDonald’s homepage McDeliver? appeared first on Conversion.

A/B Testing During Black Friday Promotions

James Flory — Wed, 25 Oct 2023 21:04:55 +0000

Testing during the holidays – BJ Fogg Model

Black Friday, and its counterpart Cyber Monday, are some of the largest consumer transaction days for the American economy. Shoppers come out in droves to either complete their holiday shopping, upgrade large ticket appliances/electronics, or just take advantage of great deals and promotions. Those of us who have ventured to participate in Black Friday or Cyber Monday have experienced long lineups in every part of the retail process, and even worse: slow and buggy websites.

However, many of us are so motivated by the fear of missing out on a great deal that we are willing to put up with seemingly anything to check off our holiday shopping list! Let’s take a closer look at why this happens, and how companies can build a testing and shopping experience around it to better understand their shoppers and lift conversion rates throughout the year.

The BJ Fogg Model: Understanding consumer behaviors to test better

The BJ Fogg model can help companies understand persuasive design and user behavior. The three parts of this model are “Motivation” up the y axis, “Ability” across the x , and an action line curve in the model as illustrated below:

What the BFM tells us is that:

If you have low motivation to do something, but that thing is “easy to do”, there is a strong likelihood that you would do it
If something is ”hard to do”, but you are highly motivated to do it then you are also likely to put up with the difficulty and still do it

What we help our partners do with experimentation, is take the baseline point which sits somewhere in the space along the axis of the Fogg model, and move the desired action above the action line, it is here where the user is most likely to convert and complete their checkout journey. If the desired action remains below the action line, then the shopper is much less likely to convert.

How do we do this? We can think of the optimizations we make to a website as focused on improving a user’s motivation to convert, ability to convert, or both.

Hypotheses aimed at improving a user’s ability to convert may be focused around reducing steps in the funnel or process, improving product discoverability, or increasing ease of use on the site.

Alternatively, hypotheses aimed at improving a user’s motivation to convert may be focused around improving the value perception of the product, presence and depth of product information, or – you guessed it – sales (an urgency tactic).

Externality and the Bj Fogg Model: Capturing the Boost on Black Friday

Events like Black Friday are unique in that they cause a dramatic shake-up in both user motivation & ability, allowing an opportunity to convert individuals who may otherwise be below the action line.

We can see below the substantial shift in the model, with shoppers becoming highly motivated during the Black Friday period, however the ability to shop during this period also decreases. This can be because of long lineups at crowded stores, or websites experiencing an influx of traffic and rapidly fluctuating stock levels that makes them slow, unresponsive, or unreliable.

The focus here should be on increasing the shoppers ability to purchase and not their motivation, as their motivation is already at very high levels. We can see in the graphic below that moving a user along the ability axis is most likely to make the shopper convert, while making equivalent improvements in motivation will still likely not yield any conversion. We are experiencing diminished returns on motivation.

Black Friday and Cyber Monday can sometimes see an expected 173%-380% increase in sales. A massive increase no retailer can afford to ignore! As such, many retailers with an e-commerce presence will have developed plans for Black Friday far in advance, whether it be items they wish to feature or customized items stocked directly for the event.

As a result, users are flocking to shop online with a heightened motivation to purchase. They likely have a product (or a list of products) they’re looking to shop for and while price will be the key driver of their purchase decision, the ability to purchase will also play an important role.

Historically, some companies might be hesitant to test during Black Friday — the reason being they don’t want to “break something”, and consequently risk losing customers because of a sub-optimal process. Some companies will institute a “code freeze” on Black Friday/Cyber Monday, where no new code is to be added to a site at all for a period of 1-2 weeks around the actual date in an attempt to mitigate the risk of any unintentional website performance issues caused by new code releases.

With more mature experimentation programs, we have seen companies capitalize on Black Friday to run tests that help them gain critical new insights and maximize the impact of their site performance which pays dividends over the long term.

While it’s entirely reasonable to acknowledge the potential risk of testing through the Black Friday period as we have already mentioned, there also exists a large opportunity cost if choosing not to test during this time.

By not experimenting on this surge of traffic and conversions, you’re not only missing a high volume of users to test on, you’re also essentially making a bold statement that your website as it exists today is the most optimal and performant version of your site – which as optimizers we know is never the case!

While there is a risk of running a test and sending traffic to a sub optimal Variation A or Variation B, there is also the risk that your control IS the sub optimal experience and you’re missing out on the big gains throughout the year by not capitalizing on the opportunity of Black Friday testing.

So how do we mitigate the risk? Companies can focus on testing methodologies like a Multi Armed Bandit (MAB) through this time period. A MAB automatically diverts traffic to the better performing variation in your experiment. You gain fewer insights, because traffic is automatically diverted to the best performing variation, however revenue generation is maximized and risk is mitigated. If a variation (or your control!) is underperforming, the algorithm figures it out and reduces the traffic exposed to that version during the test period. Sounds pretty optimal for a critical time like Black Friday doesn’t it?

The MAB strategy may be the better option for a momentary period of high externality like Black Friday, as the question still remains how representative shopper behavior is during this time. As we demonstrated above, users are likely far more motivated in this period than on average. As a result, the tactics and variations that perform best during Black Friday may not be the best tactics during normal periods. For this reason, you may choose to revalidate some winners from MAB testing in a different period after the holidays to uncover even more insight about the differences in your shopper behaviors during these periods. The difference in performance post-externality could be a big unlock for what you do next year.

If you choose not to test during the Black Friday/Cyber Monday shopping period, you can still take this time to learn about users through qualitative research. Interviews can be conducted at a later date with some of the shoppers that chose to visit your site during the Black Friday/Cyber Monday period for further external insights.

Customer Insights: Introducing new Shoppers to your Brand

Customer insights can help you develop future tests and even help inform brand strategy. Many shoppers will be newly introduced to your site during the uptick in visits during Black Friday and Cyber Monday. Conducting experiments and testing during that time period can help you gain new insights about these users with the potential of turning them into repeat customers.

Ready to learn more about how Conversion can help you with your CRO efforts, including testing during the holidays? Get in touch with our team today.

The post A/B Testing During Black Friday Promotions appeared first on Conversion.

SEO and CRO: A symbiotic relationship for growth

James Flory and James Vatter — Wed, 20 Sep 2023 23:37:22 +0000

Contents:

SEO and User Intent
Mixed Methods Experimentation: CRO and UX Research
CRO and SEO
Concerns with SEO and CRO – What do we need to be aware of?
So how should SEO and CRO work together?

Conversion Rate Optimization (CRO) and Search Engine Optimization (SEO) are two of the most powerful tools in a marketer’s arsenal.

Search Engine Optimization is about understanding how search engines index and rank content. Using this understanding, SEO’s aim to maximize the visibility of website’s or other digital assets in search engines

CRO is the process of testing two or more versions of a webpage against each other to determine which one performs better. This involves changing elements such as design, layout, copy, and images to see which version results in more conversions. This process helps businesses to make data-driven decisions about website design and user experience.

At a high level, how we like to look at it is that SEO is one of the ways to “get attention” driving users to your page and, and CRO happens once you have someone’s attention and want to remove their barriers and increase their motivation to take an action.

As enterprises continue to expand their online presence, they must stay ahead of the competition. This is where CRO, SEO, and UX Research come into play. These practices work hand in hand to improve the online presence of a company, leading to increased traffic, conversions, and ultimately, revenue.

SEO and User Intent

In SEO, efforts are focused on optimizing search results to reflect user intent. User intent refers to the reason behind a search query, and it can be classified into four categories: navigational, informational, commercial, and transactional.

Navigational intent refers to when a user is looking for a specific page or website. In this case, the user already knows what they are looking for and is trying to find it quickly. For example, a user might search for “Apple” if they want to visit the Apple website.

Informational intent refers to when a user is looking to learn more about a particular topic. In this case, the user is trying to gather information and might not have a specific website or page in mind. For example, a user might search for “best restaurants in Seattle” if they are planning a trip and want to find some recommendations.

Commercial intent refers to when a user is researching products or services before making a decision on what to buy. In this case, the user is interested in buying something but wants to compare different options first. For example, a user might search for “best cell phone” if they are looking to upgrade their phone.

Transactional intent refers to when a user wants to complete a specific action, usually a purchase. In this case, the user has already decided to buy something and is looking for a place to do so. For example, a user might search for “buy Adidas shoes online” if they want to purchase a specific brand of shoes.

Often the key when approaching intent for keywords is to understand where a brand wants to be positioned, and how search engines approach presenting the ‘right’ result for a term. For example, do you want to strategically focus on high volume top of the funnel terms for the purpose of brand awareness, or on lower volume bottom of the funnel terms to focus directly on conversions? We can optimize our web pages to provide the information that users are looking for, which can increase click-through rates and ultimately drive more conversions.

Mixed Methods Experimentation: CRO and UX Research

Mixed Methods is the intricate process in which different methodologies are combined to generate different types of insights. Some of these techniques are quantitative in nature, like CRO and some are qualitative such as certain User Experience Research (UX Research) methodologies.

Utilizing mixed methods helps teams keep a constant pulse on the visitors of their site, and help:

Understanding users and identifying hypotheses: UX Research helps us understand what’s most important, or what causes the biggest friction for prospects and customers. This allows us to prioritize which are likely to be the most impactful testing hypotheses.
Avoiding stagnation through constant innovation: By utilizing the synergies created by UX Research and A/B testing, teams can continually be ideating, testing and iterating, supporting the business’ growth and keeping ahead of the curve.

CRO and SEO

By conducting CRO and testing different versions of a page, we can identify the elements that resonate with users and optimize our pages accordingly.

There exists a common misconception that CRO is bad for SEO, while there are several high level risks that we discuss in a later section, there are a multitude of reasons that combining CRO and SEO is not only a winning strategy, but it can drive bottom line revenue at your organization. When CRO and SEO are combined, they can result in significant benefits for enterprises:

Improved User Experience

CRO and SEO can both improve the user experience of a website. By testing different versions of a webpage, businesses can identify which design elements, layout, or copy works best for an intended audience. Optimizing a website for search intent ensures that users find what they are looking for when they search for relevant keywords, resulting in a positive user experience. With a positive user experience users stay on websites longer and don’t bounce back to Google rankings and send Google negative signals that your website didn’t solve their need.

Increased Conversion Rates

CRO leads to increased conversion rates by optimizing a website’s design and user experience. Additionally, SEO ensures that the website ranks for the relevant keywords that users search for, leading to more qualified traffic. Together, these practices can increase conversion rates and ultimately revenue for businesses. Similarly, with increased conversion rates these are signals to search engines that users are finding what they need and this site should rank higher.

Improved Search Engine Rankings

Optimizing a website for search intent can lead to improved search engine rankings. By ensuring that a website’s content aligns with user intent, search engines will recognize the website as an authoritative source and rank it higher in the SERPs (Search Engine Results Page).

In a similar way, the improvements to user experience, content layout and design informed by CRO data can help to improve on content that is already properly optimized for relevant keywords in organic search. CRO input can therefore often be valuable in further improving the organic ranking potential of content.

Cost-Effective

CRO and SEO are both cost-effective practices that can deliver significant return on investment. By identifying the design elements that work best for their audience, businesses can improve their website’s user experience without investing in expensive redesigns. Additionally, SEO ensures that the website ranks for the relevant keywords, resulting in more qualified traffic without the need for expensive advertising campaigns.

CRO and SEO are essential practices that enterprises should integrate into their digital marketing strategy. Combining CRO and SEO can be a powerful strategy for increasing sales. By understanding user intent and testing different versions of our pages, we can optimize to provide the information that users are looking for and encourage them to take action.

Concerns with SEO and CRO – What do we need to be aware of?

CRO and SEO work best together, however there are some nuances and potential pitfalls to look out for and we’ve outlined some of the most relevant and common considerations when applying both.

Potential unintended consequences

Given that both SEO and CRO are based on making alterations to the content of a website–whether from a design standpoint, or a content standpoint, or even both–without proper communication between the stakeholders responsible for these two disciplines, you run the risk of negatively impacting the performance of each one.

Say, for example, the CRO team substantially changes or even removes the content of a page as part of their optimisation efforts. While their data showed that this may have been the right decision in terms of the conversion rate of the page, what they didn’t consider is that this same content had been properly structured and optimized for relevant organic keywords. As a result, removing this content leads to a substantial reduction in organic traffic to the page due to lost visibility in Google’s SERPs.

Of course, this example works in reverse too. For example, if the SEO team adds content to a page or amend its structure in order to better rank for relevant keywords in Google’s SERPs, they could severely impact the conversion rate of that same page. Say that these changes added a substantial amount of content to a URL that converted its users at a high rate precisely because it was short and to the point, the organic traffic to that same URL may well increase, but it may also lead to a drastic reduction in conversions.

In order to avoid the potential for unintended consequences, SEO and CRO need to work together, share data and align on their approaches.

Googlebot’s limitations and the risks of cloaking

One of the core aspects of a harmonious relationship between SEO and CRO is to make use of A/B testing.

However, there are some risks teams should be aware of in order to apply CRO and SEO correctly. Collaboration between SEO and CRO can make or break the efficacy of both.

The other major consideration from an SEO perspective is what’s called ‘cloaking’. This term refers to instances where Google is served different content than an actual user, with the former often being served the ‘SEO ideal’ version of a page whereas a user is served something substantially different – with an emphasis on substantially.

If Google understands that this is happening, it will be much less inclined to return that URL in favorable positions within its SERPs. In some extreme cases, the Web Spam Team at Google can also issue manual penalties to webmasters based on extensive use of cloaking practices.

This could be an issue when it comes to split testing URL designs and content for CRO if you’re making significant enough changes. After all, you need to test different versions of the page in order to understand how best to optimize for engagement and conversion with its content. So if Google is served one version of a page at one time and then another, totally different version another time, it may perceive this as webspam and therefore lose any incentive to rank it competitively for relevant keywords. The best way to minimize the risk is to stay focused on what Google refers to as the “spirit” of the page.

The “Spirit” of the Page: Avoiding the pitfalls of Cloaking

Google actively encourages A/B testing as a means to improve site performance and user experience. They recommend multiple ways to prevent Googlebot from interpreting an A/B test as cloaking content, but the most important is to maintain the “spirit” of the page.

The spirit of the page is the core content and purpose for which the page is designed. If it’s a product page for a t-shirt, make sure its content and layout continue to communicate and focus on the t-shirt’s features, price, images, etc. Don’t all of a sudden swap the page to be talking about pants, or furniture, or an investing scheme. This is the type of “bait and switch” scenario Google is concerned about. Google wants to penalize sites who are intentionally trying to deceive search engines to boost their pagerank performance – not sites that are simply trying to optimize their page layout and content to improve their own sites performance.

Maintaining the ‘spirit’ of the page is critical when conducting CRO and being flagged as a bad actor by Googlebot. To do this, variants should always maintain and give users the general points of the original content, regardless of variation.

If your experimentation does require you to make large changes to the page such as removing significant portions of content there are additional ways to mitigate the risk such as adjusting the targeting of your experiments to specifically exclude Googlebot from qualifying for the experiment. This way you can ensure Googlebot always sees the same control version of your site and reduce the risk they ever crawl or index a variant and incorrectly flag your experiments as deceitful.

Improving your website via A/B testing and experimentation has significant benefits and foregoing the strategy altogether because of a concern about SEO impact is likely more detrimental than simply taking the time to do it right. SEO and CRO experts can help you navigate through the nuances of these considerations to help avoid any pitfalls or unintended consequences of your experimentation program.

So how should SEO and CRO work together?

As referenced briefly above, the key to a successful website, whether in terms of organic visibility or conversion performance, is communication.

From an SEO perspective, any changes to a page that remove substantial content altogether represents a massive risk to organic traffic performance. So any long form CRO tests that require this should be brought to the attention of the SEO team as soon as possible in order to mitigate any unintended negative outcomes.

Likewise, any change that would see substantial amounts of content being added or altered on a page could be detrimental to the conversion rate to that same page.

So in short, the best way to balance the demands of SEO and CRO is to collaborate and to communicate. Make sure that each team has visibility over the strategic roadmap of each other, and has the opportunity to provide relevant expertise and consultation prior to any changes being made to a URL. This will more than likely save each team a lot of stress and frustration.

Putting It All Together

SEO is ensuring that content lives on the site in intentional and functional ways for users and search engines to find it, and then CRO is a way to optimize that content to ensure it has the right layout, emphasis, priority, usability, etc. to maximize conversion rates. These two disciplines should be viewed as symbiotic: SEO helps content gain visibility in search engines, while CRO enables that traffic to convert into specific actions on your website which in turn could send more positive signals to Google about that content’s quality.

About the Authors

Conversion and Reddico are members of the Sideshow Group of agencies – a global challenger in digital experience and marketing services.

James Vatter is an SEO Consultant at Reddico with over eight years of experience in SEO and digital marketing, encompassing technical SEO, content optimization and strategy, and website migrations in the financial services, insurance, medical and pharmaceutical sectors. Reddico works with leading brands such as BlackRock, Love Holidays, Auto Trader, Compare the Market and the Cotswold Company.

James Flory is the VP of Delivery at Conversion with over seven years of experience in experimentation, CRO, and client service. He is widely considered a subject matter expert in experimentation among some of the world’s leading companies like Microsoft, HP, and Whirlpool. He is also an Instructor of Digital Analytics at Simon Fraser University, one of Canada’s top Universities.

Sources:

https://support.google.com/optimize/answer/6218011

https://developers.google.com/search/docs/crawling-indexing/website-testing

The post SEO and CRO: A symbiotic relationship for growth appeared first on Conversion.

Are you thinking about experimentation too narrowly?

Matt Wright — Tue, 30 May 2023 22:20:36 +0000

In our recent study of experimentation programs, we interviewed program leaders at 42 different companies to learn how they are defining and applying experimentation. Surprisingly they answered with many distinctly different definitions and applications on how experimentation is best applied in business.

Which begs the question: if leaders have many different definitions of what experimentation means, does that mean some may be applying experimentation/testing too narrowly within their companies?

To answer, it helps to first consider the growth stages found along a typical life cycle (or S-curve) of a product or business. As a product or business launches, its products sell slowly. As it gains customers and popularity, it may have a time of rapid growth until it eventually slows to relative consistency due to one or more internal or external events.

Put into the context of experimentation, different mindsets typically occur at each of these different growth stages.

For instance, in the early phases, teams take on a more explorative growth mindset as they search for product-market fit. From there, they work on optimizing and fine tuning their product — exhibiting more of an exploitative mindset. Then, as growth starts to peter out, chances are companies need to move into more expansive thinking to continue to innovate and meet customer needs, often by looking for new product opportunities altogether.

We found that a large majority of those we interviewed still primarily limit their thinking around experimentation to the Exploit phase of the S curve — focusing their efforts almost exclusively on conversion rate optimization.

Good experimentation programs are using experimentation to power growth across both the exploration and expansion phases of a product, from validating product market fit, identifying and testing new features, boosting conversion rates, identifying new segments, as well as testing new channels.

Great, leading programs that reported the largest impact apply experimentation across the entire spectrum of growth stages in the life cycles of their products and services–not just the optimization phase. These programs will continually apply experimentation in the expansive phase to innovate and discover entirely new product opportunities.

By doing so, they’re using experimentation to generate multiple S-Curves — continuously finding new opportunities and verticals to be optimized individually to drive growth.

A prime example of this kind of expansive experimentation is at Nationwide. Led by Julia Barham, Nationwide’s Innovation Product Team uses experimentation as a tool to find new, more expansive opportunities for growth.

To deliver on that mandate, Julia’s team tests hundreds of concepts every year to learn about users’ needs, validate (or invalidate) problem-solution fit, and determine what new products and services Nationwide should pursue.

From lightweight concepts to high fidelity prototypes, it’s all about helping to attract and retain customers by putting them at the center of their work.

“We test hundreds of concepts every year to learn about users’ needs and validate (or invalidate) problem-solution fit. We test lightweight concepts to learn if a new feature is attractive and useful to customers, and we also test higher-fidelity concepts when we’re developing a brand new product. All of them are designed to help us attract and retain customers by putting them at the center of our work.”

— Julia Barham – Nationwide

If you’re only thinking about experimentation through the lens of conversion rate optimization, chances are you’re missing out on important opportunities for more expansive growth. It’s not to say conversion rate optimization isn’t important; but if your capacity is limited, it’s important to consider other means by which you can achieve more meaningful growth with the resources you have available, beyond just more iterative product or website improvements.

Learn more on expansive experimentation at Nationwide in our study Maximum Impact, How digital experimentation leaders are doing more with less here.

The post Are you thinking about experimentation too narrowly? appeared first on Conversion.