Tag Archives: a/b testing

Green Marketing: The psychological impact of an eco-conscious marketing campaign

The following research was first published in the MECLABS Quarterly Research Digest, July 2014.

Almost every industry has seen a shift toward “green technology” or “eco-friendly materials.” While this is certainly a positive step for the earth, it can rightly be questioned whether the marketing that touts this particular aspect of the business is really effective.

Marketing offices across the globe face some very real questions:

  • Does highlighting your green practices actually cause more people to buy from you?
  • Does it have any impact at all?
  • Does it, much to our shock and dismay, temper conversion?

When we find an issue like this, we are inclined to run a test rather than trust our marketing intuition.

Experiment: Does green marketing impact conversion?

The Research Partner for Test Protocol (TP) 11009 is a furniture company wanting to increase sales of its eco-friendly mattresses. Our key tracking metric was simple: purchases. Our research question was this: Which landing page would create more mattress sales, A or B?

As you can see in Figure 1.1, the pages were identical save for one key aspect: Version B included an extra section that Version A left out. In this section, we went into more detail about the green aspects of the mattress. It should be noted, however, that both pages included the “GreenGuard Gold Certification Seal,” so it is not as if Version A is devoid of the green marketing angle. Version B simply spelled it out more clearly.

Figure 1.1

Did the change make a difference? Yes, Version B outperformed Version A by 46%. Remember, this lift is in purchases, not simply clickthrough.

 

 

We have established that green marketing can be effective. But in what cases? How can we put that knowledge to good use and navigate the waters of green marketing with a repeatable methodology?

Four ways to create effective green marketing campaigns

In the test above, green marketing made a clear and significant difference. We made four observations as to why this particular green marketing strategy succeeded. You can use them as guides toward your own green marketing success.

Key Observation #1. The value was tangible. The value created by the copy was directly connected to the customer experience.

In the case of the GreenGuard Certified mattress, the value of being green was not solely based on its being eco-friendly. It also was customer-friendly. The green nature of the manufacturing process directly affected and increased the quality of the product. The copy stated that the mattress “meets the world’s most rigorous, third-party chemical emissions standards with strict low emission levels for over 360 volatile organic compounds.” Not only is it good for the earth, but it is also good for your toddler and your grandmother.

This tangible benefit to the customer experience is not always present in green marketing. In Figure 2.1, you see three examples of green marketing that fail to leverage a tangible benefit to the customer:

Figure 2.1

 

  1. When a hotel encourages you to reuse your towels to “save water,” it does nothing to improve the value of your experience with them. If anything, it may come off as an attempt to guilt the guest into reducing the hotel’s water bill.
  2. GE’s “Ecomagination” campaign is devoid of a tangible benefit to the customer. How does GE being green make my microwave better for me? The campaign doesn’t offer an answer.
  3. Conversely, “100% recycled toilet tissue” not only does not offer a tangible benefit to the customer, it also implies that the customer might not receive the same quality experience they would have with a non-green option.

For green marketing to optimally operate, you must be able to a point out a tangible benefit to the customer, in addition to the earth-friendly nature of the product.

Key Observation #2. The issue was relevant. The issue addressed by the copy dealt with a key concern already present in the mind of the prospect.

For people in the market for a new mattress, especially those with young children, sensitive skin or allergies, there are well-founded concerns regarding the chemicals and other materials that go into the production of the mattress. This concern already exists in the mind of the customer. It does not need to be raised or hyped by the marketer. Again, not all green marketing campaigns address relevant concerns.

Figure 3.1

 

  1. People are more concerned with safety, comfort and affordability when traveling. Whether the airline is green or not is not generally a concern.
  2. When choosing a sunscreen, most people don’t go in with aspirations of choosing a green option. Their top concern is sun protection, and biodegradable sunscreen doesn’t appear to meet that need as well as another option can.
  3. Again, “biodegradable” is not a common concern brought to the table by people buying pens.

All of these, while potentially noble causes, do not directly connect to a relevant problem the customer experiences. On the other hand, the GreenGuard Certified mattress immediately addressed a pressing concern held by the customer. It is “perfect for those with skin sensitivity or allergies.”

Key Observation #3. The claim was unique. The claim of exclusivity in the copy intensified the “only” factor of the product itself.

Just like any other benefit, green marketing benefits gain or lose value based on how many others can make the claim. If a web hosting platform touts itself as green or eco-friendly, the claim doesn’t hold as much force because the industry is saturated with green options (Figure 4.1). The same is true of BPA-free water bottles (Figure 4.2).

 

Figure 4.1

 

Figure 4.2

 

However, in the case of our Research Partner, not many of its competitors could make the “GreenGuard Gold Certification” claim (Figure 4.3). This added exclusivity — not to mention that Gold status implied they achieved the highest level of certification. Uniqueness drives value up, as long as the benefit in question is actually in demand.

Figure 4.3

 

Key Observation #4. The evidence was believable. The evidence provided in the copy lent instant credibility to any of the claims.

After the initial wave of green marketing techniques and practices took the industry by storm, there was a very justified backlash against those simply trying to cash in on the trend. Lawsuits were filed against marketers exaggerating their green-ness, including the likes of SC Johnson, Fiji Water, Hyundai and others. As a result, consumers became wary of green claims and must be persuaded otherwise by believable data.

In the winning design above, we did this in three ways:

  1. Verification: “100% Certified by GreenGuard Gold”
  2. Specification: “Our mattresses get reviewed quarterly to maintain this seal of approval. Last certification: January 4th, 2014.”
  3. Quantification: “Low emission levels for over 360 volatile organic compounds.”

The ability to prove that your green practices or eco-friendly products are truly as earth-friendly — and tangibly beneficial — as you claim is a crucial component in creating a green marketing angle that produces a significant increase in conversion.

How to approach your green marketing challenges

We have seen that green marketing can work. Still, this is not a recommendation to throw green marketing language into everything you put out. Green marketing is not a cure-all.

However, given the right circumstances, the right green positioning can certainly achieve lifts, and we want you to be able to capitalize on that. Therefore, we have created this checklist to help you analyze and improve your green marketing tactics.

☐  Is your green marketing tangible?

Does the nature of the green claims actually make the end product more appealing?

☐  Is your green marketing relevant?

Does the fact that your offer is green solve an important problem in the mind of the customer?

☐  Is your green marketing unique?

Can anyone else in your vertical make similar claims? If so, how do your claims stand apart?

☐  Is your green marketing believable?

Are your claims actually true? If so, how can you quantify, verify or specify your particular claims?

Of course, this checklist is only a starting point. Testing your results is the only true way to discover if your new green techniques are truly improving conversion.

Related Resources

Learn how Research Partnerships work, and how you can join MECLABS in discovering what really works in marketing

Read this MarketingExperiments Blog post to learn how to craft the right research question

Sometimes we only have intangible benefits to market. In this interview, Tim Kachuriak, Founder and Chief Innovation & Optimization Officer, Next After, explains how to get your customers to say, “heck yes”

One way to be relevant is better understand your customers is through data-driven marketing

Discover three techniques for standing out a competitive marketing, including focusing on your “only” factor

Read on for nine elements that help make your marketing claims more believable

The post Green Marketing: The psychological impact of an eco-conscious marketing campaign appeared first on MarketingExperiments.

Green Marketing: The psychological impact of an eco-conscious marketing campaign

The following research was first published in the MECLABS Quarterly Research Digest, July 2014.

Almost every industry has seen a shift toward “green technology” or “eco-friendly materials.” While this is certainly a positive step for the earth, it can rightly be questioned whether the marketing that touts this particular aspect of the business is really effective.

Marketing offices across the globe face some very real questions:

  • Does highlighting your green practices actually cause more people to buy from you?
  • Does it have any impact at all?
  • Does it, much to our shock and dismay, temper conversion?

When we find an issue like this, we are inclined to run a test rather than trust our marketing intuition.

Experiment: Does green marketing impact conversion?

The Research Partner for Test Protocol (TP) 11009 is a furniture company wanting to increase sales of its eco-friendly mattresses. Our key tracking metric was simple: purchases. Our research question was this: Which landing page would create more mattress sales, A or B?

As you can see in Figure 1.1, the pages were identical save for one key aspect: Version B included an extra section that Version A left out. In this section, we went into more detail about the green aspects of the mattress. It should be noted, however, that both pages included the “GreenGuard Gold Certification Seal,” so it is not as if Version A is devoid of the green marketing angle. Version B simply spelled it out more clearly.

Figure 1.1

Did the change make a difference? Yes, Version B outperformed Version A by 46%. Remember, this lift is in purchases, not simply clickthrough.

 

 

We have established that green marketing can be effective. But in what cases? How can we put that knowledge to good use and navigate the waters of green marketing with a repeatable methodology?

Four ways to create effective green marketing campaigns

In the test above, green marketing made a clear and significant difference. We made four observations as to why this particular green marketing strategy succeeded. You can use them as guides toward your own green marketing success.

Key Observation #1. The value was tangible. The value created by the copy was directly connected to the customer experience.

In the case of the GreenGuard Certified mattress, the value of being green was not solely based on its being eco-friendly. It also was customer-friendly. The green nature of the manufacturing process directly affected and increased the quality of the product. The copy stated that the mattress “meets the world’s most rigorous, third-party chemical emissions standards with strict low emission levels for over 360 volatile organic compounds.” Not only is it good for the earth, but it is also good for your toddler and your grandmother.

This tangible benefit to the customer experience is not always present in green marketing. In Figure 2.1, you see three examples of green marketing that fail to leverage a tangible benefit to the customer:

Figure 2.1

 

  1. When a hotel encourages you to reuse your towels to “save water,” it does nothing to improve the value of your experience with them. If anything, it may come off as an attempt to guilt the guest into reducing the hotel’s water bill.
  2. GE’s “Ecomagination” campaign is devoid of a tangible benefit to the customer. How does GE being green make my microwave better for me? The campaign doesn’t offer an answer.
  3. Conversely, “100% recycled toilet tissue” not only does not offer a tangible benefit to the customer, it also implies that the customer might not receive the same quality experience they would have with a non-green option.

For green marketing to optimally operate, you must be able to a point out a tangible benefit to the customer, in addition to the earth-friendly nature of the product.

Key Observation #2. The issue was relevant. The issue addressed by the copy dealt with a key concern already present in the mind of the prospect.

For people in the market for a new mattress, especially those with young children, sensitive skin or allergies, there are well-founded concerns regarding the chemicals and other materials that go into the production of the mattress. This concern already exists in the mind of the customer. It does not need to be raised or hyped by the marketer. Again, not all green marketing campaigns address relevant concerns.

Figure 3.1

 

  1. People are more concerned with safety, comfort and affordability when traveling. Whether the airline is green or not is not generally a concern.
  2. When choosing a sunscreen, most people don’t go in with aspirations of choosing a green option. Their top concern is sun protection, and biodegradable sunscreen doesn’t appear to meet that need as well as another option can.
  3. Again, “biodegradable” is not a common concern brought to the table by people buying pens.

All of these, while potentially noble causes, do not directly connect to a relevant problem the customer experiences. On the other hand, the GreenGuard Certified mattress immediately addressed a pressing concern held by the customer. It is “perfect for those with skin sensitivity or allergies.”

Key Observation #3. The claim was unique. The claim of exclusivity in the copy intensified the “only” factor of the product itself.

Just like any other benefit, green marketing benefits gain or lose value based on how many others can make the claim. If a web hosting platform touts itself as green or eco-friendly, the claim doesn’t hold as much force because the industry is saturated with green options (Figure 4.1). The same is true of BPA-free water bottles (Figure 4.2).

 

Figure 4.1

 

Figure 4.2

 

However, in the case of our Research Partner, not many of its competitors could make the “GreenGuard Gold Certification” claim (Figure 4.3). This added exclusivity — not to mention that Gold status implied they achieved the highest level of certification. Uniqueness drives value up, as long as the benefit in question is actually in demand.

Figure 4.3

 

Key Observation #4. The evidence was believable. The evidence provided in the copy lent instant credibility to any of the claims.

After the initial wave of green marketing techniques and practices took the industry by storm, there was a very justified backlash against those simply trying to cash in on the trend. Lawsuits were filed against marketers exaggerating their green-ness, including the likes of SC Johnson, Fiji Water, Hyundai and others. As a result, consumers became wary of green claims and must be persuaded otherwise by believable data.

In the winning design above, we did this in three ways:

  1. Verification: “100% Certified by GreenGuard Gold”
  2. Specification: “Our mattresses get reviewed quarterly to maintain this seal of approval. Last certification: January 4th, 2014.”
  3. Quantification: “Low emission levels for over 360 volatile organic compounds.”

The ability to prove that your green practices or eco-friendly products are truly as earth-friendly — and tangibly beneficial — as you claim is a crucial component in creating a green marketing angle that produces a significant increase in conversion.

How to approach your green marketing challenges

We have seen that green marketing can work. Still, this is not a recommendation to throw green marketing language into everything you put out. Green marketing is not a cure-all.

However, given the right circumstances, the right green positioning can certainly achieve lifts, and we want you to be able to capitalize on that. Therefore, we have created this checklist to help you analyze and improve your green marketing tactics.

☐  Is your green marketing tangible?

Does the nature of the green claims actually make the end product more appealing?

☐  Is your green marketing relevant?

Does the fact that your offer is green solve an important problem in the mind of the customer?

☐  Is your green marketing unique?

Can anyone else in your vertical make similar claims? If so, how do your claims stand apart?

☐  Is your green marketing believable?

Are your claims actually true? If so, how can you quantify, verify or specify your particular claims?

Of course, this checklist is only a starting point. Testing your results is the only true way to discover if your new green techniques are truly improving conversion.

Related Resources

Learn how Research Partnerships work, and how you can join MECLABS in discovering what really works in marketing

Read this MarketingExperiments Blog post to learn how to craft the right research question

Sometimes we only have intangible benefits to market. In this interview, Tim Kachuriak, Founder and Chief Innovation & Optimization Officer, Next After, explains how to get your customers to say, “heck yes”

One way to be relevant is better understand your customers is through data-driven marketing

Discover three techniques for standing out a competitive marketing, including focusing on your “only” factor

Read on for nine elements that help make your marketing claims more believable

The post Green Marketing: The psychological impact of an eco-conscious marketing campaign appeared first on MarketingExperiments.

Get Your Free Test Discovery Tool to Help Log all the Results and Discoveries from Your Company’s Marketing Tests

Come budget time, do you have an easy way to show all the results from your testing? Not just conversion lifts, but the golden intel that senior business leaders crave — key insights into customer behavior.

To help you do that, we’ve created the free MECLABS Institute Test Discovery Tool, so you can build a custom discovery library for your organization. This simple tool is an easy way of helping your company create a repository of discoveries from its behavioral testing with customers and showing business leaders all the results of your testing efforts. Just click the link below to get yours.

 

Click Here to Download Your FREE Test Discovery Tool Instantly

(no form to fill out, just click to get your instant download of this Excel-based tool)

 

In addition to enabling you to show comprehensive test results to business leaders, a custom test discovery library for your brand helps improve your overall organization’s performance. You probably have an amazing amount of institutional knowledge stuck in your cranium. From previous campaigns and tests, you have a good sense of what will work with your customers and what will not. You probably use this info to inform future tests and campaigns, measure what works and build your knowledge base even more.

But to create a truly successful organization, you have to get that wisdom out of your head and make sure everyone in your marketing department and at your agencies has access to that valuable intel. Plus, you want the ability to learn from everyone in your organization as well.

 

Click Here to Download Your FREE Test Discovery Tool Instantly

(no form to fill out, just click to get your instant download of this Excel-based tool)

 

This tool was created to help a MECLABS Research Partner keep track of all the lessons learned from its tests.

“The goal of building this summary spreadsheet was to create a functional and precise approach to document a comprehensive summary of results. The template allows marketers to form a holistic understanding of their test outcomes in an easily digestible format, which is helpful when sharing and building upon future testing strategy within your organization. The fields within the template are key components that all testing summaries should possess to clearly understand what the test was measuring and impacting, and the validity of the results,” said Delaney Dempsey, Data Scientist, MECLABS Institute.

“Basically, the combination of these fields provides a clear understanding of what worked and what did not work. Overall, the biggest takeaway for marketers is that having an effective approach to documenting your results is an important element in creation of your customer theory and impactful marketing strategies. Ultimately, past test results are the root of our testing discovery about our customers,” she explained.

 

Click Here to Download Your FREE Test Discovery Tool Instantly

(no form to fill out, just click to get your instant download of this Excel-based tool)

 

Here is a quick overview for filling out the fields in this tool (we’ve also included this info in the tool) …

Click on the image to enlarge in new window

How to use this tool to organize your company’s customer discoveries from real-world behavioral tests

For a deeper exploration of testing, and to learn where to test, what to test and how to turn basic testing data into customer wisdom, you can take the MECLABS Institute Online Testing on-demand certification course.

Test Dashboard: This provides an overview of your tests. The info automatically pulls from the information you input for each individual test on the other sheets in this Excel document. You may decide to color code each test stream (say blue for email, green for landing pages, etc.) to more easily read the dashboard. (For instructions on adding more rows to the Test Dashboard, and thus more test worksheets to the Excel tool, scroll down to the “Adding More Tests” section.)

Your Test Name Here: Create a name for each test you run. (To add more tabs to run more tests, scroll down to the “Adding More Tests” section.)

Test Stream: Group tests in a way that makes the most sense for your organization. Some examples might be the main site, microsite, landing pages, homepage, email, specific email lists, PPC ads, social media ads and so on.

Test Location: Where in your test stream did this specific test occur? For example, if the Test Stream was your main site, the Test Location may have been on product pages, a shopping page or on the homepage. If one of your testing streams is Landing Pages, the test location may have been a Facebook landing page for a specific product.

Test Tracking Number: To organize your tests, it can help to assign each test a unique tracking number. For example, every test MECLABS Institute conducts for a company has a Test Protocol Number.

Timeframe Run: Enter the dates the test ran and the number of days it ran. MECLABS recommends you run your tests for at least a week, even if it reaches a statistically significant sample size, to help reduce the chances of a validity threat known as History Effect.

Hypothesis: The reason to run a test is to prove or disprove a hypothesis.

Do you know how you can best serve your customer to improve results? What knowledge gaps do you have about your customer? What internal debates do you have about the customer? What have you debated with your agency or vendor partner? Settle those debates and fill those knowledge gaps by crafting a hypothesis and running a test to measure real-world customer behavior.

Here is the approach MECLABS uses to formulate a hypothesis, with an example filled in …

# of Treatments: This is the number of versions you are testing. For example, if you had Landing Page A and Landing Page B, that would be two treatments. The more treatments you test in one experiment, the more samples you need to avoid a Sampling Distortion Effect validity threat, which can occur when you do not collect a significant number of observations.

Valid/Not Valid: A valid test measures what it claims to measure. Valid tests are well-founded and correspond accurately to the real world. Results of a valid test can be trusted to be accurate and to represent real-world conditions. Invalid tests fail to measure what they claim to measure and cannot be trusted as being representative of real-world conditions.

Conclusive/Inconclusive: A Conclusive Test is a valid test that has reached the desired Level of Confidence (95% is the most commonly used standard). An Inconclusive Test is a valid test that failed to reach the desired Level of Confidence for the primary KPI (95% is the most commonly used standard). Inconclusive tests, while not the marketer’s goal, are not innately bad. They offer insights into the cognitive psychology of the customer. They help marketers discover which mental levers do not have a significant impact on the decision process.

KPIs — MAIN, SECONDARY, TERTIARY

Name: KPIs are key performance indicators. They are the yardstick for measuring your test. The main KPI is what ultimately determines how well your test performed, but secondary and tertiary KPIs can be insightful as well. For example, the main KPI for a product page test might be the add-to-cart rate. That is the main action you are trying to influence with your test treatment(s). A secondary KPI might be a change in revenue. Perhaps you get fewer orders, but at a higher value per order, and thus more revenue. A tertiary KPI might be checkout rate, tracking how many people complete the action all the way through the funnel. There may be later steps in the funnel that are affecting that checkout rate beyond what you’re testing, which is why it is not the main KPI of the test but still important to understand. (Please note, every test does not necessarily have to have a main, secondary and tertiary KPI, but every test should at least have a main KPI.)

Key Discoveries: This is the main benefit of running tests — to make new discoveries about customer behavior. This Test Discovery Library gives you a central, easily accessible place to share those discoveries with the entire company. For example, you could upload this document to an internal SharePoint or intranet, or even email it around every time a test is complete.

The hypothesis will heavily inform the key discoveries section, but you may also learn something you weren’t expecting, especially from secondary KPIs.

What did the test results tell you about the perceived credibility of your product and brand? The level of brand exposure customers have previously had? Customers’ propensity to buy or become a lead? The difference in the behavior of new and returning visits to your website? The preference for different communication mechanisms (e.g., live chat vs. video chat)? Behavior on different devices (e.g., desktop vs. mobile)? These are just examples; the list could go on forever … and you likely have some that are unique to your organization.

Experience Implemented? This is pretty straightforward. Has the experience that was tested been implemented as the new landing page, home page, etc., after the test closed?

Date of implementation: If the experience has been implemented, when was it implemented? Recording this information can help you go back and make sure overall performance correlated with your expectations from the test results.

ADDING MORE TESTS TO THE TOOL

The Test Dashboard tab dynamically pulls in all information from the subsequent test worksheets, so you do not need to manually enter any data here except for the test sequence number in Column A. If you want to create a new test tab and the corresponding row in the “Test Dashboard,” follow these instructions:

    • Right click on the bottom tab titled “Template – Your Test Name Here.” Choose “Move or Copy.” From the list of sheets, choose “Template – Your Test Name Here.” Check the box “Create a Copy” and click OK. Right click on your new “Template – Your Test Name Here (2)” tab and rename as “Your Test Name Here (7).”
    • Now, you’ll need to add a new row to your “Test Dashboard” tab. Copy the last row. For example, select row 8 on the “Test Dashboard” tab, copy/paste those contents into row 9. You will need to make the following edits to reference your new tab, “Your Test Name Here (7).” This can be done in the following way:
      • Manually enter the test as “7” in cell A9.
      • The remaining cells dynamically pull the data in. However, since you copy/paste, they are still referencing the test above. To update this, highlight select row 9 again. On the Home Tab>Editing, select “Find & Select (located on the far right)>”Replace,” or use “CTRL+F”>Replace.
      • On the Replace tab of the box, enter Find What: “Your Test Name (6)” and Replace with: “Your Test Name (7).”
      • Click “Replace All”
      • All cells in the row should now reference your new tab, “Your Test Name (7)” properly.

 

Click Here to Download Your FREE Test Discovery Tool Instantly

(no form to fill out, just click to get your instant download of this Excel-based tool)

 

Special thanks to Research Manager Alissa Shaw, Data Scientist Delaney Dempsey, Associate Director of Design Lauren Leonard, Senior Director of Research Partnerships Austin McCraw, and Copy Editor Linda Johnson for helping to create the Test Discovery Library tool.

If you have any questions, you can email us at info@MECLABS.com. And here are some more resources to help with your testing …

Lead your team to breakthrough results with A Model of your Customer’s Mind: These 21 charts and tools have helped capture more than $500 million in (carefully measured) test wins

Test Planning Scenario Tool – This simple tool helps you visualize factors that affect the ROI implications of test sequencing

Customer Theory: How we learned from a previous test to drive a 40% increase in CTR

The post Get Your Free Test Discovery Tool to Help Log all the Results and Discoveries from Your Company’s Marketing Tests appeared first on MarketingExperiments.

A/B Testing: Why do different sample size calculators and testing platforms produce different estimates of statistical significance?

A/B testing is a powerful way to increase conversion (e.g., 638% more leads, 78% more conversion on a product page, etc.).

Its strength lies in its predictive ability. When you implement the alternate version suggested by the test, your conversion funnel actually performs the way the test indicated that it would.

To help determine that, you want to ensure you’re running valid tests. And before you decide to implement related changes, you want to ensure your test is conclusive and not just a result of random chance. One important element of a conclusive test is that the results show a statistically significant difference between the control and the treatment.

Many platforms will include something like a “statistical significance status” with your results to help you determine this. There are also several sample size calculators available online, and different calculators may suggest you need different sample sizes for your test.

But what do those numbers really mean? We’ll explore that topic in this MarketingExperiments article.

A word of caution for marketing and advertising creatives: This article includes several paragraphs that talk about statistics in a mathy way — and even contains a mathematical equation (in case these may pose a trigger risk for you). Even so, we’ve done our best to use them only where they serve to clarify rather than complicate.

Why does statistical significance matter?

To set the stage for talking about sample size and statistical significance, it’s worth mentioning a few words about the nature and purpose of testing (aka inferential experimentation) and the nomenclature we’ll use.

We test in order to infer some important characteristics about a whole population by observing a small subset of members from the population called a “Sample.”

MECLABS metatheory dubs a test that successfully accomplishes this purpose a “Useful” test.

The Usefulness (predictiveness) of a test is affected by two key features: “Validity” and “Conclusiveness.”

Statistical significance is one factor that helps to determine if a test is useful. A useful test is one that can be trusted to accurately reflect how the “system” will perform under real-world conditions.

Having an insufficient sample size presents a validity threat known as Sample Distortion Effect. This is a danger because if you don’t get a large enough sample size, any apparent performance differences may have been due to random variation and not true insights into your customers’ behavior. This could give you false confidence that a landing page change that you tested will improve your results if you implement it, when it actually won’t.

“Seemingly unlikely things DO sometimes happen, purely ‘by coincidence’ (aka due to random variation). Statistical methods help us to distinguish between valuable insights and worthless superstitions,” said Bob Kemper, Executive Director, Infrastructure Support Services at MECLABS Institute.

“By our very nature, humans are instinctively programmed to seek out and recognize patterns: think ‘Hmm, did you notice that the last five people who ate those purplish berries down by the river died the next day?’” he said.

A conclusive test is a valid test (There are other validity threats in addition to sample distortion effect.) that has reached a desired Level of Confidence, or LoC (95% is the most commonly used standard).

In practice, at 95% LoC, the 95% confidence interval for the difference between control and treatment rates of the key performance indicator (KPI) does not include zero.

A simple way to think of this is that a conclusive test means you are 95% confident the treatment will perform at least as well as the control on the primary KPI.  So the performance you’ll actually get, once it’s in production for all traffic, will be somewhere inside the Confidence Interval (shown in yellow above).  Determining level of confidence requires some math.

Why do different testing platforms and related tools offer such disparate estimates of required sample size? 

One of MECLABS Institute’s Research Partners who is president of an internet company recently asked our analysts about this topic. His team found a sample size calculator tool online from a reputable company and noticed how different its estimate of minimum sample size was compared to the internal tool MECLABS analysts use when working with Research Partners (MECLABS is the parent research organization of MarketingExperiments).

The simple answer is that the two tools approach the estimation problem using different assumptions and statistical models, much the way there are several competing models for predicting the path of hurricanes and tropical storms.

Living in Jacksonville, Florida, an area that is often under hurricane threats, I can tell you there’s been much debate over which among the several competing models is most accurate (and now there’s even a newer, Next Gen model). Similarly, there is debate in the optimization testing world about which statistical models are best.

The goal of this article isn’t to take sides, just to give you a closer look at why different tools produce different estimates. Not because the math is “wrong” in any of them, they simply employ different approaches.

“While the underlying philosophies supporting each differ, and they approach empirical inference in subtly different ways, both can be used profitably in marketing experimentation,” said Danitza Dragovic, Digital Optimization Specialist at MECLABS Institute.

In this case, in seeking to understand the business implications of test duration and confidence in results, it was understandably confusing for our Research Partner to see different sample size calculations based upon the tool used. It wasn’t clear that a pre-determined sample size is fundamental to testing in some calculations, while other platforms ultimately determine test results irrespective of pre-determined sample sizes, using prior probabilities assigned by the platform, and provide sample size calculators simply as a planning tool.

Let’s take a closer look at each …

Classical statistics 

The MECLABS Test Protocol employs a group of statistical methods based on the “Z-test,” arising from “classical statistics” principles that adopt a Frequentist approach, which makes predictions using only data from the current experiment.

With this method, recent traffic and performance levels are used to compute a single fixed minimum sample size before launching the test.  Status checks are made to detect any potential test setup or instrumentation problems, but LoC (level of confidence) is not computed until the test has reached the pre-established minimum sample size.

While historically the most commonly used for scientific and academic experimental research for the last century, this classical approach is now being met by theoretical and practical competition from tools that use (or incorporate) a different statistical school of thought based upon the principles of Bayesian probability theory. Though Bayesian theory is far from new (Thomas Bayes proposed its foundations more than 250 years ago), its practical application for real-time optimization research required computational speed and capacity only recently available.

Breaking Tradition: Toward optimization breakthroughs

“Among the criticisms of the traditional frequentist approach has been its counterintuitive ‘negative inference’ approach and thought process, accompanied by a correspondingly ‘backwards’ nomenclature. For instance, you don’t ‘prove your hypothesis’ (like normal people), but instead you ‘fail to reject your Null hypothesis’ — I mean, who talks (or thinks) like that?” Kemper said.

He continued, “While Bayesian probability is not without its own weird lexical contrivances (Can you say ‘posterior predictive’?), its inferential frame of reference is more consistent with the way most people naturally think, like assigning the ’probability of a hypothesis being True’ based on your past experience with such things. For a purist Frequentist, it’s impolite (indeed sacrilegious) to go into a test with a preconceived ‘favorite’ or ‘preferred answer.’ One must simply objectively conduct the test and ‘see what the data says.’ As a consequence, the statement of the findings from a typical Bayesian test — i.e., a Bayesian inference — is much more satisfying to a non-specialist in science or statistics than is an equivalent traditional/frequentist one.”

Hybrid approaches

Some platforms use a sequential likelihood ratio test that combines a Frequentist approach with a Bayesian approach. The adjective “sequential” refers to the approach’s continual recalculation of the minimum sample size for sufficiency as new data arrives, with the goal of minimizing the likelihood of a false positive arising from stopping data collection too soon.

Although an online test estimator using this method may give a rough sample size, this method was specifically designed to avoid having to rely on a predetermined sample size, or predetermined minimum effect size. Instead, the test is monitored, and the tool indicates at what point you can be confident in the results.

In many cases, this approach may result in shorter tests due to unexpectedly high effect sizes. But when tools employ proprietary methodologies, the way that minimum sample size is ultimately determined may be opaque to the marketer.

CONSIDERATIONS FOR EACH OF THESE APPROACHES

Classical “static” approaches

Classical statistical tests, such as Z-tests, are the de facto standard across a broad spectrum of industries and disciplines, including academia. They arise from the concepts of normal distribution (think bell curve) and probability theory described by mathematicians Abraham de Moivre and Carl Friedrich Gauss in the 17th to 19th centuries. (Normal distribution is also known as Gaussian distribution.)  Z-tests are commonly used in medical and social science research.

They require you to estimate the minimum detectable effect-size before launching the test and then refrain from “peeking at” Level of Confidence until the corresponding minimum sample size is reached.  For example, the MECLABS Sample Size Estimation Tool used with Research Partners requires that our analysts make pre-test estimates of:

  • The projected success rate — for example, conversion rate, clickthrough rate (CTR), etc.
  • The minimum relative difference you wish to detect — how big a difference is needed to make the test worth conducting? The greater this “effect size,” the fewer samples are needed to confidently assert that there is, in fact, an actual difference between the treatments. Of course, the smaller the design’s “minimum detectable difference,” the harder it is to achieve that threshold.
  • The statistical significance level — this is the probability of accidentally concluding there is a difference due to sampling error when really there is no difference (aka Type-I error). MECLABS recommends a five percent statistical significance which equates to a 95% desired Level of Confidence (LoC).
  • The arrival rate in terms of total arrivals per day — this would be your total estimated traffic level if you’re testing landing pages. “For example, if the element being tested is a page in your ecommerce lower funnel (shopping cart), then the ‘arrival rate’ would be the total number of visitors who click the ‘My Cart’ or ‘Buy Now’ button, entering the shopping cart section of the sales funnel and who will experience either the control or an experimental treatment of your test,” Kemper said.
  • The number of primary treatments — for example, this would be two if you’re running an A/B test with a control and one experimental treatment.

Typically, analysts draw upon a forensic data analysis conducted at the outset combined with test results measured throughout the Research Partnership to arrive at these inputs.

“Dynamic” approaches 

Dynamic, or “adaptive” sampling approaches, such as the sequential likelihood ratio test, are a more recent development and tend to incorporate methods beyond those recognized by classical statistics.

In part, these methods weren’t introduced sooner due to technical limitations. Because adaptive sampling employs frequent computational reassessment of sample size sufficiency and may even be adjusting the balance of incoming traffic among treatments, they were impractical until they could be hosted on machines with the computing capacity to keep up.

One potential benefit can be the test duration. “Under certain circumstances (for example, when actual treatment performance is very different from test-design assumptions), tests may be able to be significantly foreshortened, especially when actual treatment effects are very large,” Kemper said.

This is where prior data is so important to this approach. The model can shorten test duration specifically because it takes prior data into account. An attendant limitation is that it can be difficult to identify what prior data is used and exactly how statistical significance is calculated. This doesn’t necessarily make the math any less sound or valid, it just makes it somewhat less transparent. And the quality/applicability of the priors can be critical to the accuracy of the outcome.

As Georgi Z. Georgiev explains in Issues with Current Bayesian Approaches to A/B Testing in Conversion Rate Optimization, “An end user would be left to wonder: what prior exactly is used in the calculations? Does it concentrate probability mass around a certain point? How informative exactly is it and what weight does it have over the observed data from a particular test? How robust with regards to the data and the resulting posterior is it? Without answers to these and other questions an end user might have a hard time interpreting results.”

As with other things unique to a specific platform, it also impinges on the portability of the data, as Georgiev explains:

A practitioner who wants to do that [compare results of different tests run on different platforms] will find himself in a situation where it cannot really be done, since a test ran on one platform and ended with a given value of a statistic of interest cannot be compared to another test with the same value of a statistic of interest ran on another platform, due to the different priors involved. This makes sharing of knowledge between practitioners of such platforms significantly more difficult, if not impossible since the priors might not be known to the user.

Interpreting MECLABS (classical approach) test duration estimates 

At MECLABS, the estimated minimum required sample size for most experiments conducted with Research Partners is calculated using classical statistics. For example, the formula for computing the number of samples needed for two proportions that are evenly split (uneven splits use a different and slightly more complicated formula) is provided by:

Solving for n yields:

Variables:

  • n: the minimum number of samples required per treatment
  • z: the Z statistic value corresponding with the desired Level of Confidence
  • p: the pooled success proportion — a value between 0 – 1 — (i.e., of clicks, conversions, etc.)
  • δ: the difference of success proportions among the treatments

This formula is used for tests that have an even split among treatments.

Once “samples per treatment” (n) has been calculated, it is multiplied by the number of primary treatments being tested to estimate the minimum number of total samples required to detect the specified amount of “treatment effect” (performance lift) with at least the specified Level of Confidence, presuming the selection of test subjects is random.

The estimated test duration, typically expressed in days, is then calculated by dividing the required total sample size by the expected average traffic level, expressed as visitors per day arriving at the test.

Finding your way 

“As a marketer using experimentation to optimize your organization’s sales performance, you will find your own style and your own way to your destination,” Kemper said.

“Like travel, the path you choose depends on a variety of factors, including your skills, your priorities and your budget. Getting over the mountains, you might choose to climb, bike, drive or fly; and there are products and service providers who can assist you with each,” he advised.

Understanding sampling method and minimum required sample size will help you to choose the best path for your organization. This article is intended to provide a starting point. Take a look at the links to related articles below for further research on sample sizes in particular and testing in general.

Related Resources

17 charts and tools have helped capture more than $500 million in (carefully measured) test wins

MECLABS Institute Online Testing on-demand certification course

Marketing Optimization: How To Determine The Proper Sample Size

A/B Testing: Working With A Very Small Sample Size Is Difficult, But Not Impossible

A/B Testing: Split Tests Are Meaningless Without The Proper Sample Size

Two Factors that Affect the Validity of Your Test Estimation

Frequentist A/B test (good basic overview by Ethen Liu)

Bayesian vs Frequentist A/B Testing – What’s the Difference? (by Alex Birkett on ConversionXL)

Thinking about A/B Testing for Your Client? Read This First. (by Emīls Vēveris on Shopify)

On the scalability of statistical procedures: why the p-value bashers just don’t get it. (by Jeff Leek on SimplyStats)

Bayesian vs Frequentist Statistics (by Leonid Pekelis on Optimizely Blog)

Statistics for the Internet Age: The Story Behind Optimizely’s New Stats Engine (by Leonid Pekelis on Optimizely Blog)

Issues with Current Bayesian Approaches to A/B Testing in Conversion Rate Optimization (by Georgi Z. Georgiev on Analytics-Toolkit.com)

 

The post A/B Testing: Why do different sample size calculators and testing platforms produce different estimates of statistical significance? appeared first on MarketingExperiments.

A/B Testing Prioritization: The surprising ROI impact of test order

I want everything. And I want it now.

I’m sure you do, too.

But let me tell you about my marketing department. Resources aren’t infinite. I can’t do everything right away. I need to focus myself and my team on the right things.

Unless you found a genie in a bottle and wished for an infinite marketing budget (right after you wished for unlimited wishes, natch), I’m guessing you’re in the same boat.

When it comes to your conversion rate optimization program, it means running the most impactful tests. As Stephen Walsh said when he wrote about 19 possible A/B tests for your website on Neil Patel’s blog, “testing every random aspect of your website can often be counter-productive.”

Of course, you probably already know that. What may surprise you is this …

It’s not enough to run the right tests, you will get a higher ROI if you run them in the right order

To help you discover the optimal testing sequence for your marketing department, we’ve created the free MECLABS Institute Test Planning Scenario Tool (MECLABS is the parent research organization of MarketingExperiments).

Let’s look at a few example scenarios.

Scenario #1: Level of effort and level of impact

Tests will have different levels of effort to run. For example, it’s easier to make a simple copy change to a headline than to change a shopping cart.

This level of effort (LOE) sometimes correlates to the level of impact the test will have to your bottom line. For example, a radical redesign might be a higher LOE to launch, but it will also likely produce a higher lift than a simple, small change.

So how does the order you run a high effort, high return, and low effort, low return test sequence affect results? Again, we’re not saying choose one test over another. We’re simply talking about timing. To the test planning scenario tool …

Test 1 (Low LOE, low level of impact)

  • Business impact — 15% more revenue than the control
  • Build Time — 2 weeks

Test 2 (High LOE, high level of impact)

  • Business impact — 47% more revenue than the control
  • Build Time — 6 weeks

Let’s look at the revenue impact over a six-month period. According to the test planning tool, if the control is generating $30,000 in revenue per month, running a test where the treatment has a low LOE and a low level of impact (Test 1) first will generate $22,800 more revenue than running a test where the treatment has a high LOE and a high level of impact (Test 2) first.

Scenario #2: An even larger discrepancy in the level of impact

It can be hard to predict the exact level of business impact. So what if the business impact differential between the higher LOE test is even greater than in Scenario #1, and both treatments perform even better than they did in Scenario #1? How would test sequence affect results in that case?

Let’s run the numbers in the Test Planning Scenario Tool.

Test 1 (Low LOE, low level of impact)

  • Business impact — 25% more revenue than the control
  • Build Time — 2 weeks

Test 2 (High LOE, high level of impact)

  • Business impact — 125% more revenue than the control
  • Build Time — 6 weeks

According to the test planning tool, if the control is generating $30,000 in revenue per month, running Test 1 (low LOE, low level of impact) first will generate $45,000 more revenue than running Test 2 (high LOE, high level of impact) first.

Again, same tests (over a six-month period) just a different order. And you gain $45,000 more in revenue.

“It is particularly interesting to see the benefits of running the lower LOE and lower impact test first so that its benefits could be reaped throughout the duration of the longer development schedule on the higher LOE test. The financial impact difference — landing in the tens of thousands of dollars — may be particularly shocking to some readers,” said Rebecca Strally, Director, Optimization and Design, MECLABS Institute.

Scenario #3: Fewer development resources

In the above two examples, the tests were able to be developed simultaneously. What if the test cannot be developed simultaneously (must be developed sequentially) and can’t be developed until the previous test has been implemented? Perhaps this is because of your organization’s development methodology (Agile vs. Waterfall, etc.), or there is just simply a limit on your development resources. (They likely have many other projects than just developing your tests.)

Let’s look at that scenario, this time with three treatments.

Test 1 (Low LOE, low level of impact)

  • Business impact — 10% more revenue than the control
  • Build Time — 2 weeks

Test 2 (High LOE, high level of impact)

  • Business impact — 360% more revenue than the control
  • Build Time — 6 weeks

Test 3 (Medium LOE, medium level of impact)

  • Business impact — 70% more revenue than the control
  • Build Time — 3 weeks

In this scenario, Test 2 first, then Test 1 and finally Test 3, along with Test 2, then Test 3, then Test 1 were the highest-performing scenarios. The lowest-performing scenario was Test 3, Test 1, Test 2. The difference was $894,000 more revenue from using one of the highest-performing test sequences versus the lowest-performing test sequence.

“If development for tests could not take place simultaneously, there would be a bigger discrepancy in overall revenue from different test sequences,” Strally said.

“Running a higher LOE test first suddenly has a much larger financial payoff. This is notable because once the largest impact has been achieved, it doesn’t matter in what order the smaller LOE and impact tests are run, the final dollar amounts are the same. Development limitations (although I’ve rarely seen them this extreme in the real world) created a situation where whichever test went first had a much longer opportunity to impact the final financial numbers. The added front time certainly helped to push running the highest LOE and impact test first to the front of the financial pack,” she added.

The Next Scenario Is Up To You: Now forecast your own most profitable test sequences

You likely don’t have the exact perfect information we provided in the scenarios. We’ve provided model scenarios above, but the real world can be trickier. After all, as Nobel Prize-winning physicist Niels Bohr said, “Prediction is very difficult, especially if it’s about the future.”

“We rarely have this level of information about the possible financial impact of a test prior to development and launch when working to optimize conversion for MECLABS Research Partners. At best, the team often only has a general guess as to the level of impact expected, and it’s rarely translated into a dollar amount,” Strally said.

That’s why we’re providing the Test Planning Scenario Tool as a free, instant download. It’s easy to run a few different scenarios in the tool based on different levels of projected results and see how the test order can affect overall revenue. You can then use the visual charts and numbers created by the tool to make the case to your team, clients and business leaders about what order you should run your company’s tests.

Don’t put your tests on autopilot

Of course, things don’t always go according to plan. This tool is just a start. To have a successful conversion optimization practice, you have to actively monitor your tests and advocate for the results because there are a number of additional items that could impact an optimal testing sequence.

“There’s also the reality of testing which is not represented in these very clean charts. For example, things like validity threats popping up midtest and causing a longer run time, treatments not being possible to implement, and Research Partners requesting changes to winning treatments after the results are in, all take place regularly and would greatly shift the timing and financial implications of any testing sequence,” Strally said.

“In reality though, the number one risk to a preplanned DOE (design of experiments) in my experience is an unplanned result. I don’t mean the control winning when we thought the treatment would outperform. I mean a test coming back a winner in the main KPI (key performance indicator) with an unexpected customer insight result, or an insignificant result coming back with odd customer behavior data. This type of result often creates a longer analysis period and the need to go back to the drawing board to develop a test that will answer a question we didn’t even know we needed to ask. We are often highly invested in getting these answers because of their long-term positive impact potential and will pause all other work — lowering financial impact — to get these questions answered to our satisfaction,” she said.

Related Resources

MECLABS Institute Online Testing on-demand certification course

Offline and Online Optimization: Cabela’s shares tactics from 51 years of offline testing, 7 years of digital testing

Landing Page Testing: Designing And Prioritizing Experiments

Email Optimization: How To Prioritize Your A/B Testing

The post A/B Testing Prioritization: The surprising ROI impact of test order appeared first on MarketingExperiments.

In Conversion Optimization, The Loser Takes It All

Most of us at some point in our lives have experienced that creeping, irrational fear of failure, of being an imposter in our chosen profession or deemed “a Loser” for not getting something right the first time. marketers who work in A/B testing and conversion optimization.

We are constantly tasked with creating new, better experiences for our company or client and in turn the customers they serve. Yet unlike many business ventures or fire-and-forget ad agency work, we then willingly set out to definitively prove that our new version is better than the old, thus throwing ourselves upon the dual fates of customer decision making and statistical significance.

And that’s when the sense of failure begins to creep in, when you have to present a losing test to well-meaning clients or peers who were so convinced that this was a winner, a surefire hit. The initial illusion they had — that you knew all the right answers — so clinically shattered by that negative percentage sign in front of your results.

Yet of course herein lays the mistake of both the client and peer who understandably need quick, short-term results or the bravado of the marketer who thinks they can always get it right the first time.

A/B testing and conversion optimization, like the scientific method these disciplines apply to marketing, is merely a process to get you to the right answer, and to view it as the answer itself is to mistake the map for the territory.

I was reminded of this the other day when listening to one of my favorite science podcasts, “The Skeptics Guide to the Universe,” hosted by Dr. Steven Novella, which ends each week with a relevant quote. That week they quoted Brazilian-born, English, Nobel Prize-winning zoologist Sir Peter B. Medawar (1915 -1987) from his 1979 book “Advice to a Young Scientist.” In it he stated, “All experimentation is criticism. If an experiment does not hold out the possibility of causing one to revise one’s views, it is hard to see why it should be done at all.”This quote for me captures a lot of the truisms I’ve learnt in my experience as a conversion optimization marketer, as well as addresses a lot of the confusion in many MECLABS Institute Research Partners and colleagues who are less familiar with the nature and process of conversion optimization.

Here are four points to keep in mind if you choose to take a scientific approach to your marketing:

1. If you truly knew what the best customer experience was, then you wouldn’t test

I have previously been asked after presenting a thoroughly researched outline of planned testing, that although the methodic process to learning we had just outlined was greatly appreciated, did we not know a shortcut we could take to get to a big success.

Now, this is a fully understandable sentiment, especially in the business world where time is money and everyone needs to meet their targets yesterday. That said, the question does fundamentally miss the value of conversion optimizing testing, if not the value of the scientific method itself. Remember, this method of inquiry has allowed us — through experimentation and the repeated failure of educated, but ultimately false hypotheses — to finally develop the correct hypothesis and understanding of the available facts. As a result, we are able to cure disease, put humans on the moon and develop better-converting landing pages.

In the same vein, as marketers we can do in-depth data and customer research to get us closer to identifying the correct conversion problems in a marketing funnel and to work out strong hypotheses about what the best solutions are, but ultimately we can’t know the true answer until we test it.

A genuine scientific experiment should be trying to prove itself wrong as much as it is proving itself right. It is only through testing out our false hypothesis that we as marketers can confirm the true hypothesis that represents the correct interpretation of the available data and understanding of our customers that will allow us to get the big success we seek for our clients and customers.

2. If you know the answer, just implement it

This particularly applies to broken elements in your marketing or conversion funnel.

An example of this from my own recent experience with a client was when we noticed in our initial forensic conversion analysis of their site that the design of their cart made it almost impossible to convert on a small mobile or desktop screen if you had more than two products in your cart.

Looking at the data and the results from our own user testing, we could see that this was clearly broken and not just an underperformance. So we just recommended that they fix it, which they did.

We were then able to move on and optimize the now-functioning cart and lower funnel through testing, rather than wasting everyone’s time with a test that was a foregone conclusion.

3. If you see no compelling reason why a potential test would change customer behavior, then don’t do it

When creating the hypothesis (the supposition that can be supported or refuted via the outcome of your test), make sure it is a hypothesis based upon an interpretation of available evidence and a theory about your customer.

Running the test should teach you something about both your interpretation of the data and the empathetic understanding you think you have of your customer.

If running the test will do neither, then it is unlikely to be impactful and probably not worth running.

4. Make sure that the changes you make are big enough and loud enough to impact customer behavior

You might have data to support the changes in your treatment and a well-thought-out customer theory, but if the changes you make are implemented in a way that customers won’t notice them, then you are unlikely to elicit the change you expect to see and have no possibility of learning something.

Failure is a feature, not a bug

So next time you are feeling like a loser, when you are trying to explain why your conversion optimization test lost:

  • Remind your audience that educated failure is an intentional part of the process:
  • Focus on what you learnt about your customer and how you have improved upon your initial understanding of the data.
  • Explain how you helped the client avoid implementing the initial “winning idea” that, it turns out, wasn’t such a winner — and all the money this saved them.

Remember, like all scientific testing, conversion optimization might be slow, methodical and paved with losing tests, but it is ultimately the only guaranteed way to build repeatable, iterative, transferable success across a business.

Related Resources:

Optimizing Headlines & Subject Lines

Consumer Reports Value Proposition Test: What You Can Learn From A 29% Drop In Clickthrough

MarketingExperiments Research Journal (Q1 2011) — See “Landing Page Optimization: Identifying friction to increase conversion and win a Nobel Prize” starting on page 106

The post In Conversion Optimization, The Loser Takes It All appeared first on MarketingExperiments.

Conversion Optimization Testing: Validity threats from running multiple tests at the same time

A/B testing is popular among marketers and businesses because it gives you a way to determine what really works between two (or more) options.

However, to truly extract value from your testing program, it requires more than simply throwing some headlines or images into a website testing tool. There are ways you can undermine your testing tool that the tool itself can’t prevent.

It will still spit out results for you. And you’ll think they’re accurate.

These are called validity threats. In other words, they threaten the ability of your test to give you information that accurately reflects what is really happening with your customer. Instead, you’re seeing skewed data from not running the test in a scientifically sound manner.

In the MECLABS Institute Online Testing certification course, we cover validity threats like history effect, selection effect, instrumentation effect and sampling distortion effect. In this article, we’ll zoom in on one example of a selection effect that might cause a validity threat and thus misinterpretation of results — running multiple tests at the same time — which increases the likelihood of a false positive.

Interaction Effect — different variations in the tests can influence each other and thus skew the data

The goal of an experiment is to isolate a scenario that accurately reflects how the customer experiences your sales and marketing path. If you’re running two tests at the same time, the first test could influence how they experience the second test and therefore their likelihood to convert.

This is a psychological phenomenon known as priming. If we talk about the color yellow and then I ask you to mention a fruit, you’re more likely to answer banana. But if we talk about red and I ask you to mention a fruit, you’re more likely to answer apple. 

Another way interaction effect can threaten the validity is with a selection effect. In other words, the way you advertise near the beginning of the funnel impacts the type of customer and the motivations of the customer you’re bringing through your funnel.

Taylor Bartlinski, Senior Manager, Data Analytics, MECLABS Institute, provides this example:

“We run an SEO test where a treatment that uses the word ‘cheap’ has a higher clickthrough rate than the control, which uses the word ‘trustworthy.’ At the same time, we run a landing page test where the treatment also uses the word ‘cheap’ and the control uses ‘trustworthy.’  The treatments in both tests with the ‘cheap’ language work very well together to create a higher conversion rate, and the controls in each test using the ‘trustworthy’ language work together just as well.  Because of this, the landing page test is inconclusive, so we keep the control. Thus, the SEO ad with ‘cheap’ language is implemented and the landing page with ‘trustworthy’ language is kept, resulting in a lower conversion rate due to the lack of continuity in the messaging.”

Running multiple tests and hoping for little to no validity threat

The level of risk depends on the size of the change and the amount of interaction. However, that can be difficult to gauge before, and even after, the tests are run.

“Some people believe (that) unless you suspect extreme interactions and huge overlap between tests, this is going to be OK. But it is difficult to know to what degree you can suspect extreme interactions. We have seen very small changes have very big impacts on sites,” Bartlinski says.

Another example Bartlinski provides is where there this is little interaction between tests. For example, testing PPC landing pages that do not interact with organic landing pages that are part of another test — or testing separate things in mobile and desktop at the same time. “This lowers the risk, but there still may be overlap. It’s still an issue if a percentage gets into both tests; not ideal if we want to isolate findings and be fully confident in customer learnings,” Bartlinski said.

How to overcome the interaction effect when testing at the speed of business

In a perfect scientific experiment, multiple tests would not be run simultaneously. However, science often has the luxury of moving at the speed of academia. In addition, many scientific experiments are seeking to discover knowledge that can have life or death implications.

If you’re reading this article, you likely don’t have the luxury of taking as much time with your tests. You need results — and quick. You also are dealing with business risk, and not the high stakes of, for example, human life or death.

There is a way to run simultaneous tests while limiting validity threats — running multiple tests on (or leading to) the same webpage but splitting traffic so people do not see different variations at the same time.

“Running mutually exclusive tests will eliminate the above validity threats and will allow us to accurately determine which variations truly work best together,” Bartlinski said.

There is a downside though. It will slow down testing since an adequate sample size is needed for each test. If you don’t have a lot of traffic, it may end up taking the same amount of time as running tests one after another.

What’s the big idea?

Another important factor to consider is that the results from grouping the tests should lead to a new understanding of the customer — or what’s the point of running the test?

Bartlinski explains, “Grouping tests makes sense if tests measure the same goal (e.g., reservations), they’re in the same flow (e.g., same page/funnel), and you plan to run them for the same duration.”

The messaging should be parallel as well so you get a lesson. Pointing a treatment ad that focuses on cost to a treatment landing page that focuses on luxury, and then a treatment ad that focuses on luxury pointing to an ad that focuses on cost will not teach you much about your customer’s motivations.

If you’re running multiple tests on different parts of the funnel and aligning them, you should think of each flow as a test of a certain assumption about the customer as part of your overall hypothesis.

It is similar to a radical redesign. Much like testing multiple steps of the funnel can cause an interaction effect, testing multiple elements on a single landing page or in a single email can cause an attribution issue. Which change caused the result we see?

Bartlinski provides this example, “On the same landing page, we run a test where both the call-to-action (CTA) and the headline have been changed in the treatment. The treatment wins, but is it because of the CTA change or the headline? It is possible that the increase comes exclusively from the headline, while the new CTA is actually harming the clickthrough rate. If we tested the headline in isolation, we would be able to determine whether the combination of the new headline and old CTA actually has the best clickthrough, and we are potentially missing out on an even bigger increase.”

While running single-factorial A/B tests is the best way to isolate variables and determine with certainty which change caused a result, if you’re testing at the speed of business you don’t have that luxury. You need results and you need them now!

However, if you align several changes in a single treatment around a common theme that represents something you’re trying to learn about the customer (aka radical redesign), you can get a lift while still attaining a customer discovery. And then, in follow-up single-factorial A/B tests, narrow down which variables had the biggest impact on the customer.

Another cause of attribution effect is running multiple tests on different parts of a landing page because you assume they don’t interact. Perhaps, you run a test on two different ways to display locations on a map in the upper left corner of the page. Then a few days later, while that test is still running, you launch a second test on the same page but in the lower right corner on how star ratings are displayed in the results.

You could assume these two changes won’t have an effect on each other. However, the variables haven’t been isolated from the tests, and they might influence each other. Again, small changes can have big effects. The speed of your testing might necessitate testing like this; just know the risk involved in terms of skewed results.

To avoid that risk, you could run multivariate tests or mutually exclusive tests which would essentially match each combination of multiple variables together into a separate treatment. Again, the “cost” would be that it would take longer for the test to reach a statistically significant sample size since the traffic is split among more treatments.

Test strategically

The big takeaway here is — you can’t simply trust a split testing tool to give you accurate results. And it’s not necessarily the tool’s fault. It’s yours. The tool can’t possibly know ways you are threatening the validity of your results outside that individual split test.

If you take a hypothesis-driven approach to your testing, you can test fast AND smart, getting a result that accurately reflects the real-world situation while discovering more about your customer.

You might also like:

Online Testing certification course — Learn a proven methodology for executing effective and valid experiments

Optimization Testing Tested: Validity threats beyond sample size

Validity Threats: 3 tips for online testing during a promotion (if you can’t avoid it)

B2B Email Testing: Validity threats cause Ferguson to miss out on lift from Black Friday test

Validity Threats: How we could have missed a 31% increase in conversions

The post Conversion Optimization Testing: Validity threats from running multiple tests at the same time appeared first on MarketingExperiments.

Call Center Optimization: How a nonprofit increased donation rate 29% with call center testing

If you’ve read MarketingExperiments for any length of time, you know that most of our marketing experiments occur online because we view the web as a living laboratory.

However, if your goal is to learn more about your customers so you can practice customer-first marketing and improve business results, don’t overlook other areas of customer experimentation as well.

To wit, this article is about a MECLABS Institute Research Partner who engaged in call center testing.

Overall Research Partnership Objective

Since the Research Partner was a nonprofit, the objective of the overall partnership focused on donations. Specifically, to increase the total amount of donations (number and size) given by both current and prospective members.

While MECLABS engaged with the nonprofit in digital experimentation as well (for example, on the donation form), the telephone was a key channel for this nonprofit to garner donations.

Call Script Test: Initial Analysis

After analyzing the nonprofit’s calls scripts, the MECLABS research analysts identified several opportunities for optimization. For the first test, they focused on the call script’s failure to establish rapport with the caller and only mentioning the possibility of donating $20 per month, mentally creating a ceiling for the donation amount.

Based on that analysis, the team formulated a test. The team wanted to see if they could increase overall conversion rate by establishing rapport early in the call. The previous script jumped in with the assumption of a donation before connecting with the caller.

Control Versus Treatment

In digital A/B testing, traffic is split between a control and treatment. For example, 50% of traffic to a landing page is randomly selected to go to the control. And the other 50% is randomly selected to go to the treatment that includes the optimized element or elements: optimized headline, design, etc. Marketers then compare performance to see if the tested variable (e.g., the headline) had an impact on performance.

In this case, the Research Partner had two call centers. To run this test, we provided optimized call scripts to one call center and left the other call center as the control.

We made three key changes in the treatment with the following goals in mind:

  • Establish greater rapport at the beginning of the call: The control goes right into asking for a donation – “How may I assist you in giving today?” However, the treatment asked for the caller’s name and expressed gratitude for the previous giving.
  • Leverage choice framing by recommending $20/month, $40/month, or more: The control only mentioned the $20/month option. The addition of options allows potential donors to make a choice and not have only one option thrust upon them.
  • Include an additional one-time cause-related donation for both monthly givers and other appropriate calls: The control did not ask for a one-time additional donation. The ongoing donation supported the nonprofit’s overall mission; however, the one-time donation provided another opportunity for donors to give by tying specifically into a real-time pressing matter that the nonprofit’s leaders were focused on. If they declined to give more per month for financial reasons, they were not asked about the one-time donation.

To calibrate the treatment before the experimentation began, a MECLABS researcher flew to the call center site to train the callers and pretest the treatment script.

While the overall hypothesis stayed the same, after four hours of pretesting, the callers reconvened to make minor tweaks to the wording based on this pretest. It was important to preserve key components of the hypothesis; however, the callers could make small tweaks to preserve their own language.

The treatment was used on a large enough sample size — in this case, 19,655 calls — to detect a statistically valid difference between the control and the treatment.

Results

The treatment script increased the donation rate from 14.32% to 18.47% at a 99% Level of Confidence for a 29% relative increase in the donation rate.

Customer Insights

The benefits of experimentation go beyond the incremental increase in revenue from this specific test. By running the experiment in a rigorously scientific fashion — accounting for validity threats and formulating a hypothesis — marketers can build a robust customer theory that helps them create more effective customer-first marketing.

In this case, the “customers” were donors. After analyzing the data in this experiment, the team discovered three customer insights:

  • Building rapport on the front end of the script generated a greater openness with donors and made them more likely to consider donating.
  • Asking for a one-time additional donation was aligned with the degree of motivation for many of the callers. The script realized a 90% increase in one-time gifts.
  • Discovering there was an overlooked customer motivation — to make one-time donations, not only ongoing donations sought by the organization. Part of the reason may be due to the fact that the ideal donors were in an older demographic, which made it difficult for them to commit at a long-term macro level and much easier to commit at a one-time micro level. (Also, it gave the nonprofit an opportunity to tap into not only the overall motivation of contributing to the organization’s mission but contributing to a specific timely issue as well.)

The experimentation allowed the calling team to look at their role in a new way. Many had been handling these donors’ calls for several years, even decades, and there was an initial resistance to the script. But once they saw the results, they were more eager to do future testing.

Can You Improve Call Center Performance?

Any call center script is merely a series of assumptions. Whether your organization is nonprofit or for-profit, B2C or B2C, you must ask a fundamental question — what assumptions are we making about the person on the other line with our call scripts?

And the next step is — how can we learn more about that person to draft call center scripts with a customer-first marketing approach that will ultimately improve conversion?

You can follow Daniel Burstein, Senior Director, Content & Marketing, MarketingExperiments, on Twitter @DanielBurstein.

You Might Also Like

Lead Nurturing: Why good call scripts are built on storytelling

Online Ads for Inbound Calls: 5 Tactics to get customers to pick up the phone

B2B Lead Generation: 300% ROI from email and teleprospecting combo to house list

Learn more about MECLABS Research Partnerships

The post Call Center Optimization: How a nonprofit increased donation rate 29% with call center testing appeared first on MarketingExperiments.

7 Lessons for Testing with Limited Data and Resources

It feels like there’s never enough time, money or headcount to do marketing testing and optimization “right.” The BIG WIN in our marketing experimentation, whether it is conversions, revenue, leads, etc., never seems to come quick enough. I’ve been there with you. However, after running 33+ tests in 18 months with our team, I can testify there are simple lessons and tools to help you optimize your marketing campaigns.

Here is a list of 7 simple, effective lessons and tools to leverage in your marketing optimization.

1. Determine optimal testing sequence based on projected impact, level of importance and level of effort

Marketers are continually asking, “How do I know what to test first?” I recommend prioritizing tests with higher potential revenue impact first. Remember to factor in the time for key phases of testing (build time, run time, analysis and decision time, implementation time), as well as the organization’s ability to develop and implement tests simultaneously.

Sample Test Sequence Calculation Sheet

2. Save development time and money by creating wireframes and prototypes first

Get organizational buy-in using landing page, email, etc. prototypes before spending time and resources on development. Axure is my personal wireframe tool of choice because of the drag and drop nature of the pre-built widgets, the ability to lay out the page with pixel perfect accuracy, and the flexibility of the prototyping functions.

Wireframe [in Axure RP], then develop your landing pages, saving time and money on revisions

3. Determine minimum sample size, test duration and level of confidence

Before testing, determine your necessary sample size, estimated test duration and number of treatments to achieve test results you can feel confident in. Save yourself the frustration and time of conducting inconclusive tests that do not achieve a desired level of confidence. Here’s a simple, free tool to help you get started.

Free Testing Tool from MECLABS

4. Get qualitative, crowd-sourced feedback before launching

Leverage User Testing to better understand consumer preferences and avoid the marketer’s blind spot.

5. Content test emails for increased clickthrough rates

Increase open rates with subject, from-field, and time/day testing; increase clickthrough rates with content testing. Test up to eight versions of each email for increased performance. I am personally a fan of Mailchimp Pro and have driven a lot of sales through this easy to operate ESP.

6. Results are good, repeatable methods are better

A single test that increases conversion rates, sales and revenue is worth celebrating. However, developing customer insights that allow you to apply what you learned to future marketing campaigns will result in a much larger cumulative impact on the business. What was the impact from the test, AND more importantly, what did we learn?

Summarize results AND customer insights after each test

7. Bring your team with you

Organizational transformation toward a culture of testing and optimization only occurs when others believe in it with you. The most practical education I received to increase my marketing performance was from the world’s first (and only) graduate education program focused specifically on how to help marketers increase conversion. Having three people from our organization in the program changed how we talk about marketing. We moved from making decisions based on our intuition to testing our hypotheses to improve performance. 

Sample lecture from Graduate Certificate Program with Dr. Flint McGlaughlin

The post 7 Lessons for Testing with Limited Data and Resources appeared first on MarketingExperiments.

The Hypothesis and the Modern-Day Marketer

On the surface, the words “hypothesis” and “marketing” seem like they would never be in the same sentence, let alone the same paragraph. Hypotheses are for scientists with fancy lab coats on, right? I totally understand this perspective because, unless A/B Testing CRO (conversion rate optimization) is part of your company’s culture, you may ask yourself, “Where would I ever use a hypothesis in my daily activities?” To this question, I would answer “EVERYWHERE.”

By everywhere, I don’t just mean for your marketing collateral but also for any change within your company. For example, I oversee the operations of our Research Partnerships, and when making a team structure change earlier this year, I created a hypothesis that over the next few months I will prove or disprove. Here’s a modified version of this hypothesis:

IF we centralize partnership launches BY creating a dedicated Research Partner launch team and by building out processes that systematically focus on the most critical business objectives from the start, we WILL increase efficiency and effectiveness (and ultimately successful Partnerships) BECAUSE our Research Partners place high value on efforts that achieve their most critical business objectives first.

Can your offers and messaging be optimized?

The American Marketing Association defines marketing as “the activity, set of institutions, and processes for creating, communicating, delivering, and exchanging offerings that have value for customers, clients, partners, and society at large.”

So, if you are reading this blog right now and your job function reflects the above definition even in the slightest, can you answer one question for me: Do you have 100% confidence that each offer and message you create and deliver to your customers is the best it could possibly be?

If you answered yes, then our country desperately needs you. Please apply here: https://www.usa.gov/government-jobs.

If you answered no, then you are like most of us (and Socrates, by the way, i.e., “I know that I know nothing”) and you should have hypothesis creation operationalized into your process, because why wouldn’t you want to test message A versus message B to a small audience before sending an email out to your entire list?

How to create a marketing hypothesis

So first, let’s discuss what a hypothesis is and how to create one.

While there are several useful definitions of the word hypothesis, in Session 05: Developing an Effective Hypothesis, University of Florida’s graduate course MMC 5422, Flint McGlaughlin proposes the following definition as a useful starting point in behavioral testing: “A hypothesis is a supposition or proposed statement made on the basis of limited evidence that can be supported or refuted and is used as a starting point for further investigation.”

Now that we know what we are looking for, we need a tool to help us get there. In Session 04, Crafting a Powerful Research Question of this same course, McGlaughlin reveals that this tool is the MECLABS Discovery Triad, a conceptual thinking tool that leads to the creation of an effective hypothesis — the “h” in the center of the triad represents the hypothesis.

Before creating a hypothesis, the scientists at MECLABS Institute use this Discovery Triad to complete the following steps for all of our Research Partners:

  1. We uncover the business objective (or business question) driving the effort. Typically, we find two patterns regarding the business objective. First, it is broader in scope than a research question that would be suitable for an experiment. Second, this objective takes the form of a question, which typically starts with the interrogative “How.” For example, “How do I get more leads?” “How do I drive more traffic?” or “How do I increase clickthrough rate (CTR)?” My business objective in the examples above was, “How do I create a more valuable research partnership from the perspective of the research partner?”
  1. Now that we are focused on an objective, we ask a series of “What” questions. For example, “What is happening on this webpage?” or “Where are visitors to page A going if they do not make it to page B?” Essentially, we are looking to understand what the data can tell us about the behavior of the prospective customer. This series of “What” questions should encompass both quantitative (e.g., on-page clicks, next page visits, etc.) and qualitative questions. (e.g., What is the customer’s experience on this page?)
  1. We ask a question which starts with the interrogative “Why.” A “Why” question enables us to make a series of educated guesses as to why the customer is behaving in a certain way. For example, “Why are 75% of visitors not clicking the ‘Continue to Checkout’ button?” “Why are 20% of shoppers not adding the blue widget to their cart?” or “Why are only 5% of visitors starting the free trial from this page?” To answer “Why” questions, the research scientists at MECLABS apply the patented Conversion Heuristic to the page:
    1. What can we remove, add or change to minimize perceived cost?
    2. What can we remove, add or change to intensify perceived value?
    3. What can we remove, add or change to leverage motivation?

  1. We ask a second, more refined, “How” question (research question) that identifies the best testing objective. For example, if your business question was, “How do we sell more blue widgets?” and during the “What” stage, you analyzed your entire funnel, discovering that the biggest performance leak is on your blue widget product page, then your Research Question could be something like, “How do we increase CTR from the blue widget product page to the shopping cart checkout page?”

Essentially, a powerful Research Question focuses your broader Business Question around a specific subject of research. After all, there are many ways to sell more blue widgets but only a handful of possible ways to sell more blue widgets from the product page.

The four components of a hypothesis

With a powerful Research Question created, you are now ready to develop a series of hypotheses that will help you discover how to express your offer to achieve more of your desired customer behavior using the MECLABS Four-step Hypothesis Development Framework:

  1. The IF statement. This is your summary description of the “mental lever” your primary change will pull. In fact, the mental lever is usually a cognitive shift you are trying to achieve in the mind of your prospective customers. For example, “IF we emphasize the savings they receive” or “IF we bring clarity to the product page.”
  1. The BY statement. This statement lists the variable(s) or set of variables (variable cluster) you are testing. This statement typically involves the words “add, remove or change.” For example, “BY removing unnecessary calls-to-action (CTAs)” or “BY adding a relevant testimonial and removing the video player.” (Tip: This statement should not contain detailed design requirements. That next level of precision occurs when you develop treatments or articulate your hypothesis.)
  1. The WILL statement. This should be the easiest statement to compose because it is the desired result you hope to achieve from the changes you are proposing. For example, “We WILL increase clickthrough rate,” or “We WILL increase the number of video views.” (Tip: This statement should tightly align with the Test Question.)
  1. The BECAUSE statement. While last in order of appearance, this statement is the most critical as it’s what connects your work deeply into your customer’s being. By that I mean, the metric identified in your WILL statement either increased or decreased because the change you made resonated or did not resonate in the mind of your customer. For example, “BECAUSE prospective customers were still searching for the best deal, and every distraction made them think that there’s a better deal still out there,” or “BECAUSE prospective customers clearly see the savings they receive.” (Tip: Your BECAUSE statement should be centered around a single customer insight that, through testing, adds to your broader customer theory.)

So, if you put all this together, you have:

IF we bring clarity to the product page BY removing unnecessary CTAs, we WILL increase clickthrough rate BECAUSE prospective customers were still searching for the best deal, and every distraction made them think that there’s a better deal still out there.

Just like I used the Discovery Triad and Four-Step Hypothesis Development Frameworks for a team structure change, these processes can be used for any type of collateral you are “creating, communicating, delivering.” Even if you are not A/B testing and just making changes or updates, it’s a valuable exercise to go through to ensure you are not just constantly throwing spaghetti on the wall, but rather, systematically thinking through your changes and how they impact the customer.

You might also like

Online Marketing Tests: How Could You Be So Sure?

The world’s first (and only) graduate education program focused specifically on how to help marketers increase conversion

Optimizing Forms: How To Increase The Perceived Value For Your Customers

Download the Executive Series: The Web as a Living Laboratory

The post The Hypothesis and the Modern-Day Marketer appeared first on MarketingExperiments.