6910 week 8 - testing & optimization

41
Testing & Optimization ISM 6910 – Week 8

Transcript of 6910 week 8 - testing & optimization

Page 1: 6910   week 8 - testing & optimization

Testing & OptimizationISM 6910 – Week 8

Page 2: 6910   week 8 - testing & optimization

Week 8 Topics

• Testing

• End Action

• Attribution

• Media Mix Modeling

Page 3: 6910   week 8 - testing & optimization

Testing

Page 4: 6910   week 8 - testing & optimization

TestingA/B Tests:

Multi Variety Tests:

vs.

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

Go

Search Now

Book Now

= Test cells executed during test = Test cells evaluated but not executed

Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving

Page 5: 6910   week 8 - testing & optimization

A/B Test Example:Videos continue to score higher in NSAT and PSAT, but under perform in conversions.

NSAT Results - FYQ2 2011With video results were stat. significantly higher

Unique Visitors

NSAT PSAT

FPP Upgrade

Conv. Rate

Avg. Revenue

(All SKUs)

Compare w/o Video

186,330 108 134 0.35% $1.13

Compare with Video

187,185 124 136 0.30% $0.95

Lift 16 2 (0.05%) ($0.18)

*PSAT lift only has a stat. significance of 91%, all other are +99%

FYQ2 results are inline with Septembers findings, which showed adding a video to the compare page has had a positive impact on visitor’s NSAT and PSAT scores.

A possible down side to adding more videos is they may serve as a distraction, causing visitors to miss the Buy Now button and lowering conversion rates.

Numbers have been doctored to hide client sensitive data

Page 6: 6910   week 8 - testing & optimization

Multivariate TestsFull factorial – test every possible variation. For example if you are testing four different elements and four variations of three element you are looking at 4 x 3 x 3 = 36 combinations.

Partial factorial – Partial factorial tests can be set up in a way that allows you to infer results, the Taguchi method is probably the most commonly used method.

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

It's TimeExpedia

HeroNumbers

Offer only

Go

Search Now

Book Now

= Test cells executed during test = Test cells evaluated but not executed

Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving

Page 7: 6910   week 8 - testing & optimization

Offer

Call to Action

Message*

Shopping Rate by Call to Action

165

191

180

150

180

210

Book Now Search Now Go

Sh

op

pin

g R

ate

pe

r m

illi

on

co

ok

ies

Shopping Rate by Offer

157

200

179

150

180

210

Tanning Beach Cruise Bay Bridge

Sh

op

pin

g R

ate

pe

r m

illi

on

co

ok

ies

Image

Multivariate Example:

Shopping Rate by Call to Action

176

159

201

179

150

180

210

Expedia as Hero It's Time Offer Only Numbers

Sh

op

pin

g R

ate

pe

r m

illi

on

co

ok

ies

Shopping Rate by Offer

174

183

180

150

180

210

Book Together and Save 50% off Hotels Generic

Sh

op

pin

g R

ate

pe

r m

illi

on

co

ok

ies

_____

Page 8: 6910   week 8 - testing & optimization

Pros and Cons

Pros Cons

A/B Test • Set up is relatively easy• Analysis is easier• Don’t need much of a stats

background to interpret results

• Easy to get sucked into testing too many things at once

• A and B need to be different enough to get results

• Time consuming to test on element at a time

Multivariate Test • Less political push back, everyone gets to test their idea

• Get all of the analysis done in one shot

• Easy to mess up• Tools are a black box, or do it

your self + your best PhD stats buddy

• Need a lot of volume or time

Page 9: 6910   week 8 - testing & optimization

Testing Recommendations

Start with high impact tests:• Test home/landing pages• Test Conversions i.e. sign-up forms, cart/purchase

pages, etc.• Test ad design• Price tests (a hard one politically to pull off)

Other great things to tests:• Test landing page/deep linking• Page Heros

Page 10: 6910   week 8 - testing & optimization

Testing Best Practices

• Start with a hypothesis – Don’t just start testing random stuff like colors unless you have a good reason.

• Set goals – Looking to improve conversion rate by x%

• What is significant – We’re not testing drugs, no one’s life is on the line so 99.9% statistical significance is probably over kill, but what about say 60%???

Page 11: 6910   week 8 - testing & optimization

More Testing Tips

• Get help – Setting up the test, searching through your old stats notes can be a challenge. Don’t

• Make it fun/interesting – It takes a lot to pull of a good test: UX, creative team, site dev, analysts, and maybe more. Plus someone’s budget. Everyone has an opinion and/or theory, you can use that to get momentum for a testing project.

At Getty we held a company wide contest to see who could pick the winner of a multivariate test. There were +300 possible combinations and everyone got to vote on which one they thought would be the winner.

Page 12: 6910   week 8 - testing & optimization

End Action – Site Surveys

Page 13: 6910   week 8 - testing & optimization

How It Works

• NSAT• PSAT• Value Prop• Purch. Intent

Page 14: 6910   week 8 - testing & optimization

Attitudinal End Action ProcessEA captures both behavioral and attitudinal data and correlates shifts in attitude with end actions taken on site:

Page 15: 6910   week 8 - testing & optimization

What is it good for

Page 16: 6910   week 8 - testing & optimization

Combining attitudinal & behavioral data

The End Action scorecard was originally designed to value experiences based on shifts in attitudes. For Q4 2010, we added Microsoft Store Purchase behavior as well:

Page 17: 6910   week 8 - testing & optimization

End Action conversion ratesUsing End Action cookie data we can report on a more accurate conversion rate.If we assume most site visits don’t last longer than 30 minutes, we can conclude less than half of Store buyers (43%) make a purchase during their first site visit. The remaining purchasers return later (sometimes days later) to complete their purchase. Using End Action cookie data site visitors who read a product review, leave the Shop page, and return later to finally make a purchase will still be counted when reporting on site visitors who read a product review and then made a purchase.

Numbers have been doctored to hide client sensitive data

Page 18: 6910   week 8 - testing & optimization

Challenges

Page 19: 6910   week 8 - testing & optimization

Measurement overload

Numbers have been doctored to hide client sensitive data

Page 20: 6910   week 8 - testing & optimization

EA measures correlations, not causation

Example: People who watch 7 Second demos have 10% higher Win.com NSAT than people who don’t watch demos

However, EA quantifies correlation, not causation• Cannot immediately say: Watching videos makes people more 10% satisfied• This requires additional information such as specific testing, observation, and insight

Page 21: 6910   week 8 - testing & optimization

Survey timing can create respondent biases

Site visitors are invited to take the EA survey as soon as they leave the windows domain. So, as site visitors move further down the funnel, survey respondents start to look more like visitors who are abandoning their cart vs. purchasers. This can be seen in the illustration below.

In this example, Visitor A takes the survey and will be included in the NSAT results for the Visit Shop EA, but does not purchase. While Visitor B completes the purchase process but by doing so, never receives a survey invite.

Page 22: 6910   week 8 - testing & optimization

Key Insights

Page 23: 6910   week 8 - testing & optimization

Compare Page VideoVideos continue to score higher in NSAT and PSAT, but under perform in conversions.

NSAT Results - FYQ2 2011With video results were stat. significantly higher

Unique Visitors

NSAT PSAT

FPP Upgrade

Conv. Rate

Avg. Revenue

(All SKUs)

Compare w/o Video

186,330 108 134 0.35% $1.13

Compare with Video

187,185 124 136 0.30% $0.95

Lift 16 2 (0.05%) ($0.18)

*PSAT lift only has a stat. significance of 91%, all other are +99%

FYQ2 results are inline with Septembers findings, which showed adding a video to the compare page has had a positive impact on visitor’s NSAT and PSAT scores.

A possible down side to adding more videos is they may serve as a distraction, causing visitors to miss the Buy Now button and lowering conversion rates.

Numbers have been doctored to hide client sensitive data

Page 24: 6910   week 8 - testing & optimization

Attitudes Influence Buying BehaviorAs site visitors move deeper into the site and further down the purchase funnel, we start to see an increase in both site satisfaction (NSAT) and the Windows 7 Upgrade conversion rate. Based on the EA survey data, we know that we have some levers for improving site satisfaction – using video or interactive experiences, providing value added downloads etc. From this data, we can see that by first improving NSAT, we can push more people into a transactional mode on the site.

(1)

(2)

(1) NSAT % ∆ from FYQ4 EAA Scorecard(2) FPP Upgrade # ∆ from FYQ4 Sales Trans Scorecard Note: Purchase NSAT & Video Conv. Rate were not statistically significant by

+/-5%

*Conversion rate = purchasers who took the end action / count of unique cookies who took the end action.

Numbers have been doctored to hide client sensitive data

Page 25: 6910   week 8 - testing & optimization

Target Content = Higher ScoresWindows 7 visitors – Visitors who visited the Compare pages, Anytime Upgrade, and Features pages had higher NSAT scores while less relevant pages like the Upgrade Advisor and the Get win7 default page scored lower.

Vista Visitors – Vista users who visited the Compare pages and Upgrade Advisor related pages had higher NSAT scores. The less relevant Anytime Upgrade pages scored lower.

XP Visitors – Similar to the Vista users, the Compare pages and Upgrade Advisor related pages had higher NSAT scores, while the less relevant Anytime Upgrade pages scored lower.

Numbers have been doctored to hide client sensitive data

Page 26: 6910   week 8 - testing & optimization

Multi Touch Attribution

Page 27: 6910   week 8 - testing & optimization

Ad Conversions

When a user clicks on an Ad they re-directed through Atlas to the destination page.

Atlas

Atlas records the click and re-directs the user to the destination page.

Atlas Ad Server(img server,

CDN)

Atlas

1x1

If the site has an action tag on the landing page the visit can now be directly tied back to the ad.

Atlas can then tie each ad impressions and click back to the action tag, (per cookie). This is data is then used to optimize the ad campaign.

Page 28: 6910   week 8 - testing & optimization

GA Video

http://youtu.be/Cz4yHOKE5j8

Page 29: 6910   week 8 - testing & optimization

Advanced Attribution: DetailsProblem: Ad-server rules are heavily biased in favor of click-based and ‘last-touch’ exposures (i.e. branded search) and undervalue a person’s history of exposure to display media.

Objective: Correct this bias by reallocating credit for conversions in proportion to the relative contribution of past exposures.

Approach: Model cookie-exposure history to estimate relative contribution. Use model estimates to ‘score’ the individual placements; awarding each placement some, all, or no credit for a cookies conversion.

Action: Media-planners may optimize online media budget, either during or after a campaign, towards those publishers and engagements that drive the greatest ROI.

Page 30: 6910   week 8 - testing & optimization

There are several approaches

Method II: Recency-weighted Attribution

Score is assigned according to its time distance to conversion

Special weight might be given to the first and last touch point

Method III: Probabilistic Attribution

Weight is given according to conversion probability change from exposure to the ads

Probability is calculated from predicting models on ads frequency and attributes

1/n C1/n1/n 1/n 1/n 1/n 1/n 1/n

CS.t1S.t2S.t3S.t4S.t5S.tn S.tn-1 S.tn-2

CΔP1ΔP2ΔP3ΔP4ΔP5ΔP8 ΔP7 ΔP6

Simple approach, but flawed in that it’s really a “welfare state” for media that does not address relative efficacy

More nuanced approach differentiates by recency, but does not account for relative performance differences of different formats

More complex performance-based approach uses the change in historical conversion probability per exposure to allocate credit

Method I: Even Distribution

Score = 1/ n (n is total exposure frequency)

Page 31: 6910   week 8 - testing & optimization

Outcome ExampleUsing the conversion rates under the attribution model certain placements and networks look better or worse, this directly effects how and where the media team purchases ad placements.

Page 32: 6910   week 8 - testing & optimization

Incremental revenue from attribution

Incremental revenue increase is calculated by comparing attribution media optimization against last touch media optimization.

Incremental revenue increase varies with the degree of optimization shift from least to most efficient media.

5% optimization: +$15 million (+2.67%).

10% optimization: +$29 million (+5.09%) incremental revenue increase.

15% optimization: +$45 million (+7.95%) incremental revenue increase.

Base Lowest 5% Lowest 10% Loweest 15%

$520,000,000

$540,000,000

$560,000,000

$580,000,000

$600,000,000

$620,000,000

$640,000,000

$563,957,214

$576,048,162 $579,374,307 $582,368,257

$563,957,214

$591,141,240$608,291,365

$627,830,612

Revenue With OptimizationStandard Last Touch Razorfish Advanced Attribution

$15 MM

$29MM

$45 MM

Page 33: 6910   week 8 - testing & optimization

Case Study

0.69

1.02

ControlTest

0.69%

1.02%

48% lift in Paid Search Click-Through Rate due to Banner Ad Exposure

Test group was exposed to client

media when encountering

campaign placements

Control group was exposed to PSA

media when encountering

campaign placements

• Across clients and advertisers, banner exposure consistently drives incremental search clicks and conversions

• Clearly, some portion of credit for search conversion belongs to prior display (and other media exposure)

• Attribution quantifies the relative contributions of each touch point and allocates credit accordingly

Example is from an apparel retailer. We ran a “true lift test” – where we held out a random control from all display media for a period, and evaluated performance differences between control and exposed. These results are consistent with other similar test run for other clients.

Page 34: 6910   week 8 - testing & optimization

Media Mix Models

Page 35: 6910   week 8 - testing & optimization

ConversionsTV

Radio

Display Mobile

Cinema

Print

Media Mix ModelsProblem: When multi-channel marketing efforts occur simultaneously it can be hard to identify which of these channels responsible for conversions. Answers are difficult to come by when direct measurement of individual-level exposure is not feasible (i.e. OOH, TV etc.).

Objective: Create a model that accurately reflects how well each channel operates within a general business/marketing environment.

Approach: Use daily (or weekly) tracking data to specify the relationship between channel activity and conversion volume. Incorporate into the models channel-specific accumulation and decay effects as well as relevant, macroeconomic indicators and historical events.

Action: Using the results to estimate the channel specific point of diminishing returns, the optimal spend per channel is appraised for future campaigns.

Page 36: 6910   week 8 - testing & optimization

Factors and media effectsThe most important aim of the attribution analysis is to get to the relationship between media spend and the KPI that we are optimizing for. In order to get there, we need to understand each media type impacts KPIs and each other

Page 37: 6910   week 8 - testing & optimization

Ad Stocking effectsAdding the ad stocking effect of media to the model helps account for the diminishing effects of an ad over time. The chart below shows the approximate half life of each media type modeled. Note some media types have a longer half life than others, i.e. the effect of TV ads tend to last longer than a banner ad for example.

Optimizer

Ad Stocking Effects

Effectiveness Curves

Media Cost Curves

Total Budget

Typical Half Life's by Media Types

Page 38: 6910   week 8 - testing & optimization

Media effectiveness curvesThe effectiveness of media diminishes as the volume of exposure is increased. Eventually the incremental change in media will have little to no effect on the reached audience, the saturation point. Each media type reaches its saturation point at different levels of exposure (GRPs).

Diminishing Returns by Media Type

Optimizer

Ad Stocking Effects

Effectiveness Curves

Media Cost

Curves

Total Budget

Page 39: 6910   week 8 - testing & optimization

Media cost effects

Media Reach Curves: • Inventory constraints for each media type.• Planner judgment on maximum feasible investment levels.

Media Cost Curves:• These reflect how media costs scale as spend scales.• These need to capture realities such as increasing costs per reach point,

seasonality etc. in order to pragmatically reflect the media landscape.

Optimizer

Ad Stocking Effects

Effectiveness Curves

Media Cost

Curves

Total Budget

Page 40: 6910   week 8 - testing & optimization

Budget effectsBecause the saturation point and level of effectiveness changes at a different rate for each media type the overall optimal mix for each channel will change with the overall media budget. In the example below shows how optimal mix in spend shifts from one media type to another depending on the level of spend.

Diminishing Returns by Media Spend

Budget A

Budget B

Optimizer

Ad Stocking Effects

Effectiveness

Curves

Media Cost

Curves

Total Budget

Page 41: 6910   week 8 - testing & optimization

OptimizationThe optimizer takes into account all of the factors, ad stocking, diminishing returns, cost and inventory constraints and through and through an iterative process chooses the optimal media channel for each incremental dollar spent.

Diminishing Returns by Media Spend Final Optimized Results

Optimizer

Ad Stocking Effects

Effectiveness

Curves

Media Cost

Curves

Total Budget