Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative to A/B-testing

13
# 1 This presentation was prepared for GOR15 – the General Online Research Conference – taking place from March 19-20, 2015 in Cologne Germany.

Transcript of Modeling Online Hotel Choice: Conjoint analysis as a multivariate alternative to A/B-testing

# 1

This presentation was prepared for GOR15 – the General Online Research

Conference – taking place from March 19-20, 2015 in Cologne Germany.

#

Let’s start with a story…

It was November 2007 and Barack Obama, then a Democratic candidate for

president, was at Google’s headquarters in Mountain View, California, to

speak. Obama said to the Googlers: “I am a big believer in reason and facts

and evidence and science and feedback—everything that allows you to do

what you do. That’s what we should be doing in our government.” He then

hired some people from Google to optimize his online campaign by simple

tests: They broke the landing page of the Obama campaign into its

component parts and prepared a handful of alternatives for each.

For the button, an A/B test of two new word choices—”Learn More” and “Sign

Up Now”—revealed that “Learn More” garnered 18.6 percent more signups

per visitor than the default of “Sign Up.” Similarly, a black-and-white photo of

the Obama family outperformed the default turquoise image by 13.1 percent.

Using both the family image and “Learn More,” signups increased by a

thundering 40 percent. This is one of the most prominent examples for A/B-

testing and it marked the start of the rise of A/B-testing (although Google

already did this since 2000).

2

#

What is A/B-testing?

A/B testing is a method of website optimization in which the conversion rates

of two versions of a page — version A and version B — are compared to one

another using live traffic. Site visitors, such as online shoppers for example,

are assigned to see one version or the other. By tracking the way visitors

interact with the page you can determine which version of the page is most

effective for what you want to achieve with it, e.g. gaining more clicks, longer

visits, more purchases. It’s been used for improving websites (organizations

such as Google or Amazon are famous for doing so) as well as apps.

Advantages of A/B-testing

• Easy setup, super-simple

• Results available quickly

• Demonstrates quantifiable impact of incremental innovation

3

#

A/B works fine if you want to check out how site visitors react to a single new

feature, like a redesigned purchase button or a new search form. In most

cases, however, we want to test a variety of changes on a page or an app,

e.g. different configurations of search forms and purchase buttons. You can

view it as A/B testing has kind of outgrown itself: It has become so popular

over time that now people want to test so many features that it is not all that

simple anymore.

Current challenges of (multivariate) A/B-testing

• Requires a lot of iterations if you want to test multiple variations and

isolate effects of changes on multiple features

• Required traffic on a site must be high to allow for many individual

comparisons

4

#

A question we asked ourselves: What are possible alternatives to A/B-

testing for multiple feature changes?

One alternative we tried: A/Z-testing with experimental designs, e.g. using

conjoint analysis

• Let’s you test various different variations of a feature and possible

scenarios in one go without having to run a gazillion of individual

comparisons

• Well-chosen experimental designs maximize the amount of information

that can be obtained for a given amount of experimental effort

• You can do this, e.g., by employing choice-based conjoint

In most conjoint applications attributes and levels pertain to product features.

The method can also be applied to optimizing user interfaces.

• Model estimates allow for optimization across various attributes and levels

• Allows insights into sensitivities of site visitors to feature changes

• Gives an indication where behavior comes from on feature level, not just

by isolation of features

• Gives options to play what-if games and optimize for the best combination

of features

5

#

The study

We replicated an online hotel booking site (based on booking.com), where

N=1492 respondents chose one out of 50 available hotels as if on a live

booking site.

• We varied product features, but also user interaction design elements to

measure the impact on conversion and online booking behavior.

• We manipulated these features using an orthogonal statistical design.

• We built a model of conversion assuming that each respondent goes

through one choice task only to mimic the live purchase process on a

booking site.

6

#

How did we do this?

In most conjoint applications attributes and levels pertain to products and a

purchase decision, but the method can also be applied to optimizing the user

interface (where elements are placed, how they are designed and executed,

the use of rankings, reviews, and promotions).

We replicated an online hotel booking site (based on booking.com). We

manipulated these features using an orthogonal statistical design:

• brand and type of hotel and price per night

• including a filter and sort option for site visitors

• call to action: including a number of rooms left tag

• including a metric for the distance to the city center

• including customer review scores: including scores for cleanliness, staff

and facilities

• varying positions of hotels on the page (top search item vs. on the first

page, second page, etc.)

7

#

What did we find?

We estimated a logit model of conversion depending on the configuration of

the product features and user interaction design elements manipulated in the

statistical design. We tested the accuracy of the resulting parameters using

out-of-sample tests of forecast performance.

Ideally of course we would have had a perfect prediction (where x = y).

Having had real-life data, we naturally did not achieve this, but we our model

managed to predict the trend of choices fairly well.

8

#

What did we find?

Predictions of hotel choice* (Logit predictions vs. Random guess):

*These are given that we correctly predict 1 out of 50 hotels chosen!

• MAE: 1.60% vs. 3.15%

• R^2: 0.5486 vs. 0.00

• Correlation: 0.7407 vs. 0.00

• Mean absolute error of prediction is significantly lower than that of random

guess (t=4.41, p<0.001)

We conclude from this that with this way of modelling online choices we

manage to predict actual choices.

9

#

What can we do with this kind of model?

1) The same as with an A/B-test: check what the effect of the inclusion of,

e.g. consumer review scores, is on click-through rate, in this example

• Including a consumer reviews for facilities increased click-through by 11%

• Including a consumer reviews for staff increased click-through by 15%

• Including a consumer reviews for cleanliness increased click-through by

21%

2) More than with an A/B-test: We also checked the variation of including

combinations of the three consumer review scores in our statistical design.

We learn that including all three of them in combination gives us an increase

in click-through of 63%.

10

#

1. The answer to our question is: YES, experimental designs as employed

in Conjoint Analysis can be a viable multivariate alternative to A/B-testing

2. A little more difficult to setup than A/B-testing (you need an experimental

design!), but much more efficient in testing multiple feature changes in

one go (e.g. when time or site traffic are limited) � instead of just

changing the media and the button, the Obama campaign team could

have tested several features from the background colour of the page, the

button colour to the placement of the logo or the button itself)

3. Provides insights into which features site visitors are sensitive to, which

features configurations drive conversion, allows for what if games of

various possible feature combinations

11

#

Applications: When to use it?

Not a replacement to A/B-testing, but rather an addition to when A/B-testing

is not appropriate:

• When you have too many alternative features to test and no clear

hypothesis on what may be driving

• When you have too little time/ site traffic for multiple isolated tests

(relevant for small sites/startups)

• Also: On occasions where you really do NOT want to test something live,

e.g. new-to-market sites or apps, new prices

What to do when you want to use it?

1. Think about which features could drive click through / conversion

2. Translate features into conjoint-like attributes and levels

3. Create balanced orthogonal designs for your experiments

4. Assign site visitors randomly to different design versions and measure

click through / conversion

5. Estimate a logit model and simulate outcomes

6. Simulate & optimize what works best for your site or app

12

# 13

Don’t hesitate to contact us with questions, ideas or feedback!

We look forward to hearing from you.