Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

58
Experimenting on Humans Aviran Mordo Head of Back-end Engineering @aviranm http://www.linkedin.com/in / aviran http:// www.aviransplace.com Talya Gendler Back-end Team Leader www.linkedin.com /in/talyagendler

description

How do you know what 55 millions users like? Wix.com is conducting hundreds of experiments every month on production to understand which features our users like and which hurt or improve our business. In this talk we’ll explain how our engineering team is supporting our product managers in making the right decisions and getting our product road map on the right path. We will also present some of the open source tools we developed that help us experimenting our products on humans. While A/B test is a very known and familiar methodology for conducting experiments on production when you do that on a large scale by changing your system behavior every 9 minutes, it entails many challenges in the organization level from developers, product managers, QA, marketing and management. In this talk we will explain what is the life-cycle of an experiment, some of the challenges we faced and the effect on our development process and product evolution.

Transcript of Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Page 1: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Experimenting on Humans

Aviran Mordo

Head of Back-end Engineering

@aviranm

http://www.linkedin.com/in/aviranhttp://www.aviransplace.com

Talya Gendler

Back-end Team Leader

www.linkedin.com/in/talyagendler

Page 2: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014
Page 3: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Wix In Numbers

Over 55M users + 1M new users/month

Static storage is >1.5Pb of data

3 data centers + 3 clouds (Google, Amazon, Azure)

1.5B HTTP requests/day

800 people work at Wix, of which ~ 300 in R&D

Page 4: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

1542 (A/B Tests in 3 months)

Page 5: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Basic A/B testing

Experiment driven development

PETRI – Wix’s 3rd generation open source

experiment system

Challenges and best practices

Complexities and effect on product

Agenda

Page 6: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

10:22

A/B Test

Page 7: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

To B or NOT to B?

A

B

Page 8: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Home page results (How many registered)

Page 9: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Experiment Driven Development

Page 10: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

This is the Wix editor

Page 11: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Our gallery manager

What can we improve?

Page 12: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Is this better?

Page 13: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Don’t be a loser

Page 14: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Product Experiments Toggles & Reporting

Infrastructure

Page 15: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

How do you know what is running?

Page 16: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

If I “know” it is better, do I really need to test it?

Why so many?

Page 17: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014
Page 18: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Sign-upChoose Templat

eEdit site Publish Premiu

m

The theory

Page 19: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Result = Fail

Page 20: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Intent matters

Page 21: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

EVERY new feature is A/B tested

We open the new feature to a % of users

Measure success

If it is better, we keep it

If worse, we check why and improve

If flawed, the impact is just for % of our users

Conclusion

Page 22: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Start with 50% / 50% ?

Page 23: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014
Page 24: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

New code can have bugs

Conversion can drop

Usage can drop

Unexpected cross test dependencies

Sh*t happens (Test could fail)

Page 25: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Language

GEO

Browser

User-agent

OS

Minimize affected users (in case of failure)

Gradual exposure (percentage of…)

Company employees

User roles

Any other criteria you have (extendable)

All users

Page 26: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

First time visitors = Never visited wix.com

New registered users = Untainted users

Not all users are equal

Page 27: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Start new experiment (limited population)

Page 28: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

We need that feature

…and failure is not an option

Page 29: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Adding a mobile view

Page 30: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

First trial failed

Performance had to be improved

Page 31: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Halting the test results in loss of data.

What can we do about it?

Page 32: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Solution – Pause the experiment!

• Maintain NEW experience for already exposed users

• No additional users will be exposed to the NEW feature

Page 33: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

PETRI’s pause implementation

Use cookies to persist assignment

If user changes browser assignment is

unknown

Server side persistence solves this

You pay in performance & scalability

Page 34: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Decision

Keep feature Drop feature

Improve code & resume experiment

Keep backwards compatibility for exposed users forever?

Migrate users to another equivalent feature

Drop it all together (users lose data/work)

Page 35: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

The road to success

Page 36: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Numbers look good but sample size is small

We need more data!

Expand

Reaching statistical significance

25% 50% 75% 100%

75% 50% 25% 0%Control Group (A)

Test Group (B)

Page 37: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Keep user experience consistent

Control Group

(A)

Test Group

(B)

Page 38: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Signed-in user (Editor)Test group assignment is determined by the user IDGuarantee toss persistency across browsers

Anonymous user (Home page)Test group assignment is randomly determinedCan not guarantee persistent experience if changing

browser

11% of Wix users use more than one desktop browser

Keeping persistent UX

Page 39: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Robots are users too!

Page 40: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Always exclude robots

Don’t let Google index a losing page

Don’t let bots affect statistics

Page 41: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

There is MORE than one

Page 42: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

# of active experiment

Possible # of states

10 1024

20 1,048,576

30 1,073,741,824

Possible states >= 2^(# experiments)

Wix has ~200 active experiments = 1.606938e+60

Page 43: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Supporting 2^N different users is challenging

How do you know which experiment causes errors?

Managing an ever changing production env.

Page 44: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Override options (URL parameters, cookies, headers…)

Near real time user BI tools

Specialized tools

Page 45: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Integrated into the product

Page 46: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Why should product care about

the system architecture

Page 47: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Share document with other users

Page 48: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Document owner is part of a test that enables a new video

component

Page 49: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

?

What will the other user experience when editing a shared document ?

Owner Friend

Page 50: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Assignment may be different than owner’s

Owner (B) Friend (A)

Page 51: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Enable features by existing content

Enable features by document owner’s assignment

Exclude experimental features from shared documents

Possible solutions

Page 52: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

A/B testing introduces complexity

Page 53: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Petri is more than just an A/B test framework

Feature toggle

A/B Test

Personalization

Internal testing

Continuous deployment

Jira integration

Experiments

Dynamic configuration

QA

Automated testing

Page 55: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Q&A

Aviran Mordo

Head of Back-end Engineering

@aviranm

http://www.linkedin.com/in/aviranhttp://www.aviransplace.com

Talya Gendler

Back-end Team Leader

www.linkedin.com/in/talyagendler

https://github.com/wix/petri

http://goo.gl/L7pHnd

Page 56: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Creditshttp://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg

http://goo.gl/nEiepT

https://www.flickr.com/photos/ilo_oli/2421536836

https://www.flickr.com/photos/dexxus/5791228117

http://goo.gl/SdeJ0o

https://www.flickr.com/photos/112923805@N05/15005456062

https://www.flickr.com/photos/wiertz/8537791164

https://www.flickr.com/photos/laenulfean/5943132296

https://www.flickr.com/photos/torek/3470257377

https://www.flickr.com/photos/i5design/5393934753

https://www.flickr.com/photos/argonavigo/5320119828

Page 57: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Modeled experiment lifecycle

Open source (developed using TDD from day 1)

Running at scale on production

No deployment necessary

Both back-end and front-end experiment

Flexible architecture

Why Petri

Page 58: Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

PERTI Server Your app

Laboratory

DB Logs