Ad serving kdd2008_2

20
KDD2008 April 10, 2010 Scalable Ad Serving Human Relevance Team Pascale Queva Woohee Kwak Gang Wu and Brendan Kitts http://team/sites/Broadmatch Revenue Team Shuzhen Nong Paul Clark Jeremy Tantrum Test Binu John Harish Krishnan Martin Markov Gong Cheng Deepika Othuluru Sharat Development Hung Nguyen Ashok Madala Gang Wu Program Management Gang Wu Brendan Kitts Brian Burdick Algorithms Ewa Dominowska Shuzhen Nong Susan Dumais Donald Metzler Chris Meek Max Chickering Jesper Lind Abhinai Srivastava Gang Wu Hua Li Jian Hu Hua-Jun Zeng Zheng Chen Jody Biggs Bo Thiesson Kathy Dai Silviu-Petru Cucerzan Robert Ragno

description

Wu, G. and Kitts, B. (2008), Experimental comparison of scalable online ad serving, Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2008), pp. 1008-1015. [PPT Presentation]

Transcript of Ad serving kdd2008_2

Page 1: Ad serving kdd2008_2

KDD2008

April 10, 2010

Scalable Ad Serving

Human Relevance Team

Pascale Queva

Woohee Kwak

Gang Wu and Brendan Kitts

http://team/sites/Broadmatch

Revenue Team

Shuzhen Nong

Paul Clark

Jeremy Tantrum

Test

Binu John

Harish Krishnan

Martin Markov

Gong Cheng

Deepika Othuluru Sharat

Development

Hung Nguyen

Ashok Madala

Gang Wu

Program Management

Gang Wu

Brendan Kitts

Brian Burdick

Algorithms

Ewa Dominowska

Shuzhen Nong

Susan Dumais

Donald Metzler

Chris Meek

Max Chickering

Jesper Lind

Abhinai Srivastava

Gang Wu

Hua Li

Jian Hu

Hua-Jun Zeng

Zheng Chen

Jody Biggs

Bo Thiesson

Kathy Dai

Silviu-Petru Cucerzan

Robert Ragno

Page 2: Ad serving kdd2008_2

KDD2008

The Ad Serving Problem

Banner Advertising

Ad Response

Triger: Pageview

from User

Page 3: Ad serving kdd2008_2

KDD2008

The Ad Serving Problem

Triger: Pageview from User

Ad Response

Ad Response

Paid Search Advertising

Page 4: Ad serving kdd2008_2

KDD2008

The Ad Serving Problem: Technical

Challenge to do this at Scale!

• Problem• Given any Trigger, respond with an Ad that maximizes Revenue…

• Scale• For simple bayesian or codebook method, Scale = Triggers x Ads

• 5 million x 9 million = 45 trillion possible pairs to evaluate for suitability

• Speed• Ad serving should be completed in around 50 miliseconds.

• Can’t store 45 trillion in memory.

• Ad Serving Algorithm• Maintan a codebook of triggers and the ads that should be presented using Hash for rapid

serving. Distribute hash across machines..

• Data mining problem • Come up with a good code-book to use the precious memory resource.

Page 5: Ad serving kdd2008_2

KDD2008

The Ad Serving Problem: Definition

• Given any Trigger, respond with an Ad that maximizes Revenue subject to some constraints

• Constraints include: · Relevance: CTR > x

· Storage limit: Number of code-book pairs < N

· And lots more

· Frequency capping

· Sequence constraints

· Competitive exclusion

· Mainline Reserve constraints

• Let’s have a look at Revenue….

Rev vrs Rel

Page 6: Ad serving kdd2008_2

KDD2008

Revenue in the Ad Business

tktktk crI ,,,

Should we Serve Ad? (0 or 1) * Revenue per action rk,t * Probability of action ck,t

Revenue =

Page 7: Ad serving kdd2008_2

KDD2008

Probability of Action (CTR)

Global CTR = Pr(k) CTR of advertisement without

condition / Popularity of advertisement.

Conditional CTR = Pr(k|t) CTR of advertisement

conditional upon trigger – basic historical

performance

Smoothed CTR = Smoothly vary between the two

Feature-based Model Dtree, Linear Regression,

etc = Disadvantage is that this requires some

knowledge of the ads.

Revenue = tktktk crI ,,,

Ad Serving 101

Page 8: Ad serving kdd2008_2

KDD2008

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

globalctr

history

smoothed

linearreg

dtree

CTR Prediction Accuracy

Global CTR does

surprisingly

well…..

Feature based

methods

Conditional CTR

does well but

peters out

because we lack

data

One could go

model-less at

least for top

15% of data as

measured by

conditional

probabilities

and generate

fairly good

results

Ad servers

purportedly use

this

technique….

Page 9: Ad serving kdd2008_2

KDD2008

Revenue in the Ad Business

tktktk crI ,,,Revenue =

Should we Serve Ad? (0 or 1) * Revenue per action rk,t * Probability of action ck,t

Page 10: Ad serving kdd2008_2

KDD2008

Ad Serving: Solution

• Greedy optimization:

· Add Ik,t to code-book that have the highest expected

revenue (meaning probability of action * payout for action)

· Add while constraints are met. Constraints include.

tktktk crI ,,,Revenue =

Ik,t * rk,t * ck,t = dk,t

Ik,t * rk,t * ck,t = dk,t

Ik,t * rk,t * ck,t = dk,t

Sort

Pick highest E[Revenue]

up to the capacity

constraint

tktktk crI ,,, =

0 0.5 1 1.5 2 2.5 3

x 105

10-7

10-6

10-5

10-4

10-3

10-2

10-1

number of trigger-ads being served

pre

dic

ted C

TR

greedy allocation of trigger,ads to ad-server

1 2 3

Page 11: Ad serving kdd2008_2

KDD2008

Some curious things about maximizing

revenue….

Page 12: Ad serving kdd2008_2

KDD2008

Some curious things about maximizing

revenue….

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26global CTR of expansion

Revenue per display

CTR of

ad

Property noted by Jensen and

other authors: Tendency for

relevance to be correlated with

revenue – advertisers have to be

highly relevant to offer to pay such

high prices since otherwise they

pay for lots of non-converting clicks

Hey, what happened!?!? Might

advertisers with poor CTRs be

trying to make up for it by

increasing their bid price?

Each knot

is a decile

of the

trigger-ad

population

Page 13: Ad serving kdd2008_2

KDD2008

Ad Serving Application

• Use a lookup table to map to keyword-tagged advertisement.

When a user types in “shoes”, map it to the keyword-tagged

advertisement “nike sneakers” (for example).

• The keyword tags and ad creatives are entered by the

advertiser.

• We can choose whether to add a code-book entry or leave it

go

Page 14: Ad serving kdd2008_2

KDD2008

Building it will be a piece of cake…. Not really!

• 1 year to launch

• 55 algorithms tested from 10 teams! Turned into a competition

• Unexpected challenges including Porn, Trademark, Bad

expansions, Editorial policy, Adoption and acceptance by

internal teams

Page 15: Ad serving kdd2008_2

KDD2008

Results

• Implemented on Live.com search engine Paid Advertisements.

• Data for 4 months analyzed in this paper, although system has been

running for the past two years.

• 3 billion impressions

• Experimental test setup:

· Test split randomly on Live search traffic

· Control = Basic Ad Serving Algorithm

· Experimental = Optimized Ad Serving Algorithm

• Positive on all metrics including advertiser value, searcher value,

adCenter performance, but required some work to achieve this

Page 16: Ad serving kdd2008_2

KDD2008

Algorithms which are positive on both

CTR and RPS Oct-Nov 2006

6 27

6 14

6 25 6 316 32

6 11

6 156 1

6 176 4

5 220.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5%

RPS %

CT

R %

(Sc

ale

Re

mo

ve

d)

(Scale Removed)

Page 17: Ad serving kdd2008_2

KDD2008

Ad Serving Revenue vrs Control

5.7%

4.1%

0.7%

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

RPS % RPBS % CTR % (CPBS)

Smart vs Control May 2007 - Jan 2008

Ad Serving Revenue versus Control

(Scale

Rem

oved

)

Page 18: Ad serving kdd2008_2

KDD2008

Ad Serving Revenue vrs Control

$0

$5,000,000

$10,000,000

$15,000,000

$20,000,000

$25,000,000

$30,000,000

Smartmatch Revenue vrs ControlAd Serving Revenue versus Control

(Scale

removed)

Page 19: Ad serving kdd2008_2

KDD2008

Algorithms in Public Domain

Alg14 and Alg24

Jidong Wang, Hua-Jun Zeng, Zheng Chen, Hongjun Lu, Li Tao, Wei-

Ying Ma. ReCoM: Reinforcement Clustering of Multi-Type Interrelated

Data Objects. In Proceedings of the 26th annual international ACM

SIGIR conference on Research and development in information

retrieval (SIGIR'03), pp. 274-281, Toronto, Canada, July 2003.

http://team/sites/Broadmatch/Shared%20Documents/p16477-

wang.pdf

Alg 11

Donald Metzler, Susan Dumais, Chris Meek, (2006), Similarity

Measures for Short Segments of Text, preprint

http://team/sites/Broadmatch/Shared%20Documents/MetzlerDumais

MeekECIR07-Final.doc

Page 20: Ad serving kdd2008_2

KDD2008

Conclusion

• Greedy optimization method for maximizing Revenue or CTR.

• Used very simple features, eg. CTR and Conditional CTR, as well as

more complex ones we haven’t discussed.

• Running live, at scale (7% US Traffic), with control groups

• Revenue and Relevance generally correlated (as noted by Jensen

and other authors), but very high revenue is not correlated with

relevance. Inverted “U” Shaped function! Hypothesis: High revenue

advertisers may be compensating for poor CTR by boosting their

Prices as high as possible.

• Conditional CTR and Global CTR are effective methods for predicting

ad performance. They also avoid training.

• Feature-based prediction most effective.