howto serve 2500 ad requests / second
-
Upload
miguel-mascarenhas-sousa-filipe -
Category
Documents
-
view
215 -
download
0
Transcript of howto serve 2500 ad requests / second
-
8/8/2019 howto serve 2500 ad requests / second
1/54
-
8/8/2019 howto serve 2500 ad requests / second
2/54
Who am I ?
Team lead ofAnncios Sapo (text ads) team
Team of 5 (+ me) software engineers,designing & developing the
text ads platform
-
8/8/2019 howto serve 2500 ad requests / second
3/54
How to serve VS How we serve
Are we a role-model?
Are there recipes ?
Should there be a how to?
-
8/8/2019 howto serve 2500 ad requests / second
4/54
Summary
Project Overview Full System Overview Serving Text Ads
Latency & Speed Scaling on the Front Backend & backend Services
Availability, Reliability & Fault Tolerance Scalability Issues Monitoring & Logging Programming Languages & Technologies
-
8/8/2019 howto serve 2500 ad requests / second
5/54
-
8/8/2019 howto serve 2500 ad requests / second
6/54
-
8/8/2019 howto serve 2500 ad requests / second
7/54
-
8/8/2019 howto serve 2500 ad requests / second
8/54
Anncios SAPO
Text Advertisement System Search Engines:
BING ( http://www.bing.pt )
SAPO's Search ( http://sl.pt ) SAPO & Partners' Content Network
~200 web sites
Impresa, Publico, Cofina, IOL, controlinveste Peaks at 90M ad requests per day, 2500/sec
http://www.bing.pt/http://sl.pt/http://sl.pt/http://www.bing.pt/ -
8/8/2019 howto serve 2500 ad requests / second
9/54
-
8/8/2019 howto serve 2500 ad requests / second
10/54
Serving
Text Ads
-
8/8/2019 howto serve 2500 ad requests / second
11/54
Concepts
Pay Per Click Business Model
CPC - Cost Per Click
QPS Queries Per Second
CTR - Click Through Rate(clicks / impressions)
-
8/8/2019 howto serve 2500 ad requests / second
12/54
Serving text-ads ..
Majorfeatures:
choose & serve ads
register requests, impressions, clicks,conversions
maintain user budget up to date
Quickly reflect changes in ad pool
-
8/8/2019 howto serve 2500 ad requests / second
13/54
Serving text-ads ...
elect_get_ads() {if( words )ads = get_ads_keywords()
else {
if (crawled_site)ads = get_ads_crawler()
elseads = get_ads_site_keywords()
}site_ads = get_ads_site_targeting()merge_ads(ads,site_ads)
}
-
8/8/2019 howto serve 2500 ad requests / second
14/54
Serving text-ads ...
Election of ads:
requires index
ads['word'] -> [ad1, ad2, ad3..]
ads ordered by:'score' -> f(CTR, CPC, quality)
Auction based onGeneralized second-price Auction
-
8/8/2019 howto serve 2500 ad requests / second
15/54
Serving text-ads..
Otheressentialfeatures:
contextualization of webpages/sites
blacklisting of ads per site
reporting information
scoring quality of ads
anti-fraud systems/fraud detectors
-
8/8/2019 howto serve 2500 ad requests / second
16/54
LATENCY & SPEED
.. and their importance
-
8/8/2019 howto serve 2500 ad requests / second
17/54
Latency
Low latency is required:Search pages (BING,SAPO,..) have to:
search ads (that's us!)
search resultsand merge results together.
ads added last - site developers put ad-
request code at the end of the page (last thing
to load, usually)
-
8/8/2019 howto serve 2500 ad requests / second
18/54
Latency
Withoutgood latency
ads are slow to appear andusers have moved on...
-
8/8/2019 howto serve 2500 ad requests / second
19/54
Latency
Slow ads Low CTR BAD!
Latency has a BIG impacton REVENUE.
-
8/8/2019 howto serve 2500 ad requests / second
20/54
Latency Service Level Agreement
99.9% of reqs under:150 milliseconds
Average response time is:20 milliseconds
Never take more than 1 second.serve blank ads in that case
-
8/8/2019 howto serve 2500 ad requests / second
21/54
How to keep low Latency ?
Pre-computing everything is essential
Fast contextualization lookup
Handle lack of information gracefully(turning essential into optional)
-
8/8/2019 howto serve 2500 ad requests / second
22/54
How to keep low Latency ?
Decouple (and postpone) everythingthat isn't essential to serve ads
.. such as DB writes & otherside effects of
serving ads.
-
8/8/2019 howto serve 2500 ad requests / second
23/54
Fast word lookups - LiveDB
Fast word/site lookup(inverted index of ads)
cache it in local RAM (memcached)
'persistent' backing store is RAM
-
8/8/2019 howto serve 2500 ad requests / second
24/54
Fast word lookups - LiveDB
Offline creation of index:ads['word'] -> [ ad1, ad2, ad3, ad4, ... ]
Lots of details, need to compute additionalinformation for each tuple: (word, ad, CPC):
CTR, Evaluation Score
-
8/8/2019 howto serve 2500 ad requests / second
25/54
Fast word lookups - LiveDB
We choose on MySQL for: fast 'inverted index' creation
(by using Stored procedures & replication)
fast index lookup based on the 'fame'of MySQL speed in simple workloads
Replication for free using MySQL'smaster-slave replication
-
8/8/2019 howto serve 2500 ad requests / second
26/54
Fast word lookups - LiveDB
Workload is almost read-only.(in fact, we can make it read-only with some tricks)
Storage engines FAST for read-only workloads:MySQL MEMORY
MySQL MyISAM
Very, very similar
MEMORY has more restrictions & limitations
-
8/8/2019 howto serve 2500 ad requests / second
27/54
MySQL MEMORY
Extremely fast lookup.
data is guaranteed to be in RAM (or in swap..)
Benchmarked MySQL Memory engine:.. avg response time was around 10-20msecs,
..within our needs!
Constraints:
There is a maximum table size in bytes VARCHAR is in fact.. CHAR()
-
8/8/2019 howto serve 2500 ad requests / second
28/54
MySQL MyISAM
.. After months in production use,
MEMORY engine proved problematic..
Evaluated MyISAM, did benchmarks:
same speed, lower standard deviation.
-
8/8/2019 howto serve 2500 ad requests / second
29/54
-
8/8/2019 howto serve 2500 ad requests / second
30/54
Speed
Speed is .. ?
Queries per second ?
Sequential or Concurrently (Throughput) ?
-
8/8/2019 howto serve 2500 ad requests / second
31/54
-
8/8/2019 howto serve 2500 ad requests / second
32/54
Speed!
avg time is 20msecs = 50 QPS
but... it's totally parallel workload.And most of the time is IOwait on indexlookup.
1 servercpu can do ~6x this: ~300 QPS.. current servers: ~1200 QPS ..
-
8/8/2019 howto serve 2500 ad requests / second
33/54
Scaling on the Front..
Se scale horizontally because:
We can add more Frontends
to handle more QPS
We can add more LiveDB slaves
to handle more SQL Queries
-
8/8/2019 howto serve 2500 ad requests / second
34/54
Backend
Message queueing system:SAPO BROKER
-
8/8/2019 howto serve 2500 ad requests / second
35/54
Backend Services
'compact' & apply operations to the DBMS
runs anti-fraud system
runs contextualizer & crawler
runs semantic analyser
runs reports & suggestions system
-
8/8/2019 howto serve 2500 ad requests / second
36/54
Building the LiveDB
MySQL is the ACID DBMSMySQL is the non-ACID LiveDB.
Python & Stored Procedures create LiveDBin a MySQL DBMS slave,
MySQL replication pushesto the read-only slaves
-
8/8/2019 howto serve 2500 ad requests / second
37/54
-
8/8/2019 howto serve 2500 ad requests / second
38/54
Availability & Reliability
(no downtime please..)
-
8/8/2019 howto serve 2500 ad requests / second
39/54
-
8/8/2019 howto serve 2500 ad requests / second
40/54
Reliability & Fault Tolerance
Almost every service/component is redundant.
Currently there are only 2 single points of
failure: Master DB server Master LiveDB server
And even if BOTH FAIL,we keep serving ads...
-
8/8/2019 howto serve 2500 ad requests / second
41/54
Reliability & Fault Tolerance
Failure in Master LiveDB server: We have a hot spare, Can change master in aprox 5 to 10 minutes
Failure in Master DB: Data starts to pile up on Backend services
Backoffices are unable to operate (no new ads)
if failure lasts a long time:
we might serve ads without budget
Electing a new Master is performed manually
Has happened before, no ad serving downtime.
-
8/8/2019 howto serve 2500 ad requests / second
42/54
Scalability Issues
We can scale horizontally in all but twocomponents currently:
Master DBMS MySQL server(but we are far from saturating it..)
we currently don't plan to 'solve' this
Master LiveDB server
...
-
8/8/2019 howto serve 2500 ad requests / second
43/54
Scalability Issues
Building LiveDB doesn't scale: We build a full new LiveDB everytime
It isn't distributed nor is it easily made parallel
Time is proportional to n of active Bids
LiveDB should be updated only with recent changes in ad pool.
Impossible to do with current main DB data model andwith current LiveDB design
We are currently investing heavily on a solution to this,LiveCouchDB
-
8/8/2019 howto serve 2500 ad requests / second
44/54
Monitoring & Logging
(is everything okay?)
i i i
-
8/8/2019 howto serve 2500 ad requests / second
45/54
Monitoring & Logging
Bad things happen:Log it, deal with it...
We need to know about it:monitor logs
trigger alarm if errors on log..
i i & l i i
-
8/8/2019 howto serve 2500 ad requests / second
46/54
Monitoring & Alarmistics
frontend code failures
intercept errorserve blank ad
log errortrigger alarm
M i i & Al i i
-
8/8/2019 howto serve 2500 ad requests / second
47/54
Monitoring & Alarmistics
network failures
reconnect with exponential backoff
log error
trigger alarm ?
M it i & Al i ti
-
8/8/2019 howto serve 2500 ad requests / second
48/54
Monitoring & Alarmistics
machine failures
replication & redundancysave state to disk
M it i & Al i ti
-
8/8/2019 howto serve 2500 ad requests / second
49/54
Monitoring & Alarmistics
software bugs..
bad (or lack of) data
radio silence
log errortrigger alarm
-
8/8/2019 howto serve 2500 ad requests / second
50/54
Programming Languages
.. and software used
P i L
-
8/8/2019 howto serve 2500 ad requests / second
51/54
Programming Languages
Python (backend)Perl (frontend code)
C (1 app only)Java (broker & reporting)
PHP (backoffices)
SQLJavascript
S ft d
-
8/8/2019 howto serve 2500 ad requests / second
52/54
Software used
Linux
memcached
MySQL
squid
nginx
C tl E l ti
-
8/8/2019 howto serve 2500 ad requests / second
53/54
Currently Evaluating
Languages:
Hadoop PIG
Technologies:
Hadoop
CouchDB
-
8/8/2019 howto serve 2500 ad requests / second
54/54
Questions?