howto serve 2500 ad requests / second

download howto serve 2500 ad requests / second

of 54

Transcript of howto serve 2500 ad requests / second

  • 8/8/2019 howto serve 2500 ad requests / second

    1/54

  • 8/8/2019 howto serve 2500 ad requests / second

    2/54

    Who am I ?

    Team lead ofAnncios Sapo (text ads) team

    Team of 5 (+ me) software engineers,designing & developing the

    text ads platform

  • 8/8/2019 howto serve 2500 ad requests / second

    3/54

    How to serve VS How we serve

    Are we a role-model?

    Are there recipes ?

    Should there be a how to?

  • 8/8/2019 howto serve 2500 ad requests / second

    4/54

    Summary

    Project Overview Full System Overview Serving Text Ads

    Latency & Speed Scaling on the Front Backend & backend Services

    Availability, Reliability & Fault Tolerance Scalability Issues Monitoring & Logging Programming Languages & Technologies

  • 8/8/2019 howto serve 2500 ad requests / second

    5/54

  • 8/8/2019 howto serve 2500 ad requests / second

    6/54

  • 8/8/2019 howto serve 2500 ad requests / second

    7/54

  • 8/8/2019 howto serve 2500 ad requests / second

    8/54

    Anncios SAPO

    Text Advertisement System Search Engines:

    BING ( http://www.bing.pt )

    SAPO's Search ( http://sl.pt ) SAPO & Partners' Content Network

    ~200 web sites

    Impresa, Publico, Cofina, IOL, controlinveste Peaks at 90M ad requests per day, 2500/sec

    http://www.bing.pt/http://sl.pt/http://sl.pt/http://www.bing.pt/
  • 8/8/2019 howto serve 2500 ad requests / second

    9/54

  • 8/8/2019 howto serve 2500 ad requests / second

    10/54

    Serving

    Text Ads

  • 8/8/2019 howto serve 2500 ad requests / second

    11/54

    Concepts

    Pay Per Click Business Model

    CPC - Cost Per Click

    QPS Queries Per Second

    CTR - Click Through Rate(clicks / impressions)

  • 8/8/2019 howto serve 2500 ad requests / second

    12/54

    Serving text-ads ..

    Majorfeatures:

    choose & serve ads

    register requests, impressions, clicks,conversions

    maintain user budget up to date

    Quickly reflect changes in ad pool

  • 8/8/2019 howto serve 2500 ad requests / second

    13/54

    Serving text-ads ...

    elect_get_ads() {if( words )ads = get_ads_keywords()

    else {

    if (crawled_site)ads = get_ads_crawler()

    elseads = get_ads_site_keywords()

    }site_ads = get_ads_site_targeting()merge_ads(ads,site_ads)

    }

  • 8/8/2019 howto serve 2500 ad requests / second

    14/54

    Serving text-ads ...

    Election of ads:

    requires index

    ads['word'] -> [ad1, ad2, ad3..]

    ads ordered by:'score' -> f(CTR, CPC, quality)

    Auction based onGeneralized second-price Auction

  • 8/8/2019 howto serve 2500 ad requests / second

    15/54

    Serving text-ads..

    Otheressentialfeatures:

    contextualization of webpages/sites

    blacklisting of ads per site

    reporting information

    scoring quality of ads

    anti-fraud systems/fraud detectors

  • 8/8/2019 howto serve 2500 ad requests / second

    16/54

    LATENCY & SPEED

    .. and their importance

  • 8/8/2019 howto serve 2500 ad requests / second

    17/54

    Latency

    Low latency is required:Search pages (BING,SAPO,..) have to:

    search ads (that's us!)

    search resultsand merge results together.

    ads added last - site developers put ad-

    request code at the end of the page (last thing

    to load, usually)

  • 8/8/2019 howto serve 2500 ad requests / second

    18/54

    Latency

    Withoutgood latency

    ads are slow to appear andusers have moved on...

  • 8/8/2019 howto serve 2500 ad requests / second

    19/54

    Latency

    Slow ads Low CTR BAD!

    Latency has a BIG impacton REVENUE.

  • 8/8/2019 howto serve 2500 ad requests / second

    20/54

    Latency Service Level Agreement

    99.9% of reqs under:150 milliseconds

    Average response time is:20 milliseconds

    Never take more than 1 second.serve blank ads in that case

  • 8/8/2019 howto serve 2500 ad requests / second

    21/54

    How to keep low Latency ?

    Pre-computing everything is essential

    Fast contextualization lookup

    Handle lack of information gracefully(turning essential into optional)

  • 8/8/2019 howto serve 2500 ad requests / second

    22/54

    How to keep low Latency ?

    Decouple (and postpone) everythingthat isn't essential to serve ads

    .. such as DB writes & otherside effects of

    serving ads.

  • 8/8/2019 howto serve 2500 ad requests / second

    23/54

    Fast word lookups - LiveDB

    Fast word/site lookup(inverted index of ads)

    cache it in local RAM (memcached)

    'persistent' backing store is RAM

  • 8/8/2019 howto serve 2500 ad requests / second

    24/54

    Fast word lookups - LiveDB

    Offline creation of index:ads['word'] -> [ ad1, ad2, ad3, ad4, ... ]

    Lots of details, need to compute additionalinformation for each tuple: (word, ad, CPC):

    CTR, Evaluation Score

  • 8/8/2019 howto serve 2500 ad requests / second

    25/54

    Fast word lookups - LiveDB

    We choose on MySQL for: fast 'inverted index' creation

    (by using Stored procedures & replication)

    fast index lookup based on the 'fame'of MySQL speed in simple workloads

    Replication for free using MySQL'smaster-slave replication

  • 8/8/2019 howto serve 2500 ad requests / second

    26/54

    Fast word lookups - LiveDB

    Workload is almost read-only.(in fact, we can make it read-only with some tricks)

    Storage engines FAST for read-only workloads:MySQL MEMORY

    MySQL MyISAM

    Very, very similar

    MEMORY has more restrictions & limitations

  • 8/8/2019 howto serve 2500 ad requests / second

    27/54

    MySQL MEMORY

    Extremely fast lookup.

    data is guaranteed to be in RAM (or in swap..)

    Benchmarked MySQL Memory engine:.. avg response time was around 10-20msecs,

    ..within our needs!

    Constraints:

    There is a maximum table size in bytes VARCHAR is in fact.. CHAR()

  • 8/8/2019 howto serve 2500 ad requests / second

    28/54

    MySQL MyISAM

    .. After months in production use,

    MEMORY engine proved problematic..

    Evaluated MyISAM, did benchmarks:

    same speed, lower standard deviation.

  • 8/8/2019 howto serve 2500 ad requests / second

    29/54

  • 8/8/2019 howto serve 2500 ad requests / second

    30/54

    Speed

    Speed is .. ?

    Queries per second ?

    Sequential or Concurrently (Throughput) ?

  • 8/8/2019 howto serve 2500 ad requests / second

    31/54

  • 8/8/2019 howto serve 2500 ad requests / second

    32/54

    Speed!

    avg time is 20msecs = 50 QPS

    but... it's totally parallel workload.And most of the time is IOwait on indexlookup.

    1 servercpu can do ~6x this: ~300 QPS.. current servers: ~1200 QPS ..

  • 8/8/2019 howto serve 2500 ad requests / second

    33/54

    Scaling on the Front..

    Se scale horizontally because:

    We can add more Frontends

    to handle more QPS

    We can add more LiveDB slaves

    to handle more SQL Queries

  • 8/8/2019 howto serve 2500 ad requests / second

    34/54

    Backend

    Message queueing system:SAPO BROKER

  • 8/8/2019 howto serve 2500 ad requests / second

    35/54

    Backend Services

    'compact' & apply operations to the DBMS

    runs anti-fraud system

    runs contextualizer & crawler

    runs semantic analyser

    runs reports & suggestions system

  • 8/8/2019 howto serve 2500 ad requests / second

    36/54

    Building the LiveDB

    MySQL is the ACID DBMSMySQL is the non-ACID LiveDB.

    Python & Stored Procedures create LiveDBin a MySQL DBMS slave,

    MySQL replication pushesto the read-only slaves

  • 8/8/2019 howto serve 2500 ad requests / second

    37/54

  • 8/8/2019 howto serve 2500 ad requests / second

    38/54

    Availability & Reliability

    (no downtime please..)

  • 8/8/2019 howto serve 2500 ad requests / second

    39/54

  • 8/8/2019 howto serve 2500 ad requests / second

    40/54

    Reliability & Fault Tolerance

    Almost every service/component is redundant.

    Currently there are only 2 single points of

    failure: Master DB server Master LiveDB server

    And even if BOTH FAIL,we keep serving ads...

  • 8/8/2019 howto serve 2500 ad requests / second

    41/54

    Reliability & Fault Tolerance

    Failure in Master LiveDB server: We have a hot spare, Can change master in aprox 5 to 10 minutes

    Failure in Master DB: Data starts to pile up on Backend services

    Backoffices are unable to operate (no new ads)

    if failure lasts a long time:

    we might serve ads without budget

    Electing a new Master is performed manually

    Has happened before, no ad serving downtime.

  • 8/8/2019 howto serve 2500 ad requests / second

    42/54

    Scalability Issues

    We can scale horizontally in all but twocomponents currently:

    Master DBMS MySQL server(but we are far from saturating it..)

    we currently don't plan to 'solve' this

    Master LiveDB server

    ...

  • 8/8/2019 howto serve 2500 ad requests / second

    43/54

    Scalability Issues

    Building LiveDB doesn't scale: We build a full new LiveDB everytime

    It isn't distributed nor is it easily made parallel

    Time is proportional to n of active Bids

    LiveDB should be updated only with recent changes in ad pool.

    Impossible to do with current main DB data model andwith current LiveDB design

    We are currently investing heavily on a solution to this,LiveCouchDB

  • 8/8/2019 howto serve 2500 ad requests / second

    44/54

    Monitoring & Logging

    (is everything okay?)

    i i i

  • 8/8/2019 howto serve 2500 ad requests / second

    45/54

    Monitoring & Logging

    Bad things happen:Log it, deal with it...

    We need to know about it:monitor logs

    trigger alarm if errors on log..

    i i & l i i

  • 8/8/2019 howto serve 2500 ad requests / second

    46/54

    Monitoring & Alarmistics

    frontend code failures

    intercept errorserve blank ad

    log errortrigger alarm

    M i i & Al i i

  • 8/8/2019 howto serve 2500 ad requests / second

    47/54

    Monitoring & Alarmistics

    network failures

    reconnect with exponential backoff

    log error

    trigger alarm ?

    M it i & Al i ti

  • 8/8/2019 howto serve 2500 ad requests / second

    48/54

    Monitoring & Alarmistics

    machine failures

    replication & redundancysave state to disk

    M it i & Al i ti

  • 8/8/2019 howto serve 2500 ad requests / second

    49/54

    Monitoring & Alarmistics

    software bugs..

    bad (or lack of) data

    radio silence

    log errortrigger alarm

  • 8/8/2019 howto serve 2500 ad requests / second

    50/54

    Programming Languages

    .. and software used

    P i L

  • 8/8/2019 howto serve 2500 ad requests / second

    51/54

    Programming Languages

    Python (backend)Perl (frontend code)

    C (1 app only)Java (broker & reporting)

    PHP (backoffices)

    SQLJavascript

    S ft d

  • 8/8/2019 howto serve 2500 ad requests / second

    52/54

    Software used

    Linux

    memcached

    MySQL

    squid

    nginx

    C tl E l ti

  • 8/8/2019 howto serve 2500 ad requests / second

    53/54

    Currently Evaluating

    Languages:

    Hadoop PIG

    Technologies:

    Hadoop

    CouchDB

  • 8/8/2019 howto serve 2500 ad requests / second

    54/54

    Questions?