Efsf2012 Whatsapp Scaling

download Efsf2012 Whatsapp Scaling

of 31

Transcript of Efsf2012 Whatsapp Scaling

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    1/31

    1

    Scaling to Millions ofSimultaneous Connections

    Rick ReedWhatsApp

    Erlang Factory SF

    March 30, 2012

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    2/31

    2

    About ...

    Joined WhatsApp in 2011

    Ne to Erlang

    !ackgro"nd in per#or$ance o# %&'asedsyste$s on Free!S( and )in"*

    +rior ork at ahoo-, S./

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    3/31

    3

    Overview

    he good pro'le$ to hae

    +er#or$ance .oals

    ools and echni4"es

    Res"lts

    .eneral Findings

    Speci#ic Scala'ility Fi*es

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    4/31

    4

    The Problem

    A good pro'le$, '"t a pro'le$ nonetheless

    .roth, Earth4"akes, and Soccer-Msg rates #or past #o"r eeks

    Me*ican earth4"ake

    goals

    5 F

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    5/31

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    6/31

    6

    Performance Goals

    1 Million connections per serer 9 -

    Resilience against disr"ptions "nder load

    So#tare #ail"res

    5ardare #ail"res :serers, netork gear;

    World eents :sports, earth4"akes, etc

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    7/31

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    8/31

    8

    Tools and Techniques

    Syste$ actiity $onitoring :sar;

    =S&leel

    !EAM

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    9/31

    9

    Tools and Techniques

    +rocessor hardare per# co"nters :p$cstat;

    dtrace, kernel lock&co"nting, gpro#

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    10/31

    10

    Tools and Techniques

    #pro# :8 and 8o cp"ti$esta$p;

    !EAM lock&co"nting :inal"a'le---;

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    11/31

    11

    Tools and Techniques

    Synthetic orkload

    .ood #or s"'syste$s ith si$ple inter#aces

    )i$ited al"e #or "seracing syste$s

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    12/31

    12

    Tools and Techniques

    eeBd orkload

    Where side&e##ects can 'e contained

    E*tre$ely "se#"l #or t"ning

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    13/31

    13

    Tools and Techniques

    (ierted orkload

    Add additional prod"ction load to serer

    (NS ia e*tra /+ aliases) iss"es

    /+FW #orarding

    Ran into a #e kernel panics at high conn co"nts

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    14/31

    14

    Results

    /nitial 'ottlenecks appeared aro"nd >2Ck

    First ro"nd o# #i*es got "s to 1M conns

    Fr"it as hanging pretty lo

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    15/31

    15

    Results

    %ontin"ed attacking si$ilar 'ottlenecks

    Achieed 2M conns a'o"t a $onth later

    +"t #"rther opti$iDations on 'ack '"rner

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    16/31

    16

    Results

    !egan opti$iDing app code a#ter Ne ears

    ?nintentional record atte$pt in Fe'

    +eaked at 2

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    17/31

    17

    Results

    Still trying to o'tain el"sie 3M conns

    St< +atrickBs (ay asnBt as l"cky as hoped

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    18/31

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    19/31

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    20/31

    20

    General indings

    %ontention, contention, contention

    Fro$ 200k to 2M ere all contention #i*es

    So$e iss"es are internal to !EAMSo$e addressa'le ith app changes

    Most re4"ired !EAM patches

    So$e re4"ired app changes

    Especially6 partitioning orkload correctly

    So$e co$$on Erlang idio$s co$e at a price

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    21/31

    21

    S!ecific Scalabilit" i#es

    Free!S(

    !ackported S%&'ased kernel ti$eco"nter

    getti$eo#day:2; calls $"ch less e*pensie!ackported ig' netork drier

    5ad iss"es ith MS/&I 4"e"e stalls

    sysctl t"ning

    ='io"s li$its :e

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    22/31

    22

    S!ecific Scalabilit" i#es

    !EAM $etrics

    Sched"ler :G"til, cs, aits, sleeps, 9;

    statistics:$essage4"e"es;Msgs 4"e"ed, Hnon&e$pty 4"e"es, longest 4"e"e

    processin#o:$essage4"e"estats;

    En48de48send co"nt K rates :1s, 10s, 100s;

    statistics:$essageco"nts;

    Aggregation o# $essage4"e"estats

    Ena'le #pro# cp"ti$esta$p #or Free!S(

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    23/31

    23

    S!ecific Scalabilit" i#es

    !EAM $etrics :cont

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    24/31

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    25/31

    25

    S!ecific Scalabilit" i#es

    !EAM t"ning

    LM"l$'cs 32 LM"$'cgs 1

    LM"s$'cs 20>Want large 2M&aligned $seg allocations to

    $a*i$iDe s"perpage pro$otions

    R"n ith real&ti$e sched"ling priority

    Lssct 1 :ia patchO sched"ler spin co"nt;

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    26/31

    26

    S!ecific Scalabilit" i#es

    !EAM contention

    ti$eo#day lock :esp :port re"se;

    (isa'le $seg $a* check

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    27/31

    27

    S!ecific Scalabilit" i#es

    !EAM contention :cont

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    28/31

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    29/31

    29

    S!ecific Scalabilit" i#es

    Erlang "sage

    +re#er os6ti$esta$p to erlang6no

    /$ple$ent cross&node genserer calls itho"t"sing $onitors :red"ces dist tra##ic and proclink lock contention;

    +artition ets and $nesia ta'les and localiDe

    access to s$aller n"$'er o# processesS$all $nesia cl"sters

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    30/31

    30

    S!ecific Scalabilit" i#es

    =pera'ility #i*es

    Added PprependQ option to erlang6send

    Added process#lag:#l"sh$essage4"e"e;

  • 8/10/2019 Efsf2012 Whatsapp Scaling

    31/31

    31

    $uestions% Comments%

    rrhatsapp