Efsf2012 Whatsapp Scaling
Transcript of Efsf2012 Whatsapp Scaling
-
8/10/2019 Efsf2012 Whatsapp Scaling
1/31
1
Scaling to Millions ofSimultaneous Connections
Rick ReedWhatsApp
Erlang Factory SF
March 30, 2012
-
8/10/2019 Efsf2012 Whatsapp Scaling
2/31
2
About ...
Joined WhatsApp in 2011
Ne to Erlang
!ackgro"nd in per#or$ance o# %&'asedsyste$s on Free!S( and )in"*
+rior ork at ahoo-, S./
-
8/10/2019 Efsf2012 Whatsapp Scaling
3/31
3
Overview
he good pro'le$ to hae
+er#or$ance .oals
ools and echni4"es
Res"lts
.eneral Findings
Speci#ic Scala'ility Fi*es
-
8/10/2019 Efsf2012 Whatsapp Scaling
4/31
4
The Problem
A good pro'le$, '"t a pro'le$ nonetheless
.roth, Earth4"akes, and Soccer-Msg rates #or past #o"r eeks
Me*ican earth4"ake
goals
5 F
-
8/10/2019 Efsf2012 Whatsapp Scaling
5/31
-
8/10/2019 Efsf2012 Whatsapp Scaling
6/31
6
Performance Goals
1 Million connections per serer 9 -
Resilience against disr"ptions "nder load
So#tare #ail"res
5ardare #ail"res :serers, netork gear;
World eents :sports, earth4"akes, etc
-
8/10/2019 Efsf2012 Whatsapp Scaling
7/31
-
8/10/2019 Efsf2012 Whatsapp Scaling
8/31
8
Tools and Techniques
Syste$ actiity $onitoring :sar;
=S&leel
!EAM
-
8/10/2019 Efsf2012 Whatsapp Scaling
9/31
9
Tools and Techniques
+rocessor hardare per# co"nters :p$cstat;
dtrace, kernel lock&co"nting, gpro#
-
8/10/2019 Efsf2012 Whatsapp Scaling
10/31
10
Tools and Techniques
#pro# :8 and 8o cp"ti$esta$p;
!EAM lock&co"nting :inal"a'le---;
-
8/10/2019 Efsf2012 Whatsapp Scaling
11/31
11
Tools and Techniques
Synthetic orkload
.ood #or s"'syste$s ith si$ple inter#aces
)i$ited al"e #or "seracing syste$s
-
8/10/2019 Efsf2012 Whatsapp Scaling
12/31
12
Tools and Techniques
eeBd orkload
Where side&e##ects can 'e contained
E*tre$ely "se#"l #or t"ning
-
8/10/2019 Efsf2012 Whatsapp Scaling
13/31
13
Tools and Techniques
(ierted orkload
Add additional prod"ction load to serer
(NS ia e*tra /+ aliases) iss"es
/+FW #orarding
Ran into a #e kernel panics at high conn co"nts
-
8/10/2019 Efsf2012 Whatsapp Scaling
14/31
14
Results
/nitial 'ottlenecks appeared aro"nd >2Ck
First ro"nd o# #i*es got "s to 1M conns
Fr"it as hanging pretty lo
-
8/10/2019 Efsf2012 Whatsapp Scaling
15/31
15
Results
%ontin"ed attacking si$ilar 'ottlenecks
Achieed 2M conns a'o"t a $onth later
+"t #"rther opti$iDations on 'ack '"rner
-
8/10/2019 Efsf2012 Whatsapp Scaling
16/31
16
Results
!egan opti$iDing app code a#ter Ne ears
?nintentional record atte$pt in Fe'
+eaked at 2
-
8/10/2019 Efsf2012 Whatsapp Scaling
17/31
17
Results
Still trying to o'tain el"sie 3M conns
St< +atrickBs (ay asnBt as l"cky as hoped
-
8/10/2019 Efsf2012 Whatsapp Scaling
18/31
-
8/10/2019 Efsf2012 Whatsapp Scaling
19/31
-
8/10/2019 Efsf2012 Whatsapp Scaling
20/31
20
General indings
%ontention, contention, contention
Fro$ 200k to 2M ere all contention #i*es
So$e iss"es are internal to !EAMSo$e addressa'le ith app changes
Most re4"ired !EAM patches
So$e re4"ired app changes
Especially6 partitioning orkload correctly
So$e co$$on Erlang idio$s co$e at a price
-
8/10/2019 Efsf2012 Whatsapp Scaling
21/31
21
S!ecific Scalabilit" i#es
Free!S(
!ackported S%&'ased kernel ti$eco"nter
getti$eo#day:2; calls $"ch less e*pensie!ackported ig' netork drier
5ad iss"es ith MS/&I 4"e"e stalls
sysctl t"ning
='io"s li$its :e
-
8/10/2019 Efsf2012 Whatsapp Scaling
22/31
22
S!ecific Scalabilit" i#es
!EAM $etrics
Sched"ler :G"til, cs, aits, sleeps, 9;
statistics:$essage4"e"es;Msgs 4"e"ed, Hnon&e$pty 4"e"es, longest 4"e"e
processin#o:$essage4"e"estats;
En48de48send co"nt K rates :1s, 10s, 100s;
statistics:$essageco"nts;
Aggregation o# $essage4"e"estats
Ena'le #pro# cp"ti$esta$p #or Free!S(
-
8/10/2019 Efsf2012 Whatsapp Scaling
23/31
23
S!ecific Scalabilit" i#es
!EAM $etrics :cont
-
8/10/2019 Efsf2012 Whatsapp Scaling
24/31
-
8/10/2019 Efsf2012 Whatsapp Scaling
25/31
25
S!ecific Scalabilit" i#es
!EAM t"ning
LM"l$'cs 32 LM"$'cgs 1
LM"s$'cs 20>Want large 2M&aligned $seg allocations to
$a*i$iDe s"perpage pro$otions
R"n ith real&ti$e sched"ling priority
Lssct 1 :ia patchO sched"ler spin co"nt;
-
8/10/2019 Efsf2012 Whatsapp Scaling
26/31
26
S!ecific Scalabilit" i#es
!EAM contention
ti$eo#day lock :esp :port re"se;
(isa'le $seg $a* check
-
8/10/2019 Efsf2012 Whatsapp Scaling
27/31
27
S!ecific Scalabilit" i#es
!EAM contention :cont
-
8/10/2019 Efsf2012 Whatsapp Scaling
28/31
-
8/10/2019 Efsf2012 Whatsapp Scaling
29/31
29
S!ecific Scalabilit" i#es
Erlang "sage
+re#er os6ti$esta$p to erlang6no
/$ple$ent cross&node genserer calls itho"t"sing $onitors :red"ces dist tra##ic and proclink lock contention;
+artition ets and $nesia ta'les and localiDe
access to s$aller n"$'er o# processesS$all $nesia cl"sters
-
8/10/2019 Efsf2012 Whatsapp Scaling
30/31
30
S!ecific Scalabilit" i#es
=pera'ility #i*es
Added PprependQ option to erlang6send
Added process#lag:#l"sh$essage4"e"e;
-
8/10/2019 Efsf2012 Whatsapp Scaling
31/31
31
$uestions% Comments%
rrhatsapp