Erlang and Scalability - Percona Introduction Course Title @ Course Author 2007 Erlang and...
Transcript of Erlang and Scalability - Percona Introduction Course Title @ Course Author 2007 Erlang and...
Course Introduction Course Title @ Course Author 2007
Erlang andScalability
Jan Henry [email protected]
Percona Performance 2009
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 2
Introduction• Scalability Killers• Design Decisions – Language and Yours• Thinking Scalable/Parallel• Code for the correct case• Rules of Thumb• Scalability in the small: SMP
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 3
Scalability Killers• Synchronization• Resource contention
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 3
Scalability Killers
• Synchronization
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 4
Design DecisionsNo sharing
• Processes• Encapsulation• No implicit synchronization
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 5
Design DecisionsNo implicit synchronization
• Spawn always succeed• Sending always succeed• Random access message buffer• Fire and forget unless you need the synchronization
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design Decisions
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code
Clarity is King!
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code
Clarity is King!
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6
Design DecisionsConcurrency oriented programming
• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code
Clarity is King!
I rather try to get clear code correct than correct code clear
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 7
0
Thinking Scalable/Parallel
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 7
List length: Obviously Linear
:
But not when you have n processors?
Thinking Scalable/Parallel
4
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 8
List length: O(logN) with sufficient processors
Thinking Scalable/Parallel
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 8
List length: O(logN) with sufficient processors
Thinking Scalable/Parallel
2
4
1 111
2
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 9
Thinking Scalable/ParallelIn the Erlang setting
• Do not introduce unneeded synchronization • Remember processes are cheap• Do not introduce unneeded synchronization• A terminated process is all garbage• Do not introduce unneeded synchronization
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10
Code for the Correct Case
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10
Code for the Correct Case
set timer
set timer
set timer
request
request
request
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10
Code for the Correct Case
set timer
set timer
set timer
release timercheck
release timercheck
release timercheck
request
request
request
answer
answer
answer
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10
Code for the Correct Case
set timer
set timer
set timer
release timercheck
release timercheck
release timercheck
request
request
request
answer
answer
answer
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 11
Code for the Correct Case
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 11
Code for the Correct Case
set timer request
request
request
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 11
Code for the Correct Case
set timer request
request
request
answer
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12
Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!
f()
g()
h()
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12
Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!
f()
g()
h()
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12
Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!
f()
g()
h()
h(g(f()))
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12
Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!
f()
g()
h()
h(g(f()))h(g(f()))
h(g(f()))h(g(f()))
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 13
Scalability in the small: SMPErlang SMP ”Credo”
SMP should be transparent to the programmer inmuch the same way as Erlang Distribution
• You shouldn’t have to think about it ...but sometimes you must
• Use SMP mainly for stuff that you’d make concurrent anyway• Erlang uses concurrency as a structuring principle
• Model for the natural concurrency in your problem
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 14
Scalability in the small: SMP• Erlang on multicore
• SMP prototype ‘97, First OTP release May ‘06.
• Mid -06 benchmark mimicking call handling (axdmark) on the (experimental) SMP emulator. Observed speedup/core: 0.95
• First Ericsson product (TGC) released on SMP Erlang in Q207.
”Big bang” benchmark on Sunfire T2000
Simultaneous processes16 schedulers
1 scheduler
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 15
Scalability in the small: SMPCase Study: Telephony Gateway Controller
• Mediates between legacy telephony and multimedia networks.
• Hugely complex state machines• + massive concurrency.• Developed in Erlang.• Multicore version shipped to customer Q207.• Porting from 1-core PPC to 2-core Intel took < 1 man-year
(including testing).
AXE TGC
GWGW GW
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 16
Scalability in the small: SMP
3.17X call/sec
1.55X call/sec
0.4X call/sec
AXDCPB5
14X call/sec
7.6X call/sec
2.1X call/sec
AXDCPB6
ISUP-ISUP /Intra MGW
ISUP-ISUP /Inter MGW
POTS-POTS /AGW
Trafficscenario
5.5X call/sec
3.6X call/sec
X call/sec
IS/GCP1slot/board
7.7X call/sec
One core used
2.3X call/sec
One core used
IS/GEPDual coreOne core running
2slots/board
26X call/sec
13X call/secOTP R11_3 beta
+patches
4.3X call/secOTP R11_3 beta
+patches
IS/GEPDual coreTwo cores
running2slots/board
Case Study: Telephony Gateway Controller
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 16
Scalability in the small: SMP
3.17X call/sec
1.55X call/sec
0.4X call/sec
AXDCPB5
14X call/sec
7.6X call/sec
2.1X call/sec
AXDCPB6
ISUP-ISUP /Intra MGW
ISUP-ISUP /Inter MGW
POTS-POTS /AGW
Trafficscenario
5.5X call/sec
3.6X call/sec
X call/sec
IS/GCP1slot/board
7.7X call/sec
One core used
2.3X call/sec
One core used
IS/GEPDual coreOne core running
2slots/board
26X call/sec
13X call/secOTP R11_3 beta
+patches
4.3X call/secOTP R11_3 beta
+patches
IS/GEPDual coreTwo cores
running2slots/board
Case Study: Telephony Gateway Controller
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 17
Scalability in the small: SMP
0
1.25
2.50
3.75
5.00
1 2 3 4 5 6 7 8
1.00
1.92 2.05
2.733.11
3.63 3.79 3.96
Speedup on 4 Hyper Threaded Pentium4
Sp
ed
du
p
# Schedulers
• Chatty• 1000 processes created• Each process randomly sends req/recieves ack from all other
processes
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 18
Scalability in the small: SMPErlang VM
Scheduler
run queuenon-SMP VM
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 19
Scalability in the small: SMPErlang VM
Scheduler #1
Scheduler #2
Scheduler #N
run queueCurrent SMP VMOTP R11/R12
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 20
Scalability in the small: SMP
Erlang VM
Scheduler #1
Scheduler #2
run queue
Scheduler #2
Scheduler #N
run queue
run queue
migrationlogic
migrationlogic
New SMP VMOTP R13
Released 21th April
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 21
• Speedup of ”Big Bang” on a Tilera Tile64 chip (R13A)• 1000 processes, all talking to each other
Memory allocation locks dominate...
Scalability in the small: SMP
Multiplerun queues
Singlerun queue
Speedup: Ca 0.43 * N @ 32 cores
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 22
Scalability in the small: SMPShift in Bottlenecks
• All scalable Erlang systems were stress tested for CPU usage for network usage
• With SMP hardware we must stress test for memory usage • In the typical SMP system, the bottleneck has shifted from
the CPU to the memory
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 23
Scalability in the small: SMPDeath by a thousand cuts
• Many requests that generate short spikes in memory usage• Limit or serialize those requests• More on this in coming paper from CTO Ulf Wiger
loop(State) -> receive
{request, typeA, Data} -> Data1 = allocate_lots_of_memory(Data), a_server ! {request, typeA, self()},
receive {answer, …
Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 24
Questions
???