Erlang and Scalability

Erlang andScalability

Jan Henry [email protected]

Percona Performance 2009

Percona Performance Conference © 2009 -2009, Erlang Training and Consulting 2Erlang and Scalability

Introduction• Scalability Killers• Design Decisions – Language and Yours• Thinking Scalable/Parallel• Code for the correct case• Rules of Thumb• Scalability in the small: SMP


Scalability Killers• Synchronization• Resource contention• Synchronization


Design Decisions

No sharing

• Processes• Encapsulation• No implicit synchronization


Design Decisions

No implicit synchronization

• Spawn always succeed• Sending always succeed• Random access message buffer• Fire and forget unless you need the synchronization


Design Decisions

Concurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code

Clarity is King!

I rather try to get clear code correct than correct code clear


0

List length: Obviously Linear

:

But not when you have n processors?

Thinking Scalable/Parallel

1234


List length: O(logN) with sufficient processors


2

4

1 111

2



In the Erlang setting

• Do not introduce unneeded synchronization • Remember processes are cheap• Do not introduce unneeded synchronization• A terminated process is all garbage• Do not introduce unneeded synchronization


Code for the Correct Case

set timer

set timer

set timer

release timercheck

release timercheck

release timercheck

request

request

request

answer

answer

answer


Code for the Correct Case

set timer

release timercheck

request

request

request

answer


Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!

f()

g()

h()

h(g(f()))h(g(f()))

h(g(f()))h(g(f()))


Scalability in the small: SMP

Erlang SMP ”Credo”

SMP should be transparent to the programmer inSMP should be transparent to the programmer inmuch the same way as Erlang Distributionmuch the same way as Erlang Distribution

• You shouldn’t have to think about it ...but sometimes you must

• Use SMP mainly for stuff that you’d make concurrent anyway• Erlang uses concurrency as a structuring principle

• Model for the natural concurrency in your problem



• Erlang on multicore

• SMP prototype ‘97, First OTP release May ‘06.

• Mid -06 benchmark mimicking call handling (axdmark) on the (experimental) SMP emulator. Observed speedup/core: 0.95

• First Ericsson product (TGC) released on SMP Erlang in Q207.

”Big bang” benchmark on Sunfire T2000

Simultaneous processes16 schedulers

1 scheduler



Case Study: Telephony Gateway Controller

• Mediates between legacy telephony and multimedia networks.

• Hugely complex state machines• + massive concurrency.• Developed in Erlang.• Multicore version shipped to customer Q207.• Porting from 1-core PPC to 2-core Intel took < 1 man-year

(including testing).

AXE TGC

GWGW GW



3.17X call/sec

1.55X call/sec

0.4X call/sec

AXDCPB5

14X call/sec

7.6X call/sec

2.1X call/sec

AXDCPB6

ISUP-ISUP /Intra MGW

ISUP-ISUP /Inter MGW

POTS-POTS /AGW

Trafficscenario

5.5X call/sec

3.6X call/sec

X call/sec

IS/GCP1slot/board

7.7X call/sec

One core used

2.3X call/sec

One core used

IS/GEPDual coreOne core running

2slots/board

26X call/sec

13X call/secOTP R11_3

beta+patches

4.3X call/secOTP R11_3

beta+patches

IS/GEPDual coreTwo cores

running2slots/board

Case Study: Telephony Gateway Controller


Scalability in the small: SMPSpeedup on 4 Hyper Threaded Pentium4

1

1.92 2.05

2.733.11

3.633.79

3.96

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8

# Schedulers

Sp

ed

du

p

• Chatty• 1000 processes created• Each process randomly sends req/recieves ack from all other

processes


Scalability in the small: SMPErlang VM

Scheduler

run queuenon-SMP VM


Scalability in the small: SMPErlang VM

Scheduler #1

Scheduler #2

Scheduler #N

run queueCurrent SMP VM

OTP R11/R12



Erlang VM

Scheduler #1

Scheduler #2

run queue

Scheduler #2

Scheduler #N

run queue

run queue

migrationlogic

migrationlogic

New SMP VMOTP R13

Released 21th April


• Speedup of ”Big Bang” on a Tilera Tile64 chip (R13A)• 1000 processes, all talking to each other

Memory allocation locks dominate...


Multiplerun queues

Singlerun queue

Speedup: Ca 0.43 * N @ 32 cores



Shift in Bottlenecks

• All scalable Erlang systems were stress tested for CPU usage for network usage

• With SMP hardware we must stress test for memory usage • In the typical SMP system, the bottleneck has shifted from

the CPU to the memory



Death by a thousand cuts

• Many requests that generate short spikes in memory usage• Limit or serialize those requests• More on this in coming paper from CTO Ulf Wiger

loop(State) ->

receive

{request, typeA, Data} ->

Data1 = allocate_lots_of_memory(Data),

a_server ! {request, typeA, self()},

receive

{answer, …


Questions

???

Erlang and Scalability

Technology

Transcript of Erlang and Scalability