Performance optimization 101 - Erlang Factory SF 2014
-
Upload
lpgauth -
Category
Technology
-
view
105 -
download
1
description
Transcript of Performance optimization 101 - Erlang Factory SF 2014
![Page 1: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/1.jpg)
Performance Optimization 101Louis-Philippe Gauthier
Team leader @ AdGear Trader
![Page 2: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/2.jpg)
Exercise
![Page 3: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/3.jpg)
GET /date - returns today’s date GET /time - returns the unix time in seconds
HTTP API serverAPI
![Page 4: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/4.jpg)
• accepting connections
• parsing http requests
• routing
• building responses
HTTP API serverTODO
![Page 5: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/5.jpg)
HTTP API serveraccepting connections
![Page 6: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/6.jpg)
HTTP API serveraccepting connections
![Page 7: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/7.jpg)
HTTP API serveraccepting connections
• gen_tcp:controlling_process/2 is slow
• spawn worker with ListenSocket
• worker accepts and ack’s listener
![Page 8: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/8.jpg)
HTTP API serveraccepting connections
![Page 9: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/9.jpg)
HTTP API serveraccepting connections
![Page 10: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/10.jpg)
HTTP API serveraccepting connections
![Page 11: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/11.jpg)
HTTP API serveraccepting connections
• use proc_lib instead of gen_server
• socket options:
• binary
• {backlog, 4196}
• {raw, 6, 9, <<30:32/native>>}
![Page 12: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/12.jpg)
HTTP API serverparsing request
![Page 13: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/13.jpg)
HTTP API serverparsing request
![Page 14: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/14.jpg)
HTTP API server
• binary matching is very powerful!
• working with binaries is more memory efficient
• binaries over 64 bytes are shared (not copied)
• faster than the built-in http parser (BIF) when running on many cores and using hipe
• keep state in a record
• O(1) lookups
parsing request
![Page 15: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/15.jpg)
HTTP API serverrouting
![Page 16: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/16.jpg)
HTTP API serverrouting
pattern matching is awesome!!
![Page 17: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/17.jpg)
HTTP API serverbuilding response
![Page 18: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/18.jpg)
HTTP API serverbuilding response
![Page 19: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/19.jpg)
HTTP API serverbuilding response
![Page 20: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/20.jpg)
HTTP API serverbuilding response
• ETS is your friend!
• cache time date in ETS public table
• {read_concurrency, true}
• if you store a binary over 64 bytes, it won’t get copied!
• have a gen_server update the cache
• every second for the time
• every day for the date
![Page 21: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/21.jpg)
HTTP API serverbuilding response
• do not try to rewrite everything
• use community projects and contribute back!
• often your application will spend most of its time talking to external services
• premature optimization is usually bad
![Page 22: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/22.jpg)
Gotchasslow functions / modules
• erlang:now/0 vs os:timestamp/0
• proplists:get_value() vs lists:keyfind()
• timer:send_after() vs erlang:send_after()
• gen_udp:send() vs erlang:port_command()
• avoid erlang:controlling_process() if you can
• avoid base64, string, unicode modules
![Page 23: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/23.jpg)
Tools
![Page 24: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/24.jpg)
Profiling
• useful to find slow code paths • fprof
• uses erlang:trace/3 • output is really hard to understand • erlgrind to read in kcachegrind
• eflame • also uses erlang:trace/3 • nice graphical output
info
![Page 25: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/25.jpg)
Eflame
• flamechart.pl (from Joyent)
• makes it visually easy to find slow function calls
info
![Page 26: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/26.jpg)
Eflamehow to
![Page 27: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/27.jpg)
Eflameinfo
![Page 28: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/28.jpg)
Micro benchmarks
• start with profiling
• useful for experimentation and to validate hypothesis
• small benchmarking library called timing
• uses the excellent bear (statistics) library
info
![Page 29: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/29.jpg)
Micro benchmarkshow to
![Page 30: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/30.jpg)
Micro benchmarks
# parallel processes erlang:now/0 os:timestamp/0
1 0.99 0.87
10 22.87 2.54
100 168.23 16.99
1000 664.46 51.98
info
![Page 31: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/31.jpg)
Hipe
• native, {hipe, [o3]} • doesn’t mix with NIFs
• on_load • switching between non-native and native code is
expensive
• different call stacks • might overload the code_server (bug?)
• —enable-native-libs • hipe_bifs (sshhh)
info
![Page 32: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/32.jpg)
Hipehow to
![Page 33: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/33.jpg)
NIFs
• function that is implemented in C instead of Erlang • can be dangerous…
• crash VM (segfault) • OOM (memory leak) • must return < 500 us (to be safe…)
• ideally should yield and use enif_consume_timeslice • what is a reduction?
• dirty schedulers (R17) • finally!
info
![Page 34: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/34.jpg)
Process Tuning
• tune min_heap_size on spawn
• fullsweep_after if you have memory issues
• force gc
• +hms (set default min_heap_size)
info
![Page 35: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/35.jpg)
Process Tuninginfo
![Page 36: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/36.jpg)
Monitoring
• statsderl for application metrics
• vmstats for VM metrics
• system_stats for OS metrics
• erlang:system_monitor/2
• entop for live system exploration
info
![Page 37: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/37.jpg)
Statsderl
• statsd client
• very cheap to call (async)
• offers 3 kinds of metrics:
• counters - for counting (e.g QPS)
• gauges - for absolute values (e.g. system memory)
• timers - similar to gauges but with extra statistics
info
![Page 38: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/38.jpg)
Statsderlhow to
![Page 39: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/39.jpg)
VM Stats
• process count • messages in queues • run queue length • memory (total, proc_used, atom_used, binary, ETS) • scheduler utilization (per scheduler) • garbage collection (count, words reclaimed) • reductions • IO bytes (in/out)
info
![Page 40: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/40.jpg)
VM Statsinfo
![Page 41: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/41.jpg)
System Stats
• load1, load5, load15
• cpu percent
• can be misleading because of spinning schedulers
• virtual memory size
• resident memory size
• very useful to track those OOM crashes
!
info
![Page 42: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/42.jpg)
System Statsinfo
![Page 43: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/43.jpg)
System Monitor
• monitoring for: • busy_port • busy_dist_port • long_gc • long_schedule • large_heap
• riak_sysmon + lager / statsderl handler
info
![Page 44: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/44.jpg)
System Monitorhow to
![Page 45: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/45.jpg)
Dashboard info
![Page 46: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/46.jpg)
Entop
• top(1)-like tool for the Erlang VM
• can be used remotely
• gives per process:
• pid / name
• reductions
• message queue length
• heap size
!
info
![Page 47: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/47.jpg)
Entopinfo
![Page 48: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/48.jpg)
VM Tuning
• +K true (kernel polling)
• +sct db (scheduler bind)
• +scl false (disable load distribution)
• +sfwi 500 (force sheduler wakeup NIFs)
• +spp true (port parallelism)
• +zdbbl (distribution buffer busy limit)
• test with production load (synthetic benchmarks can be misleading)
info
![Page 49: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/49.jpg)
Csetinfo
• tool to help create cpusets
• reduces non voluntary context-switches
• reserve first two CPUs for interrupts and background jobs
• reserve rest of CPUs for the Erlang VM
• linux only
![Page 50: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/50.jpg)
Cpusethow to
![Page 51: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/51.jpg)
Lock counterinfo
![Page 52: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/52.jpg)
Other tools
• system limits • ulimit -n • sysctl
• dtrace / systemtap • application + OS tracing
info
![Page 53: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/53.jpg)
Links
• https://github.com/proger/eflame
• https://github.com/lpgauth/timing
• https://github.com/lpgauth/statsderl
• https://github.com/ferd/vmstats
• https://github.com/lpgauth/system-stats
• https://github.com/mazenharake/entop
• https://github.com/ratelle/cpuset
info
![Page 54: Performance optimization 101 - Erlang Factory SF 2014](https://reader033.fdocuments.in/reader033/viewer/2022051819/54c5ecec4a795935028b45fb/html5/thumbnails/54.jpg)
github: lpgauth!irc: lpgauth (@erlounge)!
twitter: lpgauth
Thank you!