Investigating and reducing latency of trading applications · kernel technologies,...
Transcript of Investigating and reducing latency of trading applications · kernel technologies,...
![Page 1: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/1.jpg)
Investigating and reducing latency of trading applications
Konstantin Volkov, Tracing Summit 2017
Konstantin Volkov �1 Tracing Summit 2017
![Page 2: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/2.jpg)
About me
• DevOps engineer
• Infrastructure for trading applications
• Containers, configuration automation, kernel technologies, performance/tracing tools
Konstantin Volkov �2 Tracing Summit 2017
![Page 3: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/3.jpg)
Agenda
• Use cases
Konstantin Volkov �3 Tracing Summit 2017
![Page 4: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/4.jpg)
Case #1
• Program allocates several gigabytes of memory
• Performs math calculations
• After system software update on one of the servers, program runs ~50% slowly.
Konstantin Volkov �4 Tracing Summit 2017
![Page 5: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/5.jpg)
Assumptions:
• Configuration issue
• Increased load on the system
• Hardware problem
Konstantin Volkov �5 Tracing Summit 2017
![Page 6: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/6.jpg)
Conventional diagnosis
• uptime(1), top(1), ps(1)
basic investigation reveals no additional running processes or parasite load
Konstantin Volkov �6 Tracing Summit 2017
![Page 7: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/7.jpg)
Conventional diagnosis (cont.)
time(1) utility:
• healthy server: 0.14user 2.67system 0:02.84elapsed
• impacted server: 0.14user 4.98system 0:05.14elapsed
elapsed +55% increase, system +53%
Konstantin Volkov �7 Tracing Summit 2017
![Page 8: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/8.jpg)
Conventional diagnosis (cont.) # strace -c <program>
% time seconds usecs/call calls errors syscall ------ --------- ---------- ----- ------ ------- 100.00 0.259030 43172 6 munmap 0.00 0.000000 0 1 read ------ --------- ---------- ----- ----- -------- 100.00 0.262200 43700 6 munmap 0.00 0.000000 0 1 read
total time spent in syscalls increased by 3ms
Konstantin Volkov �8 Tracing Summit 2017
![Page 9: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/9.jpg)
Conventional diagnosis (cont.) mpstat(1) (%irq and %soft)
both servers do not experience any significant interrupt load
Konstantin Volkov �9 Tracing Summit 2017
![Page 10: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/10.jpg)
Advanced diagnosis
# perf record <program>
# perf report
Konstantin Volkov �10 Tracing Summit 2017
![Page 11: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/11.jpg)
Advanced diagnosis (output) # Overhead Command Shared Object Symbol
# ........ ........ ................ ..............................
#
57.68% program [kernel.kallsyms] [k] clear_page_c
7.76% program [kernel.kallsyms] [k] page_fault
6.40% program [kernel.kallsyms] [k] _raw_spin_lock
- - - - - - - - - - - - - - - - - - - - - - — - - - - - - - - - - — - -
29.30% program [kernel.kallsyms] [k] clear_page_c
19.67% program [kernel.kallsyms] [k] isolate_migratepages_range
16.52% program [kernel.kallsyms] [k] compaction_alloc
different sets of functions contribute to the profile Konstantin Volkov �11 Tracing Summit 2017
![Page 12: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/12.jpg)
Advanced diagnosis (cont.)
isolate_migratepages_range()
compaction_alloc()
Both defined in mm/compaction.c
Konstantin Volkov �12 Tracing Summit 2017
![Page 13: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/13.jpg)
Advanced diagnosis (cont.)
Documentation/sysctl/vm.txt
compact_memory
Available only when CONFIG_COMPACTION is set. When 1 is written to the file, all zones are compacted such that free memory is available in contiguous blocks where possible. This can be important for example in the allocation of huge pages although processes will also directly compact memory as required.
Konstantin Volkov �13 Tracing Summit 2017
![Page 14: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/14.jpg)
Case #1 remediation
# echo never > \ /sys/kernel/mm/transparent_hugepage/defrag
Konstantin Volkov �14 Tracing Summit 2017
![Page 15: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/15.jpg)
Case #2
• Freshly setup server constantly spends 30% of time in system
• No production software running yet
Konstantin Volkov �15 Tracing Summit 2017
![Page 16: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/16.jpg)
Assumptions:
• Huge amount of interrupts? But there’s no load yet applied
Konstantin Volkov �16 Tracing Summit 2017
![Page 17: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/17.jpg)
Advanced diagnosis
• perf to collect execution profile of the
whole system
Konstantin Volkov �17 Tracing Summit 2017
![Page 18: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/18.jpg)
Advanced diagnosis (cont.) # Overhead Command hared Object Symbol
# ........ ......... ................. ..........................
#
60.29% swapper [kernel.kallsyms] [k] intel_idle
5.20% swapper [kernel.kallsyms] [k] acpi_os_read_port
3.54% swapper [kernel.kallsyms] [k] menu_select
3.34% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave
• idling task is dominating in the profile
• no other visible time consumer
Konstantin Volkov �18 Tracing Summit 2017
![Page 19: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/19.jpg)
Advanced diagnosis (cont.)
CPU flame graphs to the rescue
Konstantin Volkov �19 Tracing Summit 2017
![Page 20: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/20.jpg)
Konstantin Volkov �20 Tracing Summit 2017
![Page 21: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/21.jpg)
Advanced diagnosis (cont.)
• _raw_spin_lock_irqsave() comes
from CPU frequency scaling code
• looks like cpufreq code has one global lock, on the system with 64 CPUs this leads to a sensible contention
Konstantin Volkov �21 Tracing Summit 2017
![Page 22: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/22.jpg)
Case #2 remediation
# echo performance > \
/sys/devices/system/cpu/cpu*/
cpufreq/scaling_governor
Konstantin Volkov �22 Tracing Summit 2017
![Page 23: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/23.jpg)
Case #3
• synchronous writes take to much time to complete (10 sec).
Konstantin Volkov �23 Tracing Summit 2017
![Page 24: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/24.jpg)
Assumptions
• Hardware problem
• Increased load
Konstantin Volkov �24 Tracing Summit 2017
![Page 25: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/25.jpg)
Conventional diagnosis # iostat -x
%util: 100.00
svctm: 2.24
w_await: 322.17
Konstantin Volkov �25 Tracing Summit 2017
![Page 26: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/26.jpg)
Advanced diagnosis
• ftrace events via trace-cmd(1)
3094618.749527: block_rq_insert: 386645440
3094618.753639: block_rq_complete: 386645440
it takes 4ms to service IO request
Konstantin Volkov �26 Tracing Summit 2017
![Page 27: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/27.jpg)
Advanced diagnosis (cont.)
• ftrace function_graph
3094618.749248: funcgraph_entry: SyS_fsync()
3094628.729051: funcgraph_exit:
fsync() system call takes 10 sec to complete
Konstantin Volkov �27 Tracing Summit 2017
![Page 28: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/28.jpg)
Advanced diagnosis (cont.)
jbd2_log_wait_commit() {
_raw_read_lock();
__wake_up() {
_raw_spin_lock_irqsave();
__wake_up_common();
_raw_spin_unlock_irqrestore();
}
prepare_to_wait_event() {
_raw_spin_lock_irqsave();
_raw_spin_unlock_irqrestore();
}
schedule() {
Konstantin Volkov �28 Tracing Summit 2017
![Page 29: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/29.jpg)
Advanced diagnosis (cont.) kworker/u8:2-1718 [000] 3094619.035436: block_rq_insert:
kworker/u8:2-1718 [000] 3094619.035463: kernel_stack:
=> blk_flush_plug_list (ffffffff81285258)
=> blk_queue_bio (ffffffff812854ca)
=> generic_make_request (ffffffff81280cb0)
…….
=> __writeback_single_inode (ffffffff811d1c09)
=> writeback_sb_inodes (ffffffff811d2964)
=> __writeback_inodes_wb (ffffffff811d2c56)
=> wb_writeback (ffffffff811d2f03)
Lots of similar events happening while our our task is waiting
Konstantin Volkov �29 Tracing Summit 2017
![Page 30: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/30.jpg)
Advanced diagnosis (cont.)
Looks like journaling can not advance while under heavy writeback
Konstantin Volkov �30 Tracing Summit 2017
![Page 31: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/31.jpg)
Case #3 remediation
• Decrease write back buffer, e.g. dirty_ratio
Konstantin Volkov �31 Tracing Summit 2017
![Page 32: Investigating and reducing latency of trading applications · kernel technologies, performance/tracing tools Konstantin Volkov 2 Tracing Summit 2017. Agenda ... different sets of](https://reader033.fdocuments.in/reader033/viewer/2022050420/5f8f8da699a9b70e5a5c61be/html5/thumbnails/32.jpg)
Thank you!
Konstantin Volkov �32 Tracing Summit 2017