FlexSC : Flexible System Call Scheduling with Exception-Less System Calls
description
Transcript of FlexSC : Flexible System Call Scheduling with Exception-Less System Calls
![Page 1: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/1.jpg)
FlexSC: Flexible System Call Scheduling with Exception-Less System Calls
Livio Soares, Michael StummUniversity of Toronto
Presented by Md. Maksudul Alam29 September, 2011
CS 5204
![Page 2: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/2.jpg)
2
Mechanism used by an application program to request service from operating system kernel
De facto interface for 30+ years
Synchronous in nature
Application Blocks
System Calls
![Page 3: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/3.jpg)
3
How system calls works?
int main() { fork(); return 0;}
fork() { MOVL 2, EAX INT 0x80}
•Division By Zero0x00
•Debugger0x01•NMI0x02
•……
•System Calls0x80
Interrupt Descriptor Table
User Mode Kernel Mode
libc.a
system_call() # EAX = 2 call *sys_call_table[EAX]
•sys_exit()1•sys_fork()2
•sys_read()3
•sys_write()4
Syscall Handler
Syscall Table
sys_fork(){ …}
System Call: fork()
User ProcessBlocked
![Page 4: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/4.jpg)
4
How system calls works?
int main() { fork(); return 0;}
fork() { MOVL 2, EAX INT 0x80}
•Division By Zero0x00
•Debugger0x01•NMI0x02
•……
•System Calls0x80
Interrupt Descriptor Table
User Mode Kernel Mode
libc.a
system_call() # EAX = 2 call *sys_call_table[EAX]
•sys_exit()1•sys_fork()2
•sys_read()3
•sys_write()4
Syscall Handler
Syscall Table
sys_fork(){ …}
System Call: fork()
User ProcessBlocked
![Page 5: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/5.jpg)
5
Direct Cost:Mode Switch Time: between user mode and kernel modeFlushing of processor pipeline
Indirect Cost:Processor Pollution: critical system structures are flushed during context switchTranslation Look-aside Buffer (TLB)Branch Prediction TablesCache (L1, L2, L3)
System Call Costs!
![Page 6: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/6.jpg)
6
Direct and Indirect Cost are significantDegrades Performance, up to 65% degradation
System Call Impact on User IPC
System call pwrite impact on user mode IPC
![Page 7: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/7.jpg)
7
Opposite of user mode trendsThe more system call, the more kernel state is maintained
Mode Switching Cost on Kernel IPC
System call pwrite impact on kernel mode IPC
![Page 8: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/8.jpg)
8
Temporal Locality: if at one point in time a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future
Spatial Locality: if a particular memory location is referenced at a particular time, then it is likely that nearby memory locations will be referenced in the near future
Principle of Locality
![Page 9: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/9.jpg)
9
Synchronous System Calls are BAD!!
User
Kernel ExceptionException
Traditional Synchronous System Call
![Page 10: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/10.jpg)
10
Minimize Mode Switch – eliminate Direct Cost
Asynchronous system calls
Maximize Locality – eliminate Indirect CostBatch processing of system calls
Decouple Execution from Invocation
Improve Performance
![Page 11: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/11.jpg)
11
Exception Less System Call
User
Kernel
Asynchronous Exception Less System Call
Sys Call Page
2
Asynchronous System Call: Minimize Mode Switch
Syscall Threads: Maximize Locality
1
Shared Memory
Invoker
Executor
![Page 12: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/12.jpg)
12
FlexSC:Implementation of the exception-less system call on LinuxA library exposing API for application using exception-less system callHas a thread scheduler to schedule syscall threadApplicable for event driven applications
FlexSC-Thread:A POSIX compliant thread libraryBinary compatible with NPTLFor thread oriented application such as MySQL, ApacheM-on-N threading model (M-user & N-kernel-visible threads)
FlexSC: Flexible System Call
![Page 13: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/13.jpg)
13
Shared memory pages between user and kernelEach entry contains one syscall requests64 Bytes
Syscall Page
Syscall Page
syscall number
# of args status arg 0 … arg 6 return
value
Sys Call Page
![Page 14: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/14.jpg)
14
System Call Interface
write(fd, buf, 4096);
entry = free_syscall_entry();
/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMIT;
while (entry->status != DONE) do_something_else();
return entry->return_code;
![Page 15: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/15.jpg)
15
System Call Interface
write(fd, buf, 4096);
entry = free_syscall_entry();
/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMIT;
while (entry->status != DONE) do_something_else();
return entry->return_code;
![Page 16: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/16.jpg)
16
Two new systems calls are implemented:flexsc_register – to register flexsc syscall threadsflexsc_wait – to notify kernel
flexsc_register explicitly creates the syscall threadsWhen user level thread has nothing to do it issue flexsc_wait system call
FlexSC will wake up the thread
FlexSC System calls
![Page 17: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/17.jpg)
17
Kernel only threadsVirtual address is same as application
Created during flexsc_register system call, which preserves this address
Each thread processes an syscall page entryOnly one thread is active per application/core
Syscall Threads
UserKernel
Sys Call Page
Syscall Threads
![Page 18: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/18.jpg)
18
Post as many request as possibleNo more threads? Call flexsc_wait, jump to kernelSyscall scheduler wakes up syscall threadsDoes not wake up user thread until:
All calls are issued and At least one is finished and others blocked
Only one thread can be activeFor multi-core system more threads can be active in different cores
Syscall Thread Scheduler
![Page 19: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/19.jpg)
19
To use FlexSC, existing apps need modification
Not feasible for thread based applicationsUse FlexSC-Threads, a thread library supporting FlexSCM-on-N thread model, POSIX and NPTL compliantOne kernel-visible thread per core
FlexSC-Threads
![Page 20: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/20.jpg)
20
1. Redirect system calls (libc) to FlexSC2. Post system calls to syscall page and switch
thread3. find syscall page for finished entries if no
more threads4. Call flexsc_wait to block the kernel visible
thread
How FlexSC-Thread works
2 4
![Page 21: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/21.jpg)
21
FlexSC-Threads support multicore specializationA single kernel-visible thread per corePrivate syscall page per coreUser threads and kernel threads can run on different threads
Multi Core specialization
![Page 22: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/22.jpg)
22
Linux kernel 2.6.33
Intel Nehalem (Core i7) processor with 4 Cores
Remote Client 1 Gbps network
Results are average of 5 runs
Performance Evaluation
![Page 23: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/23.jpg)
23
getppid() - measures direct costFor 1 syscall flexsc is 43% slower than syncFor more syscall batch flexsc is up to 130% faster
Direct Cost – Single Core
![Page 24: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/24.jpg)
24
Inter-processor interrupt (IPI) to wake remote threadsFor one system call flexsc 10 times slower than syncfor 32 or more batch flexsc catches syncHowever it was worst case scenario
Direct Cost – Remote-core
![Page 25: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/25.jpg)
25
Apache 2.2.15
Two scenario:NPTL: 200 user threads
FlexSC-Threads: 1000 user threads
ApacheBench as workload
Apache
![Page 26: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/26.jpg)
26
Single Core: Only syscall batching 86% improvement
4-Cores: With multi-core specialization 115% improvement
Apache Throughput
![Page 27: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/27.jpg)
27
One would expect high latency because of batching and asynchronous processing, but it is opposite in this caseUp to 50% reduction in latency
Apache Latency with 256 requests
Comparison of Apache latency of Linux/NPTL and FlexSC
![Page 28: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/28.jpg)
28
MySQL 5.5.4 with InnoDB
sysbench as workload generator
Database with 5M rows = 1.2GB data
Disabled synchronous transactions
MySQL
![Page 29: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/29.jpg)
29
Single Core: Only syscall batching 15% improvement
4-Cores: With multi-core specialization 40% improvement
MySQL Throughput
![Page 30: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/30.jpg)
30
FlexSC response is 70-88% of NPTLModest improvements
MySQL Latency with 256 requests
Comparison of MySQL latency of Linux/NPTL and FlexSC
![Page 31: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/31.jpg)
31
How much syscall entries would be best?Apache serving 2048 concurrent requests:
Single core: 200 – 250 entries (3/4 syscall page)4-Cores: 300 – 400 entries (6/7 syscall pages)
Sensitivity Analysis
![Page 32: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/32.jpg)
32
Synchronous System call degrades performance
Indirect cost is hugeException-less system calls improve performance of applications that need many systems callsEvent based application can use FlexSC directlyThread oriented application can use FlexSC-Thread by just swapping NPTL libraryPerformance of server centric application is very goodMulti-core facility will be a big advantage
Summary
![Page 34: FlexSC : Flexible System Call Scheduling with Exception-Less System Calls](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163a3550346895dd4a7a6/html5/thumbnails/34.jpg)
Thank You!