Threads and multi threading
-
Upload
antonio-cesarano -
Category
Software
-
view
715 -
download
2
Transcript of Threads and multi threading
![Page 1: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/1.jpg)
Threads
Cesarano AntonioDel Monte Bonaventura
Università degli studi di Salerno
7th April 2014
Operating Systems II
![Page 2: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/2.jpg)
Agenda
Introduction Threads models Multithreading: single-core Vs
multicore Implementation A Case Study Conclusions
![Page 3: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/3.jpg)
CPU Trends
![Page 4: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/4.jpg)
IntroductionWhat’s a Thread?
![Page 5: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/5.jpg)
Memory: Heavy vs Light processes
Introduction
![Page 6: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/6.jpg)
Why should I care about Threads?
Pro• Responsiveness• Resources
sharing• Economy• Scalability
Cons• Hard implementation• Synchronization• Critical section,
deadlock, livelock…
Introduction
![Page 7: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/7.jpg)
Thread Models
Two kinds of Threads
User Threads
Kernel Threads
![Page 8: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/8.jpg)
Thread ModelsUser-level Threads
Implemented in software library Pthread Win32 API
Pro:• Easy handling• Fast context switch• Trasparent to OS• No new address space, no need to change address space
Cons:• Do not benefit from multithreading or multiprocessing• Thread blocked
Process blocked
![Page 9: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/9.jpg)
Thread Models Kernel-level
Threads Executed only in kernel mode, managed by OS Kthreadd children
Pro:• Resource Aware• No need to use a new address space• Thread blocked
Scheduled
Con:• Slower then User-threads
![Page 10: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/10.jpg)
Thread Models
Thread implementation models:From many to oneFrom one to oneFrom many to many
![Page 11: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/11.jpg)
Thread ModelsFrom many to one
Whole process is blocked if one thread is blocked Useless on multicore architectures
![Page 12: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/12.jpg)
Thread ModelsFrom one to one
Works fine on multicore architectureso Many kernel threads = High overhead
![Page 13: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/13.jpg)
Thread ModelsFrom many to many
Works fine on multicore architectures Less overhead then “one to one” model
![Page 14: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/14.jpg)
MultithreadingMultitasking
Single core Symmetric Multi-Processor
![Page 15: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/15.jpg)
MultiThreading
Multithreading
![Page 16: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/16.jpg)
Multithreading
HyperThreading
![Page 17: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/17.jpg)
Multithreading
![Page 18: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/18.jpg)
How can We use multithreading architectures?
Thread Level
Parallelism
Data Level
Parallelism
Multithreading
![Page 19: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/19.jpg)
Thread Level ParallelismMultithreading
![Page 20: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/20.jpg)
Data Level ParallelismMultithreading
![Page 21: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/21.jpg)
Granularity Coarse-grained:
Multithreading
Context switch on high latency event
Very fast thread-switching, no threads slow
down
Loss of throughput due to short stalls:
pipeline start-up
![Page 22: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/22.jpg)
Granularity Fine-grained
Multithreading
Context switch on every cycle Interleaved execution of multiple threads: it can hide both short and long stalls
Rarely-stalling threads are slowed down
![Page 23: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/23.jpg)
GranularityMultithreading
![Page 24: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/24.jpg)
Context SwitchingSingle-core Vs Multi-core
Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret
CPUESP
Thread 1regs
Thread 2
registers
Thread 1 TCB
SP: ....
Thread 2 TCB
SP: ....
Running Ready
![Page 25: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/25.jpg)
Pushing old contextSingle-core Vs Multi-core
Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret
CPUESP
Thread 1regs
Thread 2
registers
Thread 1 TCB
SP: ....
Thread 2 TCB
SP: ....
Thread 1
registers
Running Ready
![Page 26: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/26.jpg)
Saving old stack pointerSingle-core Vs Multi-core
Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret
CPUESP
Thread 1regs
Thread 2
registers
Thread 1 TCB
SP: ....
Thread 2 TCB
SP: ....
Thread 1
registers
Running Ready
![Page 27: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/27.jpg)
Changing stack pointerSingle-core Vs Multi-core
Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret
CPUESP
Thread 1regs
Thread 2
registers
Thread 1 TCB
SP: ....
Thread 2 TCB
SP: ....
Thread 1
registers
Ready Running
![Page 28: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/28.jpg)
Popping off thread #2 old context
Single-core Vs Multi-core
Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret
CPUESP
Thread 2 regs
Thread 1 TCB
SP: ....
Thread 2 TCB
SP: ....
Thread 1
registers
Ready Running
![Page 29: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/29.jpg)
Done: returnSingle-core Vs Multi-core
Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret
CPUESP
Thread 2 regs
Thread 1 TCB
SP: ....
Thread 2 TCB
SP: ....
Thread 1
registers
Ready Running
RET pops of the returning address and it assigns its value to PC reg
![Page 30: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/30.jpg)
Problems
Critical Section:When a thread A tries to access to a shared variable simultaneously to a thread B
Deadlock:When a process A is waiting for resource reserved to B, which is waiting for resource reserved to A
Race Condition: The result of an execution depens on the order of execution of different threads
![Page 31: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/31.jpg)
More Issues
fork() and exec() system calls: to duplicate or to not deplicate all threads?
Signal handling in multithreading application.
Scheduler activation: kernel threads have to communicate with user thread, i.e.: upcalls
Thread cancellation: termination a thread before it has completed.
• Deferred cancellation
• Asynchronous cancellation: immediate
![Page 32: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/32.jpg)
Designing a thread library
Multiprocessor support
Virtual processor
RealTime support
Memory Management
Provide functions library rather than a module
Portability
No Kernel mode
![Page 33: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/33.jpg)
Implementation
Posix Thread Posix standard for threads: IEEE POSIX
1003.1c Library made up of a set of types and
procedure calls written in C, for UNIX platform
It supports:a) Thread management b) Mutexesc) Condition Variablesd) Synchronization between threads
using R/W locks and barries
![Page 34: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/34.jpg)
Implementation
Thread Pool Different threads available in a pool
When a task arrives, it gets assigned to a free thread
Once a thread completes its service, it returns in the pool and awaits another work.
![Page 35: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/35.jpg)
ImplementationPThred Lib base operations
pthread_create()- create and launch a new thread
pthread_exit()- destroy a running thread
pthread_attr_init()- set thread attributes to their default values
pthread_join()- the caller thread blocks and waits for another thread to finish
pthread_self()- it retrieves the id assigned to the calling thread
![Page 36: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/36.jpg)
Implementation Example
N x N Matrix Multiplication
![Page 37: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/37.jpg)
Implementation Example
A simple algorithmfor (int i = 0; i < MATRIX_ELEMENTS; i += MATRIX_LINE){ for (int j = 0; j < MATRIX_LINE; ++j) {
float tmp = 0;for (int k = 0; k < MATRIX_LINE; k++){
tmp += A[i + k] * B[(MATRIX_LINE * k) + j];
}C[i + j] = tmp;
}}
![Page 38: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/38.jpg)
Implementation Example
SIMD Approachtranspose(B);for (int i = 0; i < MATRIX_LINE; i++) { for (int j = 0; j < MATRIX_LINE; j++){ __m128 tmp = _mm_setzero_ps(); for (int k = 0; k < MATRIX_LINE; k += 4){ tmp = _mm_add_ps(tmp, _mm_mul_ps(_mm_load_ps(&A[MATRIX_LINE * i + k]), _mm_load_ps(&B[MATRIX_LINE * j + k]))); } tmp = _mm_hadd_ps(tmp, tmp); tmp = _mm_hadd_ps(tmp, tmp); _mm_store_ss(&C[MATRIX_LINE * i + j], tmp); }}transpose(B);
![Page 39: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/39.jpg)
Implementation Example
TLP Approachstruct thread_params {
pthread_t id;
float* a;
float* b;
float* c;int low;int high;
bool flag;
};………
int main(int argc, char** argv){ int ncores=sysconf(_SC_NPROCESSORS_ONLN); int stride = MATRIX_LINE / ncores; for (int j = 0; j < ncores; ++j){
pthread_attr_t attr; pthread_attr_init(&attr); thread_params* par = new thread_params; par->low=j*stride; par->high=j*stride+stride; par->a = A; par->b = B; par->c = C; pthread_create(&(par->id), &attr, runner, par); // set cpu affinity for thread // sched_setaffinity
}}
![Page 40: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/40.jpg)
Implementation Example
TLP Approachint main(int argc, char** argv){….int completed = 0;while (true) { if (completed >= ncores) break; completed = 0; usleep(100000); for (int j=0; j<ncores; ++j){ if (p[j]->flag) completed++;
}}….}
void runner(void* p){thread_params* params = (thread_params*) p;int low = params->low; // unpack others valuesfor (int i = low; i < high; i++) {
for (int j = 0; j < MATRIX_LINE; j++){
float tmp = 0;
for (int k = 0; k < MATRIX_LINE; k++){ tmp +=
A[MATRIX_LINE * i + k] * B[(MATRIX_LINE * k) + j]; } C[i + j] = tmp; }}params->flag = true;pthread_exit(0);}
![Page 41: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/41.jpg)
Implementation Performance
Simple SIMD TLP SIMD&TLP0
1000
2000
3000
4000
5000
6000
7000
8000
9000
8 cores4 cores
![Page 42: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/42.jpg)
A case study
Using threads in Interactive Systems
• Research by XEROX PARC Palo Alto
• Analysis of two large interactive system: Cedar and GVX
• Goals: i. Identifing paradigms of thread usageii. architecture analysis of thread-based
environmentiii. pointing out the most important properties of
an interactive system
![Page 43: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/43.jpg)
A case studyThread model
Mesa language
Multiple, lightweight, pre-emptively scheduled threads in shared address space, threads may have different priorities
FORK, JOIN, DETACH
Support to conditional variables and monitors: critical sections and mutexes
Finer grain for locks: directly on data structures
![Page 44: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/44.jpg)
A case study
Three types of thread
1. Eternal: run forever, waiting for cond. var.
2. Worker: perform some computation
3. Transient: short life threads, forked off by long-lived threads
![Page 45: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/45.jpg)
A case study
Dynamic analysis
Cedar GVX0
5
10
15
20
25
30
35
40
45
# threads idle
Fork rate max
# threads max
Switching intervals: (130/sec, 270/sec) vs. (33/sec, 60/sec)
![Page 46: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/46.jpg)
A case study
Paradigms of thread usage Defer Work: forking for reducing latency
print documents
Pumps or slack processes: components of pipeline Preprocessing user input Request to X server
Sleepers and one-shots: wait for some event and then execute Blink cursor Double click
Deadlock avoiders: avoid violating lock order constraint Windows repainting
![Page 47: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/47.jpg)
A case study
Paradigms of thread usage Task rejuvenation: recover a service from a bad
state, either forking a new thread or reporting the erroro Avoid fork overhead in input event dispatcher
of Cedar
Serializers: thread processing a queueo A window system with input events from many
sources
Concurrency exploiters: for using multiple processors
Encapsulated forks: a mix of previous paradigms, code modularity
![Page 48: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/48.jpg)
A case study
Common Mistakes and Issueso Timeout hacks for compensate missing
NOTIFY
o IF instead of WHILE for monitors
o Handling resources consumption
o Slack processes may need hack YieldButNotToMe
o Using single-thread designed libraries in multi-threading environment: Xlib and XI
o Spurious lock
![Page 49: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/49.jpg)
A case study
Xerox scientists’ conclusions
Interesting difficulties were discovered both in use and implementation of multi-threading environment
Starting point for new studies
![Page 50: Threads and multi threading](https://reader035.fdocuments.in/reader035/viewer/2022062219/58f316cb1a28ab604b8b45bf/html5/thumbnails/50.jpg)
Conclusion