Performance Analysis of Multiple Threads/Cores Using the UltraSPARC T1 (Niagara)
Fast switching of threads between cores - Advanced Operating Systems
-
Upload
ruhaim-izmeth -
Category
Education
-
view
86 -
download
0
description
Transcript of Fast switching of threads between cores - Advanced Operating Systems
![Page 1: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/1.jpg)
Fast Switching of Threads Between CoresRichard Strong & Dean Tullsen (University San Diego)
Jayaram Mudigonda, Jeffrey C. Mogul & Nathan Binkert (HP Labs)
Ruhaim Izmeth | MS14901218Nipuna Pannala | MS14902208
![Page 2: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/2.jpg)
Introduction
● Now we are in the MULTICORE era.● Multi Core CPUs enable inter core communication
with less cost in the terms of Magnitude compared to the traditional multi processors. [This reduce the time for hardware to move migrating data working set]
● But software cost for moving thread remain as high
![Page 3: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/3.jpg)
Asymmetric Multicore Processor
● Core – Core performance asymmetry appears to be very useful way to improve energy and area efficiency.
● Relatively little performance cost, But greater throughput per watt.
● Asymmetric Multicore Processor increases the need for frequent migration of threads between cores very efficiently.
![Page 4: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/4.jpg)
Fast Switching of Threads between Cores
● To get a good performance in switching threads, between cores○ OS scheduler needs to migrate thread from slow
core to fast or ideal core.○ Also necessary to balance the load between
cores.(In a symmetric or Asymmetric system)○ All thread execution time segments should be
relatively short.
![Page 5: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/5.jpg)
Simple Cores…
● Normally simple Cores can be better match for memory-bound application code.○ Operating systems and OS like codes are typical
memory bounded applications.
![Page 6: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/6.jpg)
Thread Migration Techniques
● Migration Mechanism 1 : Constantinou○ This mechanism considered verity of costs
associated with thread migration, But primary focus about the threads in warming up (Caches and branch predictors)
○ But this is not addressing the software cost to migrate threads between cores.
![Page 7: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/7.jpg)
Thread Migration Techniques
● Migration Mechanism 2 : Choi○ This mechanism specific case of migrating the
branch predictor state when thread switches cores
○ But this is not addressing the software overhead issues.
![Page 8: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/8.jpg)
Thread Migration Techniques
Shared Thread Multiprocessor: Brown & Tulsan● Hardware manage's the thread moments.● Thread State is represented in hardware and that is
shared among the all cores in a chip.● Therefore hardware can move threads between
cores without direct OS involvement.
![Page 9: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/9.jpg)
Software Approaches to Core Switching
•Core B is in IDLE state ?•Is there any thread to run on core A after T switching to B ?•Can ensure T is the most appropriate thread to run on B?
Transfer architectural state of thread from A to B
![Page 10: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/10.jpg)
Approaches used in the research
● V1: Linux’s thread-migration mechanism● V2: Modified scheduler● V3: Scheduler fast-paths● V4: Addressing IPI costs● V5: Cross-core wakeup from quiesce
![Page 11: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/11.jpg)
V1: Linux Thread Migration Mechanism
● Normally using for relatively long-term load balancing across the cores.
● Linux thread migration mechanism is the art of the core switching.
● One thread is available to initiate the migration.
![Page 12: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/12.jpg)
V1: Linux Thread Migration Mechanism
● When task wants to migrate it puts itself on Per-Core Migration Queue.
● If the target core is idle thread wakes up from per-core migration queue and move to the Run Queue of the target core.
● After getting the approval from the target queue thread will execute in the target core.
![Page 13: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/13.jpg)
V1: Linux Thread Migration MechanismCons...
● This migration approach involves “Extra” context switch between initiating thread and migrating thread.
![Page 14: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/14.jpg)
Linux Thread Migration Mechanism Increase Efficiency
● To remove extra context switching,○ Threads can take migrating decisions by itself○ Centralize the thread status○ Increase the number of per core queues.○ Create Cross core signals
![Page 15: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/15.jpg)
V2: Modified scheduler
Core 0Core 1N T
Run Queue
T
Alternative Queue (AQ)
T
Run Queue
schedule() interrupt
Control Block : TCore : 1...
SwitchCore()1
2
3
4
5
6
7
● Remove an extra context switch described in V1, ● Initiate thread migrate by process itself.
![Page 16: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/16.jpg)
V3: Scheduler fast-paths● The original modified schedule● A fast schedule source version (FSS), called to initiate a core switch, ● A fast schedule target version (FST), called at the target core in response to the cross-core
signal.
FSS and FST omit a number of housekeeping functions normally done in schedule (eg: Priority calculation)
FSS only makes a hint to FST, so no locking takes place
FST has AQ check, FSS does not have AQ checks.
![Page 17: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/17.jpg)
V3: Scheduler fast-paths
![Page 18: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/18.jpg)
V4: Addressing inter-processor interrupt (IPI) costs
Inter-processor interrupts are sent to ‘wake up’ polling or paused processors.
Modified scheduler wakes up target core if idle.
The “IPI sending code” modified to be more efficient as it sends the interrupts to all members of a specified set.
schedule() is invoked on the target core with the interrupt
![Page 19: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/19.jpg)
Modified System Calls
Modified long running system calls to initiate CoreSwitch()
Modified system calls : open,stat, read, write, readv, writev, select, poll, fsync, fdatasync,readfrom, sendto and sendfile.
4096 bytes
![Page 20: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/20.jpg)
Simulation Environment
M5 Simulator used for generating detailed timelines, showing when interesting events such as procedure calls, cache misses, and long-latency instructions occur
x86 models are not debugged with M5. Complex core : Alpha EV6 (21264), 64KB L1Simple core : EV4-based (21064), 8KB L1Simulated on shared L2 3.5 MBytesMain-memory access time of 25 nsec.
![Page 21: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/21.jpg)
sim_XXX - number of ‘x’ denote the number of processors
eg: sim_c - single processor
sim_sC - dual processor
Simulation Environment - Configuration naming scheme
Prefix 750Mhz 3Ghz
c CComplex
s SSimple
Tests run on Linux v 2.6.18 kernel
Only one trial run per experiment, as the simulator is deterministic
![Page 22: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/22.jpg)
Microbenchmark results
Modified gettid() to call coreswitch() and run it N= 1,000,000 times in a tight loop
![Page 23: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/23.jpg)
Cross-core wakeup from quiesce
● idle loop polling is inefficient
● initiating cross-CPU interrupt is slow as a powered down CPU needs to be awakened
● Kernel should dynamically decide between spinlock and powering down based on recent history.
![Page 24: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/24.jpg)
Macrobenchmark results - Web Benchmark
![Page 25: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/25.jpg)
Macrobenchmark results - Database Benchmark
Using “TPC-B-like” example from the Berkeley DB distribution
Core switch done only on fdatasync()
Eliminated disk I/O delays by using a RAM disk on the real hardware, and by setting the access time to zero in M5’s disk simulator.
![Page 26: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/26.jpg)
Future Work
● Energy measurement/savings benchmarks for the above tests
● Determining the best core to switch to and the best time to switch in
● Optimal mechanism to poll or power down a Processor
![Page 27: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/27.jpg)
Summary
● Cost of core switching is more important when use asymmetric multicores.
● Core switching to slower OS cores on frequent, expensive system calls some times reduce performance○ But it also provide power down complex application
cores.
![Page 28: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/28.jpg)
References ● J. Aas. Understanding the Linux 2.6.8.1 CPU Scheduler. http://josh.trancesoftware.
com/linux/, Feb. 2005.
● S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The Impact of Performance Asymmetry in Emerging Multicore Architectures. In Proc. ISCA, pages 506–517, 2005.
● M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. J. Instruction Level Parallelism, pages 1–26, June 2008.
● N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G.Saidi, and S. K. Reinhardt. The M5 Simulator: Modeling Networked Systems. IEEE Micro, 26(4):52–60, 2006.
● D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proc. ISCA, pages 83–94, Jun. 2000.
![Page 29: Fast switching of threads between cores - Advanced Operating Systems](https://reader033.fdocuments.in/reader033/viewer/2022052908/5596d9f71a28abd56a8b4574/html5/thumbnails/29.jpg)
Q / A
Thank You