Post on 04-Jan-2016
description
1
Virtual Private Caches
ISCA’07Kyle J. Nesbit, James Laudon, James E. Smith
Presenter: Yan Li
2
CMP-based System Chip-level Multiprocessor
multiple processor cores are implemented into a single chip
Multithreading support
Intel Core 2 Duo E6750
3
CMP-based System (2)
Resource sharingCache capacity/bandwidth, main memory……
Pros: Higher resource utilization Cons: Inter-thread interference
Unpredictable performance / no QoS! Many applications running on CMP-based
systems require Quality of Service
4
Quality of Service QoS are required by many applications:
Soft real-time applications video games
Find-grain parallel applications Scheduling & synchronization
Server consolidation Hosting services
QoS objectives in CMP-based system provide an upper bound on thread execution time regar
dless of other thread activity
5
Outline
Introduction QoS Framework Virtual Private Cache - VPC Arbiter Virtual Private Cache - Capacity Manager Performance Evaluation Conclusions
6
Overview of VPM
Virtual Private Machine: A set of allocated hardware resourcesProcessors, bandwidth, memory spaces…
Each thread is allocated a share of hardware resource based on policiesApplications & system software
Hardware mechanism enforces allocated resources
7
System hardware
VPM
8
Objectives of VPM
Performance Isolation thread performance is as good as on real
private machine having same resources Dynamic distribution of excess resources
Unallocated resourcesAllocated but not used resources
9
Virtual Private Cache
Microarchitecture-level mechanism Main components
VPC Arbiter: tag & data array bandwidth sharing VPC Capacity Manager: cache capacity sharing
Advantages Performance isolation Improved utilization
10
Outline
Introduction QoS Framework Virtual Private Cache - VPC Arbiter Virtual Private Cache - Capacity Manager Performance Evaluation Conclusions
11
VPC Arbiter - Implementation(1) Each data & tag array has an arbiter Each arbiter has
FIFO buffer for each thread:1 clock register R.clk: determine arrival timeR.Li & R.Si for thread i: virtual service/start tim
e
12
VPC Arbiter - Implementation(2)
R.Li: virtual service time of a request from thread i L: latency of shared cache; : thread i’s fraction of re
sources R.Si: virtual start time of the next request of thre
ad i Time that the resource is available for the next reques
t of thread i
13
Fair Queuing Scheduling
Request Arrival:
Arbiter Calculation of virtual finish time:
Arbiter Selection: select the request with the earliest Fi
14
Arbiter Fairness Policy
Excess bandwidth is distributed to threads that has received the least excess bandwidth in the past
15
Outline Introduction QoS Framework Virtual Private Cache - VPC Arbiter Virtual Private Cache - Capacity Manager Performance Evaluation Conclusions
16
Implementation
Set associative replacement policy Each thread receives
same number of sets as the shared cache at least <ways in the shared cache>
Replacement policy LRU line owned by thread i, such that thread i owns
more than ways LRU line owned by the thread that requesting the
replacement
17
Outline Introduction QoS Framework Virtual Private Cache - VPC Arbiter Virtual Private Cache - Capacity Manager Performance Evaluation Conclusions
18
Experiment Setup Two microbenchmarks to stress performanc
e isolation featureLoads: load operations with continuous read hitsStores: store operations with continuous write hit
s SPEC CPU2000 benchmark suite QoS performance metrics
IPCData array utilization
19
Other Arbiter
Read over WritePrioritize read over write
Read over Write First Come First ServicePrioritize read over writePrioritize oldest requests
Round Robin Interleave requests uniformly and consistently
20
Microbenchmark
21
SPEC
22
Conclusions
VPC: hardware mechanism of VPM QoS framework VPC arbiter & capacity manager
VPC can achieve global QoS objectives
Issues: Local QoS objectives assumes performance monotoni
city
23
Thank You!
&
Questions?