Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji...
Transcript of Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji...
Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research)
* Work done during summer internship at Microsoft Research
Symmetric
SMP
Asymmetric multicore processor
AMP
Different types of CPU cores
P P
P P P
P
P
CPU Cores
multicore processor
B
B
C A
P P
P P
P P
P
SMP
AMP
SMP
AMP
Application
time T 2T 3T
Speedup!
• How good are AMPs as compared to SMPs?
• Can datacenter applications save power using AMPs?
S … S S S
S … S S S
S … S S S
. . . . . . . . . . . .
Others
Processor
Datacenter
Server
P P
P P P
P
P SMP AMP
λdatacenter (throughput)
• Constant work • Meet latency SLA
€
PdatacenterAMP < Pdatacenter
SMP ?
• Energy Scaling
• Parallel Speedup …
Sequential execution
Parallel execution
P P
P
Sequential application
SMP AMP
Area equivalent
time TSLA
TSMP
TAMP
Slack
tlarge tsmall
P
P
P
SMP
AMP
Smaller core = lesser power
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P … SMP AMP
Parallel application
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
SMP AMP
…
…
Sequential Phase
Small cores: Bottleneck
Run on the fast core
Speedup = Higher throughput
Server
Request Queue
Arrival Rate λ Service Rate
µ
Latency SLA
M/M/1 Queuing Model
€
E[T] =1
µ − λAvg.
Response Time
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P … SMP AMP
Parallel application
Amdahl’s Law for Multicores
Parallel Speedup (PS) (refer to paper for ES)
Area = 1 Area = r Perf = perf(r)
n = Chip area
P P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P P
P
P
P
P
P
P
P
P
P
P P
P P
SMP n=16, r=1
AMP n=16, r=4
SMP n=16, r=4
r = Area(Big/Core)
f = fraction of computation that can be parallelized
€
µAMP ( f ,n,r) =1
1− fperf (r)
+f
n − r
Ref: Hill and Marty, Amdahl's law in the multicore era (IEEE Computer’08)
€
µSMP ( f ,n,r) =1
1− fperf (r)
+f
nr* perf (r)
€
λdatacenter = NserverSMP * λserver
SMP
€
λdatacenter = NserverAMP * λserver
AMP
Datacenter capacity = No. of servers * Server throughput
Constant Work
€
λserverpeak = µ −
1TSLA
€
PdatacenterSMP = Nserver
SMP *PserverSMP
€
PdatacenterAMP = Nserver
AMP *PserverAMP
Datacenter power (P) = No. of servers * Server power
CPU Utilization (U)
Serv
er P
ower
C
onsu
mpt
ion
P(U
)
Idle Power
Peak Power
Ref: The Case for Energy-Proportional Computing, Barroso & Hölzle, IEEE Computer 2007
CPU Utilization (U)
Frac
tion
of ti
me Server load distribution (Wload)
€
Pserver = Wload (U) *Pserver (U)∑
€
PdatacenterAMP < Pdatacenter
SMP ?
Upto 52% power savings n = 64
0% 10% 20% 30% 40% 50% 60%
0 0.2 0.4 0.6 0.8 1
Pow
er sa
ving
s of A
MP
over
SM
P
Fraction of work that can be parallelized (f)
r=32 r=16 r=8 r=4
Upto 14% power savings
-25% -20% -15% -10% -5% 0% 5%
10% 15% 20%
5% 10% 15% 20% 25% 30% 35% 40% 45%
Pow
er sa
ving
s of A
MP
over
SM
P
Fraction of area sacrificed for small core
Small core bias Uniform bias Large core bias
Application A Application B Application C
• PS looks more promising that ES
• Can we achieve these savings in reality?
High r (realistic r = 3)
High (but not too high!) f
0% 10% 20% 30% 40% 50% 60%
0 0.5 1 Pow
er sa
ving
s of A
MP
over
SM
P
Fraction of work that can be parallelized (f)
r=32 r=16 r=8 r=4
• Scalability: Amdahl’s law assumes unbounded scalability
• Migration overhead: zero migration overhead
• Perfect scheduling: oracle scheduler
Actual savings are going to be lower
• Potential for power savings in datacenters using AMPs
• Parallel Speedup more promising than Energy Scaling
• Practical considerations to realize full benefits
Future work: Extend our analysis to functional asymmetry