Measuring and Discussing Computer System Performance
description
Transcript of Measuring and Discussing Computer System Performance
1
Measuring and Discussing Computer System Performance
or“My computer is faster than your
computer”
Reading: 2.4, 3.1-3.5.
Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-
NonCommercial-ShareAlike 3.0 Unported License.
2
Performance Metrics1. Network Bandwidth (data/sec)2. Network Latency (ms)3. Frame Rate (frames/sec)4. Throughput (ops/sec)
DomainsSelection
Online WoW
Crysis (FPS)
Torrent Download
Google Server Farm
A 4 3 1 2
B 4 1 3 2
C 2 1 3 4
D 2 3 1 4
E None of the above
Match (Best) Performance Metric to Domain
Jack’s car buying analogy, we care about many….
Prep with explanation of metrics and domains.
3
Measures of “Performance”
• Execution Time• Frame Rate• Throughput
(operations/time)• Responsiveness• Performance / Cost• Performance / Power • Performance /
Power^2
4
Recall our O(n) discussion
• Much of computer science focuses on execution time – and much of our class will as well.
• Ultimately, much of what we do we want fast (response time).
• So is time a reasonable metric?
People sit in front of computers / iphones / etc.
User time/ clock time/ CPU time
CPU Time for this class – for now.
5
All Together Now
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
6
All Together Now
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
instructionscycles/instruction seconds/cycle
seconds
7
• IC = 1 billion, 500 MHz processor, execution time of 3 seconds. What is the CPI for this program?
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
Selection
CPI
A 3
B 15
C 1.5
D 15*10^9
E None of the above
3 sec = 1*10^9 inst*CPI * 1sec/(5*10^8)cycles
1.5*10^9 cycles = 10^9insts*CPI
1.5 = CPI
8
Who Affects Performance?
• There are a number of people involved in processor / programming design
• Each of these elements of the performance equation can be impacted by different designer(s)
• Next slides will be about who can impact what. We’ll do speed voting (1 min ind, 1 min group) then discuss each slide.
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
Individual only
IC CT
9
Who Affects Performance?
• What can a programmer influence?
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
1 min ind / 1 min group
Selection Impacts
A IC
B IC, CPI
C IC, CPI, and CT
D IC and CT
E None of the above
IC CT
10
Who Affects Performance?
• What can a compiler influence?
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
Selection Impacts
A IC
B IC, CPI
C IC, CPI, and CT
D CPI and CT
E None of the above
IC CT
1 min ind / 1 min group
11
Who Affects Performance?
• What can an instruction set architect influence?
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
Selection Impacts
A IC
B IC, CPI
C IC, CPI, and CT
D CPI and CT
E None of the above
IC CT
1 min ind / 1 min group
12
Who Affects Performance?
• What can an hardware designer influence?
CPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
Selection Impacts
A IC
B IC, CPI
C IC, CPI, and CT
D CPI and CT
E None of the above
IC CT
1 min ind / 1 min group
13
Performance VariationCPU Execution Time
Instruction Count
CPI Clock Cycle Time= X X
Number of instructions
CPI Clock Cycle Time
Same machine different programs
same programs, different machines, same ISA
Same programs, different machines, different ISA
Same Diff Same
Same Same Diff
Same Diff Diff
Selection
Row(s) Correct
A 1
B 1 and 3
C 2
D 2 and 3
E None of the above
ROW
1
2
3 DIFF
DIFF
DIFF
14
Other Performance Metrics
• Time is useful – but how might we try to measure the “performance” of a machine- MIPS- MFLOPS
15
MIPS
MIPS = Millions of Instructions Per Second = Instruction Count Execution Time * 106
= Clock rate CPI * 106
• program-independent• deceptive
Just crank up clock rate and have it execute tons of noops.
But we need to sell processors, what do we market?
16
Trying to market the “performance” of a processor.
• “Speed Demons” vs. “Brainiacs”
If we can’t use something like raw processor speed (CT), if we want CPI - we need to look at
performance on benchmarks
Intel vs. Alpha, Intel wins… but ends up remarketing.
17
Benchmarks - Which Programs?
• peak throughput measures (simple programs)?
18
Benchmarks - Which Programs?
• peak throughput measures (simple programs)?
• synthetic benchmarks (whetstone, dhrystone,...)?
19
Benchmarks - Which Programs?
• peak throughput measures (simple programs)?
• synthetic benchmarks (whetstone, dhrystone,...)?
• Real applications
20
Benchmarks - Which Programs?
• peak throughput measures (simple programs)?• synthetic benchmarks (whetstone,
dhrystone,...)?• Real applications• SPEC (best of both worlds, but with problems of
their own)- System Performance Evaluation Cooperative- Provides a common set of real applications along with
strict guidelines for how to run them.- provides a relatively unbiased means to compare
machines.
21
Danger in Benchmark-Specific Performance Measures
• measures compiler as much as architecture!
22
SPEC Performance on Pentium III and Pentium 4Focus on clock rate relative to change in
INT vs. FP performance. SSE2 on P3 was a FP stack, P4 had independent
registers
23
Speedup
• Often want to compare performance of one machine against another
Performance = 1 Execution Time
Speedup (A over B) = PerformanceA
PerformanceB
Speedup (A over B) = ETB
ETA
24
Amdahl’s Law
Execution timeafter improvement
=Execution Time Affected
Amount of ImprovementExecution Time Unaffected+
25
Amdahl’s Law and Parallelism• Our program is 90% parallelizable (segment of code
executable in parallel on multiple cores) and runs in 100 seconds with a single core. What is the execution time if you use 4 cores (assume no overhead for parallelization)?
Selection
Execution Time
A 25 seconds
B 32.5 seconds
C 50 seconds
D 92.5 seconds
E None of the above
ISOMORPHIC
Execution timeafter improvement
=Execution Time Affected
Amount of ImprovementExecution Time Unaffected+
26
Amdahl’s Law and Parallelism• Our program is 90% parallelizable (segment of code
executable in parallel on multiple cores) and runs in 100 seconds with a single core. What is the execution time if you use 2 cores (assume no overhead for parallelization)?
Selection
Execution Time
A 55 seconds
B 50 seconds
C 100 seconds
D 95 seconds
E None of the above
ISOMORPHIC
Execution timeafter improvement
=Execution Time Affected
Amount of ImprovementExecution Time Unaffected+
27
Amdahl’s Law
• So what does Amdalh’s Law mean at a high level?
Selection
“BEST” message from Amdahl’s Law
A Parallel programming is critical for improving performance
B Improving serial code execution is ultimately the most important goal.
C Performance is strictly tied to the ability to determine which percentage of code is parallelizable.
D The impact of a performance improvement is limited by the percent of execution time affected by the improvement
E None of the above
28
Point out Phenom II x4 and x2 get same performance – but one has 4 cores the other 2. What does this tell us? (Note – with some slower speeds of the Phenom this isn’t the case – 4 cores help.)
29
Speedup vs. Sizeup
• Speedup runs into problems for parallelization because of diminishing returns and dominance of serial execution.
• What if time were a constant?Human perception (graphics)
Earthquake prediction
Weather prediction
30
Key Points
• Be careful how you specify
“performance”
• Execution time = IC * CPI * CT
• Use real applications, if possible
• Use standards, if possible
• Make the common case fast