Computer Performance Evaluation and...
Transcript of Computer Performance Evaluation and...
![Page 1: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/1.jpg)
Computer Performance
Evaluation and Benchmarking
EE 382M
Dr. Lizy Kurian John
![Page 2: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/2.jpg)
Evolution of Single-Chip
Microprocessors
1970’s 1980’s 1990’s 2010s
Transistor Count 10K-
100K
100K-1M 1M-100M 100M-
10 B
Clock Frequency 0.2-
2MHz
2-20MHz 20M-
1GHz
0.1-
4GHz
Instruction/Cycle < 0.1 0.1-0.9 0.9- 2.0 1-100
MIPS/MFLOPS < 0.2 0.2-20 20-2,000 100-
10,000
![Page 3: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/3.jpg)
Hot Chips 2014 (August 2014)
![Page 4: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/4.jpg)
AMD KAVERI HOT CHIPS
2014
![Page 5: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/5.jpg)
AMD KAVERI
![Page 6: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/6.jpg)
HOTCHIPS 2014
![Page 7: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/7.jpg)
Hotchips 2014
![Page 8: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/8.jpg)
Hotchips 2014 - NVIDIA
![Page 9: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/9.jpg)
Power Density in
Microprocessors
40048008
8080
8085
8086
286 386486
Pentium®
Processors
1
10
100
1000
10000
1970 1980 1990 2000 2010
Po
wer
Den
sity
(W
/cm
2)
Source: Intel
Hot Plate
Nuclear Reactor
Rocket Nozzle
Sun’s Surface
Core 2
![Page 10: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/10.jpg)
Why Performance Evaluation?
• For better Processor Designs
• For better Code on Existing Designs
• For better Compilers
• For better OS and Runtimes
Design Analysis
![Page 11: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/11.jpg)
Lord Kelvin
“To measure is to know.”
"If you can not measure it, you can not improve it.“
"I often say that when you can measure what you are
speaking about, and express it in numbers, you know
something about it; but when you cannot measure it, when
you cannot express it in numbers, your knowledge is of a
meagre and unsatisfactory kind; it may be the beginning of
knowledge, but you have scarcely in your thoughts advanced
to the state of Science, whatever the matter may be." [PLA,
vol. 1, "Electrical Units of Measurement", 1883-05-03]
![Page 12: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/12.jpg)
Designs evolve based on
Analysis• Good designs are impossible without good
analysis
• Workload Analysis
• Processor Analysis
Design Analysis
![Page 13: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/13.jpg)
Performance Evaluation - an integral
part of good computer architecture
Graphic in Patterson & Hennessy’s first edition of the
Computer Organization book – Five Classic Components of a
Computer
![Page 14: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/14.jpg)
Metrics
• Latency: time to completely execute a certain task
• Throughput: amount of work that can be done over
a period of time
• Power: instantaneous power during execution of a
program
• Energy: Total energy consumption during the
execution of the whole program
• Reliability: Failure rate
• CPI, IPC, MIPS, MFLOPS, MTTF, MTBF, AVF,
Transactions/minute, Transactions/hour,
MIPS/watt, Watts, Joules, Joules/instr, etc
![Page 15: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/15.jpg)
“Iron Law” of Processor
Performance
Processor Performance = Execution Time
Instructions Cycles
Program Instruction
Time
Cycle
(code size) (CPI) (cycle time)
= X X
CPI is often used for single-core processors when code
size is same and cycle time is same between cases being
compared.
![Page 16: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/16.jpg)
Challenges in Performance
Evaluation
• Complexity of Processors
• Complexity of Modern Workloads
![Page 17: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/17.jpg)
Simple non-pipelined
processors/microcontrollers
Attached is a datasheet from
Motorola 68HC11
Non-overlapped operations
Fixed number of cycles
Add up the cycles according
to the addressing mode of
the instruction
Performance Evaluation of Early Non-pipelined Processors
![Page 18: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/18.jpg)
Early Pipelined Processors
• Use datapath figure to represent pipelineIFtch Dcd Exec Mem WB
ALUI$ Reg D$ Reg
PC
instr
uctio
n
mem
ory
+4
rt
rs
rd
reg
iste
rs
ALU
Da
ta
me
mo
ry
imm
1. Instruction
Fetch2. Decode/
Register Read3. Execute 4. Memory
5. Write
Back
![Page 19: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/19.jpg)
Pipelined Execution Representation
• Evaluate by creating a simulator that mimics this
process. Dealing of instruction dependencies and
data forwarding etc. modeled in the simulator.
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
Time
![Page 20: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/20.jpg)
Processor Challenges
Superscalar Processors
Simultaneously Multithreaded Processors (SMT)
(Also called Hyperthreading)
Multicore Processors
Each core can be Single-threaded
Each core can be Hyperthreaded
![Page 21: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/21.jpg)
Superscalar Processors
![Page 22: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/22.jpg)
Multicore Processors
• Efficient utilization of big transistor budgets
• Wide superscalars are power hungry
• Have several cores albeit simple
• Operate at a lower energy point
• Run in parallel to recoup lost performance
![Page 23: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/23.jpg)
Heterogeneous Architectures
…
…
Static Single-ISA
Heterogeneous Multi-core
Inter-Program Diversity
A
D
Single ISA Heterogeneous
Cores with same ISA, but
with different
microarchitectures
Multiple ISA Heterogeneous
One or more ISAs and
Accelerators (main ISA,
DSP processor ISA,
hardware accelerators)
GPGPUs
![Page 24: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/24.jpg)
Workload Challenges
Virtualized Workloads
Multiple non-parallelizable applications may be
running on multiple cores
Parallelizable Applications
Operating Systems and Runtimes –
Dynamic Mapping, Scheduling
Compiler optimizations
![Page 25: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/25.jpg)
Complex Workloads - Heterogeneous Architectures
Multiprogrammed
workloads: e.g. SPEC
CPU
Multithreaded workloads:
e.g. PARSEC
Diversity inside programs
…
…
Static Single-ISA
Heterogeneous Multi-core
Inter-Program Diversity
Intra-Program Diversity
A
D
![Page 26: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/26.jpg)
“Iron Law” of Processor
Performance
Processor Performance = Execution Time
Instructions Cycles
Program Instruction
Time
Cycle
(code size) (CPI) (cycle time)
= X X
CPI is often used for single-core processors when code
size is same and cycle time is same between cases being
compared.
![Page 27: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/27.jpg)
Simulation Methods
![Page 28: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/28.jpg)
Classification of Techniques
• Performance Modeling
– Simulation
• Trace-Driven Simulation
• Execution Driven Simulation
• Complete System Simulation
• Event-Driven Simulation
• Statistical Simulation
– Analytical Modeling
• Probabilistic Models
• Queuing Models
• Markov Models
• PetriNet Models
• Performance Measurement
– On-Chip Hardware Monitoring
– Off-Chip Hardware Monitoring
– Software Monitoring
– Microcoded Instrumentation
![Page 29: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/29.jpg)
PRESILICON EVALUATION
• Required in early design stages
• Before prototypes can be built
• Pre-silicon
• Very important because many design
decisions are made based on this
• Timeliness of products are important in
today’s competitive world
![Page 30: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/30.jpg)
POST-SILICON EVALUATION
• To improve current generation compilers
• To improve current generation operating
systems and runtimes
• To improve current generation hardware
• To improve next generation of products
![Page 31: Computer Performance Evaluation and Benchmarkingusers.ece.utexas.edu/~ljohn/teaching/382m-15/lectures/lec1.pdfEvaluation of Modern and Future Processors Huge Challenge Evaluating one](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed35bdb080258622969baac/html5/thumbnails/31.jpg)
Evaluation of Modern and Future
ProcessorsHuge Challenge
Evaluating one processor is hard enough
Evaluating all the software and hardware layers involved
The design process, the tradeoff evaluation, depends largely
on the performance evaluation. Your company’s future
depends on the performance (P, P, E) estimates you project
for potential designs.