Performance
description
Transcript of Performance
![Page 1: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/1.jpg)
Computer Architecture and Performance
![Page 2: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/2.jpg)
Objectives:
Understand the concepts of computer architecture
Understand how performance is measuredKnow the different ways to measure
computer performance
![Page 3: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/3.jpg)
Computer Architecture
The task the computer designer faces is a complex one: Determine what attributes are important for a new computer, then design a computer to maximize performance while staying within cost, power, and availability constraints.
![Page 4: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/4.jpg)
Computer Architecture cont’d
In the past, the term computer architecture often referred only to instruction set design.
Other aspects of computer design were called implementation.
![Page 5: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/5.jpg)
Instruction Set Architecture
Instruction set architecture (ISA) refers to the actual programmer-visible instruction set Ex, LMC instruction set
The ISA serves as the boundary between the software and hardware.
![Page 6: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/6.jpg)
Implementation
The implementation of a computer has two components: organization and hardware.
The term organization includes the high-level aspects of a computer’s design, such as the memory system, the memory interconnect, and the design of the internal processor or CPU.
Hardware refers to the specifics of a computer, including the detailed logic design and the packaging technology of the computer.
![Page 7: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/7.jpg)
Goal
Computer architects must design a computer to meet functional requirements.
![Page 8: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/8.jpg)
Time discovers truth.Seneca
![Page 9: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/9.jpg)
Performance
In general, performance describes how quickly a given system can execute a program or programs.
Systems that execute programs in less time are said to have higher performance
![Page 10: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/10.jpg)
Response Time/Execution Time
The time between the start and completion of a task
To maximize performance, we want to minimize response time or execution time for some task.
![Page 11: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/11.jpg)
Response Time/Execution Time
Thus we can relate performance and execution time for a computer X:
PerformanceX = 1________
Execution TimeX
![Page 12: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/12.jpg)
Response Time/Execution Time
This means that for two computers X and Y, if the performance of X is greater than the performance of Y, we have
PerformanceX > PerformanceY
_ 1____ > ____1_____ Execution TimeX Execution TimeY
Execution timeY > Execution timeX
![Page 13: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/13.jpg)
Response Time/Execution Time
In discussing a computer design, we often want to relate the performance of two different computers quantitatively. We will use the phrase “X is n times faster than Y”—or equivalently “X is n times as fast as Y”—to mean
PerformanceX = nPerformanceY
![Page 14: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/14.jpg)
Response Time/Execution Time
If X is n times faster than Y, then the execution time on Y is n times longer than it is on X:
PerformanceX = ExecutionY = n
PerformanceY ExecutionX
![Page 15: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/15.jpg)
Relative Performance
Ex. If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds, how much faster is A than B?
![Page 16: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/16.jpg)
Ex.
PerformanceA = ExecutionB = n
PerformanceB ExecutionA
Thus, the performance ratio is15 = 1.510
and A is therefore 1.5 times faster than B.
![Page 17: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/17.jpg)
Measuring Performance
Time is the measure of computer performance: the computer that performs the same amount of work in the least time is the fastest.
![Page 18: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/18.jpg)
Performance Metrics
Cycles per Instruction (CPI) Number of clock cycles required to execute each
instruction CPI = number of clock cycles required to execute
program number of instructions executed in running the
program
Instructions executed Per Cycle (IPC) For systems that can execute more than one instruction
per cycle, the IPC is used instead of CPI IPC = number of instructions executed in running a
program number of clock cycles required to execute
programNote: IPC is the reciprocal of CPI
![Page 19: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/19.jpg)
Ex.
A given program consists of a 100-instruction loop that is executed 42 times. If it takes 16,000 cycles to execute the program on a given system, what are the system’s CPI and IPC values for the program?
Soln:
![Page 20: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/20.jpg)
Benchmark Suites
Consists of a set of programs that are believed to be typical of the programs that will be run on the system
They generate estimates of a system’s performance on different types of applications. Ex. SPEC – Standard Performance Evaluation
Corporation is a non-profit corporation formed to establish,
maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers.
SPEC CPU2006,SPEC CPUv6
![Page 21: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/21.jpg)
Speedup
Used to describe how the performance of an architecture changes as different improvements are made to the architecture
It is the ratio of the execution times before and after a change is made
Speedup = Execution Time before
Execution Time after
![Page 22: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/22.jpg)
Ex
If a program takes 25 seconds to run on one version of an architecture and 15 seconds to run on a new version, the overall speedup = 25 sec/15 sec = 1.67
![Page 23: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/23.jpg)
Amdahl’s Law
The most important rule for designing high-performance computer systems is make the common case fast.
Qualitatively, this means that the impact of a given performance on overall performance is dependent on both how much the improvement improves performance when it is in use and how often the improvement is in use
![Page 24: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/24.jpg)
Amdahl’s Law
Execution Timenew =
Execution Timeold X [ Fracunused + Frac used
]
Speedupused
![Page 25: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/25.jpg)
where:
Frac unused = fraction of time that the improvement is not in use
Fracused = fraction of time that the improvement is in use
Speedupused = speedup that occurs when the improvement is used
Note that Fracused and Fracunused are computed using the the execution time before the modification is applied.
![Page 26: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/26.jpg)
Amdahl’s Law can be rewritten using the definition of speedup:
Speedup = Execution Timeold
Execution Timenew
= ________ 1_____________[ Fracunused + Frac used ]
Speedupused
![Page 27: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/27.jpg)
Ex.
Suppose that a given architecture does not have hardware support for multiplication, so multiplication have to be done through repeated addition (this was the case on some early microprocessors). If it takes 200 cycles to perform multiplication in software, and 4 cycles to perform multiplication in hardware, what is the overall speedup from hardware support for multiplication if a program spends 10% of its time doing multiplications? What about a program that spends 40% of its time doing multiplications?
![Page 28: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/28.jpg)
Soln:
![Page 29: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/29.jpg)
Seatwork:
1. If the 2011 version of a computer executes a program in 200ns and the version of the computer made in the year 2013 executes the same program in 150ns, what is the speedup that the manufacturer had achieved over the two-year period?
![Page 30: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/30.jpg)
2. To achieve a speedup of 3 on a program that originally took 78s to execute, what must be the execution time of the program be reduced to?
![Page 31: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/31.jpg)
3. When run on a given system, a program takes 1,000,000 cycles. If the system achieves a CPI of 40, how many instructions were executed in running the program?
![Page 32: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/32.jpg)
4. What is the IPC of a program that executes 35,000 instructions and requires 17,000 cycles to execute?
![Page 33: Performance](https://reader036.fdocuments.in/reader036/viewer/2022062423/55cf931a550346f57b9bbdf7/html5/thumbnails/33.jpg)
5. Suppose a computer spends 90% of its time handling a particular type of computation when running a given program, and its manufacturers make a change that improves its performance on that type of computation by a factor of 10.
a. If the program originally took 100s to execute, what will its execution time be after the change?
b. What is the speedup from the old system to the new system?