Computer Organization and Design Performance Montek Singh Mon, Aug 26, 2013 Lecture 2 1.
-
Upload
susan-hall -
Category
Documents
-
view
217 -
download
3
Transcript of Computer Organization and Design Performance Montek Singh Mon, Aug 26, 2013 Lecture 2 1.
Computer Organization and Computer Organization and DesignDesign
PerformancePerformance
Montek SinghMontek Singh
Mon, Aug 26, 2013Mon, Aug 26, 2013
Lecture 2Lecture 2
1
TopicsTopics Defining “Performance”Defining “Performance” Performance MeasuresPerformance Measures How to improve performanceHow to improve performance Performance pitfallsPerformance pitfalls ExamplesExamples
Reading: Ch 1.4Reading: Ch 1.4
2
Why Study Performance?Why Study Performance? Helps us make intelligent design choicesHelps us make intelligent design choices
See through the marketing hypeSee through the marketing hype
Key to understanding underlying computer Key to understanding underlying computer organizationorganization Why is some hardware faster than others for different Why is some hardware faster than others for different
programs?programs? What factors of system performance are hardware What factors of system performance are hardware
related?related?e.g., Do we need a new machine …e.g., Do we need a new machine …… … or a new operating system?or a new operating system?
How does a machineHow does a machine’’s instruction set affect its s instruction set affect its performance?performance?
3
What is Performance?What is Performance? Loosely:Loosely:
How fast can a computer complete a taskHow fast can a computer complete a task
Examples of “tasks”:Examples of “tasks”: Short tasks:Short tasks:
Crunch a bunch of numbers (say calculate mean)Crunch a bunch of numbers (say calculate mean)Display a PDF documentDisplay a PDF documentRespond to a game console button pressRespond to a game console button press
Longer ones:Longer ones:Search for a document on hard driveSearch for a document on hard driveRip a CD track into an mp3 fileRip a CD track into an mp3 fileApply a Photoshop filterApply a Photoshop filterTranscode/recode a videoTranscode/recode a video
4
Which Which airplaneairplane is is ““bestbest””??Airplane
Passenger capacity
Cruising range (miles)
Cruising speed (mph)
Passenger throughput (passengers x mph)
5
Which Which airplaneairplane is is ““bestbest””??
How much faster is the Concorde than the 747?How much faster is the Concorde than the 747? 2.2 X 2.2 X (“X” means “factor of”)(“X” means “factor of”)
How much larger is the 747How much larger is the 747’’s capacity than the s capacity than the Concorde?Concorde? 3.6 X3.6 X
It is roughly 4000 miles from Raleigh to Paris. What is the It is roughly 4000 miles from Raleigh to Paris. What is the throughputthroughput of the 747 in passengers/hr? The Concorde? of the 747 in passengers/hr? The Concorde?
What is the What is the latencylatency (trip time) of the 747? The Concorde? (trip time) of the 747? The Concorde? 6.56 hours, 2.96 hours6.56 hours, 2.96 hours
AirplanePassenger capacity
Cruising range (miles)
Cruising speed (mph)
Passenger throughput (passengers x mph)
6
Performance MetricsPerformance Metrics Latency: Time from input to corresponding Latency: Time from input to corresponding
outputoutput How long does it take for my program to run?How long does it take for my program to run? How long must I wait after typing return for the How long must I wait after typing return for the
result?result? Other examples?Other examples?
Throughput: Results produced per unit timeThroughput: Results produced per unit time How many results can be processed per second?How many results can be processed per second? What is the average execution rate of my program?What is the average execution rate of my program? How much work is getting done each second?How much work is getting done each second? Other examples?Other examples?
7
Design TradeoffsDesign Tradeoffs Performance is rarely the sole factorPerformance is rarely the sole factor
Cost is important tooCost is important too Energy/power consumption is often criticalEnergy/power consumption is often critical
Frequently used compound metricsFrequently used compound metrics Performance/Cost (throughput/$)Performance/Cost (throughput/$) Performance/Power (throughput/watt)Performance/Power (throughput/watt) Work/Energy (total work done per joule)Work/Energy (total work done per joule)
for battery-powered devicesfor battery-powered devices
8
Execution TimeExecution Time Elapsed Time/Wall Clock TimeElapsed Time/Wall Clock Time
counts everything (disk and memory accesses, I/O, etc.)counts everything (disk and memory accesses, I/O, etc.) includes the impact of other programsincludes the impact of other programs a useful number, but often not good for comparison a useful number, but often not good for comparison
purposespurposes
CPU timeCPU time does not include I/O or time spent running other does not include I/O or time spent running other
programsprograms can be broken up into system time, and can be broken up into system time, and user timeuser time
Our focus: user CPU time Our focus: user CPU time time spent executing actual instructions of time spent executing actual instructions of ““ourour””
programprogram9
PerformancePerformance For some program running on machine X,For some program running on machine X,
PerformancePerformanceXX = Program Executions / Time = Program Executions / TimeXX (executions/sec)(executions/sec)
= 1 / Execution Time= 1 / Execution TimeXX
Relative PerformanceRelative Performance "X is "X is nn times faster than Y times faster than Y”” PerformancePerformanceXX / Performance / PerformanceYY = = nn
Example:Example: Machine A runs a program in 20 secondsMachine A runs a program in 20 seconds Machine B runs the same program in 25 secondsMachine B runs the same program in 25 seconds
By how much is A faster than B?By how much is A faster than B?By how much is B slower than A?By how much is B slower than A?PerformanceA = 1/20 PerformanceB = 1/25
Machine A is (1/20)/(1/25) = 1.25 times faster than Machine B10
Performance: Pitfalls of using %Performance: Pitfalls of using % Same Example:Same Example:
Machine A runs a program in 20 sec; B takes 25 secMachine A runs a program in 20 sec; B takes 25 sec By how much is A faster than B?By how much is A faster than B?
Is it (25-20)/25 = 20% faster?Is it (25-20)/25 = 20% faster? Is it (25-20)/20 = 25% faster?Is it (25-20)/20 = 25% faster?Correct answer is: A is (PerfCorrect answer is: A is (PerfAA-Perf-PerfBB)/Per)/PerBB faster faster
– (0.05 - 0.04) / (0.04) = 25% faster(0.05 - 0.04) / (0.04) = 25% faster
By how much is B slower than A?By how much is B slower than A?Correct answer is: B is (PerfCorrect answer is: B is (PerfBB-Perf-PerfAA)/Per)/PerAA faster/slower faster/slower
– (0.04 - 0.05) / (0.05) = -20% faster = 20% slower(0.04 - 0.05) / (0.05) = -20% faster = 20% slower
Confusing: A is 25% faster than B; B is 20% slowerConfusing: A is 25% faster than B; B is 20% slower Better: A is 1.25 times faster than B; B is 1/1.25 as Better: A is 1.25 times faster than B; B is 1/1.25 as
fastfast
Also: %ages are only good up to 100%Also: %ages are only good up to 100% Never say A is 13000% faster than BNever say A is 13000% faster than B 11
CPU ClockingCPU Clocking Operation of digital hardware governed by a Operation of digital hardware governed by a
constant-rate clockconstant-rate clock
Clock period: duration of a clock cycleClock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10e.g., 250ps = 0.25ns = 250×10–12–12ss
Clock frequency (rate): cycles per secondClock frequency (rate): cycles per second e.g., 1/(0.25ns) = 4GHz = 4000MHz = 4.0×10e.g., 1/(0.25ns) = 4GHz = 4000MHz = 4.0×1099HzHz
Clock (cycles)
Data transferand computation
Update state
Clock period
12
Program Clock CyclesProgram Clock Cycles Instead of reporting execution time in seconds, Instead of reporting execution time in seconds,
we often use clock cycle countswe often use clock cycle counts Why? A newer generation of the same processor…Why? A newer generation of the same processor…
Often has the same cycle counts for the same programOften has the same cycle counts for the same programBut often has different clock speed (ex, 1 GHz changes to But often has different clock speed (ex, 1 GHz changes to
1.5 GHz)1.5 GHz)
Clock “ticks” indicate when machine state Clock “ticks” indicate when machine state changes changes an abstraction: allows time to be discrete instead of an abstraction: allows time to be discrete instead of
continuouscontinuous
Simple relation:Simple relation:
time
Rate Clock
Cycles Clock CPU
Time Cycle ClockCycles Clock CPUTime CPU
13
Program Clock CyclesProgram Clock Cycles Relation:Relation:
or:or:
cycle time = time between ticks = seconds per cyclecycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz = 1 clock rate (frequency) = cycles per second (1 Hz = 1
cycle/s)cycle/s)
A 200 MHz clock has a cycle time of:A 200 MHz clock has a cycle time of:
Rate Clock
Cycles Clock CPU
Time Cycle ClockCycles Clock CPUTime CPU
14
Instruction Count and CPIInstruction Count and CPI
Instruction Count for a programInstruction Count for a program Determined byDetermined by
programprogram Instruction set architecture (ISA)Instruction set architecture (ISA)compilercompiler
Average cycles per instruction (“CPI”)Average cycles per instruction (“CPI”) Determined by CPU hardwareDetermined by CPU hardware If different instructions have different CPIIf different instructions have different CPI
Average CPI affected by instruction mixAverage CPI affected by instruction mix
Rate Clock
CPICount nInstructio
Time Cycle ClockCPICount nInstructioTime CPU
nInstructio per CyclesCount nInstructioCycles Clock
15
Computer Performance MeasureComputer Performance Measure
Millions of Instructions per Second Frequency in MHz
Historically: PDP-11, VAX, Intel 8086: CPI > 1
Load/Store RISC machinesMIPS, SPARC, PowerPC, miniMIPS: CPI = 1Modern CPUs, Pentium, Athlon : CPI < 1
CPI (Average Clocks Per Instruction)
Unfortunate coincidence:This “MIPS” has nothing to do with the name of the MIPS processor we will be studying!
16
How to Improve Performance?How to Improve Performance? Many ways to write the same equations:Many ways to write the same equations:
So, to improve performance (everything else So, to improve performance (everything else being equal) you can eitherbeing equal) you can either ________ the # of required cycles for a program;________ the # of required cycles for a program; ________ the clock cycle time or, said another way, ________ the clock cycle time or, said another way, ________ the clock rate;________ the clock rate; ________ the CPI (average clocks per instruction).________ the CPI (average clocks per instruction).
MIPS PitfallMIPS Pitfall Cannot compare MIPS of two different processors if they run Cannot compare MIPS of two different processors if they run
different sets of instructions!different sets of instructions!
MMeaningless eaningless IIndicator of ndicator of PProcessor rocessor SSpeed!peed!
Decrease Decrease
Increase
Decrease
17
How Many Cycles in a Program?How Many Cycles in a Program? For For somesome processors (e.g., MIPS processor) processors (e.g., MIPS processor)
# of cycles = # of instructions# of cycles = # of instructions
This assumption can be incorrect,Different instructions take different amounts of time on different machines.Memory accesses might require more cycles than other instructions.Floating-Point instructions might require multiple clock cycles to execute.Branches might stall execution rate
time
1st
inst
ruct
ion
2nd
inst
ruct
ion
3rd
inst
ruct
ion
4th
5th
6th ..
.
18
ExampleExample Our favorite program runs in 10 seconds on computer Our favorite program runs in 10 seconds on computer
A, which has a 2 GHz clock. We are trying to help a A, which has a 2 GHz clock. We are trying to help a computer designer build a new machine B, to run this computer designer build a new machine B, to run this program in 6 seconds. The designer can use new (or program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate as machine A for the same program. What clock rate should we tell the designer to target?should we tell the designer to target?
19
Now that we understand cyclesNow that we understand cycles A given program will requireA given program will require
some number of instructions (machine instructions)some number of instructions (machine instructions) some number of cyclessome number of cycles some number of secondssome number of seconds
We have a vocabulary that relates these We have a vocabulary that relates these quantities:quantities: cycle time cycle time (seconds per cycle)(seconds per cycle) clock rate clock rate (cycles per second)(cycles per second) CPICPI (average clocks per instruction) (average clocks per instruction)
a floating point intensive application might have a higher a floating point intensive application might have a higher CPICPI
MIPS MIPS (millions of instructions per second)(millions of instructions per second) this would be higher for a program using simple this would be higher for a program using simple
instructionsinstructions20
Performance TrapsPerformance Traps Performance is determined by the execution Performance is determined by the execution
time of a program that you care about.time of a program that you care about.
Do any of the other variables equal Do any of the other variables equal performance?performance? # of cycles to execute program?# of cycles to execute program? # of instructions in program?# of instructions in program? # of cycles per second?# of cycles per second? average # of cycles per instruction?average # of cycles per instruction? average # of instructions per second?average # of instructions per second?
Common pitfall:Common pitfall: Thinking that only one of the variables is indicative of Thinking that only one of the variables is indicative of
performance when it really is not!performance when it really is not! 21
CPI ExampleCPI Example Suppose we have two implementations of the Suppose we have two implementations of the
same instruction set architecture (ISA)same instruction set architecture (ISA) If two machines have the same ISA which quantity If two machines have the same ISA which quantity
(e.g., clock rate, CPI) is the same?(e.g., clock rate, CPI) is the same? For some program:For some program:
Computer A has a clock cycle time of 250 ps and a CPI of Computer A has a clock cycle time of 250 ps and a CPI of 2.02.0
Computer B has a clock cycle time of 500 ps and a CPI of Computer B has a clock cycle time of 500 ps and a CPI of 1.21.2
What machine is faster for this program, and by how What machine is faster for this program, and by how much?much?
A is faster by 1.2 X
22
CPI in More DetailCPI in More Detail If different instruction classes take different If different instruction classes take different
numbers of cyclesnumbers of cycles
Weighted average CPI:Weighted average CPI:
n
1iii )Count nInstructio(CPICycles Clock
n
1i
ii Count nInstructio
Count nInstructioCPI
Count nInstructio
Cycles ClockCPI
Relative frequency
23
Example: CompilerExample: Compiler’’s Impacts Impact Two different compilers are being tested for a 500 MHz Two different compilers are being tested for a 500 MHz
machine with three different classes of instructions: Class machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B uses 5 million Class A instructions, 1 million Class B instructions, and 2 million Class C instructions. The second instructions, and 2 million Class C instructions. The second compiler's code uses 7 million Class A instructions, 1 million compiler's code uses 7 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.Class B instructions, and 1 million Class C instructions.
Which program uses fewer instructions?Which program uses fewer instructions? InstructionsInstructions11 = (5+1+2) x 10 = (5+1+2) x 1066 = 8 x 10 = 8 x 1066
InstructionsInstructions22 = (7+1+1) x 10 = (7+1+1) x 1066 = 9 x 10 = 9 x 1066
Which sequence uses fewer clock cycles?Which sequence uses fewer clock cycles? CyclesCycles11 = (5(1)+1(2)+2(3)) x 10 = (5(1)+1(2)+2(3)) x 1066 = 13 x 10 = 13 x 1066
CyclesCycles22 = (7(1)+1(2)+1(3)) x 10 = (7(1)+1(2)+1(3)) x 1066 = 12 x 10 = 12 x 1066
CPI1 = ?13/8 = 1.625
CPI2 = ?12/9 = 1.33
24
CPI ExampleCPI Example Alternative compiled code versions using Alternative compiled code versions using
instructions in classes A, B, Cinstructions in classes A, B, C
Version 1: IC = 5Version 1: IC = 5 Clock CyclesClock Cycles
= 2×1 + 1×2 + 2×3= 2×1 + 1×2 + 2×3= 10= 10
Avg. CPI = 10/5 = 2.0Avg. CPI = 10/5 = 2.0
Class A B C
CPI for class 1 2 3
IC for version 1 2 1 2
IC for version 2 4 1 1
Version 2: IC = 6Version 2: IC = 6 Clock CyclesClock Cycles
= 4×1 + 1×2 + 1×3= 4×1 + 1×2 + 1×3= 9= 9
Avg. CPI = 9/6 = 1.5Avg. CPI = 9/6 = 1.525
Performance SummaryPerformance Summary
Performance depends onPerformance depends on Algorithm: affects IC, possibly CPIAlgorithm: affects IC, possibly CPI Programming language: affects IC, CPIProgramming language: affects IC, CPI Compiler: affects IC, CPICompiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Cycle TimeInstruction set architecture: affects IC, CPI, Cycle Time
The BIG Picture
26
BenchmarksBenchmarks Performance best determined by running a real Performance best determined by running a real
applicationapplication Use programs typical of expected workloadUse programs typical of expected workload Or, typical of expected class of applicationsOr, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.e.g., compilers/editors, scientific applications, graphics, etc.
Small benchmarksSmall benchmarks nice for architects and designersnice for architects and designers easy to standardizeeasy to standardize can be abusedcan be abused
SPEC (System Performance Evaluation Cooperative)SPEC (System Performance Evaluation Cooperative) companies have agreed on a set of real program and inputscompanies have agreed on a set of real program and inputs can still be abusedcan still be abused valuable indicator of performance (and compiler valuable indicator of performance (and compiler
technology)technology)
27
SPEC ’95SPEC ’95
Periodically updated with newer benchmarksPeriodically updated with newer benchmarks SPEC 2000, SPEC 2006SPEC 2000, SPEC 2006
Benchmark Description
go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Image compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database programtomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation
28
SPEC 2006SPEC 2006 Interesting to see how the mix of applications Interesting to see how the mix of applications
has changed over the years…has changed over the years… See http://www.spec.org/cpu2006/{CINT2006, See http://www.spec.org/cpu2006/{CINT2006,
CFP2006}CFP2006}
29
Other Popular BenchmarksOther Popular Benchmarks Several others Several others
popularpopular industry uses SPECindustry uses SPEC but ordinary but ordinary
consumers use othersconsumers use othersmore representative of more representative of
the work they do!the work they do!– e.g., gaming, e.g., gaming,
Photoshop/Aperture, Photoshop/Aperture, copying huge files, copying huge files, multimedia multimedia coding/decoding, etc.coding/decoding, etc.
Geekbench is quite Geekbench is quite popular!popular!
30
Amdahl’s LawAmdahl’s Law Possibly the most important law regarding Possibly the most important law regarding
computer performance:computer performance:
Principle: Principle: Make the common case fast!Make the common case fast! Eventually, performance gains will be limited by what Eventually, performance gains will be limited by what
cannot be improvedcannot be improvede.g., you can raise the speed limit, but there are still traffic e.g., you can raise the speed limit, but there are still traffic
lightslights
31
Amdahl’s Law: ExampleAmdahl’s Law: Example
Example: "Suppose a program runs in 100 Example: "Suppose a program runs in 100 seconds on a machine, where multiplies are seconds on a machine, where multiplies are executed 80% of the time. How much do we executed 80% of the time. How much do we need to improve the speed of multiplication if need to improve the speed of multiplication if we want the program to run 4 times faster?"we want the program to run 4 times faster?"
How about making it 5 times faster?How about making it 5 times faster?
25 = 80/r + 20 r = 16x
20 = 80/r + 20 r = ?
32
ExampleExample Suppose we enhance a machine making all floating-Suppose we enhance a machine making all floating-
point instructions run FIVE times faster. If the point instructions run FIVE times faster. If the execution time of some benchmark before the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will floating-point enhancement is 10 seconds, what will the speedup be if only half of the 10 seconds is spent the speedup be if only half of the 10 seconds is spent executing floating-point instructions?executing floating-point instructions?
We are looking for a benchmark to show off the new We are looking for a benchmark to show off the new floating-point unit described above, and want the floating-point unit described above, and want the overall benchmark to show at least a speedup of 3. overall benchmark to show at least a speedup of 3. What percentage of the execution time would floating-What percentage of the execution time would floating-point instructions have to account for in this program point instructions have to account for in this program in order to yield our desired speedup on this in order to yield our desired speedup on this benchmark?benchmark?
5/5 + 5 = 6 Relative Perf = 10/6 = 1.67 x
100/3 = f/5 + (100 – f) = 100 – 4f/5 f = 83.33 33
Power TrendsPower Trends
In CMOS IC technologyIn CMOS IC technology
×1000×30 5V → 1V
34
Reducing PowerReducing Power Suppose a new CPU hasSuppose a new CPU has
85% of capacitive load of old CPU85% of capacitive load of old CPU 15% voltage and 15% frequency reduction15% voltage and 15% frequency reduction
The power wallThe power wall We cannot reduce voltage furtherWe cannot reduce voltage further We cannot remove more heatWe cannot remove more heat
How else can we improve performance?How else can we improve performance?
35
Uniprocessor PerformanceUniprocessor Performance
Constrained by power, instruction-level parallelism, memory latency
36
MultiprocessorsMultiprocessors ““Multicore” microprocessorsMulticore” microprocessors
More than one processor core per chipMore than one processor core per chip
Requires explicitly parallel programmingRequires explicitly parallel programming Hardware executes multiple instructions at onceHardware executes multiple instructions at once
Ideally, hidden from the programmerIdeally, hidden from the programmer Hard to doHard to do
Programming for performanceProgramming for performanceLoad balancingLoad balancingOptimizing communication and synchronizationOptimizing communication and synchronizationBut, newer OSes and libraries have been designed for thisBut, newer OSes and libraries have been designed for this
37
Manufacturing ICsManufacturing ICs IC = Integrated Circuit (“chip”)IC = Integrated Circuit (“chip”)
Yield: proportion of working dies per waferYield: proportion of working dies per wafer38
Integrated Circuit CostIntegrated Circuit Cost
Nonlinear relation to area and defect rateNonlinear relation to area and defect rate Wafer cost and area are fixedWafer cost and area are fixed Defect rate determined by manufacturing processDefect rate determined by manufacturing process Die area determined by architecture and circuit Die area determined by architecture and circuit
designdesign
39
Fallacy: Low Power at IdleFallacy: Low Power at Idle AMD X4 power benchmark:AMD X4 power benchmark:
At 100% load: 295W (max power)At 100% load: 295W (max power) At 50% load: 246W (83% max power)At 50% load: 246W (83% max power) At 10% load: 180W (still consumes 61% of max At 10% load: 180W (still consumes 61% of max
power)power)
Google data centerGoogle data center Mostly operates at 10% - 50% loadMostly operates at 10% - 50% load At 100% load less than 1% of the timeAt 100% load less than 1% of the time
Industry challenge: Design processors to Industry challenge: Design processors to make power proportional to loadmake power proportional to load
40
RememberRemember Performance is specific to a particular programPerformance is specific to a particular program
Total execution time is a consistent summary of Total execution time is a consistent summary of performanceperformance
For a given architecture performance comes from:For a given architecture performance comes from: increases in clock rate (without adverse CPI affects)increases in clock rate (without adverse CPI affects) improvements in processor organization that lower CPIimprovements in processor organization that lower CPI compiler enhancements that lower CPI and/or instruction compiler enhancements that lower CPI and/or instruction
countcount
Pitfall: Expecting improvements in one aspect of Pitfall: Expecting improvements in one aspect of a machinea machine’’s performance to affect the total s performance to affect the total performanceperformance
You should not always believe everything you You should not always believe everything you read! read! Read carefully! Especially what the business guys say!Read carefully! Especially what the business guys say!
41
Concluding RemarksConcluding Remarks Cost/performance is improvingCost/performance is improving
Due to underlying technology developmentDue to underlying technology development
Hierarchical layers of abstractionHierarchical layers of abstraction In both hardware and softwareIn both hardware and software
Instruction set architectureInstruction set architecture The hardware/software interfaceThe hardware/software interface
Execution time: the best performance measureExecution time: the best performance measure Power is a limiting factorPower is a limiting factor
Use parallelism to improve performanceUse parallelism to improve performance
42