The Research Process
-
Upload
sammy17 -
Category
Technology
-
view
517 -
download
1
description
Transcript of The Research Process
1
TK2123: COMPUTER ORGANISATION & ARCHITECTURE
Prepared By: Associate Prof. Dr Masri Ayob
Lecture 5: Computer Performance
Prepared by: Dr Masri Ayob - TK2123
2
Contents
This lecture will discuss:This lecture will discuss:• Speeding up computer operation.Speeding up computer operation.• Improvements in Chip Organisation and Improvements in Chip Organisation and
Architecture.Architecture.• Multilevel MachinesMultilevel Machines
Prepared by: Dr Masri Ayob - TK2123
3
Speeding up computer operation
PipeliningPipelining
On board cacheOn board cache
On board L1 & L2 cacheOn board L1 & L2 cache
Branch predictionBranch prediction
Data flow analysisData flow analysis
Speculative executionSpeculative execution
Prepared by: Dr Masri Ayob - TK2123
4
Performance Balance
Processor speed increased.Processor speed increased.
Memory capacity increased.Memory capacity increased.
Memory speed lags behind processor speed.Memory speed lags behind processor speed.
Prepared by: Dr Masri Ayob - TK2123
5
Logic and Memory Performance Gap
Prepared by: Dr Masri Ayob - TK2123
6
Solutions
Increase number of bits retrieved at one timeIncrease number of bits retrieved at one time• Make DRAM “wider” rather than “deeper”Make DRAM “wider” rather than “deeper”
Change DRAM interfaceChange DRAM interface• CacheCache
Reduce frequency of memory accessReduce frequency of memory access• More complex cache and cache on chipMore complex cache and cache on chip
Increase interconnection bandwidthIncrease interconnection bandwidth• High speed busesHigh speed buses• Hierarchy of busesHierarchy of buses
Prepared by: Dr Masri Ayob - TK2123
7
I/O Devices
Peripherals with intensive I/O demandsPeripherals with intensive I/O demands
Large data throughput demandsLarge data throughput demands
Processors can handle thisProcessors can handle this
Problem moving data Problem moving data
Solutions:Solutions:• CachingCaching• BufferingBuffering• Higher-speed interconnection busesHigher-speed interconnection buses• More elaborate bus structuresMore elaborate bus structures• Multiple-processor configurationsMultiple-processor configurations
Prepared by: Dr Masri Ayob - TK2123
8
Typical I/O Device Data Rates
Prepared by: Dr Masri Ayob - TK2123
9
Key is Balance
Processor componentsProcessor components
Main memoryMain memory
I/O devicesI/O devices
Interconnection structuresInterconnection structures
Prepared by: Dr Masri Ayob - TK2123
10
Improvements in Chip Organization and Architecture
Increase hardware speed of processorIncrease hardware speed of processor• Fundamentally due to shrinking logic gate sizeFundamentally due to shrinking logic gate size• More gates, packed more tightly, increasing More gates, packed more tightly, increasing
clock rateclock rate• Propagation time for signals reducedPropagation time for signals reduced
Increase size and speed of cachesIncrease size and speed of caches• Dedicating part of processor chip Dedicating part of processor chip • Cache access times drop significantlyCache access times drop significantly
Change processor organization and architectureChange processor organization and architecture• Increase effective speed of executionIncrease effective speed of execution• ParallelismParallelism
Prepared by: Dr Masri Ayob - TK2123
11
Problems with Clock Speed and Logic Density
PowerPower• Power density increases with density of logic and Power density increases with density of logic and
clock speed.clock speed.• Dissipating heat.Dissipating heat.
RC delayRC delay• Speed at which electrons flow limited by Speed at which electrons flow limited by
resistance and capacitance of metal wires resistance and capacitance of metal wires connecting them.connecting them.
• Delay increases as RC product increases.Delay increases as RC product increases.• Wire interconnects thinner, increasing resistance.Wire interconnects thinner, increasing resistance.• Wires closer together, increasing capacitance.Wires closer together, increasing capacitance.
Prepared by: Dr Masri Ayob - TK2123
12
Problems with Clock Speed and Logic Density
Memory latencyMemory latency• Memory speeds lag processor speeds.Memory speeds lag processor speeds.
Solution:Solution:• More emphasis on organisational and More emphasis on organisational and
architectural approachesarchitectural approaches
Prepared by: Dr Masri Ayob - TK2123
13
Intel Microprocessor Performance
Prepared by: Dr Masri Ayob - TK2123
14
Increased Cache Capacity
Typically two or three levels of cache Typically two or three levels of cache between processor and main memory.between processor and main memory.
Chip density increasedChip density increased• More cache memory on chipMore cache memory on chip• Faster cache accessFaster cache access
Pentium chip devoted about 10% of chip area Pentium chip devoted about 10% of chip area to cache.to cache.
Pentium 4 devotes about 50%Pentium 4 devotes about 50%
Prepared by: Dr Masri Ayob - TK2123
15
More Complex Execution Logic
Enable parallel execution of instructionsEnable parallel execution of instructions
Pipeline works like assembly linePipeline works like assembly line• Different stages of execution of different Different stages of execution of different
instructions at same time along pipelineinstructions at same time along pipeline
Superscalar allows multiple pipelines within Superscalar allows multiple pipelines within single processorsingle processor• Instructions that do not depend on one another Instructions that do not depend on one another
can be executed in parallelcan be executed in parallel
Prepared by: Dr Masri Ayob - TK2123
16
Diminishing Returns
Internal organisation of processors complexInternal organisation of processors complex• Can get a great deal of parallelismCan get a great deal of parallelism• Further significant increases likely to be relatively Further significant increases likely to be relatively
modest.modest.
Benefits from cache are reaching limit.Benefits from cache are reaching limit.
Increasing clock rate runs into power dissipation Increasing clock rate runs into power dissipation problem. problem. • Some fundamental physical limits are being Some fundamental physical limits are being
reached.reached.
Prepared by: Dr Masri Ayob - TK2123
17
New Approach – Multiple Cores
Multiple processors on single chipMultiple processors on single chip• Large shared cacheLarge shared cache
Within a processor, increase in performance Within a processor, increase in performance proportional to square root of increase in complexityproportional to square root of increase in complexityIf software can use multiple processors, doubling If software can use multiple processors, doubling number of processors almost doubles performancenumber of processors almost doubles performanceSo, use two simpler processors on the chip rather than So, use two simpler processors on the chip rather than one more complex processorone more complex processorWith two processors, larger caches are justifiedWith two processors, larger caches are justified• Power consumption of memory logic less than Power consumption of memory logic less than
processing logicprocessing logicExample: IBM POWER4Example: IBM POWER4• Two cores based on PowerPCTwo cores based on PowerPC
Prepared by: Dr Masri Ayob - TK2123
18
POWER4 Chip Organization
Prepared by: Dr Masri Ayob - TK2123
19
Pentium Evolution (1)80808080• first general purpose microprocessorfirst general purpose microprocessor• 8 bit data path8 bit data path• Used in first personal computer – AltairUsed in first personal computer – Altair
80868086• much more powerfulmuch more powerful• 16 bit16 bit• instruction cache, prefetch few instructionsinstruction cache, prefetch few instructions• 8088 (8 bit external bus) used in first IBM PC8088 (8 bit external bus) used in first IBM PC
8028680286• 16 Mbyte memory addressable16 Mbyte memory addressable• up from 1Mbup from 1Mb
8038680386• 32 bit32 bit• Support for multitaskingSupport for multitasking
Prepared by: Dr Masri Ayob - TK2123
20
Pentium Evolution (2)
8048680486• sophisticated powerful cache and instruction pipeliningsophisticated powerful cache and instruction pipelining• built in maths co-processorbuilt in maths co-processor
PentiumPentium• SuperscalarSuperscalar• Multiple instructions executed in parallelMultiple instructions executed in parallel
Pentium ProPentium Pro• Increased superscalar organizationIncreased superscalar organization• Aggressive register renamingAggressive register renaming• branch predictionbranch prediction• data flow analysisdata flow analysis• speculative executionspeculative execution
Prepared by: Dr Masri Ayob - TK2123
21
Pentium Evolution (3)
Pentium IIPentium II• MMX technologyMMX technology• graphics, video & audio processinggraphics, video & audio processing
Pentium IIIPentium III• Additional floating point instructions for 3D graphicsAdditional floating point instructions for 3D graphics
Pentium 4Pentium 4• Note Arabic rather than Roman numeralsNote Arabic rather than Roman numerals• Further floating point and multimedia enhancementsFurther floating point and multimedia enhancements
ItaniumItanium• 64 bit64 bit
Itanium 2Itanium 2• Hardware enhancements to increase speedHardware enhancements to increase speed
Prepared by: Dr Masri Ayob - TK2123
22
Intel Computer Family (3)
Moore’s law for (Intel) CPU chips.
Prepared by: Dr Masri Ayob - TK2123
23
Intel Computer Family (1)
The Intel CPU family. Clock speeds are measured in MHz (megahertz) where 1 MHZ
is 1 million cycles/sec.
Prepared by: Dr Masri Ayob - TK2123
24
PowerPC
1975, 801 minicomputer project (IBM) RISC 1975, 801 minicomputer project (IBM) RISC
Berkeley RISC I processorBerkeley RISC I processor
1986, IBM commercial RISC workstation product, RT PC.1986, IBM commercial RISC workstation product, RT PC.• Not commercial successNot commercial success• Many rivals with comparable or better performanceMany rivals with comparable or better performance
1990, IBM RISC System/60001990, IBM RISC System/6000• RISC-like superscalar machineRISC-like superscalar machine• POWER architecturePOWER architecture
IBM alliance with Motorola (68000 microprocessors), and IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh)Apple, (used 68000 in Macintosh)
Result is PowerPC architectureResult is PowerPC architecture• Derived from the POWER architectureDerived from the POWER architecture• Superscalar RISCSuperscalar RISC• Apple MacintoshApple Macintosh• Embedded chip applicationsEmbedded chip applications
Prepared by: Dr Masri Ayob - TK2123
25
PowerPC Family (1)
601:601:• Quickly to market. 32-bit machineQuickly to market. 32-bit machine
603:603:• Low-end desktop and portable Low-end desktop and portable • 32-bit32-bit• Comparable performance with 601Comparable performance with 601• Lower cost and more efficient implementationLower cost and more efficient implementation
604:604:• Desktop and low-end serversDesktop and low-end servers• 32-bit machine32-bit machine• Much more advanced superscalar designMuch more advanced superscalar design• Greater performanceGreater performance
620:620:• High-end serversHigh-end servers• 64-bit architecture64-bit architecture
Prepared by: Dr Masri Ayob - TK2123
26
PowerPC Family (2)
740/750:740/750:• Also known as G3Also known as G3• Two levels of cache on chipTwo levels of cache on chip
G4:G4:• Increases parallelism and internal speedIncreases parallelism and internal speed
G5:G5:• Improvements in parallelism and internal speed Improvements in parallelism and internal speed • 64-bit organization64-bit organization
Prepared by: Dr Masri Ayob - TK2123
27
Internet Resources
http://www.intel.com/ http://www.intel.com/ • Search for the Intel MuseumSearch for the Intel Museum
http://www.ibm.comhttp://www.ibm.com
http://www.dec.comhttp://www.dec.com
Charles Babbage InstituteCharles Babbage Institute
PowerPCPowerPC
Intel Developer HomeIntel Developer Home
Prepared by: Dr Masri Ayob - TK2123
28
Languages, Levels, Virtual Machines
A multilevel machine
Prepared by: Dr Masri Ayob - TK2123
29
Contemporary Multilevel Machines
Prepared by: Dr Masri Ayob - TK2123
30
Evolution of Multilevel Machines
• Invention of microprogrammingInvention of microprogramming• Invention of operating systemInvention of operating system• Migration of functionality to microcodeMigration of functionality to microcode• Elimination of microprogrammingElimination of microprogramming
Prepared by: Dr Masri Ayob - TK212331
The Computer Spectrum
The current spectrum of computers available. The current spectrum of computers available.
Prepared by: Dr Masri Ayob - TK212332
Metric Units
The principal metric prefixes.The principal metric prefixes.
Prepared by: Dr Masri Ayob - TK2123
33
Thank youQ & A