Csa 01
Transcript of Csa 01
-
7/31/2019 Csa 01
1/72
Computer Architecture and Organization
Chapter 1
General Introduction
-
7/31/2019 Csa 01
2/72
Architecture & Organization 1
Architecture is those attributes visible to theprogrammer
Instruction set, number of bits used for data
representation, I/O mechanisms, addressingtechniques.
e.g. Is there a multiply instruction?
Organization is how features are implemented
Control signals, interfaces, memory technology. e.g. Is there a hardware multiply unit or is it done by
repeated addition?
2
-
7/31/2019 Csa 01
3/72
Architecture & Organization 2
All Intel x86 family share the same basic
architecture
The IBM System/370 family share the same
basic architecture
This gives code compatibility
Organization differs between differentversions
3
-
7/31/2019 Csa 01
4/72
Structure & Function
Structure is the way in which components
related to each other
Function is the operation of individual
components as part of the structure
4
-
7/31/2019 Csa 01
5/72
Function
All computer functions are:
Data processing
Data storage (on the fly also )
Data movement (b/w itself and outside)
Input output (peripherals)
Data communications
Control
5
-
7/31/2019 Csa 01
6/72
Functional View
6
-
7/31/2019 Csa 01
7/72
Operations (a) Data movement
7
-
7/31/2019 Csa 01
8/72
Operations (b) Storage
8
-
7/31/2019 Csa 01
9/72
Operation (c) Processing from/to storage
9
-
7/31/2019 Csa 01
10/72
-
7/31/2019 Csa 01
11/72
Structure - Top Level
Computer
MainMemory
InputOutput
SystemsInterconnection
Peripherals
Communicationlines
CentralProcessingUnit
Computer
11
-
7/31/2019 Csa 01
12/72
Structure - The CPU
Computer ArithmeticandLogic Unit
ControlUnit
Internal CPUInterconnection
Registers
CPU
I/O
Memory
SystemBus
CPU
12
-
7/31/2019 Csa 01
13/72
Structure - The Control Unit
CPU
ControlMemory
Control UnitRegisters and
Decoders
SequencingLogic
ControlUnit
ALU
Registers
InternalBus
Control Unit
13
-
7/31/2019 Csa 01
14/72
Computer Evolution
-
7/31/2019 Csa 01
15/72
ENIAC - background
Electronic Numerical Integrator And Computer
John presper Eckert and John Mauchly
University of Pennsylvania Trajectory tables for weapons
Started 1943
Finished 1946 Used until 1955
15
-
7/31/2019 Csa 01
16/72
ENIAC - details
Decimal (not binary)
20 accumulators of 10 digit decimal number
Programmed manually by switches and cables
18,000 vacuum tubes
30 tons
15,000 square feet
140 kW power consumption
5,000 additions per second
16
-
7/31/2019 Csa 01
17/72
ENIAC
17
-
7/31/2019 Csa 01
18/72
ENIAC
18
-
7/31/2019 Csa 01
19/72
von Neumann/Turing
Stored Program concept
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions frommemory and executing
Input and output equipment operated by controlunit
Princeton Institute for Advanced Studies IAS
Completed 1952
19
-
7/31/2019 Csa 01
20/72
Structure of von Neumann machine
20
-
7/31/2019 Csa 01
21/72
IAS - details
1000 x 40 bit words Binary number 2 x 20 bit instructions
Set of registers (storage in CPU) Memory Buffer Register
Memory Address Register
Instruction Register
Instruction Buffer Register
Program Counter
Accumulator
Multiplier Quotient
21
-
7/31/2019 Csa 01
22/72
Structure of IASdetail
22
-
7/31/2019 Csa 01
23/72
Instruction set of IAS
23
-
7/31/2019 Csa 01
24/72
Commercial Computers
1947 - Eckert-Mauchly Computer Corporation
UNIVAC I (Universal Automatic Computer)
US Bureau of Census 1950 calculations Late 1950s - UNIVAC II
Faster
More memory
24
-
7/31/2019 Csa 01
25/72
IBM
Punched-card processing equipment
1953 - the 701
IBMs first stored program computer
Scientific calculations
1955 - the 702
More hardware feautres
Business applications
Lead to 700/7000 series
Made them as a dominant computer manufacturer25
-
7/31/2019 Csa 01
26/72
Second generation - Transistors
Replaced vacuum tubes
Smaller
Cheaper
Less heat dissipation
Solid State device
Made from Silicon (Sand)
Invented 1947 at Bell Labs
William Shockley
26
-
7/31/2019 Csa 01
27/72
Transistor Based Computers
Second generation machines
NCR & RCA produced small transistor
machines
IBM 7000
DEC (Digital Equipment Corporation) 1957
Produced PDP-1 (Programmed Data Processor)
27
-
7/31/2019 Csa 01
28/72
Third Generation : IC
Discrete components.
Difficult to attach in circuit board
10,000 transistors and more
1958 era of microelectronics - IC
28
-
7/31/2019 Csa 01
29/72
Microelectronics
Literally - small electronics
A computer is made up of gates, memory cells
and interconnections
These can be manufactured on a
semiconductor
e.g. silicon wafer
29
-
7/31/2019 Csa 01
30/72
Data storage:
Memory cells
Processing
gates
Movement
Paths between components
Control
Paths carry control signals also
30
-
7/31/2019 Csa 01
31/72
Generations of Computer
Vacuum tube - 1946-1957 Transistor - 1958-1964 Small scale integration - 1965 on
Up to 100 devices on a chip
Medium scale integration - to 1971 100-3,000 devices on a chip
Large scale integration - 1971-1977 3,000 - 100,000 devices on a chip
Very large scale integration - 1978 -1991 100,000 - 100,000,000 devices on a chip
Ultra large scale integration 1991 - Over 100,000,000 devices on a chip
31
-
7/31/2019 Csa 01
32/72
Moores Law
Increased density of components on chip
Gordon Moore co-founder of Intel
Number of transistors on a chip will double every year
Since 1970s development has slowed a little
Number of transistors doubles every 18 months
Cost of a chip has remained almost unchanged
Higher packing density means shorter electrical paths, giving higherperformance
Smaller size gives increased flexibility
Reduced power and cooling requirements
Fewer interconnections increases reliability when compared tosolder connections
32
-
7/31/2019 Csa 01
33/72
Growth in CPU Transistor Count
33
-
7/31/2019 Csa 01
34/72
IBM 360 series
1964
Replaced (& not compatible with) 7000 series
First planned family of computers
Similar or identical instruction sets
Similar or identical O/S
Increasing speed
Increasing number of I/O ports (i.e. more terminals) Increased memory size
Increased cost
34
-
7/31/2019 Csa 01
35/72
35
-
7/31/2019 Csa 01
36/72
DEC PDP-8
1964 First minicomputer
Did not need air conditioned room
Small enough to sit on a lab bench
$16,000 $100k+ for IBM 360
Embedded applications & OEM(original equipment
manufacturers) Another manufactures purchase it and integrate
into a new system
36
-
7/31/2019 Csa 01
37/72
DEC - PDP-8 Bus Structure
Not Central switched architecture like IBM
96 separate signal paths
Carry control, address and data signals
Highly flexible
37
-
7/31/2019 Csa 01
38/72
Later Generations Ics used for construction of processor
Construct memories also Tiny rings of ferromagnetic materials
16 of an inch in diameter
Strung up on grids of fine wires
Magnetized one way represents one and Magnetized other way
represents Zero Its was faster but expensive and destructive
1970
Fairchild
Size of a single core
i.e. 1 bit of magnetic core storage Holds 256 bits
Non-destructive read
Much faster than core
Capacity approximately doubles each year
38
-
7/31/2019 Csa 01
39/72
39
-
7/31/2019 Csa 01
40/72
Designing for Performance
Microprocessor speed
Capacity of RAM
Reducing the distance between the circuits
But the raw speed is not increased
Constant stream of instruction Branch prediction ( predicts which branch or group of
instructions to be processed next and prefetch it)
Data flow analysis( Which instruction depends on others
results and schedule it properly) Speculative execution (executes ahead of their actual
appearances)
40
-
7/31/2019 Csa 01
41/72
Performance Balance
Processor speed increased Memory capacity increased
Memory speed lags behind processor speed
Interface b/w processor and memory is crucial path Its carrying constant flow of program instructions
and data
If memory or pathway fails to match speed thenprocessor stalls in wait state and processing time islost
41
-
7/31/2019 Csa 01
42/72
Logic and Memory Performance Gap
42
-
7/31/2019 Csa 01
43/72
Solutions
Increase number of bits retrieved at one time Make DRAM wider rather than deeper
Change DRAM interface
Cache
Reduce frequency of memory access
More complex cache and cache on chip
On chip and off chip near to processor
Increase interconnection bandwidth
High speed buses
43
-
7/31/2019 Csa 01
44/72
I/O Devices
Peripherals with intensive I/O demands
Large data throughput demands Processors can handle this
Problem moving data
Solutions: Caching
Buffering
Higher-speed interconnection buses
More elaborate bus structures Multiple-processor configurations
44
/
-
7/31/2019 Csa 01
45/72
Typical I/O Device Data Rates
45
-
7/31/2019 Csa 01
46/72
Key is Balance
Balance in Processor components, Main memory,
I/O devices, Interconnection structures
46
Improvements in Chip Organization
-
7/31/2019 Csa 01
47/72
Improvements in Chip Organizationand Architecture
Increase hardware speed of processor Fundamentally due to shrinking logic gate size
More gates, packed more tightly, increasing clock rate
Propagation time for signals reduced
Increase size and speed of caches Dedicating part of processor chip
Cache access times drop significantly
Change processor organization and architecture
Increase effective speed of execution
Parallelism
47
Problems with Clock Speed and Logic
-
7/31/2019 Csa 01
48/72
Problems with Clock Speed and Logic
Density Power
Power density increases with density of logic and clock speed
Dissipating heat
RC delay Speed at which electrons flow limited by resistance and capacitance of metal
wires connecting them Delay increases as RC product increases
As size of the components in chip decreases Wire interconnects becomethinner, increasing resistance and Wires closer together, increasing capacitance
Memory latency
Memory speeds lag processor speeds
Solution:
More emphasis on organizational and architectural approaches
48
l i f
-
7/31/2019 Csa 01
49/72
Intel Microprocessor Performance
49
-
7/31/2019 Csa 01
50/72
Increased Cache Capacity
Typically two or three levels of cache between
processor and main memory
Chip density increased
More cache memory on chip
Faster cache access
Pentium chip devoted about 10% of chip area
to cache
Pentium 4 devotes about 50%
50
-
7/31/2019 Csa 01
51/72
More Complex Execution Logic
Enable parallel execution of instructions
Pipeline works like assembly line
Different stages of execution of different
instructions at same time along pipeline
Superscalar allows multiple pipelines within
single processor
Instructions that do not depend on one anothercan be executed in parallel
51
-
7/31/2019 Csa 01
52/72
Diminishing Returns
Internal organization of processors complex Can get a great deal of parallelism
Further significant increases likely to be relatively
modest
Benefits from cache are reaching limit
Increasing clock rate runs into power dissipation
problem
Some fundamental physical limits are being reached
52
-
7/31/2019 Csa 01
53/72
New Approach Multiple Cores
Multiple processors on single chip
Large shared cache
Within a processor, increase in performance proportional to
square root of increase in complexity
If software can use multiple processors, doubling number of
processors almost doubles performance
So, use two simpler processors on the chip rather than one
more complex processor
With two processors, larger caches are justified
Power consumption of memory logic less than processing logic
53
86 E l ti (1)
-
7/31/2019 Csa 01
54/72
x86 Evolution (1) 8080
first general purpose microprocessor
8 bit data path Used in first personal computer Altair
8086 5MHz 29,000 transistors
much more powerful
16 bit
instruction cache, pre fetch few instructions 8088 (8 bit external bus) used in first IBM PC
80286
16 Mbyte memory addressable
up from 1Mb
80386
32 bit
Support for multitasking
80486
sophisticated powerful cache and instruction pipelining
built in maths co-processor (Main CPU dont do any maths calculations)
54
86 E l ti (2)
-
7/31/2019 Csa 01
55/72
x86 Evolution (2) Pentium
Superscalar
Multiple instructions executed in parallel
Pentium Pro
Increased superscalar organization
branch prediction
data flow analysis
speculative execution
Pentium II
MMX technology (Multimedia extensions)
graphics, video & audio processing
Pentium III Additional floating point instructions for 3D graphics
55
86 E l ti (3)
-
7/31/2019 Csa 01
56/72
x86 Evolution (3) Pentium 4
Further floating point and multimedia enhancements Core
First x86 with dual core (two processors in a chip)
Core 2
64 bit architecture (4 processors on a single chip)
Core 2 Quad 3GHz 820 million transistors Four processors on chip
x86 architecture dominant outside embedded systems
Instruction set architecture evolved with backwards compatibility
~1 instruction per month added
500 instructions available
56
-
7/31/2019 Csa 01
57/72
ARM Evolution
Designed by ARM Inc., Cambridge, England Licensed to manufacturers
High speed, small die, low power consumption
PDAs, hand held games, phones E.g. iPod, iPhone
Widely used embedded processor
Acorn produced ARM1 & ARM2 in 1985 andARM3 in 1989
Acorn, VLSI and Apple Computer founded ARMLtd.
57
-
7/31/2019 Csa 01
58/72
ARM Systems Categories
Embedded real time
Storage systems , industrial and networking
applications
Application platform Linux, Palm OS, Symbian OS, Windows mobile
Secure applications
Smart cards and payment terminals
58
-
7/31/2019 Csa 01
59/72
T02-Vertical.pdf
59
http://t02-vertical.pdf/http://t02-vertical.pdf/http://t02-vertical.pdf/http://t02-vertical.pdf/ -
7/31/2019 Csa 01
60/72
Top Level View ofComputer Function and
Interconnection
-
7/31/2019 Csa 01
61/72
Consists of CPU, Memory, I/O components,
and Interconnection
At a top level, computer can described as
The external behavior of each component the
data and control signals that it exchanges withother components
The interconnection structure and the controls
required to manage the use of the interconnectionstructure
61
-
7/31/2019 Csa 01
62/72
Program Concept Hardwired systems are inflexible
General purpose hardware can do different
tasks, given correct control signals
Instead of re-wiring, supply a new set of controlsignals
62
-
7/31/2019 Csa 01
63/72
63
-
7/31/2019 Csa 01
64/72
What is a program?
A sequence of steps
For each step, an arithmetic or logical
operation is done
For each operation, a different set of control
signals is needed
64
-
7/31/2019 Csa 01
65/72
Function of Control Unit
For each operation a unique code is provided
e.g. ADD, MOVE
A hardware segment accepts the code and
issues the control signals
65
-
7/31/2019 Csa 01
66/72
Components
The Control Unit and the Arithmetic and Logic
Unit constitute the Central Processing Unit
Data and instructions need to get into the
system and results out
Input/output
Temporary storage of code and results is
needed Main memory
66
Computer Components: Top Level View
-
7/31/2019 Csa 01
67/72
Computer Components: Top Level View
67
Computer function
-
7/31/2019 Csa 01
68/72
Computer function
Basic function is execution of instruction
Two steps: Fetch
Execute
Repeating process
Instruction Cycle
68
-
7/31/2019 Csa 01
69/72
69
Fetch Cycle
-
7/31/2019 Csa 01
70/72
Fetch Cycle
Program Counter (PC) holds address of next
instruction to fetch
Processor fetches instruction from memory
location pointed to by PC
Increment PC
Unless told otherwise
Instruction loaded into Instruction Register (IR)
Processor interprets instruction and performs
required actions
70
Execute Cycle
-
7/31/2019 Csa 01
71/72
Execute Cycle Processor-memory
data transfer between CPU and main memory
Processor I/O
Data transfer between CPU and I/O module
Data processing
Some arithmetic or logical operation on data
Control
Alteration of sequence of operations
e.g. jump
Combination of above71
Example of Program Execution
-
7/31/2019 Csa 01
72/72
Example of Program Execution