Computer Architecture - OS3 · What is computer architecture? (a) How to build a processor common...
Transcript of Computer Architecture - OS3 · What is computer architecture? (a) How to build a processor common...
Computer Architecture
Sebastian Altmeyer([email protected])
What is computer architecture?
What is computer architecture?
(a) How to build a processorcommon understanding by computer architects
(b) How to build a computercommon understanding by many others …
(c) How a computer system worksIncluding all system layers
Aim of this lecture
● How a processor works● How instructions are processed● What the following items mean:
➢ Register and Gates➢ ALU➢ Program Counter➢ Instructions, Machine Code, Assembler➢ Pipelining➢ Caches
Aim of the next lecture
● How the peripherals of a processor work● What the following items mean:
➢ Interrupts?➢ Memory Management?➢ Virtual Memory?➢ Runtime systems?
(Actual content not yet fixed ...)
Unfortunately ...
that is way too much
Hence:– rough overview and basic concepts
– highly simplified explanation
– focus on the necessary abstractions
– and at some points you simply have to believe me ;)
– Feedback!
Intel Core i7
Quadcore with 731,000,000 transistors
… and it's natural habitat
Highly complex, but simple components
● registers (to store information)● gates (to process information● buses (to connect registers and gates)● a clock
A processor can only flip 0 to 1 and vice versa
… but that is all we need
Gottfried LeibnizBinary Numbers
George Boole Boolean Algebra
John von NeumannFirst Processor
Claude ShannonExpressivness of Relays
Leibniz: Binary Numbers Or, how to represent any natural number just
using bits
n denotes register/bus width, i.e.32-bit architecture means n = 32
Example (8 bits): 00101010 = 42
Range: [0;2n-1-1]
bn−1 bn−2 bn−3…b2 b1 b0≡∑i=0
n−1
bi2i
Two's Complement how to represent a negative number
Example (8 bits):11010110 = -42
00101010 = 42
Range: [-2n-2;2n-2-1]
bn−1 bn−2 bn−3…b2 b1 b0≡−bn−1 2n−1+∑
i=0
n−2
b i2i
Floating Point(IEEE 754 single-precision 32-bit)
b31b30b29...b23b22b21...b0
Sign Exponent Fraction
−1Sign(Fraction)2 2(Exponent )2−127
Boole: Boolean Algebra
z , v ,w∈{0,1 }
z=¬v
z=v∧w
z=v∨w
Variables:
Operators:
Claude Shannon
Has shown in his Master's Thesis: „A Symbolic Analysis of
Relay and Switching Circuits“how to implement boolean logic using electronic
circuits
Boole and Shannon: Gates
z=¬v
z=v∧w
z=v∨w
Half Adder/Full Adder
xor or and
Adder
How to substract?
How to substract?
a−b=a+b+1
Add/Substract
a−b=a+b+1
Arithmetic Logic Unit (ALU)
ALU and Register
How to control ALU/Registers?Machine Code
s2s
1s
0rs
3rs
2rs
1rs
0rt
3rt
2rt
1rt
0rd
3rd
2rd
1rd
0
OP regS regT regD
regD = regS [OP] regT
PC + Instruction Memory
s2s
1s
0rs
3rs
2rs
1rs
0rt
3rt
2rt
1rt
0rd
3rd
2rd
1rd
0
OP regS regT regD
regD = regS [OP] regT
Calculator Instructions
Calculator
Clock
Increase PC write result in register
one cycle
Here comes the example (SIM-PL)
Calculator with Immediates
Instructions
Machine Code
Instructions with 2 registerst0s
2s
1s
0rs
3rs
2rs
1rs
0rt
3rt
2rt
1rt
0rd
3rd
2rd
1rd
0
Type OP regS regT regD
regD = regS [OP] regT if Type = 1
Type with immediatest0s
2s
1s
0rs
3rs
2rs
1rs
0xxxxrd
3rd
2rd
1rd
0i15
i14
...i0
Type OP regS regD regT
regD = regS [OP] IMM if Type = 0
Loop Instructions
Calculator with Loops
condition
Registers are very limited ...
Little Endian/Big Endian
describes order in which bytes are written to memory ...
Harvard Machine
Load Store machine
Complete Instruction Set
Turing Complete
We can now compute whatever we want …
Further improvements target - increasing usability- speed
Usability
von Neumann Architecture
Just like Harvard Architecture, with one little difference
Instruction Memory and Data Memory are the same
(predates Harvard Architecture)
Procedure Call
PC PC + Offset
$ra PC + 1; Store PC+1 in register
Harvard Machine: ReturnLoad PC from register $ra
more details (recursive functions, stack, heap) in next lecture
Interrupts
ways to interrupt current execution for other stuff
Examples: ● pressing a key on the keyboard● network data available● shutting down processes● moving mouse
more details in the next lecture
Speed
Pipelining
currently:● multiple cycles for one instruction● large parts of the pipeline are idle
with pipelining● one cycle for one instruction ● nearly all parts are nearly always busy
(similar to conveyor belt)
Pipelining
Pipelining
Pipelining
Pipelining
Pipelining
Pipelining
Harvard Pipelining
Pipeline Hazards/Multicycle Instructions
One instruction per cycle is the best case … but that's not always possible:
● Memory accesses● Branches● Dependencies
ADD $r1, $r2, $r3ADD $r4, $r1, $r2
Forwarding
Data dependency
Data dependency
Harvard Machine with Forwarding
Branch Prediction
Clock rate scaling
easiest way to improve performance … but mostly for processor speed
Load $r2, _aLoad $r1, _b
Add $r3, $r2, $r1
MPC 5xx
Load $r2, _aLoad $r1, _b
Add $r3, $r2, $r1
MPC 755
Memory Hierarchy
emulates a fast and large memory
● on top: small and fast ● on bottom: large and
slow
each level contains a subset of the data below
rough idea:
books on your desk, books in your shelf, books in the library
Principle of Locality
● Spatial Locality neighboring memory blocks are likely to be accessed contemporary
● Temporal Locality recently accessed memory blocks are likely to be accessed in the near future again
Harvard Architecture with cache
direct-mapped cache
Internal cache organization
2 way set-associative cache
Concept can be extended to fully-associative caches
Different types of cache misses
● Compulsory (cold) misses: caches are initially empty, first access is always a miss.
● Capacity misses due to the limited cache capacity (i.e. cache is full)
● Conflict misses due to an unbalanced cache usage (eviction in one cache set, while other lines are still empty)
See you in the lab session!
Exercise 1 (Boolean Algebra)
a) What is the minimal subset of the set of basic operations {and, or, not} sufficient to derive all logical operations? Justify your answer.
b) What is the minimal subset if we add the nand operation (i.e. not and) to the set {and, or, not}? Justify your answer.
Exercise 2 (Digital Circuit)
a) Draw a digital circuit of an 4-bit incrementer, i.e., a circuit that satisfies the equation
b = a + 1 mod 24
You can use the the operations from the set {and,or,nand,nor, not}.
b) Try to find an incrementer with the minimal number of operations needed.
c) Try to minimize the depth of the circuit, i.e. the maximal length of any path from input to output.
Exercise 3 (Assembler)
Write an assembler program (for the harvard machine) that converts memory data from little endian to big endian. Assume that the address of the memory data that should be converted is stored in register 1.
Test your assembly code using the SIM-PL.
Exercise 4 (Loops)
Extend you assembly code from Exercise 4 so that a complete block of data is converted.
Assume that the initial memory address is stored in register 1 and the number of blocks that should be converted in register 2. Your code shall convert each memory block from
[r1;r1 + r2]