Post on 31-Dec-2015
TDC 311
The Microarchitecture
Introduction
As mentioned earlier in the class, one Java statement generates multiple machine code statements
Then one machine code statement generates one or more micro-code statements
2
Introduction Continued
For example, in Java:
counter += 1;
Might generate the following machine code:
load reg1,counter
inc reg1
store reg1,counter
3
Reg BB 31
PC 1
MAR 2
MDR 3
Reg A 4
Reg B 5
Reg C 6
ALU
Control Store
MIR
ALU control Add 0Multiply 1Inc A 2Inc B 3
A Bus Decoder (assume 31registers, 0 means no register)
B Bus Decoder
C Bus (32 individualsignals)
Addr
A Bus B BusC Bus
Memory
machine code instr
Read, write signals
Dec A 4Dec B 5AND 6OR 7Pass A 8TwosC A 9
4
Clock SubcyclesSubcycle 1 – set up signals to drive data
pathSubcycle 2 – drive A and B busesSubcycle 3 – ALU operationSubcycle 4 – drive C bus
1 2 3 4
Cycle starts hereRegisters loaded from C Bus
Next microinstructionloaded from controlstore
Requires 2 complete clock cycles to perform a microinstruction. 5
Simple Example
Java statement: counter += 1;
What might the microinstructions look like?load reg1,counter
• (Assume the address of counter is currently in Register C)• Rd=1; Wr=0; A=00110 (Reg C); B=00000; C=00010 (MAR); ALU=1000 (pass A thru)• Rd=1; all else 0 (counter should now be sitting in MDR)• Rd=0; Wr=0; A=00011 (MDR); B=00000; C=00100 (Reg A/1); ALU=1000
inc reg1• Rd=0; Wr=0; A=00100 (Reg A/1); B=00000; C=00100 (Reg A); ALU=0010 (Inc A)
store reg1,counter• Rd=0; Wr=1; A=00100 (Register A); B=00000; C=00011 (MDR); ALU=8 (assume
address of counter is still in MAR)• Rd=0; Wr=1; all else 0
6
Design Issues
Speed vs. costreduce the number of clock cycles needed to
execute an instructionsimplify the organization so that the clock cycle
can be shorteroverlap the execution of instructions
Any way to improve upon the micro-architecture?
7
Design Issues
Create independent units that fetch and process the instructions? (double-up on other things? Everything?)
Pre-fetch one/two/three instructions?Perform pipelining?
8
Pipeline Example
9
Pipeline Problems
Pipe stall – when a subsequent instruction must wait before it can proceed
What causes stalls?waiting for memorywaiting for subsequent instructiondetermining the next instruction
What if you encounter a branch instruction?
Also takes time to fill the pipeline10
Design Issues
Perform branch prediction?Perform out-of-order execution
add two register contents and store in register increment counter by 1start a write operation
changed to:add two register contents and store in registerstart a write operation increment counter by 1
11
Design Issues
Perform speculative execution?Re-use registers that are no longer used?Have a large register set and keep all
current values in registers?Use cache memory?
12
Cache Memory
Main memory is usually referenced near one location (locality principle)
Program code should be in one location (if good programmer) and data often in another (but grouped together)
Bring most recently referenced values into a high speed cache
How does the CPU know something is in cache or not?
13
Direct-mapped Cache
Most common form of cache memoryLet’s consider a cache which has 2048
entries, each entry holding 32 bytes (not bits) of data
2048 entries times 32 bytes per entry equals 64 KB
14
V bit Tag (16 bits) Data (32 bytes)
2047
2046
2045
: :
2
1
0
Addresses that usethis entry:
65504-65535, 131040-131071,…
64-95, 65600-65631,…
32-63, 65568-65599,…
0-31, 65536-65567,131072-131103,…
15
Cache Address
When a program generates a 32-bit address, it has the following form:
Tag – 16 bits Line – 11 bits Word – 3 bits Byte – 2 bits
16
Cache Hit
To see if a data item is in the cache, use the 11-bit LINE portion (of the address) to point to one of the 2048 cache row entries
Then the 16-bit TAG of the address is compared to the 16-bit TAG value in the cache entry
If there is a match, the data is there
17
Cache Hit
If the data is there, use the 3-bit WORD portion of the address to tell you which word from the 8 words (32 bytes) in the cache line should be fetched
If necessary, the 2-bit BYTE address will tell you which one of the four bytes to fetch
18
Cache Memory
Note that since this cache only holds 64KB, it holds data for addresses 0 – 65535.
But it may also hold data for the addresses 65536 – 131071.
That is why you must compare the TAG fields to see if there is a match
19
Cache Miss
If no match (of TAG fields), then there is a cache miss
The CPU goes to main memory and fetches the next block of data and stores it in the cache (thus wiping out the old block in the cache)
20
Cache Example
Consider that the CPU wants to fetch data from location 3610 (or 00000024 in hex)
Tag = 0000 0000 0000 0000Line = 0000 0000 001Word = 001Byte = 00
21