Post on 04-Feb-2016
description
technische universiteit eindhoven
‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’
Jorge Luis Borges (Argentine writer 1899-1986)
Department of Electrical EngineeringElectronic Systems
Modeling of Architectures
Platform-based Design5KK70
Henk CorporaalBart Mesman
Hamed Fatemi2010
2
Platform-based Design 5KK70 Electronic Systems
Outline
• We will look at models for Area, Delay and Energy
• Processor structure
• Register files - Register cell
• Model (area, power, delay)
• details for several register file configurations
• Apply this to the Imagine architecture
• Stream register file
• Network
3
Platform-based Design 5KK70 Electronic Systems
Processor
• Single processor• Instruction Memory (IM)
• Controller
• Processing Element (PE)• Register File (RF)• ALU• Data Memory (DM)
• SIMD• Multiple PEs
• VLIW• Multiple ALUs
•Multi-Processor• Several processors
• Connected by a bus or network
IM
Controller
RF ALU DM
Network
PE
4
Platform-based Design 5KK70 Electronic Systems
Register File (RF) Area model
• Assume:• p = number of ports
• For large RF row decoder small compared to cell area
• 1-Bit area = w*h (tracks)
Schematic of 1 register cell
)ph)(pw()p(Acell
If p is large 2p)p(A
1-bit
5
Platform-based Design 5KK70 Electronic Systems
Register file (RF) Delay model
2
1
2
1
2
1
))(())(( pRbRpwbRpwd
Delay (d):
• Wire Propagation delay
• Fan-in/out delay
• Wire propagation dominates the delay with a large number of ports
• R = number of registers
Register file - assuming square layout- R registers of b bits
Note: for N FUs (ALUs), p ~ 3N, R ~ N → d ~ N3/2
6
Platform-based Design 5KK70 Electronic Systems
Register file (RF) Power model
Register file
• Power (P):• Proportional to the capacitance that
must be switched for each access
• In each access every bit-line and one word-line bit-line capacitance
• Each port drives (bR)1/2 bit lines
• Each bit line has length (h+p) (bR)1/2
wport CphbRP )(1
If p is large: power is dominated by wire capacitance
2_ RpP portsp
Note: for N FUs (ALUs), p ~ 3N, R ~ N → P ~ N3
7
Platform-based Design 5KK70 Electronic Systems
Register File organization
• Processor with one level register
Central (shared register file)
DRF (distributed register file):
ALU 1 ALU N
ALU 1 ALU N
8
Platform-based Design 5KK70 Electronic Systems
Comparing Area model of Central and Distributed RF
2NA
3NA
Central (shared) RF:
•2 read ports, one write port per ALU
•R= rN: number of registers of b bits
•r: number of register per ALU
•N: number of ALUs
DRF:
•Only 2 ports: one read, one write
•This would give A(1 RF) ~ N
•Area of switch has same area cost complexity
)]3)(3[( wNhNrNbA
Square layout & organization
of the DRF, including 2N*N crossbar
9
Platform-based Design 5KK70 Electronic Systems
Delay and Power models of central versus distributed RF
Assume N ALUs
• Central RF:• #registers R=rN
• #ports p =3N
• Large N
• DRF:• Constant #registers per ALU
• #ports p=2 (also constant!)
• DRF has a fixed delay and power (per RF)
• Wire propagation determines delay and power (for large N)
• For large N
3
2
3
NP
Nd
2NP
Nd
10
Platform-based Design 5KK70 Electronic Systems
Register File
Register (memory) storage and
communication between ALUs are
critical parts for area, energy and
performance in media processor.
Hierarchical register storage
11
Platform-based Design 5KK70 Electronic Systems
2-levels register files (Hierarchical)
Central:
• RF1 serves the ALUs, while RF2 is used to cover the memory latency• Overall tendency for Area is the same as having one level RF
ALU 1 ALU N
RF2 (level 2)
RF1 (level 1)
DRF:
ALU 1 ALU N
RF2 (level 2)
RF1 (level 1)
12
Platform-based Design 5KK70 Electronic Systems
Register Files
• Processor with stream register files:• Replace each port into the memory staging RF with a stream buffer
• All stream buffers share a single port into the memory staging RF, allowing that single physical port to act as many logical ports.
Central:
ALU 1 ALU N
13
Platform-based Design 5KK70 Electronic Systems
Register Files
DRF:
• The payoff the transformation into a stream architecture is that we can achieve an area proportional to N^2, since R2 (memory storage) only needs 1 port. We also have to add in the area of the stream buffers, which grows as N^2 with a very small constant.
ALU 1 ALU N
14
Platform-based Design 5KK70 Electronic Systems
Results
are
a p
er
ALU
(Norm
aliz
ed t
o 1
A
LU)
15
Platform-based Design 5KK70 Electronic Systems
Results
Loca
l dela
y
16
Platform-based Design 5KK70 Electronic Systems
Results
Pow
er
overh
ead
17
Platform-based Design 5KK70 Electronic Systems
Imagine Architecture
Die Photo of Imagine Cell placement of Imagine
18
Platform-based Design 5KK70 Electronic Systems
Imagine Floorplan
• 22 million transistors
• 500 MHz
• Area, Energy, Delay models
• Clusters, Micro-controller, SRF, Network Interface
Micro-Controller
ALU Cluster 0
ALU Cluster 1
ALU Cluster 2
ALU Cluster 3
ALU Cluster 4
ALU Cluster 5
ALU Cluster 6
ALU Cluster 7
SR
F
Mem
ory S
ystem
StreamController
NetworkInterface
7.8m
m
7.6mm
19
Platform-based Design 5KK70 Electronic Systems
Stream register File
20
Platform-based Design 5KK70 Electronic Systems
Network:
• Area of network grows with (like DRF switch) :
clustes of numberN
NA
c
2c
commclustermicroSRFtotal
commclustermicroSRFtotal
EEAECAE
ACAACAA
clustes of numberC
CA 2
More details in khailany paper [2003]
21
Platform-based Design 5KK70 Electronic Systems
Exploration
Intra-cluster scaling
22
Platform-based Design 5KK70 Electronic Systems
Exploration
Inter-cluster scaling
23
Platform-based Design 5KK70 Electronic Systems
end
• More details:• Scott Rixner, William J. Dally, Brucek Khailany, Peter Mattson,
Ujval J.Kapasi, and John D. Owens. Register Organization for Media Processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA), pages 375–386, Toulouse, France, January 2000. IEEE Computer Society.
• Brucek Khailany, William Dally, Scott Rixner, Ujval Kapasi, John Owens, and Brian Towles. Exploring the vlsi scalability of stream processors. In Proceedings of the Ninth Symposium on High Performance Computer Architecture (HPCA), pages 153–164, Anaheim, California, USA, February 2003. IEEE Computer Society.