Metrics for Reconfigurable Architectures Characterization: Remanence and Scalability
-
Upload
lankston-joey -
Category
Documents
-
view
27 -
download
2
description
Transcript of Metrics for Reconfigurable Architectures Characterization: Remanence and Scalability
Metrics for Reconfigurable Architectures Characterization:
Remanence and Scalability
Pascal BENOITG. Sassatelli – L. Torres – D. Demigny M. Robert – G.
Outline
Context Remanence Operative Density Case Study: the Systolic Ring Conclusion and perspectives
Context SoC and Customizable Platform Based-
Design
SpecificationsProcessing powerAreaPower consumptionetc.
ReconfigurableHardware
(Coarse Grain)ASIC 1
DSP Reconfigurabl
eHardware
(Fine Grain)
We need metrics to compare !
ASIC 2
ControllerCPU
RAMROM
Flash
?
ControllerCPU
RAMROM
Flash
?
Context Architecture characterization
• Processing power• Power consumption• Flexibility• Parallelism potential• Dynamism• Silicon area• Scalability• …
Metrics• Dehon criterion• Remanence• Operative density
Generalisation toArchitectural model
characterisation and metrics depend on architectural
parameters
« Comparing architectures with a minimum of criteria »
Remanence Definition
NPE: # of processing elements (PE) Nc: # of PE configurable per cycle
Fe: operating frequency Fc configuration frequency
Characterizes the Dynamism # of cycles to (re)configure the whole architecture
Amount of data to compute between 2 configurations
FcNcFeNR PE
..
Interconnection
PE PE PE PE PE
instn
…
Configuration Memory
Processing Elements
Routing
Sequencing Unit
…inst3inst2inst1inst0Sequencer
Interconnection
PE PE PE PE PE
instn
…
Configuration Memory
Processing Elements
Routing
Sequencing Unit
…inst3inst2inst1inst0Sequencer
Fe
Fc
Remanence Comparisons
Only 1 cycle to (re)configure the DSP Few cycles to (re)configure coarse grain RA (8) Many cycles to (re)configure fine grain RA
NPE Nc RName Type F (MHz)
2304 0.14 16457
24 4 624 4 6
128 16 8
ARDOISE
Systolic RingDART
MorphoSys
TMS320C62
Fine Grain RA
Coarse Grain RA
Coarse Grain RA
Coarse Grain RA
DSP VLIW 8 8
33
200130
100
300 1
FcNcFeNR PE
..
Operative Density Definition
NPE: # of PE A: Core Area (relative unit ²)
Area can be expressed as a function of NPE (architectural model)
Characterizes Fixed NPE
• # of operators per relative area unit
Variable NPE• OD as a function of NPE
A(NPE) = NPE*APE+Ainterconnect(NPE)+Amemory(NPE) Asequencer(NPE)
• OD(NPE) = k A(NPE) =k.NPE the architectural model is scalable
)()(
PE
PEPE NA
NNOD
Interconnection
PE PE PE PE PE
instn
…
Configuration Memory
Processing Elements
Routing
Sequencing Unit
…inst3inst2inst1inst0Sequencer
Interconnection
PE PE PE PE PE
instn
…
Configuration Memory
Processing Elements
Routing
Sequencing Unit
…inst3inst2inst1inst0Sequencer
Operative Density Comparisons
DSP: sequencer area ARDOISE : fine granularity Coarse granularity Reconfigurable architectures
• Scalabilty of interconnect resources ?• Generalization to architectural models
)²2/)(()(²)(
µmWµmAMA
)()(
PE
PEPE NA
NNOD
Name Type Area(M²)
ARDOISE Fine Grain RA 26 12300 0.2
Systolic Ring (S=1, C=6, N=2) Coarse Grain RA 24 500 4.8
Systolic Ring (S=1, C=16, N=4) Coarse Grain RA 128 7600 1.7
DART Coarse Grain RA 24 300 8.0
MorphoSys Coarse Grain RA 128 5500 2.3
TMS320C62 DSP VLIW 8 12300 0.1
Name Type NPEArea(M²) OD (NPE)
ARDOISE Fine Grain RA 26 12300 0.2
Systolic Ring (S=1, C=6, N=2) Coarse Grain RA 24 500 4.8
Systolic Ring (S=1, C=16, N=4) Coarse Grain RA 128 7600 1.7
DART Coarse Grain RA 24 300 8.0
MorphoSys Coarse Grain RA 128 5500 2.3
TMS320C62 DSP VLIW 8 12300 0.1
Interconnection
PE PE PE PE PE
instn
…
Configuration Memory
Processing Elements
Routing
Sequencing Unit
…inst3inst2inst1inst0Sequencer
Interconnection
PE PE PE PE PE
instn
…
Configuration Memory
Processing Elements
Routing
Sequencing Unit
…inst3inst2inst1inst0Sequencer
-Architectural Model Characterization -
A Case Study:
The Systolic Ring
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE
Dnode
RegisterFile
ALU + MULT
IN 1 IN 2
Dnode
RegisterFile
ALU + MULT
IN 1 IN 2
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths 3 parameters
• C: # of layers• N: # of Dnodes per layer
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
layer 1
layer 2
layer 3
layer 4
# of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2)
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths 3 parameters
• C: # of layers• N: # of Dnodes per layer
Switch
Dnode Dnode
Dnode Dnode
Switc
hDnode
Dnode
SwitchDnode
Dnode
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Switch
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Switch
Dnode Dnode
Dnode Dnode
Switc
hDnode
Dnode
SwitchDnode
Dnode
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Switch
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
layer 1 layer 2
layer 3
layer 4
layer 5layer 6
layer 7
layer 8
# of layers : 8 (C = 8) # of Dnode per layer : 2 (N = 2)
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths 3 parameters
• C: # of layers• N: # of Dnodes per layer• S: # of Rings
Switch
Dnode Dnode
Dnode Dnode
Switc
hDnode
Dnode
SwitchDnode
Dnode
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Switch
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Switch
Dnode Dnode
Dnode Dnode
Switc
hDnode
Dnode
SwitchDnode
Dnode
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Switch
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
# of layers : 8 (C = 8) # of Dnode per layer : 2 (N = 2)
1 Systolic Ring (S = 1)
layer 1 layer 2
layer 3
layer 4
layer 5layer 6
layer 7
layer 8
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths 3 parameters
• C: # of layers• N: # of Dnodes per layer• S: # of Rings
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Global Bus
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Global Bus
# of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2)
4 Systolic Ring (S = 4)
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths 3 parameters
• C: # of layers• N: # of Dnodes per layer• S: # of Rings
Control Units• Local Dnodes units
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Global Bus
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Global Bus
Dnode Sequencer
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths 3 parameters
• C: # of layers• N: # of Dnodes per layer• S: # of Rings
Control Units• Local Dnode unit• Local Ring unit
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Global Bus
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Global Bus
Local RingSequencer
Local RingSequencer
Local RingSequencer
Local RingSequencer
Architectural model Characterization The Systolic Ring
Architectural model Based on a coarse-grained
configurable PE Circular datapaths 3 parameters
• C: # of layers• N: # of Dnodes per layer• S: # of Rings
Control Units• Local Dnode unit• Local Ring unit• Global unit
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Global Bus
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
h
SwitchSwitc
h
Switch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Dnode Dnode
Dnode
Dnode
Dnode
Dnode
Dnode Dnode
Switc
hSw
itch
SwitchSwitchSw
itch
Switc
h
SwitchSwitch
Global Bus
Global Sequencer
Local RingSequencer
Local RingSequencer
Local RingSequencer
Local RingSequencer
Architectural model Characterization Remanence
Only one Systolic Ring S=1 NPE = # of Dnodes = N*C*S = N*C
Remanence formalisation
• k= C/N
PEPE NkNR .)(
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100 120 140 160 180 # Dnodes
REMANENCE
k = 1
k = 2k = 4
k = 8
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100 120 140 160 180 # Dnodes
REMANENCE
k = 1
k = 2k = 4
k = 8
Architectural model Characterization A(NPE) formalisation for OD(NPE)
0.18µ CMOS technology
• C = 4, N = 2, S = 1
• A(8) = 3.3 mm ²
• A(8) = 407M ²
Area formalisation:
• A ( NPE ) = f ( N, C, S )
depends on C / N ratio and S
• NPE = N.C.S
Area formalisation calibrated on these results
Switch 1 Switch 2
Switch 3 Switch 4
N1,1 N1,2
N3,1 N3,2
N2,2N2,1
N4,1 N4,2
BN1 BN3 BN2 BN4
Switch 1 Switch 2
Switch 3 Switch 4
N1,1 N1,2
N3,1 N3,2
N2,2N2,1
N4,1 N4,2
BN1 BN3 BN2 BN4
Systolic Ring layout(C=4, N=2, S=1)
Architectural model Characterization OD(NPE) for 1 Systolic Ring (S=1)
k = C/N = [ 0.25 ; 4 ]
decreasing OD(NPE)
OD(NPE) for several Systolic Ring
k = C/N = 4
multi-ring instanciations increase
scalability
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 50 100 150 200 # Dnodes
Operative Density
C/N=4
C/N=0.5
C/N=0.25
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 50 100 150 200 # Dnodes
Operative Density
C/N=4
C/N=0.5
C/N=0.25
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0,045
0 50 100 150 200 # Dnodes
Operative Density
1 Systolic Ring
2 Systolic Ring
4 Systolic Ring
8 Systolic Ring
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0,045
0 50 100 150 200 # Dnodes
Operative Density
1 Systolic Ring
2 Systolic Ring
4 Systolic Ring
8 Systolic Ring
Architectural model Characterization Customisation and design technique
• between 60 and 80 processing elements
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
Architectural model Characterization Customisation and design technique
• between 60 and 80 processing elements
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
Architectural model Characterization Customisation and design technique
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
Design Space
Architectural model Characterization
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
Best OD and remanenceWorst interconnect resources and processing power
Design Space
Architectural model Characterization
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
Design Space
Worst OD and remanenceBest interconnect resources and processing power
Architectural model Characterization
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0 20 40 60 80 100 120 140
# Dnodes
Ope
rativ
e D
ensi
ty
S=1
S=2
S=4
S=8
0
5
10
15
20
Remanence
Rem
anence
R and OD can be integrated in CAD tools to observe architectural parameters effects and choose best trade-offs in the design space
R1 OD1 R2 OD2 R3 OD3 Rn ODn
Conclusion and perspectives
IP 1
ControllerCPU
RAMROM
Flash
?
ControllerCPU
RAMROM
Flash
?
SpecificationsProcessing powerAreaPower consumptionetc.
IP 2 IP 3 IP n
R1 OD1 R2 OD2 R3 OD3 Rn ODn
Conclusion and perspectives
IP 1
ControllerCPU
RAMROM
Flash
?
ControllerCPU
RAMROM
Flash
?
SpecificationsProcessing powerAreaPower consumptionetc.
IP 2 IP 3 IP n
Architectural models
Comparisons
R1 OD1 R2 OD2 R3 OD3 Rn ODn
Conclusion and perspectives
IP 1
SpecificationsProcessing powerAreaPower consumptionetc.
IP 2 IP 3 IP n
Architectural model
Customisation
ControllerCPU
RAMROM
Flash
IP 3N=4 C=8
S=2
ControllerCPU
RAMROM
Flash
IP 3N=4 C=8
S=2
Thank You