NoC: Network OR Chip? Israel Cidon Technion. Israel Cidon, Technion Technion’s NoC Research: PIs ...
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of NoC: Network OR Chip? Israel Cidon Technion. Israel Cidon, Technion Technion’s NoC Research: PIs ...
NoC:Network OR Chip?
Israel Cidon
Technion
Israel Cidon, Technion
Technion’s NoC Research:
PIs Israel Cidon (networking) Ran Ginosar (VLSI) Idit Keidar (Dist. Systems) Avinoam Kolodny (VLSI)
Students: Evgeny Bolotin, Reuven Dobkin, Zvika Guz, Arkadiy Morgenshtein, Zigi Walter Roman Gindin
Israel Cidon, Technion
Origins of the NoC concept Early publications:
Guerrier and Greiner (2000) – “A generic architecture for on-chip packet-switched interconnections”
Hemani, Jantsch, Kumar, Postula, Oberg ,Millberg and Lindqvist (2000) – “Network on chip: An architecture for billion transistor era”
Dally and Towles (2001) – “Route packets, not wires: on-chip interconnection networks”
Wingard (2001) – “MicroNetwork-based integration of SoCs”
Rijpkema, Goossens and Wielage (2001)– “A router architecture for networks on silicon”
De Micheli and Benini (2002) – “Networks on chip: A new paradigm for systems on chip design”
Bolotin, Cidon Ginosar and Kolodny (2004) – “QNoC: QoS architecture and design process for network on chip”
Israel Cidon, Technion
Evolution or Paradigm Shift?
Computingmodule
Networkrouter
Networklink
Architectural paradigm shift Replace wire spaghetti by an intelligent network infrastructure
Design paradigm shift Busses and signals replaced by packets
Organizational paradigm shift Create a new discipline, a new infrastructure responsibility
Bus
Israel Cidon, Technion
Characteristics of a paradigm shift
Addresses a critical and topical need
Enables a quantum leap in productivity and application
Resistance from legacy experts
Requires a major change of mindset and skills!
Think: Networking not Bus evolution!
successful
Israel Cidon, Technion
Critical needs addressed by NoC
3) Enable Chip Multi Processors
1) Efficient interconnect: delay, power, noise, scalability, reliability
2) Increase system integration productivity
Module
Module Module
Module Module
Module Module
Module
Module
Module
Module
Module
Israel Cidon, Technion
NoC offers Area and Power ScalabilityFor Same Performance, compare the
n
n
dd
n
n
ddNoC: n
n
dd
Simple Bus:
Segmented Bus:Point-to Point:
3O n n
2O n n
O n
2O n n
E. Bolotin at al. , “Cost Considerations in Network on Chip”, Integration, special issue on Network on Chip, October 2004
O n n
O n n
O n
O n n
Wire-area and power:
Israel Cidon, Technion
4 Decades of Network 101
Evolved from busses and p-t-p connections Extensive architectures, modeling and analysis research Architecture is about optimizing network costs Different goals and element costs => different architectures:
Local Area Networks (LANs) Metropolitan Area Networks (MANs) System interconnect networks (SAN, InfiniBand …) WAN (TCP/IP, ATM…) Wireless networks
Cross layered design Early architecture standardization is an optimization burden!
Israel Cidon, Technion
4 Decades of Network 101
Israel Cidon, Technion
Local Area Networks (LANs) Critical need
Distributing operations and sharing of heterogeneous systems Constraints
Standardization Main Cost
Incremental cost (NICs, wiring) Typical optimized architecture:
Low cost hubs/switches Tree like architecture Exploit low cost local BW
Shared media Broadcast
Host embedded NICs
Israel Cidon, Technion
System interconnect (SAN, InfiniBand)
Critical need Create a powerful specialized system from low cost units
Constraints Low latency
Main Cost Total system cost per MIP
Typical architecture: Wormhole/cut through Connection based Over-provisioned network High degree/regular topology Specific optimizations (e.g. RDMA)
Israel Cidon, Technion
WAN (TCP/IP, ATM…) Critical need
Global application networking (collaboration, WWW, file sharing, voice)
Constraints Scalability Heterogeneous user and application
QoS requirements Main Cost
Physical infrastructure (mainly long distance trunks) Typical architecture of choice:
Packet switching Irregular, small degree networks of high speed trunks Optimization of topology and link capacities
Israel Cidon, Technion
CAN optimization
The main cost(s) Total Area Power Others
Design time, verification and testability,
The design envelope (constraints) Collection of designs supported by a given chip Convex hull of traffic requirements all configurations QoS constraints Other requirements (eg: design automation…)
Optimization variables Switching mechanism QoS Topology (incl. links capacities) Routing Flow and congestion control Buffering Application support …..
Israel Cidon, Technion
One NoC does not fit all!
Flexibility
Reconfigurationrate
single application
General purpose computer
at design time
at boot time
during run time
ASIC
CMP
ASSP
FPGA
I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006
Israel Cidon, Technion
One NoC does not fit all!
A large solution range!
Flexibility
Traffic Unpredictability
single application
General purpose computer
At design time
At configuration
Run time
I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006
ASIC
CMP
ASSP
FPGA
Israel Cidon, Technion
Architecture of choice: Wormhole or small frame switching Small # of buffers, VCs, tables Simple QoS mechanisms (which?) Topology and routing optimized for cost
Main cost Power and area
Design envelop / constraints Well define inter-modules traffic Automatic synthesis Variable QoS requirement
Apply paradigm to ASIC based NoC
Israel Cidon, Technion
Example: QNoC Quality-of-service NoC architecture for ASICs
Traffic requirements are known a-priori Overall approach
Wormhole switching QoS based on priority classes Small buffer/VC budget In-order SP XY routing Irregular topology Optimized link capacities
* E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny., “QNoC: QoS architecture and design process for Network on Chip”, JSA special issue on NoC, 2004.
)0,2(
)0,0(
)1,0(
)0,3(
)1,4(
)0,4(
)2,1(
)2,0(
)2,2( )2,3( )2,4(
)4,3(
)3,4(
)4,4(
R
R
R R R
R
RR R R R
R R
R
R
R
R
)5,0(
Israel Cidon, Technion
Quality-of-Service in QNoC
Multiple priority classes Define latency Preemptive Possible ASIC classes
Signaling Real Time Stream Read-Write DMA Block Transfer
Statistical guarantees E.g. <0.01% arrive
later then required
N
T
* E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny., “QNoC: QoS architecture and design process for Network on Chip”, JSA special issue on NOC, 2004.
Israel Cidon, Technion
Extract inter-module traffic
QNoC Design Flow
Place modules
Allocate link capacities
Verify QoS and cost
Israel Cidon, Technion
Module
Module
Module
Module
Module
Module
Module
Module
Module Module Module
Module
Module
ModuleModule
Module
Module
QNoC Design FlowExtract inter-module traffic
Place modules
Allocate link capacities
Verify QoS and cost
R
R
R R R
R
RR R R R
R RR
R
R
R
R
R
RR
R
R
R
R
R
R
R R
R R
R
R
R
R
R
RR
R
R
Israel Cidon, Technion
Module
Module
Module
Module
Module
Module
Module
Module
Module Module Module
Module
Module
Module
R
R
R R R
R
RR R R R
R R
R
R
R
R
Module
Module
Module
Module
Module
Module
Module
Module
Module
Module Module Module
Module
Module
Module
R
R
R R R
R
RR R R R
R R
R
R
R
R
Module
Extract inter-module traffic
Place modules
Allocate link capacities
Verify QoS and cost
Optimize capacity for performance/power tradeoff Capacity allocation is a traditional WAN optimization problem, however:
QNoC Design Flow
Israel Cidon, Technion
Wormhole Delay Modeling
Approximate delay analysis in wormhole networks Multiple Virtual-Channels Different link capacities Different communication
demands
| f
ij f f
jf j f i
lt
C l m
Flit interleaving delay approximation:
11 22 ( )
ii network
iinetwork
TQ
T
Queuing delay:
* I. Walter, Z. Guz, I. Cidon, R. Ginosar and A. Kolodny, “Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip,” DATE 2006.
Israel Cidon, Technion
The Capacity Allocation Problem Given:
system topology and routing Each flow’s bandwidth (fi ) and delay
bound (TiREQ)
Minimize total link capacitye
e E
C
| ( )
: ie
i e path i
link e f C
: i i
REQflow i T T
Such that:
Israel Cidon, Technion
Capacity Allocation – Realistic Example A SoC-like system with realistic traffic demands and delay requirements “Classic” design: 41.8Gbit/sec Using the algorithm: 28.7Gbit/sec Total capacity reduced by 30%
After optimization
Before optimization00
0102
03
1011
1213
2021
2223
Israel Cidon, Technion
Optimizing routing on Irregular Mesh
Goal: Minimize the total size of routing tables
Around the Block
Dead End
E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny, "Routing Table Minimization for Irregular Mesh NoCs", DATE 2007.
Israel Cidon, Technion
Saving Table Hardware
Traditional solutions - full routing tables Destination Based Routing - at router Source Routing – at sources
Solution idea: Use Reduced Tables
Store only relevant destinations (PLA) Default function (“Go XY” or “Don’t turn”) + Table for deviations
Israel Cidon, Technion
Routing Heuristics for Irregular Mesh
Routing Cost Reduction in Real Aplications
1
10
100
1000
MPEG4 VOPD
Lo
g (
Ro
uti
ng
Co
st
)
DR
TT
XYDT
SR
SRDP
Routing Cost in 12x12 NoC (many holes, high hotspost probability)
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
10 30 50
Hotspot Number
Ro
uti
ng
Co
st
[g
ate
s]
DR
XYDT
SR
SRDP
Distributed Routing (full tables) X-Y Routing with Deviation Tables Source Routing Source Routing for Deviation Points
Systems with real applications
Random problem instances
Israel Cidon, Technion
Efficient Routing Results
Scaling of Savings S
avin
gs
Network Size
Israel Cidon, Technion
NoC for Shared Memory CMP
Constraints Multiple access to coherent cache Unpredictable traffic pattern QoS requirements (fetch, pre-fetch)
Main cost CMP power / area per performance
Architecture of choice: Tailored for a given CMP In-order/adaptive routing? Simple QoS mechanisms? Regular topology? is CMP symmetric? Built in support functions (multicast, search…)
0 7
56 63
P0 P1
P5 P4
P6
P7
P3
P2
Distributed L2
Israel Cidon, Technion
NoC can facilitate critical transactions
* E.Bolotin, Z. Guz, I.Cidon, R. Ginosar and A. Kolodny, “The Power of Priority: NoC based Distributed Cache Coherency”, NoCs 2007.
L2
Dire
ctor
y
NoC
NoC
NoC
P1L1
P2L1
P0L1
1. READ. REQ
1. R
EA
D R
EQ
P1-SharedP2-Shared
L2
Dire
ctor
yNoC
NoC
NoC
P1L1
P2L1
P0L1
3. READ EXCL. REQ
6. Read EXCL. RESP (data transfer)
5. INVALID. ACK
5. IN
VA
LID
. AC
K
P0-MOD.
Israel Cidon, Technion
Priority NoC: ResultsL2 Access Delay Reduction by Priority-based NoC
22.6
31.8
19.6
28.4
13.5
25.3
18.3
32.9
22.3
28.0
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
apache zeus fft ocean radix
De
lay
Re
du
cti
on
[%
]
Read Read Exclusive
Total Program Speedup by Priority-based NoC
9.48.7
9.08.6
5.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
apache zeus fft ocean radix
Sp
ee
du
p [
%]
0 7
56 63
P0 P1
P5 P4
P6
P7
P3
P2
Distributed L2
Israel Cidon, Technion
NoC Based FPGA Architecture
CR
CR
R
FRSERDESCNI
R
FR CPU
RCNI
CR
CNIR
CR
R
CNIR
R
CR
CNIR
R
CNIR
FRDSP
CNIR
CR
R
FRPCI
RCNI
CR
CNIR
CR
CNIR
CNIR
FRCPU
RCNI
CR
CNIR
CNIR
CNIR
FRDRAM
R
CR
RCNI
R
CR
CNIR
CNIR
CNI
CNI CNI
FRETHI/F
CNIR
CNIR
FRD/AA/D
CNIR
CNIR
FRETHI/F
CNIR
CNI CNI CNI CNI
Functional unit
Routers
NoC for inter-routing
Configurable region – User
logic
Configurable network interface
Israel Cidon, Technion
NoC for FPGA
Design envelope / constraints Many ASIC like applications for a given FPGA Hard NoC infrastructure – efficient but inflexible Soft logic is reusable but has inferior performance
Average NoC cost of most demanding designs Hard grid links and router logic Total configured NoC Logic used
Architecture of choice: Regular and uniform grid In-order/load balanced routing Hard logic for links, routers Soft logic for routing algorithms, headers, CNIs Soft NoC tuning (routing, CNI) for a given implementation
Israel Cidon, Technion
NoC Based FPGA Architecture
CR
CR
R
FRSERDESCNI
R
FR CPU
RCNI
CR
CNIR
CR
R
CNIR
R
CR
CNIR
R
CNIR
FRDSP
CNIR
CR
R
FRPCI
RCNI
CR
CNIR
CR
CNIR
CNIR
FRCPU
RCNI
CR
CNIR
CNIR
CNIR
FRDRAM
R
CR
RCNI
R
CR
CNIR
CNIR
CNI
CNI CNI
FRETHI/F
CNIR
CNIR
FRD/AA/D
CNIR
CNIR
FRETHI/F
CNIR
CNI CNI CNI CNI
Functional unit
Routers
NoC for inter-routing
Configurable region – User
logic
Configurable network interface
Israel Cidon, Technion
Source Toggle XY Unlike TXY, traffic to same
destination is not split Maximum capacity similar to TXY The route is a bitwise XOR of
source and destination ID Can be extended to weighted
source toggle (WOT)
XY YX XY YX XY
YX YX XY YX
XY YX XY YX XY
YX XY YX XY YX
XY YX XY YX XY
Israel Cidon, Technion
Two Hotspots
1 2 3 4 5
15
20
25
30
Minimum Distance between the hotspots
Capac
ity
XY
TXY
STXY
WTXY
WOT
Maximum Capacity
Design Envelope for various distances between the hotspots for WOT
Israel Cidon, Technion
Generic NoC ProblemsMany shared problems across design spectrum, examples: Need for a low latency class of service Verification and predictability Power control of NoCs Centralized vs. distributed control Is single NoC enough per chip?
Bus examples suggest otherwise
Hot modules slows incoming NoC traffic Off chip systems Shared memory subsystems Expensive functional units
Israel Cidon, Technion
IP3Interface
IP2
Inte
rfa
ce
IP1
Inte
rfac
e
HM is not a local problem Transparent to NoC performance
NoC clogging by hot modules
Walter, Cidon, Ginosar and Kolodny, ”Access Regulation to Hot-Modules in Wormhole NoCs”, NOCS 2007.
Israel Cidon, Technion
IP(HM) Inte
rfac
e
No “fairness” is guarantied since routers’ arbitration is based on local state
The further is the source from the destination, its worm has to win more arbitrations
The HM module bandwidth isn’t fairly shared
Source Fairness
Israel Cidon, Technion
Hot Module Distributed Arbitration Control is distributed or centralized Centralized control can account for dependencies Requests and grants are sent at high service level Requests and grants includes additional data as needed
requested quota, source queue size, priority, deadline, etc. Granted quota, scheduling of transmission's, etc.
Initial credits hides light load request-grant latency
Israel Cidon, Technion
Hot vs. non-Hot ModuleTraffic
HM TrafficWithout Control
Other TrafficWithout Control
HM TrafficWith Control
Other TrafficWith Control
Israel Cidon, Technion
Conclusions NoC is a chip design paradigm shift Introduces many diverse and new networking challenges No killer NoC for all chips Should not comply with any X-AN concept
May include centralized mechanisms May involve more than one NoC/Bus mechanisms May combine several communication methodologies
Low latency NoC/Bus for metadata and urgent signals Beware of early standardization and legacy barriers Mutual benefit for VLSI-Networking collaboration
NoC: A Network AND A Chip