An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip...

83
An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: [email protected] webpage: home.dei.polimi.it/zoni October 31st, 2013

Transcript of An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip...

Page 1: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

An introduction on the on-chip networks (NoC)

Davide Zoni PhD Studentemail: [email protected]

webpage: home.dei.polimi.it/zoni

October 31st, 2013

Page 2: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Outline

• Introduction to Network-on-Chip The interconnection problem New challenges Scenario Cache implications Topologies and abstract metrics

• Router microarchitecture Basic blocks Wormhole Virtual channelled NoC Architecture optimizations

Optimization dimensions

2

Page 3: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

3Some slides adapted from ...

Specific References Timothy M. Pinkston, University of Southern California,

http://ceng.usc.edu/smart/slides/appendixE.html On-Chip Networks, Natalie E. Jerger and Li-Shiuan Peh Principles and Practices of Interconnection Networks, William J. Dally and Brian Towles Designing Network-on-Chip Architecture in the Nanoscale Era, José Flich, Davide Bertozzi

Other people Chita R. Das Penn State NoC Research Group Li-Shiuan-Peh, MIT Onur Mutlu, CMU Karen Bergman, Columbia Bill Dally, Stanford Rajeev Balasubramoniam, Utah Steve Keckler, UT Austin Valeria Bertacco, University of Michigan

Page 4: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

What about an interconnection network ?

Applications: low-latency, high-bandwidth, dedicated channels between logic and memory

Technology: Dedicated channels too expensive in terms of area, power and reliability

4

Page 5: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

What about an interconnection network ?

An Interconnection Network is a programmable

system that transports data between terminals

Technology: Interconnection network helps efficiently utilize scarce resources Application: Managing communication can be critical to performance

5

Page 6: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

What about a classification ?

Interconnection networks can be grouped into four domains depending on number and proximity of devices to be

connected

1 Networks on Chip (NoCs or OCNs)Devices include: microarchitectural elements (functional units, register files), caches, directories, processors

Current/Future systems: dozens, hundreds of devices

Ex: Intel TeraFLOPS research prototypes – 80 cores

Intel Single-chip Cloud Computer – 48 cores

Proximity: millimeters

6

Page 7: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

2 System/Storage Area Networks (SANs)

Multiprocessor and multicomputer systems

– Interprocessor and processor-memory interconnections

Server and data center environments

– Storage and I/O components

Hundreds to thousands of devices interconnected

– IBM Blue Gene/L supercomputer (64K nodes, each with 2 processors)

Maximum interconnect distance

– tens of meters (typical) to a few hundred meters

– Examples (standards and proprietary): InfiniBand, Myrinet, Quadrics, Advanced Switching Interconnect

7

Page 8: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

3 LANs and 4 WANs

Local Area Networks (LANs) Interconnect autonomous computer systems

Machine room or throughout a building or campus Hundreds of devices interconnected (1,000s with bridging) Maximum interconnect distance

few kilometers to few tens of kilometers Example (most popular): Ethernet, with 10 Gbps over 40Km

Wide Area Networks (WANs) Interconnect systems distributed across globe

Internet-working support required Many millions of devices interconnected Max distance: many thousands of kilometers Example: ATM (asynchronous transfer mode)

8

Page 9: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Network scenario9

Page 10: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Network scenario10

Page 11: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Why networks ?11

Page 12: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

A step back to the interconnections...p2p and busPoint to point solution between 2 devices (Baseline) link controller manages the physical link, i.e. communication and flow control What about if multiple sources are attached to the link controller?

Link controller must arbitrate and store on the correct destination buffer

12

What about the bus? Used when multiple actors access a shared medium, thus arbitration The split transaction used to increase the bus performance as the link can be

pipelined The bus adds the Network Interface Controller (NIC) [as enhanced Link Controller]

Interconnection service abstraction, i.e. portability Additional services, i.e. QoS, src/dest err check, end2end data

compression

Page 13: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

A step back to the interconnections...crossbar

When devices are more then 2 and the performance are critical Non-blocking or reconfigurable non-blocking communication 2 arbitration levels: at the input ports, at the internal switch (as the hierarchical bus) Flow control and routing

13

Issues? Power and latency issues as the bus. Latency mainly due to Used when multiple

actors access a shared medium, thus arbitration Increased complexity with the same global interconnect Contention issues, multiple ports requiring for the same output port (HoL blocking)

Page 14: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

A step back to the interconnections...NoC

When devices increase the crossbar fails due to power and latency (Orion lecture) Solution: insert multiple switches to reduce the devices connected to a single switch

More flexibility, huge space exploration, local links, scalable Previous solution issues: contention, arbitration, performance, power Additional new Issues: routing, topology organization

14

A per tile enhanced switch

(i.e. routing computation)

Page 15: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Why networks ? (again)15

Why NoCs if so difficult to design?

– Increasing number of cores inside a single chip

– Reliability, flexibility, scalability, etc….

Page 16: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

What about computing demands ?16

Page 17: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

The energy-performance wall17

Page 18: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

The energy performance wall18

Page 19: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

The energy-performance wall19

Page 20: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

The energy-performance wall20

Page 21: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Why on-chip networks?

They provide external connectivity from system to outside world Also, connectivity within a single computer system at many levels I/O units, boards, chips, modules and blocks inside chips

Trends: high demand on communication bandwidth Increased computing power and storage capacity Switched networks are replacing buses

Integral part of many-core architectures Energy consumed by communication will exceed that of computation in

future systems Lots of innovation needed!

Computer architects/engineers must understand interconnect problems

and solutions in order to more effectively design and evaluate systems

21

Page 22: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-chip vs off-chip

Significant research in multi-chassis interconnection networks (off-chip) Supercomputers and Clusters of workstations Internet routers Leverage research and insight but...

Constraints are different Pin-limited bandwidth Mix of short and long packets on-chip Inherent overheads of off-chip I/O transmission

New research area to meet performance, area, thermal, power and reliability needs (On-chip)

Wiring constraints and metal layer limitations Horizontal and vertical layout Short, fixed length Repeater insertion limits routing of wires Avoid routing over dense logic Impact wiring density

22

Page 23: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

BLUEGENE/L- Huge power consumption

- One million Watts

- Complicated network structure

Mellanox Server Blade- Total power budget Constrained by packaging and cooling costs <= 30W

- Network power consumption ~10 to 15 W

IP Routers- Constrained by costs + regulatory limits

- ~200W line card

- ~60W interconnection network

Alpha 21364

- Packaging and cooling costs – Dell’s law <= $25

- Router+link ~25W

MIT Raw CMP

- Complicated communication networks

- On-chip network consumes about 36% of total chip power

Alpha 21364 & its Thermal Profile MIT Raw CMP

CPU Systemlogic

IB4X

Some examples

Intel SCC 48-core

23

Page 24: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-chip Networks24

PEPE PEPE

PEPE PEPE

PEPE PEPE

PEPE PEPE

Page 25: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

25On-chip Networks: outline

Nomenclature and Topology

Cache implications

Router microarchitecture Baseline model Optimizations

Metrics Power Performance

PEPE PEPE

PEPE PEPE

PEPE PEPE

PEPE PEPE

Page 26: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-chip Network: Where we are ...26

General Purpose

Multi-cores

Distributed memory(or Message Passing)

Shared

Memory

Page 27: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-chip Network: Where we are ...27

General Purpose

Multi-cores

Distributed memory(or Message Passing)

Shared

Memory

Here we are

Page 28: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Shared memory multi-core

28

Page 29: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Memory Model in CMPs

Message Passing Explicit movement of data between nodes and address spaces Programmers manage communication

Shared Memory Communication occurs implicitly through loads/stores and accessing

instructions Will focus on shared memory Look at optimization for cache coherence protocols

29

Page 30: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Memory Model in CMPs

Logically All processors access some shared memory

Practically... cache hierarchies reduce access latency to improve performance

Requires cache coherence protocol to maintain coherent view in presence of multiple shared copies Consistency model: the behaviour of the memory model in multi-core

environment, i.e. what is allowed and what is not allowed Coherence: shadow the cache hierarchy to the programmer (without

lose performance improvement)

30

Page 31: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Tiled multi-core architecture with shared memory31

Source: Natalie Jerger, ACACES Summer School, 2012

Page 32: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Intel SCC

2D mesh State of the art VC routers, message passing 2Cores per each tiles Multiple voltage islands

1 Vdd per each tile 1 NoC Vdd island

32

Source: Natalie Jerger, ACACES Summer School, 2012

Page 33: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Tilera64 core

2D mesh State of the art VC routers, shared memory Multiple networks Single processor per tile

33

Page 34: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

NoC impacts on coherence

34

Page 35: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Coherence Protocol on Network Performance35

Coherence protocol shapes communication needed by system

Single writer, multiple reader invariant Requires:

Data requests Data responses Coherence permissions

Suggested reading for a quick review of coherence:

“A Primer on Memory Consistency and Cache Coherence”, Daniel

Sorin, Mark Hill and David Wood. Morgan Claypool Publishers, 2011.

Page 36: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Hardware cache coherence36

Rough goal: all caches have same data at all times Minimal flushing, maximum caches → best performance

Two solutions: Broadcast-based protocol:

All processors see all requests at the same time, same order. Often relies on bus But can broadcast on unordered interconnect

Directory-based protocol: Order of the requests relies on a different mechanism than bus Maybe better flexibility and scalability Maybe higher latency

Page 37: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Broadcast-based coherence37

Source: Natalie Jerger, ACACES Summer School, 2012

Page 38: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Coherence Bandwidth Requirements38

How much address bus bandwidth does snooping need? Well, coherence events generated on...

Misses (only in L2, not so bad) Dirty replacements

Some parameters: 2 GHz CPUs, 2 IPC 33% memory operations, 2% of which miss in L2 50% of evictions are dirty

Some results: (0.33 * 0.02) + (0.33 * 0.02 * 0.50)) = 0.01 events/insn 0.01 events/insns * 2 insn/cycle * 2 cycle/ns = 0.04 events/ns Request: 0.04 events/ns * 4B/event = 0.16 GB/s = 160 MB/s Data response: 0.04 events/ns * 64 B/event = 2.56 GB/s

What about scalability ? That’s 2.5 GB/s ... per processor With 16 processors, that 40 GB/s! With 128 processors, that’s 320 GB/s!!

Page 39: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Scalable Cache Coherence39

Two parts solution: Bus-based interconnect:

Replace non-scalable bandwidth substrate (bus)... ... with scalable bandwidth substrate (point-to-point network,

e.g. mesh) Processor'snooping'bandwidth:

Interesting most snoops result in no actions Replace non scalable broadcast protocol (it spam

everyone)...with scalable directory protocol (it only spams processors that care)

NOTE: physical address space statically partitioned (Still shared!!) Can easily determine which memory module holds a given line That memory module sometimes called “home” Can’t easily determine which processors have line in their caches Bus-based protocol: broadcast events to all processors/caches Simple and fast, but non-scalable

Page 40: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Scalable Cache Coherence40

Source: Natalie Jerger, ACACES Summer School, 2012

Page 41: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Coherence Protocol Requirements41

Different message types Unicast, multicast, broadcast

Directory protocol Majority of requests: Unicast Lower bandwidth demands on network More scalable due to point-to-point communication

Broadcast protocol Majority of requests: Broadcast Higher bandwidth demands Often rely on network ordering

Page 42: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Impact of Cache Hierarchy42

Sharing of injection/ejection port among cores and caches

Caches reduce average memory latency Private caches

Multiple L2 copies Data can be replicated to be close to processor

Shared caches Data can only exist in one L2 to bank Addresses striped across banks (Lots of different ways to do

this) Aside: lots of research on cache block placement, replication and

migration

Serve as filter for interconnect traffic

Page 43: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Private vs. Shared Caches43

Private caches Reduce latency of L2 cache hits keep frequently accessed data close to processor Increase off-chip pressure

Shared caches Better use of storage Non-uniform L2 hit latency More on-chip network pressure

all L1 misses go onto network

Page 44: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-chip Network: Private L2 Cache Hit44

1

2LD A

3

Miss A

CoreL1 I/D Cache

Private L2 Cache

Router

Tags Data

Controller

LogicHit A

Memory Controller

A

1

2LD A

3

Miss A

CoreL1 I/D Cache

Private L2 Cache

Router

Tags Data

Controller

LogicHit A

Memory Controller

A

Source: Chita Das, ACACES Summer School, 2011

Page 45: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-chip Network: Private L2 Cache Miss (off-chip)45

1

2LD A

3

Miss A

CoreL1 I/D Cache

Private L2 Cache

Router

Tags Data

Controller

Logic

Miss A

4

Format message to memory controller

Memory Controller5

6Data received,

sent to L2

Request sent off-chip

Source: Chita Das, ACACES Summer School, 2011

Page 46: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-chip Network: Shared L2 Local Cache Miss (on-chip)

46

A

12

LD A

3

Miss A

Memory Controller

CoreL1 I/D Cache

Shared L2 Cache Router

Tags Data

Controller

Logic

CoreL1 I/D Cache

Shared L2 Cache Router

Tags Data

Controller

Logic

Format request message and sent to L2 Bank that

A maps to4

Receive message and sent to L2

5L2 Hit

6Send data to requestor

7 Receive data, send to L1 and core

A

Source: Chita Das, ACACES Summer School, 2011

Page 47: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Network-on-Chip details

47

Page 48: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Topology nomenclature 1

• Two broad classes: Direct and Indirect Networks Direct Networks: Every node is both a terminal and a switch

Examples: Mesh, Torus, k-ary-n-cubes…

Indirect Networks: The network is basically composed of switches that connect the end nodes

• Examples: MIN, Crossbar, etc…

48

Direct Indirect

Source: Natalie Jerger, ACACES Summer School, 2012

Page 49: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Topology abstract metrics 1 Switch Degree: Number of links/edges incident on a node

Proxy for estimating cost Higher degree requires more links and port counts at each router

49

2

2,3,4 4

Source: Natalie Jerger, ACACES Summer School, 2012

Page 50: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Topology abstract metrics 2

• Hop Count: Number of hops a message takes from source to destination• Proxy for network latency• Every node, link incurs some propagation delay even when no contention

• Network diameter: large min hop count in network– Average minimum hop count: average across all source/destination pairs

• Minimal hop count: smallest hop count connecting two nodes– Implementation may incorporate non-minimal paths (increase avg hop count)

50

Max=4

Avg=1.77

Max=4

Avg=2.2

Max=2

Avg=1.33 Source: Natalie Jerger, ACACES Summer School, 2012

Page 51: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Topology abstract metrics implications

• Abstract metrics are just proxies: Does not always correlate with the real metric they represent– Example:

• Network A with 2 hops, 5 stage pipeline, 4 cycle link traversal vs.• Network B with 3 hops, 1 stage pipeline, 1 cycle link traversal• Hop Count says A is better than B• But A has 18 cycle latency vs. 6 cycle latency for B

• Topologies typically trade-off hop count and node degree

51

Page 52: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Traffic patterns

• How to stress a NoC?– Synthetic traffic patterns

• Uniform random– Optimistic, it allows to view a bad network as a good one

• Matrix transpose• Many others based on probabilistic distributions and pattern selection

algorithms– Real traffic patterns

• Real benchmarks executed on the simulated architecture• More accurate• Complete evaluation of the system performance• Time consuming simulation

Is the selected traffic suitable for my application?

52

Page 53: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Routing, Arbitration, and Switching

Routing Defines the “allowed” path(s) for each packet (Which paths?) Problems

• Livelock and Deadlock

Arbitration Determines use of paths supplied to packets (When allocated?) Problems

Starvation

Switching Establishes the connection of paths for packets (How allocated?) Switching techniques

Circuit switching, Packet switching

53

Page 54: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Until now old wine in a new bottle...but for caches54

Where is the difference?

Router/switch

Routing

algorithm

Packets

Flow control

Deadlock

Throughtput

Latency

Page 55: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

55

Low power Limited resources High performance High reliability Thermal issues

On-chip network

criticalities

Until now old wine in a new bottle...but for caches

Page 56: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

NoC granulatity overview56

Messages: composed of one or more packets

(NOTE:If message size is ≤ maximum packet size only one packet created)

Packets: composed of one or more flits

Flit: flow control digit

Phit: physical digit (Subdivides flit into chunks = to link width)

Off-chip: channel width limited by pins

On-chip: abundant wiring means phit size == flit size

Page 57: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

NoC microarchitecture based on granulatiry 57

Message-based: allocation made at message granularity circuit switching

Packet-based: allocation made to whole packets Store and forward (SaF)

Large latency and buffer required Virtual Cut Through (VCT)

Improves SaF but still large buffers and latency Flit-based: allocation made on a flit-by-flit basis

Wormhole Efficient buffer utilization, low latency Suffers Head of Line (HoL)

Virtual channels Primary to face deadlock Then face HoL

Page 58: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Switch/Router Wormhole Microarchitecture58

Flit-based,i.e. Packet divided in flits Pipelined in 4 stages

BW,RC,SA,ST,LT Buffers organized on a flit basis Single buffer per port Buffer states:

G – idle,routing,active waiting, R – output port (route) C – credit count P – pointers to data

Page 59: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Switch/Router Virtual Channel Microarchitecture59

Page 60: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Router components60

Router components Input buffers, route computation logic, virtual channel allocator, switch allocator,

crossbar switch Most OCN routers are input buffered Use single-ported memories Buffer store flits for duration in router Contrast with processor pipeline that latches between stages

Basic router pipeline (Canonical 5-stage pipeline) BW: Buffer Write RC: Routing computation VA:Virtual Channel Allocation SA: Switch Allocation ST: Switch Traversal LT: Link Traversal

Page 61: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Router components61

Routing computation performed once per packet Virtual channel allocated once per packet Body and tail flits inherit this info from head flit Router performance

Baseline (no load) dealy: 5 cycles + link delay x Hop + tserialization

How to reduce latency ?

Page 62: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Pipeline optimization: lookahead router62

Overlap with BW

Precomputing route allows flits to compete for Vcs immediately after BW

RC decodes route header

Routing computation needed at next hop Can be computed in parallel with VA

Page 63: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Pipeline optimization: speculation63

Assume that Virtual Channel Allocation stage will be successful Valid under low to moderate loads

Entire VA and SA in parallel

If VA unsuccessful (no virtual channel returned) Must repeat VA/SA in next cycle

Prioritize non-speculative requests

Page 64: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Router Pipeline: module dipendencies64

Dependence between output of one module and input of another Determine critical path through router Cannot bid for switch port until routing performed

Li-Shiuan Peh and William J. Dally. 2001. A Delay Model and Speculative Architecture for Pipelined Routers

Page 65: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Something on the router implementation... (blackboard :| )

65

Page 66: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Router Pipeline: delay model66

Li-Shiuan Peh and William J. Dally. 2001. A Delay Model and Speculative Architecture for Pipelined Routers

Page 67: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Switch/Router Flow Control67

Flow control determines how a network resources, such as bandwidth, buffer capacity and control state are allocated to packets traversing the network

Resource allocation problem: from the resources point of view Contention resolution: from the packet point of view Bufferless, buffered

Page 68: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Switch/Router Bufferless Flow Control68

No buffers Allocate channels and bandwidth to competing packets Two modes

Dropping flow control Circuit switching flow control

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 69: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Bufferless Dropping Flow Control 169

Simplest flow control form

Allocate channel and bandwidth to competing packets

In case of collisions we experience packet drops

Collision can be signaled or not using ack-nack messages

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 70: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Bufferless Dropping Flow Control 270

With no ack messages the only viable way is timeout timers

Ack messages can reduce latency

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 71: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Bufferless Circuit switching Flow Control 171

It allocates all needed resources before send the message When no further packets must be sent, the circuit is deallocated Head flit arbitrates for resources, and if stalled no resend needed

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 72: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Switch/Router Buffered Flow Control72

Buffers More flexibility, with the possibility to decouple resource allocation in steps Two modes

Wormhole flow control Virtual channel flow control

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 73: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Switch/Router Buffered Wormhole Flow Control73

Allocate on a per flit basis

More efficient in buffer consumption

Head of Line (HOL) blocking issues

Buffered solutions allow to decouple resource allocation

U – uppuer outport, L – lower outport In port States (I,W,A) (idle, waiting, allocated) Flits (H,B,T) (head, body, tail)

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 74: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Switch/Router Virtual Channel Flow Control74

Multiple buffers on the same input port

Need for a state on each virtual channel

More complex to manage than wormhole

Allows to manage different flows at the same time

Solves the HoL issues

Deadlock avoidance property

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 75: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Wormhole HoL issues75

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 76: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Buffer Management and Backpressure76

How to manage buffers between neighbors (i.e. how can I know the downstream destination router buffer is full?)

Three ways: Credit based

The upstream router keeps track of the available flit slots available in the downstream router

Upstream router decreases counter when sends a flit while downstream router increases the couter (backward) when a flit leave the router

Accurate fine grain control on flow control, but a lot of messages On/off

Threshold mechanism with single bit low overhead to signal upstream router the permission to send

Ack/nack No state in the upstream node

Sends and wait for ack/nack, no net gain Waist of bandwitdh, sending without ack guarantee

Page 77: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Credit-based flow control77

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 78: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

On-off flow control78

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 79: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Ack-nack flow control79

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Page 80: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Evaluation metrics for NOCs

Performance Network centric

• Latency• Throughput

Application Centric• System throughput (Weighted Speedup)• Application throughput (IPC)

Power/Energy Watts/Joules Energy Delay Product (EDP)

Fault-Tolerance Process variation/Reliability

Thermal Temperature

80

Page 81: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

- Buffer power, crossbar power and link power are comparable

- Arbiter power is negligible

81

Network-on-Chip power consumption

Network power

breakdown

Source: Chita Das, ACACES summer school 2011

Page 82: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

82Bibliography 2

Dally, W. J., and B. Towles [2004]. Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers, San Francisco.

C.A. Nicopoulos, N. Vijaykrishnan, and C.R. Das, “Network-on-Chip Architectures: A Holistic Design Exploration,” Lecture Notes in Electrical Engineering Book Series, Springer, October 2009.

G. De Micheli, L. Benini, ‘Networks on Chips: Technology and Tools,’ Morgan Kaufmann, 2006. J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach,

Morgan Kaufmann, 2002. R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, Y. Hoskote, 'Outstanding Research Problems in

NoC Design: System, Microarchitecture, and Circuit Perspectives', IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 28, pp. 3-21, Jan. 2009.

T. Bjerregaard and S. Mahadevan, “A survey of research and practices of network-onchip,” ACM Comput. Surv., vol. 38, no. 1, pp. 1–51, Mar. 2006.

Natalie Enright-Jerger and Li-Shiuan Peh, "On-Chip Networks", Synthesis Lecture, Morgan-Claypool Publishers, Aug. 2009

Agarwal, A. [1991]. “Limits on interconnection network performance,” IEEE Trans. on Parallel and Distributed Systems 2:4 (April), 398–412.

Dally, W. J., and B. Towles [2001]. “Route packets, not wires: On-chip interconnection networks,” Proc. of the Design Automation Conference, Las Vegas (June).

Ho, R., K. W. Mai, and M. A. Horowitz [2001]. “The future of wires,” Proc. of the IEEE 89:4 (April). Hangsheng Wang, Xinping Zhu, Li-Shiuan Peh and Sharad Malik, "Orion: A Power-Performance

Simulator for Interconnection Networks" , In Proceedings of MICRO 35, Istanbul, November 2002. D. Brooks, R. Dick, R. Joseph, and L. Shang, "Power, thermal, and reliability modeling in

nanometer-scale microprocessors, " IEEE Micro , 2007.

Page 83: An introduction on the on-chip networks (NoC) - Intranet … · An introduction on the on-chip networks (NoC) Davide Zoni PhD Student email: ... Designing Network-on-Chip Architecture

Thank youAny questions?

83