An NoC Architecture for Inductive Coupling Wireless ...

Post on 12-Dec-2021

6 views 0 download

Transcript of An NoC Architecture for Inductive Coupling Wireless ...

An NoC Architecture for Inductive

Coupling Wireless Interconnect

H.Amano

Keio University

Special Thanks to

Prof. Kuroda and

Prof. Matsutani

Outline: Wireless 3D NoC

• 3D IC technologies

– Wired approach vs. wireless approach

– Inductive-coupling technology

• Design Examples

– MuCCRA-Cube

– Cube-1

• Simple wireless 3D NoC

– Ring-based 3D network

– Bubble flow control

• CoC (Castle of Chips)

– Large scale system by wireless links

Design cost of LSI increasing…

• System-on-Chip (SoC)

– Required components are integrated on a single chip

– Different LSI must be developed for each application

• System-in-Package (SiP) or 3D IC

– Required components are stacked for each

application SiP

By changing the chips in a package, we can provide a wider range of chip family with modest design cost

3D IC technology for going vertical Two

chips

(face

-to

-fa

ce)

Microbump

Through silicon via

Capacitive coupling

Inductive coupling

Wired Wireless

Scalability

Flexibility

Mor

e t

han

thre

e c

hips

Inductive coupling link for 3D ICs

Stacking after chip fabrication Only know-good-dies selected

More than 3 chips

Bonding wires for power supply

Inductor for transceiver Implemented as a square coil with metal in common CMOS

Not a serious problem. Only metal layers are occupied

Footprint of inductor

Inductive-coupling I/F: An Example

240 8 240

tMUX

Digital 8

Txdata Rxdata

System

Clock Osc.

Local Clock

tDEMUX tTx-Rx

Tx

Tx

Rx

Rx

PCU

Rx enable Tx enable

Phase control unit

generates Rx and Tx enable signals based on the counter value

Data link (8ch)

Clock link

From upper chip

Clock link

Data link (8ch)

Outline: Wireless 3D NoC

• 3D IC technologies

– Wired approach vs. wireless approach

– Inductive-coupling technology

• Design Examples

– MuCCRA-Cube

– Cube-1

• Simple wireless 3D NoC

– Ring-based 3D network

– Bubble flow control

• CoC (Castle of Chips)

– Large scale system by wireless links

Prof. Kuroda’s recent projects

• Non-contact memory cards with wireless data/power supply [ Chung2012 ]

• Digital Rosetta Stone [Yuan2010]

• Non-contact Wafer-Level Testing[ Radecki2012]

Today, I focused on joint project for developing systems using wireless inductive coupling.

MuCCRA-Cube (2008)

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Data Memory

Technology: 90nm, Chip thickness: 85um, Glue: 10um

5.0

mm

2.5mm

Inductive-Coupling Up Link

Inductive-Coupling Down Link

• 4 MuCCRA chips are stacked on a PCB board

[Saito,FPL’09]

MuCCRA-Cube using inductive coupling

• MuCCRA: a dynamically reconfigurable processor

• Number of MuCCRA chips stacked in a package

can be changed

SiP

MuCCRA-Cube: Application mapping

Chip0

Chip1

Mem Mem Mem Mem

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

Mem Mem Mem Mem

PE PE PE PE

0

180

Left side PEs have uplinks

Right side PEs have downlinks

MuCCRA-Cube: Application mapping

Chip0

Chip1

Data1 Data2 Data3 Data4

ADD ADD SUB SUB

MULT MULT

ADD

SHIFT

DataA DataB

ADD OR

SHIFT SHIFT

MULT

0

180

Cube-1(2012)

• Wireless links are used as packet switching network rather than static links

Geyser-Cube

CMA-Cube

Inductive Coupling

Ring based packet switching network is formed The number of accelerators can be

changed.

GeyserCUBE

CMACUBE CMACUBE

Implementing JPEG decoder • DMA transfer between accelerators.

Inverse Quantitation

YUV to RGB Convert

Inverse DCT

The flow of JPEG decoder Mapping on Cube-1

Header Analysis

Decode

Huffman decode

Inverse Quantization

Inverse DCT

YUV to RGB Convert

JPEG image

RGB image

:MCU

Image data

Huffman decoder Task control

評価結果 • Cube-1 Quad-CoreにおけるJPEGデコーダの実行 • 128x96pixelの画像をデコード

0

100000

200000

300000

400000

500000

600000

700000

800000

実行サイクル数

(cyc

le)

other

convert yuv to rgb

store intermediate

inverse dct

inverse quantization

huffman decode

collect result

dma trans

processing

image data trans

Cube-1 Quad-Core No

3.15倍の性能

Evaluation Results

•Using three accelerators.

•The target image block: 128 x 96 pixel

3.15 times speed up

Execu

tio

n c

ycle

s

Outline: Wireless 3D NoC

• 3D IC technologies

– Wired approach vs. wireless approach

– Inductive-coupling technology

• Design Examples

– MuCCRA-Cube

– Cube-1

• Simple wireless 3D NoC

– Ring-based 3D network

– Bubble flow control

• CoC (Castle of Chips)

– Large scale system by wireless links

TX

TX

TX

TX TX

TX

TX

TX Bonding wire

Bonding wire

Bonding wire

Bonding wire

Chip stacking method: Slide & stack

• Inductor has TX/RX/Idle modes (1-cycle switch)

Slide & stack

Inductor (TX)

Inductor (RX)

TX

Wireless 3D NoC

Arbitrary chips are stacked to form a single system

– Each chip has vertical links at pre-specified locations, but

we do not know the number and types of chips.

CPU chip from

CPU maker

Memory chip from

memory maker

GPU chip from

GPU maker

Required chips are stacked for given applications

An example (4 chips)

Ring is the simplest approach to add, remove, swap the nodes

Ring

networ

k

Ring network: Deadlock problems Ring is the simplest approach to add, remove, and swap the

chips in a package without any modifications. But…

• Structure deadlock

– Ring network inherently

includes a cycle

– Cyclic dependency causes

packet deadlocks

• Protocol deadlock

– Coherence protocol has

multiple message classes

– Request-reply deadlocks

Deadlock-free packet transfer is mandatory for NoCs

RX TX

Ring network: VC-based approach

• VC-based approach

– Two VCs for each

message class

– Packets transit these two

VCs at the dateline

• Merit

– Conventional VC router

• Demerit

– Number of VCs is

increased as number of

message classes

– 6 VCs for 3 classes

Dateline

2VCs for each message class

Cyclic dependency can be cut before and after the dateline by VC transition

RX TX

Ring network: Bubble flow approach

• Bubble flow approach

– Single buffer can store

more than 2 packets

– Buffer space of a single

packet is always reserved in

each router

• Merit

– No VC; Simple flow control

• Demerit

– Miss routing when packets

cannot exit the ring

– Scalability problem

Single VC that can buffer more than 2 packets

Deadlock does not occur since all buffers are never occupied by the flow control

[Puente,ICPP’99] [Abad,ISCA’07]

RX TX

Evaluations: Simulation environments • Two network sizes are simulated by GEMS/Simics

4 chips (4-CPU)

3

8 chips (8-CPU)

2

1

0 CPU L2$ banks

7

1

0

# of chips 4 / 8

# of CPUs 4 / 8

# of routers 8 / 16

# of L2$ banks 16 / 32

Packet sizes 1 or 5 flits

Table 1: Architectural parameters

OS Sun Solaris 9

Compiler Sun Studio 12

Application NAS Parallel Bench

(OpenMP ver)

Table 2: Software environments

BT, CG, DC, EP, FT, IS, LU, MG, SP, UA (Total 10) For more detail, refer the paper

Evaluations: Simulation environments

• Two network sizes are simulated by GEMS/Simics

• Three communication schemes are compared

Ring + VC flow Ring + Bubble flow Vertical bus

4 chips (4-CPU)

3

8 chips (8-CPU)

2

1

0 CPU L2$ banks

7

1

0

Dateline

2VC

Results: Network throughput @ 4 chips

RTL simulations of wireless 3D NoC model (8 routers)

Bubble outperforms 2VC(15-flit) & comparable to 2VC(30-flit)

Vertical bus Ring + VC flow

2VC (15-flit)

Ring + Bubble

Bubble (15-flit)

Bubble(15-flit)

2VC(15-flit)

Results: Network throughput @ 8 chips

RTL simulations of wireless 3D NoC model (16 routers)

Vertical bus Ring + VC flow

2VC (15-flit)

Ring + Bubble

Bubble (15-flit)

Bubble(15-flit)

2VC(15-flit)

Bubble outperforms 2VC(15-flit) & comparable to 2VC(30-flit)

Bubble(15-flit)

2VC(15-flit)

Results: Application performance @4chips

Execution times of NAS parallel bench (4 CPUs)

Ring + VC flow

6VC (30-flit)

Ring + Bubble

Bubble (15-flit)

Ring + VC flow

6VC (18-flit)

Bubble approach outperforms VC-based one by 12.5% @4 chips

Vertical bus

-12.5%

Outline: Wireless 3D NoC

• 3D IC technologies

– Wired approach vs. wireless approach

– Inductive-coupling technology

• Design Examples

– MuCCRA-Cube

– Cube-1

• Simple wireless 3D NoC

– Ring-based 3D network

– Bubble flow control

• CoC (Castle of Chips)

– Large scale system by wireless links

TX

TX

TX

TX TX

TX

TX

TX Bonding wire

Bonding wire

Bonding wire

Bonding wire

The limitation of stacking

Inductor (TX)

Inductor (RX)

TX

TX TX

TX TX

Castle of Chips (CoC)

• Chips with multiple wireless ports are used as bridges of stacking.

• The stacking can be extended to the horizontal direction.

→ A large number of chips can be connected only with wireless inductive coupling links.

• However, power supply requires bonding wires with the current art of technology

Transmitter

Receiver Bi-directional

a) Uni-directional Links b) Bi-directional Links

Examples of Wireless Coupling Links

Linear Stacking: The simplest CoC

Up link

Down link

Linear Stacking: The simplest CoC

Layer 0

Layer 1

Stacking using bi-directional links

The case of using bi-directional links

Layer 0

Layer 1

Layer 2

Layer 3

Circular Stacking

The central space is used for power supply bonding wires.

The network consisting of CoC

• Tightly coupled interconnection is

assumed between links in the same chip.

– Bus, Crossbar, Direct links, etc.

– Here, a chip with 4 links = a node with 4links

Up link

Down link

a) Chip Stacking

b) Corresponding Network

3、-1 4、0 5、1 6、2 7、3

6,4

0,0 1,1 2,2 3,3 4,4 5,5

1、-1

2、-2

Level 0

Level 1

Level 2

Level 3

Level 4

x

y

n*2+1

m*2+1

Linear

Stacking

m*2+1:Height

n*2+1:Width

0,0

10,0 5,-5

5,5

Interconnection

network formed

with the circular

stacking

Stairway Boundary Mesh(SBM)

0,0

0,0

5,5

6,4

1, -1

2, -2

5, 5

7, 3

0,0

3, -3

8, 2

5, 5

m=1, n=5 m=2, n=5 m=3, n=5

Extension of Dimension Order

Routing

0,0

3, -3

8, 2

5, 5

Go to X direction

On the boundary,

go around it.

When X is the

same as the destination,

go to Y direction 2,-1

6,2

0,0 1,1 2,2 3,3 4,4 5,5

1,-1

2,-2

3,-1 4,0 5,1 6,2 7,3

6,4

x

y

5,3

1,0

2,-1

Original DOR

Original DOR

DOR vs. Extended DOR

The hop counts are the same as that of the original DOR.

If n>m, Diameter=2n (Independent on m)

x

y X X X

X

Possible turn

Forbidden turn

Explanation of deadlock

avoidance by Turn model

The number of stacked chips and

diameter

m Height

of

stacking

n=3 n=4 n=5 n=6 n=7

2 5 18 23 28 33 38

3 7 25 32 39 46 53

4 9 32 41 50 59 68

Diam

eter

6 8 10 12 14

Average distance vs. the number of chips

Better than those of rectangular mesh

Circular Stacking is not so good because of the central space.

Summary

• Wireless 3D interconnect technique will spread the possibility of the system integration.

• Wireless power supply comes insight.

• Researches on CoC (Castle of Chips) just starts.

– There are a lot of possible structures especially on the extension of circular stacking.