Raw Fabrics for PCA Status and Plans

30
Anant Agarwal Saman Amarasinghe Raw Fabrics for PCA Status and Plans

description

Raw Fabrics for PCA Status and Plans. Anant Agarwal Saman Amarasinghe. Agenda. 09:00 – 10:00Raw Fabrics Status and PlansAgarwal 10:00 – 10:30Streams and software systemsAmarasinghe 10:30 – 11:00Morphware updateThies 11:00 – 11:20Operating system updateStrumpen - PowerPoint PPT Presentation

Transcript of Raw Fabrics for PCA Status and Plans

Page 1: Raw Fabrics for PCA Status and Plans

Anant AgarwalSaman Amarasinghe

Raw Fabrics for PCAStatus and Plans

Page 2: Raw Fabrics for PCA Status and Plans

Agenda

• 09:00 – 10:00 Raw Fabrics Status and Plans Agarwal• 10:00 – 10:30 Streams and software systems

Amarasinghe• 10:30 – 11:00 Morphware update Thies• 11:00 – 11:20 Operating system update Strumpen• 11:20 – 11:50 Lab visit and demos All• 11:50 – 12:20 Applications Crago• 12:20 – 12:35 Stream Algorithms Hoffman • 12:35 – 12:50 x86 on Raw Wentzlaff• 12:50 – 1:30 Discussion All

• 12:00 Lunch

Page 3: Raw Fabrics for PCA Status and Plans

The Raw Chip

RawTile

Disk stream

Video1

SDRAM

Raw Architecture

Packet stream

SMEM

SWITCHPC

DMEM

IMEM

REGPC

FPUALU

A 16-tile 2-D fabric (1K tiles in 2010)Memory is distributedRISC-like core in each tile, with FPFast, programmable interconnect (r-r 3 cycles)~1100 off-chip data I/O

Taylor et al., IEEE Micro ‘02, ISSCC ‘03

Page 4: Raw Fabrics for PCA Status and Plans

Raw Handheld

Page 5: Raw Fabrics for PCA Status and Plans

Well… Raw Handheld

First program, 80MHz Jan 03 “Thorough testing” 300 MHz May 03

Page 6: Raw Fabrics for PCA Status and Plans

1024 Channel Audio Beam Forming

ADATOptical

AudioInterfaceFor RAW FPGA

RAW

Microphone Array

128 190

FPGA190

AudioA-to-D

16

2

AudioA-to-D

16

AudioA-to-D

16

.

.

.

12…

64...

One PCA chip beats current 640 channel custom hardware beamformer!

First proposed at review in Nov 02

Page 7: Raw Fabrics for PCA Status and Plans

FPGA RAW FPGA

384 Mbits/sec

A-to-D CPLD

768 Kbits/sec

16 KHz24 bits

16

12 Mbits/sec

32

1024 microphonesA 1024-Node Acoustic Beamformer

Page 8: Raw Fabrics for PCA Status and Plans

2-Microphone Card

Page 9: Raw Fabrics for PCA Status and Plans

32-Microphone Column

Page 10: Raw Fabrics for PCA Status and Plans

Raw Chip Specifications• IBM SA27E Process

– 0.15, 6-metal copper ASIC process

• 16 Tile RAW Processor– 18.17mm x 18.17mm– 1657 pin CCGA package– 1152 signal pins

• Clock and Power– 420MHz (actual)– 10 watts (power save turned on)– 18 watts typical– 35 watts if everything is used!

Page 11: Raw Fabrics for PCA Status and Plans

“PowerPoint” Performance

•Raw Chip (@420MHz)–~7 GOPS/GFLOPS (SP)–~100 GBytes/s of on-chip memory bandwidth–~90 GBytes/s of on-chip “bisection bandwidth” –~40 GBytes/s I/O bandwidth

No bugs so far!

Page 12: Raw Fabrics for PCA Status and Plans

Progress on the Raw Chip

• Complete Spec Feb ‘00• IBM Initial Design Review Mar ‘00• Feature complete Netlist May ‘00• Arch. Timing optimization Feb ‘01• Floorplanning Mar ‘01• Prelim Placement/Timing opt Jun ‘01• Raw H21 system board (ISI) Jun ‘01• Raw in Emulation Jun ‘01• Detailed Placement/Timing opt Dec ‘01• Release to IBM for initial layout Dec ‘01• Timing closure after layout Mar ‘02• All backend checks pass May ‘02• Release to IBM for production layout May ‘02• Final function and timing validation Jul ‘02• Final manuf. release to IBM Aug ‘02• Chip prototypes back Oct ’02

Page 13: Raw Fabrics for PCA Status and Plans

PCA Phase 2 Effort

The Raw Processor

Rawcc Compiler

Stream Compiler

Resource Management: The Raw OS

Applications and Evaluation

- -

Embedded systems e.g., network router

libStream

Raw Fabric Testbed

Page 14: Raw Fabrics for PCA Status and Plans

PCA Raw Fabrics, Systems, Apps

• Raw Chips Oct 02• Handheld (H) board arrives from ISI Dec 02• H Board bringup – Small program 80 MHz Jan

03• H Board testing, speed gasket 300 Mhz May 03• USB Interface, 500 Mbits/s xface July 03• H Board refab (in fab now), to partners Sep 03• Fabric-Array and Fabric-IO board design Jun 03• Fabric-Array and Fabric-IO board fab Sep 03 • 16 and 64-chip PCA fabric bringup• Applications and experiments• PCA demonstrations

– Embedded networking board– Audio beamformer system– 802.11b,g,a wireless system – Graphics system– Virtual x86

Page 15: Raw Fabrics for PCA Status and Plans

Partner Support Activity• Handheld boards Sep 03• USB xface • PCI xface • “Raw User Day” videos and documentation• Expansion interface testing and documentation

(used in beamformer) • Software distribution

– Simulator (useful to debug small assembly programs)– C compiler– rGDB debugger– Streamit language and compiler– Lots of other goodies

• 1024-tile (64 chip) fabric simulator (since Dec 02)• 16, 64 node Fabrics

Page 16: Raw Fabrics for PCA Status and Plans

64-Node Raw Fabric

DRAMNetwork I/O DRAM

Network I/ONetwork I/O DRAM

DR

AM

Net

wor

k I

/O

DR

AM

Net

wor

k I/O

Net

wor

k I

/O D

RA

M

Page 17: Raw Fabrics for PCA Status and Plans

Fabric System Architecture

• Design: two distinct board designs; HOW???

• replicate and connect

• Board 1: Quad Raw Board

• Board 2: I/O & Memory Board

Page 18: Raw Fabrics for PCA Status and Plans

The Challenge• How do we use the same board designs for every

position in the fabric? Fabric board is easy enough.

Page 19: Raw Fabrics for PCA Status and Plans

The Challenge• How do we use the same board designs for every

position in the fabric? E.g., I/O board

Page 20: Raw Fabrics for PCA Status and Plans

The Saman Flip• How do we use the same board designs

for every position in the fabric?– IO Board

• symmetric about x-axis• compensate for board flip in firmware

Page 21: Raw Fabrics for PCA Status and Plans

Quad Board

•4 RAW chips per board

•16 152-pin MICTOR connectors total (4 per side)

•Power distributed over separate cables from other signals

•MICTOR connectors are stacked to save space

Page 22: Raw Fabrics for PCA Status and Plans

Quad Board Layout

11”

11”

Page 23: Raw Fabrics for PCA Status and Plans

I/O & Memory Board

•4 FPGAs

•2 64-bit PCI slots

•2 Expansion Ports (same as on Raw Handheld board)

•4 SDRAM banks

•symmetric design 11”

Page 24: Raw Fabrics for PCA Status and Plans

IO/Memory Board schematic

AD[63:0]PCI CONNECTORS

1

AMP

INFORMATION SCIENCES INTSTITUTE

5

4

3

2

1

5

4

3

2

1

A B

A B

SCHEMATIC NAME:

FAX (703) 812-3712

TEL (703) 243-9423

ARLINGTON, VA 22203

SHEET SIZE E

Copyright, UNIVERSITY OF SOUTHERN CALIFORNIA

PROJECT NAME:

3811 NORTH FAIRFAX DRIVE

SUITE #200

SHEET OF 30

MPD[15:0]

MEM_A[20:0]

MEM_D[15:0]

MPA[6:0]

CONFIGURATION CONTROLLER

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

SDRAM PCI 1

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

SDRAM EXP 0

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

SDRAM PCI 0

AD[63:0]PCI CONNECTORS

IO0_[189:0]

IO1_[189:0]

EXPANSION CONNECTORS

IO0_[189:0]

IO1_[189:0]

EXPANSION CONNECTORS

UTILITY

CLOCKS

POWER

MPA[6:0]

MPD[15:0]

MEM_A[20:0]

MEM_D[15:0]

CONFIGURATION MEMORY

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

SDRAM EXP 1

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

PIE_[36:0]

POE_[36:0]

POD_[36:0]

PID_[36:0]

AD[63:0] FPGA PCI 1

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

PIE_[36:0]

POE_[36:0]

POD_[36:0]

PID_[36:0]

IO0_[189:0]

IO1_[189:0]

FPGA EXP 1

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

PIE_[36:0]

POE_[36:0]

POD_[36:0]

PID_[36:0]

IO0_[189:0]

IO1_[189:0]

FPGA EXP 0

M1DQ[63:0] M1CB[15:0]M1DQMB[7:0]M1S_N[3:0] M1A[13:0] M0DQ[63:0] M0CB[15:0]M0DQMB[7:0]M0S_N[3:0] M0A[13:0]

PIE_[36:0]

POE_[36:0]

POD_[36:0]

PID_[36:0]

AD[63:0] FPGA PCI 0

PO6_[36:0]

PI7_[36:0]

PI5_[36:0]

PO5_[36:0]

PO4_[36:0]

PI3_[36:0]

PO3_[36:0]

PO2_[36:0]

PO1_[36:0]

PI2_[36:0]

PI1_[36:0]

PI0_[36:0]

PO0_[36:0]

PI4_[36:0]

PO7_[36:0]

PI6_[36:0]

Conn

ecto

rs

QUAD RAW IO BOARD

REV#=1.0RAW

QUAD_RAW_IO

7-10-2003_12:55

MPA[6:0]

MPD[15:0]

MEM_A[20:0]

EXP1_IO0_[189:0]

PCI1_AD[63:0]

PCI0

_M0D

Q[63

:0]

PCI0

_M0A

[13:

0]

PCI0

_M0S

_N[3

:0]

PCI0

_M1D

QB[7

:0]

PCI0

_M1S

_N[3

:0]

EXP0

_M1S

_N[3

:0]

EXP0

_M1A

[13:

0]

EXP0

_M1D

QB[7

:0]

EXP0

_M1D

Q[63

:0]

EXP0

_M1C

B[15

:0]

EXP0

_M0S

_N[3

:0]

EXP0

_M0A

[13:

0]

EXP0

_M0D

QB[7

:0]

EXP1

_M1S

_N[3

:0]

EXP1

_M1D

QB[7

:0]

PCI1

_M1S

_N[3

:0]

PCI1

_M0D

Q[63

:0]

PCI1

_M0C

B[15

:0]

PO7_[36:0]

PI7_[36:0]

PO6_[36:0]

PI6_[36:0]

PO5_[36:0]

PI5_[36:0]

PO4_[36:0]

PI4_[36:0]

PO3_[36:0]

PI3_[36:0]

PO2_[36:0]

PI2_[36:0]

PO1_[36:0]

PI1_[36:0]

PO0_[36:0]

PI0_[36:0]

PCI1

_M0D

QB[7

:0]

PCI1

_M0A

[13:

0]

PCI1

_M0S

_N[3

:0]

PCI1

_M1D

Q[63

:0]

PCI1

_M1C

B[15

:0]

PCI1

_M1D

QB[7

:0]

PCI1

_M1A

[13:

0]

EXP1

_M0D

Q[63

:0]

EXP1

_M0C

B[15

:0]

EXP1

_M0D

QB[7

:0]

EXP1

_M0A

[13:

0]

EXP1

_M0S

_N[3

:0]

EXP1

_M1D

Q[63

:0]

EXP1

_M1C

B[15

:0]

EXP1

_M1A

[13:

0]

EXP0

_M0D

Q[63

:0]

EXP0

_M0C

B[15

:0]

PCI0

_M0C

B[15

:0]

PCI0

_M0D

QB[7

:0]

PCI0

_M1D

Q[63

:0]

PCI0

_M1C

B[15

:0]

Page 25: Raw Fabrics for PCA Status and Plans

Power Distribution

• 48V distributed to all boards, then down-converted

• DC-DC converters on each board– 1.8V Raw core– 1.5V Raw I/O– 3V other logic– 1.5V is also further down converted to 0.75V

supply for HSTL termination• System-wide power supply can be up to 3kW

At 1.8V, 64 Raw chips can draw 1280 amps!!!!!!!!!!!

Page 26: Raw Fabrics for PCA Status and Plans

Power Distribution

• Distributed over special connectors, separately from signals

• external power supply feeds top and bottom rows of I/O Boards

power supply

Page 27: Raw Fabrics for PCA Status and Plans

Clock Distribution

• signal generated and distributed from a center board over MICTOR connectors

• uses DLLs to deskew the clock at each connection

• every quad board sends and receives a copy of the clock to its neighbors and we can select which of the input clocks to use using dip switches

clock generator

Page 28: Raw Fabrics for PCA Status and Plans

Clock Distributionfrom external input

DLL

• Synchronized clocks for all Raw chips in fabric • Delay-Locked Loop uses feedback to tune delay line for clock

synchronization• Dip switches keep clock dist. general no custom firmware

Page 29: Raw Fabrics for PCA Status and Plans

Reset Distribution

• signal generated by one of the I/O boards and distributed over MICTOR connectors

reset originates

here

Page 30: Raw Fabrics for PCA Status and Plans

PCA Raw Fabrics, Systems, Apps

• Raw Chips Oct 02• Handheld (H) board arrives from ISI Dec 02• H Board bringup – Small program 80 MHz Jan

03• H Board testing, speed gasket 300 Mhz May 03• USB Interface, 500 Mbits/s xface July 03• H Board refab (in fab now), to partners Sep 03• Fabric-Array and Fabric-IO board design Jun 03• Fabric-Array and Fabric-IO board fab Sep 03 • 16 and 64-chip PCA fabric bringup• Applications and experiments• PCA demonstrations

– Embedded networking board– Audio beamformer system– 802.11b,g,a wireless system – Graphics system– Virtual x86