1 Digital Space Anant Agarwal MIT and Tilera Corporation.
-
date post
20-Dec-2015 -
Category
Documents
-
view
219 -
download
1
Transcript of 1 Digital Space Anant Agarwal MIT and Tilera Corporation.
![Page 1: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/1.jpg)
1
Digital Space
Anant Agarwal
MIT and Tilera Corporation
![Page 2: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/2.jpg)
2
Arecibo
![Page 3: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/3.jpg)
3
Stages of Reality
memmem
mem
mem
mem
1996
CPUMem
19972002
2007
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pms
pms
pms
pms
pms
pms
pms
pms
pm
pms
pm
pm
pm
pm
pm
pm
pm
pm
pm
pm
2014
100B transistors
2018
1B transistors
2007
![Page 4: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/4.jpg)
4
Virtual reality
Simulator realityPrototype reality
Product reality
Virtual reality
Simulator realityPrototype reality
Simulator reality
Product realityPrototype reality
Simulator reality
Virtual Reality
![Page 5: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/5.jpg)
5
The Opportunity
20MIPS cpuin 1987
1996…
Few thousand gates
![Page 6: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/6.jpg)
6
The Opportunity
The billion transistor chip of 2007
![Page 7: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/7.jpg)
7
How to Fritter Away Opportunity
the x1786? does not scale
100 ported RegFil and RR
Caches
Control
More resolution buffers, control
“1/10 ns”
![Page 8: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/8.jpg)
8
• Lots of ALUs, lots of registers, lots of local memories – huge on-chip parallelism – but with a slower clock
• Custom-routed, short wires optimized for specific applications
Fast, low power, area efficientBut not programmable
memmem
mem
mem
mem
Take Inspiration from ASICs
![Page 9: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/9.jpg)
9
Our Early Raw Proposal
CPU
Mem E.g., 100-way unrolled loop,running on 100 ALUs, 1000 regs,100 memory banks
But how to build programmable, yet custom, wires?
Got parallelism?
![Page 10: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/10.jpg)
10
A digital wire
Ctrl
Ctrl
Ctrl
CtrlCtrl
Software orchestrate it!
• Customize to application and maximize utilization
Multiplex it!
• Improve utilization
Pipeline it!
• Fast clock (10GHz in 2010)
Uh! What were we smoking!
A dynamic router!
Replace custom wires with routed on-chip networks
![Page 11: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/11.jpg)
11
Static Router
ApplicationCompiler
SwitchCode
SwitchCode
SwitchCode
SwitchCode
SwitchCode
A static router!
![Page 12: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/12.jpg)
12
Replace Wires with Routed Networks
Ctrl
![Page 13: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/13.jpg)
13
50-Ported Register File Distributed Registers
Gigantic 50
ported register
file
![Page 14: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/14.jpg)
14
Gigantic 50
ported register
file
50-Ported Register File Distributed Registers
![Page 15: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/15.jpg)
15
Distributed Registers + Routed Network
Distributed register file
R
Called NURA [ASPLOS 1998]
![Page 16: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/16.jpg)
16
16-Way ALU Clump Distributed ALUs
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALUBypass Net
RF
![Page 17: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/17.jpg)
17
Distributed ALUs, Routed Bypass Network
AL
UA
LU A
LU A
LU
AL
U
AL
UR
Scalar Operand Network (SON) [TPDS 2005]
![Page 18: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/18.jpg)
18
Mongo Cache Distributed Cache
Gigantic 10
ported cache
![Page 19: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/19.jpg)
19
Distributing the Cache
![Page 20: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/20.jpg)
20
Distributed Shared Cache
AL
UA
LU A
LU A
LU
AL
U
AL
U
Like DSM (distributed shared memory), cache is distributed; But, unlike NUCA, caches are local to processors, not far away
R
$
[ISCA 1999]
![Page 21: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/21.jpg)
21
Tiled Multicore Architecture
AL
UA
LU A
LU A
LU
AL
U
AL
U
R
$
![Page 22: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/22.jpg)
22
E.g., Operand Routing in 16-way Superscalar
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALUBypass Net
RF
>>
+
Source: [Taylor ISCA 2004]
![Page 23: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/23.jpg)
23
Operand Routing in a Tiled Architecture
>>
AL
UA
LU A
LU A
LU
AL
U
AL
U
>>
+R
$
![Page 24: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/24.jpg)
24
Tiled Multicore
• Scales to large numbers of cores• Modular – design, layout and verify 1 tile• Power efficient [MIT-CSAIL-TR-2008-066]
– Short wires CV2f– Chandrakasan effect CV2f
– Dynamic and compiler scheduled routing
ProcessorCore
= TileCore + Switch
S
![Page 25: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/25.jpg)
25
A Prototype Tiled Architecture: The Raw Microprocessor
The Raw ChipTile
Disk stream
Video1
DRAM
Packet stream
A Raw Tile
SMEM
SWITCHPC
DMEMIMEM
REGPC
FPU
ALU
Raw Switch
PC
SMEM[Billion transistor IEEE Computer Issue ’97]www.cag.csail.mit.edu/raw
Scalar operand network (SON): Capable of low latency transport of small (or large) packets
[IEEE TPDS 2005]
![Page 26: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/26.jpg)
26
Virtual reality
Simulator reality
Prototype realityProduct reality
![Page 27: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/27.jpg)
27
Scalar Operand Transport in Raw
fmul r24, r3, r4
softwarecontrolledcrossbar
softwarecontrolledcrossbar
fadd r5, r3, r24
route P->E, N->S route W->P, S->N
Goal: flow controlled, in order delivery of operands
![Page 28: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/28.jpg)
28
RawCC: Distributed ILP Compilation (DILP)
tmp0 = (seed*3+2)/2tmp1 = seed*v1+2tmp2 = seed*v2 + 2tmp3 = (seed*6+2)/3v2 = (tmp1 - tmp3)*5v1 = (tmp1 + tmp2)*3v0 = tmp0 - v1v3 = tmp3 - v2
pval5=seed.0*6.0
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp3=tmp3.6
v3.10=tmp3.6-v2.7
v3=v3.10
v2.4=v2
pval3=seed.o*v2.4
tmp2.5=pval3+2.0
tmp2=tmp2.5
pval6=tmp1.3-tmp2.5
v2.7=pval6*5.0
v2=v2.7
seed.0=seed
pval1=seed.0*3.0
pval0=pval1+2.0
tmp0.1=pval0/2.0
tmp0=tmp0.1
v1.2=v1
pval2=seed.0*v1.2
tmp1.3=pval2+2.0
tmp1=tmp1.3
pval7=tmp1.3+tmp2.5
v1.8=pval7*3.0
v1=v1.8
v0.9=tmp0.1-v1.8
v0=v0.9
pval5=seed.0*6.0
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp3=tmp3.6
v3.10=tmp3.6-v2.7
v3=v3.10
v2.4=v2
pval3=seed.o*v2.4
tmp2.5=pval3+2.0
tmp2=tmp2.5
pval6=tmp1.3-tmp2.5
v2.7=pval6*5.0
v2=v2.7
seed.0=seed
pval1=seed.0*3.0
pval0=pval1+2.0
tmp0.1=pval0/2.0
tmp0=tmp0.1
v1.2=v1
pval2=seed.0*v1.2
tmp1.3=pval2+2.0
tmp1=tmp1.3
pval7=tmp1.3+tmp2.5
v1.8=pval7*3.0
v1=v1.8v0.9=tmp0.1-v1.8
v0=v0.9
Black arrows = Operand Communication over SON
[ASPLOS 1998]
Partitioning
Place, Route, ScheduleC
![Page 29: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/29.jpg)
29
Virtual realitySimulator reality
Prototype reality
Product reality
![Page 30: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/30.jpg)
30
A Tiled Processor Architecture Prototype: the Raw Microprocessor
October 02
Michael TaylorWalter LeeJason MillerDavid WentzlaffIan BrattBen GreenwaldHenry HoffmannPaul JohnsonJason KimJames PsotaArvind SarafNathan ShnidmanVolker StrumpenMatt FrankRajeev BaruaElliot WaingoldJonathan BabbSri DevabhaktuniSaman AmarasingheAnant Agarwal
![Page 31: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/31.jpg)
31
Raw Die Photo
IBM .18 micron process, 16 tiles, 425MHz, 18 Watts (vpenta)
[ISCA 2004]
![Page 32: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/32.jpg)
32
Raw Motherboard
![Page 33: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/33.jpg)
33
Raw Ideas and Decisions: What Worked, What Did Not
• Build a complete prototype system• Simple processor with single issue cores• FPGA logic block in each tile• Distributed ILP and static network• Static network for streaming• Multiple types of computation – ILP, streams, TLP, server• PC in every tile
![Page 34: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/34.jpg)
34
Why Build?
• Compiler (Amarasinghe), OS and runtimes (ISI), apps (ISI, Lincoln Labs, Durand) folks will not work with you unless you are serious about building hardware
• Need motivaion to build software tools -- compilers, runtimes, debugging, visualization – many challenges here
• Run large data sets (simulation takes forever even with 100 servers!)• Many hard problems show up or are better understood after you begin building (how to
maintain ordering for distributed ILP, slack for streaming codes)• Have to solve hard problems – no magic!• The more radical the idea, the more important it is to build
– World will only trust end-to-end results since it is too hard to dive into details and understand all assumptions
– Would you believe this: “Prof. John Bull has demonstrated a simulation prototype of a 64-way issue out-of-order superscalar”
• Cycle simulator became cycle accurate simulator only after HW got precisely defined• Don’t bother to commercialize unless you have a working prototype• Total network power few percent for real apps [Aug 2003 ISLPED, Kim et al. Energy
characterization of a tiled architecture processor with on-chip networks] [MIT-CSAIL-TR-2008-066 Energy scalability of on-chip interconnection networks in multicore architecures ]
– Network power is few percent in Raw for real apps; however, it is 36% only for a highly contrived synthetic sequence meant to toggle every network wire
![Page 35: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/35.jpg)
35
Raw Ideas and Decisions: What Worked, What Did Not
• Build a complete prototype system• Simple processor, single issue• FPGA logic block in each tile• Distributed ILP• Static network for streaming• Multiple types of computation – ILP, streams, TLP, server• PC in every tile
Yes1GHz, 2-way, inorder in 2016
NoYes ‘02, No ‘06, Yes ‘14
YesYes
![Page 36: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/36.jpg)
36
softwarecontrolledcrossbar
softwarecontrolledcrossbar
route P->E, N->S route W->P, S->N
Raw Ideas and Decisions: Streaming – Interconnect Support
Forced synchronization in
static network
![Page 37: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/37.jpg)
37
sub r5, r3, r55
DynamicSwitch
add r55, r3, r4
DynamicSwitch
Catch a
ll
Streaming in Tilera’s Tile Processor
TA
G
• Streaming done over dynamic interconnect with stream demuxing (AsTrO SDS)
• Automatic demultiplexing of streams into registers• Number of streams is virtualized
![Page 38: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/38.jpg)
38
Virtual realitySimulator reality
Prototype reality
Product reality
![Page 39: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/39.jpg)
39
Why Do We Care?Markets Demanding More Performance
Wireless Networks- Demand for high thruput – more channels- Fast moving standards LTE, services
Networking market - Demand for high performance – 10Gbps- Demand for more services, intelligence
Digital Multimedia market - Demand for high performance – H.264 HD - Demand for more services – VoD, transcode
Cable & BroadcastCable & BroadcastVideo ConferencingVideo Conferencing
SwitchesSwitches
Security AppliancesSecurity Appliances
RoutersRouters
… and with power efficiency and programming ease
GGSNGGSN
Base StationBase Station
39
![Page 40: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/40.jpg)
40
Tilera’s TILEPro64™ Processor
Power per tile (depending on app) 170 – 300 mW
Core power for h.264 encode (64 tiles) 12W
Clock speed Up to 866 MHz
I/O bandwidth 40 Gbps
Main Memory bandwidth 200 Gbps
Multicore Performance (90nm)
Number of tiles 64
Cache-coherent distributed cache 5 MB
Operations @ 750MHz (32, 16, 8 bit) 144-192-384 BOPS
Bisection bandwidth 2 Terabits per second
Power Efficiency
I/O and Memory Bandwidth
ProgrammingANSI standard CSMP Linux programming
Stream programming
Product reality
[Tile64, Hotchips 2007][Tile64, Microprocessor Report Nov 2007]
![Page 41: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/41.jpg)
41
PCIe 1MACPHY
PCIe 1MACPHY
PCIe 0MACPHY
PCIe 0MACPHY
SerdesSerdes
SerdesSerdes
Flexible IOFlexible IO
GbE 0GbE 0
GbE 1GbE 1Flexible IOFlexible IO
UART, HPIJTAG, I2C,
SPI
UART, HPIJTAG, I2C,
SPI
DDR2 Memory Controller 3DDR2 Memory Controller 3
DDR2 Memory Controller 0DDR2 Memory Controller 0
DDR2 Memory Controller 2DDR2 Memory Controller 2
DDR2 Memory Controller 1DDR2 Memory Controller 1
XAUIMAC
PHY 0
XAUIMAC
PHY 0
SerdesSerdes
XAUIMAC
PHY 1
XAUIMAC
PHY 1
SerdesSerdes
Tile Processor Block DiagramA Complete System on a Chip
PROCESSOR
P2
Reg File
P1 P0
CACHE
L2 CACHE
L1I L1D
ITLB DTLB
2D DMA
STN
MDN TDN
UDN IDN
SWITCH
![Page 42: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/42.jpg)
42
Tile Processor NoC
• 5 independent non-blocking networks– 64 switches per network– 1 Terabit/sec per Tile
• Each network switch directly and independently connected to tiles
• One hop per clock on all networks
• I/O write example
• Memory write example
• Tile to Tile access example
• All accesses can be performed simultaneously on non-blocking networks
UDN
STN
IDN
MDN
Tiles
TDN
VDN[IEEE Micro Sep 2007]
![Page 43: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/43.jpg)
43
Multicore Hardwall ImplementationOr Protection and Interconnects
OS1/APP1
OS1/APP3
OS2/APP2
datavalidSwitch
datavalidSwitch
HARDWALL_ENABLE
![Page 44: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/44.jpg)
44
Product Reality Differences• Market forces
– Need crisper answer to “who cares”– SMP Linux programming with pthreads – fully cache coherent – C + API approach to streaming vs new language Streamit in Raw– Special instructions for video, networking– Floating point needed in research project, but not in product for embedded market
• Lessons from Raw– E.g., Dynamic network for streams– HW instruction cache– Protected interconnects
• More substantial engineering – 3-way VLIW CPU, subword arithmetic– Engineering for clock speed and power efficiency– Completeness – I/O interfaces on chip – complete system chip. Just add DRAM for system– Support for virtual memory, 2D DMA– Runs SMP Linux (can run multiple OSes simultaneously)
![Page 45: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/45.jpg)
45
Virtual reality
Simulator realityPrototype reality
Product reality
![Page 46: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/46.jpg)
46
What Does the Future Look Like?
Corollary of Moore’s law: Number of cores will double every 18 months
‘05 ‘08 ‘11 ‘14
64 256 1024 4096
‘02
16Research
Industry 16 64 256 10244
(Cores minimally big enough to run a self respecting OS!)
1K cores by 2014! Are we ready?
![Page 47: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/47.jpg)
47
Vision for the Future
• The ‘core’ is the logic gate of the 21st century
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
spm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
![Page 48: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/48.jpg)
48
Research Challenges for 1K Cores
• 4-16 cores not interesting. Industry is there. University must focus on “1K cores”; Everything will change!
• Can we use 4 cores to get 2X through DILP? Remember cores will be 1GHz and simple! What is the interconnect?
• How should we program 1K cores? Can interconnect help with programming?• Locality and reliability WILL matter for 1K cores. Spatial view of multicore? • Can we add architectural support for programming ease? E.g., suppose I told you
cores are free. Can you discover mechanisms to make programming easier?• What is the right grain size for a core?• How must our computational models change in the face of small memories per core?• How to “feed the beast”? I/O and external memory bandwidth• Can we assume perfect reliability any longer?
![Page 49: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/49.jpg)
49
ATAC Architecture
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
p
switch
m
Optical Broadcast WDM Interconnect
Electrical Mesh Interconnect (EMesh)
[Proc. BARC Jan 2007, MIT-CSAIL-TR-2009-018 ]
![Page 50: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/50.jpg)
50
Research Challenges for 1K Cores
• 4-16 cores not interesting. Industry is there. University must focus on “1K cores”; Everything will change!
• Can we use 4 cores to get 2X through DILP? What is the interconnect?• How should we program 1K cores? Can interconnect help with programming?• Locality and reliability WILL matter for 1K cores. Spatial view of multicore? • Can we add architectural support for programming ease? E.g., suppose I told you
cores are free. Can you discover mechanisms to make programming easier?• What is the right grain size for a core?• How must our computational models change in the face of small memories per core?• How to “feed the beast”? I/O and external memory bandwidth• Can we assume perfect reliability any longer?
![Page 51: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/51.jpg)
51
FOS – Factored Operating System
•Today: User app and OS kernel thrash each other in a core’s cache• User/OS time sharing is inefficient
•Angstrom: OS assumes abstracted space model. OS services bound to distinct cores, separate from user cores. OS service cores collaborate to achieve best resource management
• User/OS space sharing is efficient
The key idea: space sharing replaces time sharing
OS OS OS
OS OS OS
File System
FS
User
AppFS
OS cores collaborate, inspired by distributed internet services model
Need new page
I/O
FS
[OS Review 2008]
![Page 52: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/52.jpg)
52
Research Challenges for 1K Cores
• 4-16 cores not interesting. Industry is there. University must focus on “1K cores”; Everything will change!
• Can we use 4 cores to get 2X through DILP? What is the interconnect?• How should we program 1K cores? Can interconnect help with programming?• Locality and reliability WILL matter for 1K cores. Spatial view of multicore? • Can we add architectural support for programming ease? E.g., suppose I told you
cores are free. Can you discover mechanisms to make programming easier?• What is the right grain size for a core?• How must our computational models change in the face of small memories per core?• How to “feed the beast”? I/O and external memory bandwidth• Can we assume perfect reliability any longer?
![Page 53: 1 Digital Space Anant Agarwal MIT and Tilera Corporation.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4c5503460f94a29e14/html5/thumbnails/53.jpg)
53
The following are trademarks of Tilera Corporation: Tilera, the Tilera Logo, Tile Processor, TILE64, Embedding Multicore, Multicore Development Environment, Gentle Slope Programming, iLib, iMesh and Multicore Hardwall. All other trademarks and/or registered trademarks are the property of their respective owners.