Aman Occ Final
-
Upload
aman-chitransh -
Category
Documents
-
view
217 -
download
0
Transcript of Aman Occ Final
-
8/8/2019 Aman Occ Final
1/19
ON-CHIP-OPTICAL-
COMMUNICATION
Presented byAman Chitransh
-
8/8/2019 Aman Occ Final
2/19
2
Moores Gap
1 9 9 8tim e
2 0 0 2
.0 01
1 9 9 2 2 0 0 6
.01
1
1 0
1 00
1 0 00
2 0 1 0
Tran
sisto
rs
Diminishing returns fromsingle CPU mechanisms
( , , .)pipelining caching etc Wire delaysPower envelopes
Pipelining
Superscalar
SMT, FGMT, CGMT
OOO
The
GOPS
Gap
Multicore
Tiled Multicore
Pe rfo rm a n ce
( )GOPS
-
8/8/2019 Aman Occ Final
3/19
-
8/8/2019 Aman Occ Final
4/19
4
The Future of Multicore
Number of cores doublesevery 18 months Parallelism replaces
clock frequencyscaling and core
complexity
ResultingChallenges
ScalabilityProgrammingPower
MIT RAW Sun Ultrasparc T2 IB M XC ell8 i TileraTILE64
-
8/8/2019 Aman Occ Final
5/19
5
Multicore Challenges
Scalability How do we turn additional cores into additional performance?
Must accelerate single apps, not just run more apps in parallel Efficient core-to-core communication is crucial
Architectures that grow easily with each new technologygeneration
Programming Traditional parallel programming techniques are hard Parallel machines were rare and used only by rocket scientists Multicores are ubiquitous and must be programmable by
anyone
Power Already a first-order design constraint More cores and more communication more power Previous tricks (e.g. lower Vdd) are running out of steam
-
8/8/2019 Aman Occ Final
6/19
6
Multicore Communication Today
Single shared resource
Uniform communication cost
Communication throughmemory
Doesn t scale to many coresdue to contention and
long wires Scalable up to about 8
cores
BU S
p p
c c
2 Cache
DRAM
-us basedInterconnect
-
8/8/2019 Aman Occ Final
7/19
-
8/8/2019 Aman Occ Final
8/19
-
8/8/2019 Aman Occ Final
9/199
Optical Broadcast Network
Waveguide passesthrough everycore
Multiple
wavelengths(WDM) eliminatescontention
Signal reaches allcores in
-
8/8/2019 Aman Occ Final
10/1910
Optical Broadcast Network -Electronic photonic
integration usingstandard CMOSprocess
Cores communicatevia optical WDM
broadcast and
select network Each core sends on
its own dedicatedwavelength using
modulators
Cores can receivefrom some set of
senders usingoptical filters
N cores
-
8/8/2019 Aman Occ Final
11/1911
Optical bit transmission
sending core
receiving core
-flip flop -flip flop
fil
ter
photodetector
modulator
modulator
driver
data waveguide
transimpedanceamplifier
-multi wavelength source waveguide
Each core sends data using a different wavelength nocontention
,Data is sent once any or all cores can receive it efficientbroadcast
-
8/8/2019 Aman Occ Final
12/19
ATAC Bandwidth
64 cores, 32 lines, 1 Gb/s
Transmit BW: 64 cores x 1 Gb/s x 32 lines = 2 Tb/s
Receive-Weighted BW: 2 Tb/s * 63 receivers= 126Tb/s
Good metric for broadcast networks reflects WDM
ATAC allows better utilization of computational
resources because less time is spent performingcommunication
-
8/8/2019 Aman Occ Final
13/1913
System Capabilities and Performance
:Baseline Raw Multicore Chip -Leading edge tiled multicore- ( )64 core system 65nm process
:Peak performance 64 GOPS :Chip power 24 W
.: .Theoretical power eff 2 7/GOPS W :Effective performance .3 GOPS :Effective power eff .3/OPS W :Total system power 150 W
ATAC Multicore ChipFuture optical interconnect
multicore
- ( )64 core system 65nm process
:Peak performance 64 GOPS
: .Chip power 25 5 W .: .Theoretical power eff 2 5
/GOPS W :Effective performance .8 0GOPS .:Effective power eff .5/OPS W :Total system power 153 W
ptical communications require a smallmount of additional system power but allow
or much better utilization of.omputational resources
-
8/8/2019 Aman Occ Final
14/1914
Programming ATAC
Cores can directly communicate with anyother corein one hop (
-
8/8/2019 Aman Occ Final
15/1915
Communication-centric Computing
Operation Energy Latency
Networktransfer
3pJ 3 cycles
ALU addoperation
2pJ 1 cycle
32KB cacheread 50pJ 1 cycle
-Off chipmemory read
500pJ 250cycles
BUS
p p
c c
L2 Cache
- ,ATAC reduces off chip memory calls and hence energy and latency
-View of extended global memory can be enabled cheaply with onchip distributed cache memory and ATAC network
ATAC
memory
-Bus BasedMulticore
3pJ
3pJ
3pJ
3pJ500pJ
500pJ500pJ
500pJ
-
8/8/2019 Aman Occ Final
16/1916
ATAC is an Efficient Network
Modulators are Primary Source of Power Consumption : ~ / -Receive Power Require only 2 fJ bit even with 5dB link loss :Modulator Power
- ~ / ( /Ge Si EA design 75 fJ bit assume 50 fJ bit for modulator)driver
: -Example 64 Core Communication
( . . = = ; : / / )i e N 64 cores 64 s for 32 bit word 2048 drops core and 32 adds core
: / /Receive Power 2 fJ bit x 1Gbit s x 32 bits x N2 = 262 W : / / =Modulator Power 75 fJ bit x 1Gbit s x 32 bits x N 153 W
/ = / + / ( - ) = /Total energy bit 75 fJ bit 2 fJ bit x N 1 201 fJ bit
:Comparison Electrical Broadcast Across 64 Cores
/ = / (Require 64 x 150fJ bit 10 pJ bit ~50X more power)( / / , - )Assumes 150fJ mm bit 1 mm spaced tiles
-
8/8/2019 Aman Occ Final
17/1917
Summary
ATAC uses optical networks to enable multicoreprogramming and performance scaling
ATAC encourages communication-centricarchitecture, which helps multicore performance and
power scalability
ATAC simplifies programming with a contention-freeall-to-all broadcast network
ATAC is enabled by recent advances in CMOSintegration of optical components
-
8/8/2019 Aman Occ Final
18/19
18
What Does the Future Look Like?
:Corollary of Moore s law Number of coreswill double every 18 months
05 08 11 14
64 256 1024 409602
16esearchIndustry 16 64 256 1024
(Cores minimally big enough to run a self respecting
!K c o r e s b y 2 0 1 4 A r e w er e a d y ?
-
8/8/2019 Aman Occ Final
19/19
19
Scaling to 1000 Cores
Purely optical design scales to about 64 coresAfter that, clusters of cores share optical hubs
ENet and BNet move data to/from optical hub Dedicated, special-purpose electrical networks
Proc
Dir $
$
memory
memory
-64 Optically Connected ClustersElectrical Networks
Connect 16 Cores toOptical Hub
ONet
BNet
ENet
HUB
NET