CONFIDENTIAL Cell Technology & ‘Many-Core’ I’ve got more than you…
-
Upload
johnny-tesh -
Category
Documents
-
view
215 -
download
0
Transcript of CONFIDENTIAL Cell Technology & ‘Many-Core’ I’ve got more than you…
CONFIDENTIAL
Cell Technology & ‘Many-Core’
I’ve got more than you…
CONFIDENTIAL
Cell/B.E. Recap
What is the Cell Broadband Engine?Pioneered by Sony, Toshiba, IBM‘Many-Core’ processor1 PPE Processor8 SPE ‘SIMD Monsters’
PPE (PPU) is a scaler proc
SPEs (SPUs) are vector or array procs
CONFIDENTIAL
Cell/B.E. Recap
230GFLOPS by 9 processing elements
Synergistic Processor Element
Jointly developed by Sony, Toshiba, and IBM
FLEX I/O20GB/s
CONFIDENTIAL
Where is the Cell?
Look in your kids’ rooms
…or next to your HDTV!
PS3 Clusters are also in high demand
for their compelling price/performance. Folding@home, Black Hole Research, Ray-Tracing, Modelling, etc
CONFIDENTIAL
The Fastest Supercomputer
Los Alamos National LaboratoryRoadrunner
Hybrid Many-Core Architecture Cell and AMD
116,640 Cell cores
12,960 AMD cores
CONFIDENTIAL
Roadrunner –116,640 Cell Cores1st Supercomputer to Sustain 1 petaflop/s
“The Los Alamos system, nicknamed Roadrunner… fended off a challenge by the Cray XT5 supercomputer at Oak Ridge National Laboratory called Jaguar.
The system, only the second to break the petaflop/s barrier, posted a top performance of 1.059 petaflop/s in running the Linpack benchmark application. One petaflop/s represents one quadrillion floating point operations per second.“
petaFLOPS = 1015
“The Los Alamos system, nicknamed Roadrunner… fended off a challenge by the Cray XT5 supercomputer at Oak Ridge National Laboratory called Jaguar.
The system, only the second to break the petaflop/s barrier, posted a top performance of 1.059 petaflop/s in running the Linpack benchmark application. One petaflop/s represents one quadrillion floating point operations per second.“
petaFLOPS = 1015
CONFIDENTIAL
Sony Cell+RSX Appliance
RSX Graphics coprocessor for hardware OpenGL operations – scaling, CSC, etc
20GBps FlexIO interconnect between Cell & RSX Low power, 1RU, Cell+RSX on motherboard Example Application: Proprietary Mathematically
Lossless CODEC ideal for DPX WAN transfers
230 GigaFLOPS230 GigaFLOPS
CONFIDENTIAL
Ingest, Transcode, Processing, Etc
fast 3.7 TeraFLOPSflexible Cell + Intelsmall 3U Rack Mount
New Platform & Potential Apps8 SPUs
Up to 16 Cell Chips
IntelMotherboard
8x HD-SDI
CONFIDENTIAL
• Dual, Quad, or Octo-core on one die• QPI Interconnect (no more FSB)• Integrated Memory Controller• IOH replaces MCH functions• Triple Channel DDR3 Memory• Hyper Threading• Built-in Power Management• Expanded PCIe support (8GB/s on x16)
Core i7 - Nehalem & X58 (Tylersburg)
CONFIDENTIAL
Legacy w/ FSB New (Nehalem) w/ QPI
CoreCore CoreCoreCoreCore CoreCore
CoreCore CoreCoreCoreCore CoreCore
NorthbridgeNorthbridge RAMRAM
CoreCore CoreCoreCoreCore CoreCore
CoreCore CoreCoreCoreCore CoreCore
IOHIOH
RAMRAM RAMRAM
IOHIOH
CONFIDENTIAL
For single core application…
CoreCore CoreCoreCoreCore CoreCore
CoreCore CoreCoreCoreCore CoreCore
NorthbridgeNorthbridge RAMRAM
CoreCore CoreCoreCoreCore CoreCore
CoreCore CoreCoreCoreCore CoreCore
IOHIOH
RAMRAM RAMRAM
IOHIOH
RAM: - Far from cores (poor latency) - Goes thru NB (low bandwidth)
RAM: - Next to cores (better latency) - Directly to RAM (better bandwidth)
Legacy w/ FSB New (Nehalem) w/ QPI
CONFIDENTIAL
AMD
Almost same as AMD HT architecture.Wider bandwidthLower latency
This was the reason why many people selected AMD Opteron.
CONFIDENTIAL
CoreCore CoreCoreCoreCore CoreCore
CoreCore CoreCoreCoreCore CoreCore
NorthbridgeNorthbridge RAMRAM
CoreCore CoreCoreCoreCore CoreCore
CoreCore CoreCoreCoreCore CoreCore
IOHIOH
RAMRAM RAMRAM
IOHIOH
QPI is much wider than legacy FSB >24GB/sLatency inside CPU is much less than NB
Much Better SolutionLegacy w/ FSB New (Nehalem) w/ QPI
CONFIDENTIAL
HD-SDI RS422/9pin
VTR 1
Ingest 3
VTR 2
VTR 3
VTR 4
HD-SDI RS422/9pin
HD-SDI RS422/9pin
HD-SDI RS422/9pin
Ingest 2
Ingest 1
Ingest 4
Multi-Port Ingest
CONFIDENTIAL
Multi-Port Ingest
HD-SDI RS422/9pin
VTR 1
VTR 2
VTR 3
VTR 4
HD-SDI RS422/9pin
HD-SDI RS422/9pin
HD-SDI RS422/9pin
Multi-IngestEliminates separate
ingest devices
Future support planned for 2x ingest via single
3Gig HD-SDI connection
CONFIDENTIAL
Distributed Processing
Tasks
Many-CoreEngine #1
Many-CoreEngine #2
Many-CoreEngine #3
Many-CoreEngine #4
Many-Core Engine
Cell Cell Cell Cell Cell Cell Cell Cell
Cell Cell Cell Cell Cell Cell Cell Cell
CONFIDENTIAL
Hybrid ‘Many-Core’ ArchitectureA Compelling Future
• Best of both worlds - Cell+Intel• Latest Nehalem / X58 Design• Dramatic scalability and flexibility• Open to new technologies• Ultimate in throughput and performance• Attention to power and efficiency concerns