ARM + DSP Supercomputer Modular HPEC Architectures - nCore … · 2017. 8. 14. · Y-Class AMC Node...
Transcript of ARM + DSP Supercomputer Modular HPEC Architectures - nCore … · 2017. 8. 14. · Y-Class AMC Node...
ARM + DSP Supercomputer Modular HPEC Architectures
nCore
System Overview
The nCore BrownDwarf Y-Class system unifies COTS technologies, high performance SoCs, advanced low latency interconnects, and optimized software to create a supercomputer delivering exceptional performance, reliability, power telemetry, reconfigurability, and programmability at significantly reduced power levels.
2More information can be found here: http://ncorehpc.com/browndwarf/
The modular architecture lends itself well to design, development and deployment of HPEC systems for military and aerospace applications, medical imaging, biomedical & genomic research, oil & gas exploration, physics simulations and power vs. performance research.
Y-Class AMC Block Diagram
3
• TI 66AK2H12 “Keystone 2” • 4 x ARM A15 @ 1.4Ghz • 24 x C66 DSP @ 1.2Ghz • 51.2GB/s Total Memory
Bandwidth • 26GB ECC Memory • 2TB/s Internal Bus • 100Gb/s Hyperlink • 20Gb/s SRIO Compute Fabric • 10Gb Ethernet System Fabric • 3 x 1Gb Ethernet OBM Fabric
ARM A15
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
C661M/32/32
8GB DDR31600
10GbE
1GbE20GbSRIO
12.8GB/s
4M L2
4M Shared Cache 6M Shared Cache
4M Shared Cache
8GB DDR31600
8GB DDR31600
12.8GB/s 12.8Gb/s
2GB DDR3
2TB/s
50Gb/sHyperlink 2TB/s
ARM A15 ARM A15 ARM A15
50Gb/sHyperlink
12.8GB/s
4 x 5Gbaud
10Gb/s
1Gb/s
MSMC MSMC
MSMC
Compute Fabric - MPI
System Fabric
OBM Fabric
BrownDwarf Y-Class System Cabinet
4
x 4 + x 12 =
Y-Class AMC Node
Carrier Blade
2TFLOPS SP / 2TOPS Integer 104GB ECC Memory
Y-Class AMC Node
Carrier Blade
Switch Blade
Applications can use a single AMC node or scale to hundreds of nodes while interfacing with any ATCA or uTCA component
Military/Aerospace/Data Acquisition Architecture
5
DiskDisk
DiskDisk
DiskDisk
DiskDisk
DiskDisk
Disk6TB
SSD
SAS
Ba
ck
pla
ne
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
Storage Blade
10GbE Switch
A/D
FPGA
A/D
FPGA
A/D
FPGA
A/D
FPGA
A/D
FPGA
A/D
FPGA
A/D
FPGA
A/D
FPGA
SRIO Switch
120Gbps SRIO
11 TFLOPS
624GB ECC
1.2TB/s MBW
380Gbps SRIO
576 DSP Cores
Sensors
Video RadarVideo Radar
sFPDP 1553
Etc.
SRIO - 80Gbs
10GbE
1GbE
DPI and Big Data Architecture
6
40GbE Packet Processor Blade
2 x Cavium Octeon II CN6880
SAS
SAS
SRIO - 80Gbs
10GbE
1GbE
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
Storage Blade
Storage Blade
40GbE Packet Processor Blade
SRIO Switch
120Gb SRIO
11 TFLOPS
624GB ECC
1.2TB/s MBW
480Gbps SRIO
10GbE span/inline/tap
n x 10GbE
40GbE Switch
DiskDisk
DiskDisk
DiskDisk
DiskDisk
DiskDisk
Disk6TB
SSDDiskDisk
DiskDisk
DiskDisk
DiskDisk
DiskDisk
Disk6TB
SSD
Ba
ck
pla
ne
H.264 Transcoding Architecture - 6 Slot Variant
7
H.264 BPEncoding
8 x C66 DSP CoresDecoding
8 x C66 DSP CoresEncoding
1 x BrownDwarf BladeDecoding
1 x BrownDwarf BladeEncoding
3 x BrownDwarf BladeDecoding
3 x BrownDwarf BladeCIF/30 48 104 576 1248 1728 3744D1/30 12 24 144 288 432 864720p30 4 8 48 96 144 288
720p60/or/1080p30 2 4 24 48 72 1441080p60 1 2 12 24 36 72
H.264 HPEncoding
8 x C66 DSP CoresDecoding
8 x C66 DSP CoresEncoding
1 x BrownDwarf BladeDecoding
1 x BrownDwarf BladeEncoding
3 x BrownDwarf BladeDecoding
3 x BrownDwarf BladeD1/30 4 8 48 96 144 288
720p60/or/1080p30 1 2 12 24 36 721080p60 0.5 1 6 12 18 36
H.265 (Standard Quality)Encoding
8 x C66 DSP CoresDecoding
8 x C66 DSP CoresEncoding
1 x BrownDwarf BladeDecoding
1 x BrownDwarf BladeEncoding
3 x BrownDwarf BladeDecoding
3 x BrownDwarf Blade720p30 1 0.25 12 48 36 1441080p30 2 0.5 6 24 18 721080p60 4 1 3 12 9 36
H.265 (High Quality)Encoding
8 x C66 DSP CoresDecoding
8 x C66 DSP CoresEncoding
1 x BrownDwarf BladeDecoding
1 x BrownDwarf BladeEncoding
3 x BrownDwarf BladeDecoding
3 x BrownDwarf Blade720p30 2 0.25 6 48 18 1441080p30 4 0.5 3 24 9 724kp30 16 2 0.75 6 2.25 18
SAS
Ba
ck
pla
ne
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
Storage Blade
SRIO Switch
120Gb SRIO
4.6 TFLOPS
312GB ECC
1.2TB/s MBW
240Gbps SRIO
n x 10GbE
10/40GbE Switch
SRIO - 80Gbs
10GbE
1GbE
DiskDisk
DiskDisk
DiskDisk
DiskDisk
DiskDisk
Disk6TB
SSD
H.265 Transcoding Architecture - 6 Slot Variant
8
H.265 (Standard Quality)Encoding
8 x C66 DSP CoresDecoding
8 x C66 DSP CoresEncoding
1 x BrownDwarf BladeDecoding
1 x BrownDwarf BladeEncoding
3 x BrownDwarf BladeDecoding
3 x BrownDwarf Blade720p30 1 0.25 12 48 36 1441080p30 2 0.5 6 24 18 721080p60 4 1 3 12 9 36
H.265 (High Quality)Encoding
8 x C66 DSP CoresDecoding
8 x C66 DSP CoresEncoding
1 x BrownDwarf BladeDecoding
1 x BrownDwarf BladeEncoding
3 x BrownDwarf BladeDecoding
3 x BrownDwarf Blade720p30 2 0.25 6 48 18 1441080p30 4 0.5 3 24 9 724kp30 16 2 0.75 6 2.25 18
SAS
Ba
ck
pla
ne
BrownDwarf Y-Class
BrownDwarf Y-Class
BrownDwarf Y-Class
Storage Blade
SRIO Switch
120Gb SRIO
4.6 TFLOPS
312GB ECC
1.2TB/s MBW
240Gbps SRIO
n x 10GbE
10/40GbE Switch
SRIO - 80Gbs
10GbE
1GbE
DiskDisk
DiskDisk
DiskDisk
DiskDisk
DiskDisk
Disk6TB
SSD
High Performance Computing Architecture
9
9.7kw / 110v
SRIO - 80Gbps
1GbE
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
SRIO/1GbE Switch
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
120Gbps
SRIO
120Gbps
SRIO
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
SRIO/1GbE SwitchSRIO/1GbE Switch
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
120Gbps
SRIO
120Gbps
SRIO
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
SRIO/1GbE SwitchSRIO/1GbE Switch
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
120Gbps
SRIO
120Gbps
SRIO
Storage Blade Storage Blade
SRIO Rack Switch
65 TFLOPS
3.3TB ECC
SRIO/1GbE Switch
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
BrownDwarf
Y-Class
Ba
ck
pla
ne
Ba
ck
pla
ne
Ba
ck
pla
ne
nCore Lithium Suite• nCore Lithium Suite is the fastest way to performance
and productivity on TI Keystone II and BrownDwarf
• Ubuntu ARM HPC centric server distribution enables access to 6.5k Linux Packages
• Native development environment on Keystone II for ARM & DSP using optimizing compilers
• Offload computations to C66x DSP cores using OpenMP 4.0 with accelerator model and OpenCL
• OpenMPI over SRIO, Optimized IPP replacement library for C66x, Advanced DMA library for C66x
• Performance Optimization Tool Layers (PAPI for A15 and C66x DSP), BLAS (ATLAS)
• Industry Leading Commercial Support
Currently Supported Platforms: - nCore BrownDwarf YCNODE - nCore BrownDwarf MBLADE - TI’s XTCIEVMK2X EVM - Others to follow
Li-HPC
10nCore is the worldwide leader in TI Keystone software technologies
ARM
Cortex A15
DSP
C66X DSP
L1 L2
C66X DSP
L1 L2
C66X DSP
L1 L2
C66x DSP
L1 L2
C66X DSP
L1 L2
C66X DSP
L1 L2
C66X DSP
L1 L2
C66x DSP
L1 L2
Ope
nMP
Acc
Mod
el/
Ope
nCL
Runt
ime
Linu
x Ope
nMP/
Ope
nMPI
Acce
lera
ted
Code
Appl
icatio
n
MC
SM +
Sha
red
Mem
ory
Tera
Net
Tera
Net
Cortex A15
Cortex A15
Cortex A15SRIO
20
Gbs
4 x 5Gbs Gen 2.1RapidIO
Code
Disp
atch
2 x
DDR3
72-B
it
2GB8GB
Num
erica
l Li
brar
ies