AltiumLive 2017: PCBs for Computing Density From Big Bang ... 2/PCBs for... · AltiumLive 2017:...
Transcript of AltiumLive 2017: PCBs for Computing Density From Big Bang ... 2/PCBs for... · AltiumLive 2017:...
AltiumLive 2017:PCBs for Computing Density From Big Bang to the Automobile
Andreas DoeringIBM Research – Zurich Laboratory
1
Motivation for Microservers
1
Insights
Outlook
The DOME project
Boards
2
3
4
5
Agenda
2
* IDC HPC technology excellence award, ISC17
3
DOMEppp Astron, IBM, Dutch
gvt
Ronald P. Luijten / July 2017•4
SKA (Square Kilometer Array) to measure Big Bang
Picture source: NZZ march 2014
0 10-32s 10-6s 0.01s 3min 380’000 years 13.8 Billion years
Big Bang Inflatio
n
Protonscreated
Start of nucleosynthesi
s through fusion
End of nucleo-
synthesisModern
Universe
•5
SKA: What is it?
Top 500: Sum=123 PFlops. 2GFlops/watt. 100x Flops of Sum! ~ 7GWh
~3000 Dishes3GHz-10GHz.
~0.5M Antennae.5GHz-1.7GHz.
~0.5M Antennae.07GHz-0.45GHz.
1. 109 samples/second * .5M antennae: .5 1015 samples/sec.
2. 3.5 109 samples/second * .5M antennae: 1.7 1015 samples/sec.
3. 2 1010 samples/second * 3K antennae: 6.1013 samples/sec
Sum = 2 1015 samples/second @ 86400 seconds/day:
170 1018 (Exa) samples/day. Assume 10-12x reduction @antenna:
14 Exabytes/day (minimum).
•6
© 2016 IBM Corporation
~ 10 Pb/s
86’400 sec/day
14 ExaByte/day
?
~ 1 PB/Day.
330 disks/day
120’000 disks/yr?
Top-500 Supercomputing(11/2013)…. 0.3Watt/Gflop/sToday’s industry focus is 1 Eflop @ 20MW. (2018)( 0.02 Gflop/s)
Most recent data from SKA:CSP….max. power 7.5MWSDP….max. power 1 MWLatest need for SKA – 4 Exaflop (SKA1 - Mid) 1.2GW…80MW
Too easy (for us)
Too hard
Moore’s lawFactor 80-1200
SDPCSP
multiple breakthroughs needed•7
Dome Project:
System Analysis
Data & Streaming
Sustainable (Green)
ComputingNanophotonics
Computing Transport Storage
Algorithms & Machines
- Nanophotonics- Real-Time Communications
- New Algorithms
- Microservers- Accelerators
- Access Patterns
Research Streams…
…are mapped to research projects:
…plus an open user platform:User platform
- Student projects
- Events- Research Collaboration
33M€ 5-year Research Project: 76 IBM PY (32 in NL); 50 ASTRON PY •8
Definitions
9
• “Microserver” = The server class of the mobile era
• “Microserver” = SoC + DRAM + Flash + Power
• “Microserver” = Backplane + not-enclosed modules
Motivation
10
• Silicon scaling limits, Energy for computation vs. on-chip-
communication vs. off-chip communication
• Use of large SMP-servers by partitioning, docker, etc.: Cache
Coherency not fully used
• Emergence of powerful embedded processor cores, in particular
ARM
• Premise given through Aquasar cooling work
enabled DOME funding
Table of PCBs
11
A= Altium Designer, C = Cadence
Module Name Iterations Length [mm]
Width [mm]
Thickness [mm] Layers Holes Components Nets Backdrilling Material Tool
P5020/P5040 processor 3 139.7 55.5 1.28 10 3242 1007 539 no ISOLA-400 A
Big Baseboard 1 220 160 1.28 10 491 175 154 no ISOLA-400 APower Converter 2 139 56.5 1.63 8 737 440 231 no FR-4 AmSATA on DIMM 2 139.7 55.5 1.24 4 341 69 67 no FR-4 A8p1 backplane 2 300 200 2.7 18 3582 565 1326 no FR-4+ ATestboard for switch power converter 1 160 220 1.6 8 888 259 134 no FR-4 ASwitch Mothercard >1 139.7 57.8 3.6 28 3311 837 730 yes ASwitch Daughtercard 1 139.7 57.8 1.8 10 423 213 160 no FR-4 AMini baseboard 2 160 100 1.2 6 851 376 241 no FR-4 ABracket for DIMM connector on Minibaseboard 2 154 32 1.2 6 116 2 98 no FR-4 ABracket for SPD08 connector on Minibaseboard 1 CT4240 processor 3+1 139.7 63 1.6 16 1316 820 Panasonic CmSATA on SPD08 3 139.7 62.5 1.6 6 1014 79 105 no FR-4 AM2 carrier 2 139.7 61.6 1.6 6 1091 130 131 AAuxiliary power converter 1 61 56 1.6 4 478 74 30 no FR-4 APCIe Extender no FR-4 ALS2088 Processor module 1? 139.7 62.5 1.6 14 1037 714 no Panasonic R1577/1570 CUSB HUB Module 2 139.7 61.5 1.57 8 1162 557 387 no FR-4 ABB2 backplane 2 520 200 3.15 22 12598 1076 3820 7 Runs Panasonic Megtron 6 A
Interposer card 1 139.7 80 1.57 8 897 76 132 4 Runs FR-4, Panasonic Megtron 6N A
FMKU2595 FPGA 2 139.7 63 1.57 14 7442 881 914 no Panasonic Megtron 6 A
System Overview
12
8/32/128 compute nodes
10G Ethernet Switch
storage node
Power converter
P5020/P5040 2/4 cores [email protected], 16GByte DDR3, 2xXAUI,4x1GbE, 2xSATAv1
T4240 24 cores [email protected], 24GByte DDR3, 4x10GbE, 2x1Gb, 2x SATAv2, PCIe-2.0 x8
LS2088 8xARMv8@2GHz, 32GByte DDR4, 6x10GbE, PCIe, 2xSATA
FMKU2595 FPGA 330KLUTs, 4x10GbE, 4xGbE,2xSATA
8 x mSATAor2xM2
8x40GbEthernet
DIMM socket with removed latches for generation 1
3M’s SPD08 in various lengths For generation 2
Xtreme Poweredge for power converter (both)
3 segments ofMolex Impact210 contacts(70 diff pairs)
Backplane connectors
13
System today
14
Backplane for • 32 compute nodes,
• 8 populated
• 1 Switch node,
• 1 Management node
• 2 Storage nodes
• Water cooled
View from above
15
Server nodes
Power node
Storage node
10 GbE Switch
QSFP cages
Water In/Out
Cooling Rails
System Q4 2017
16
Two backplanes,
total 64 compute
Nodes,
e.g.
1536 cores,
1536 GB DRAM
64 SSDs
Gallery of (some) Boards
17
Power Converter
18
• Master thesis project:
• Student did high-level design (e.g. selection of backplane connector),component selection, and schematic entry. Layout was completed by regular engineer: First version worked,
• 1 iteration to improve stability, protection
Challenges: High current on top/bottom and SMD packages, location of connectors, and tight IC/L/C-converter triangle, conflict ofhigh profile Ls and hot ICs that must be covered by cool plate
40A per contact finger, allowing different type of C/L
19
Switch Module
20
Left:Main SwitchPCB130mm x 55mm
Right:Switch with
mounted daughtercard
Pin Assignment
21
• Pin Assignment has to suit back plane and switch module design
• Both are challenging (Back plane has more space, but many more wires)
• Reduce crossing on both boards
• XAUI has low requirements on length balancing
• 1st Iteration:
• Let the CAD tool choose the pinout on both boards independently
• Find out the critical spots
• Use python script to build systematic pinout that circumvents these
PCB Layer Stack
22
6 inner signal layers, impedance controlledwith shieldingground layers in-between
4 high-currentpower supply lanes
Total PCB thickness
3.6mm
Length of connector
pins 1.2mm
Original Assumption,
that board space
across “through-hole”
connector cannot be
used, was wrong.
Need backdrilling
Press-Fit Connector on this side
ASIC on this side
PCB routing
23
This narrow strip (1cm wide) is one critical part.Routing between connector pins with 1 signal pair
FPGA Node
24
PCI- and/or Network-Attached2 Channels DDR4 (e.g. 16GByte)
Xilinx® Kintex® UltraScale
6 x 10 GBE, PCIe3 x8, 2 x SATA3
Status: In bringup
FPGA Node – Layout Concept
25
Flyby control signals
on 3 Layers,
P2P data signals mainly
on 1 layer
HighSpeed IO on
2 inner layers
Cooling
26
Combination of passive cooling on decapped chip, using vapor chambers and hot-water
Insights
27
• Main source of error: transfer from data sheet into tool
• Second source of error: Harness interface (swapping P/N on diff pairs, clock/data on I2C)
• Third source of error: voltage levels of pins (e.g. enable of power converter)
• Why is there no electronic transfer of component data to designers?
Exception: TI (e.g. https://webench.ti.com/cad/)
Why is there no standard format? There was an initiative XMLEDA, etc.
• DRC could do more, if symbols provided the information (e.g. P/N property, clock, etc.)
• Conversion from one tool to another is a кошмар
Hired Elgris and still 5 working days turned into 2 months
Acknowledgements
28
This work is the results of many people• Ronald Luijten (Lead Architect/Technical Lead), Francois Abel (Switch, FPGA. and BB2-lead), Beat Weiss
(Core Engineering), Matteo Cossale (Cooling), Stephan Paredes (Coooling), and others: IBM ZRL/CH
• Peter v. Ackeren,, Ed Swarthout, Dac Pham : Freescale/NXP
• Yvonne Chan, IBM Toronto
• Gijs Schonderbeek, Sieds Damstra, Albert-Jan Boobstra: ASTRON/NL
• Several students and interns
• And many more remain unnamed….
Companies: NXP; IBM; TransferDSW – NL, Strukton/NL, Roneda/BE, AT&S/AT,Supercomputing Systems/CH, Miromico/CH Dutch Gvt for DOME grant
Outlook
29
• Still work to be done, HW testing, SW, redesign of some boards for bugs or low
production yield, cost reduction of some components
• Commercially available through startup ILA Microservers
• First customer bought 15 T4240 modules
• Buildup of two systems for ASTRON and ZRL (with enclosure, etc.)
• GPU node
• Target markets:• Data center
• Scientific computing (SKA)
• Embedded (vehicles, robots, IoT Edge server)
Backup
System Management
31
• Every node is a USB device
• Cypress PSoC controller implements module-level management• Serial console
• Power Sequencing
• Current and Temperature Monitoring
• JTAG
• etc.
• Python process on host allows access of all hosts
• Implements IPMI
• Interacts with Switch, FPGA tools, etc.
QorlQ T4240 Communication Processor
32
32-way carrier network topology
33Ronald P. Luijten / July 2017 33
T4240module
32 way carrier
FM6000 switch
32x 10 GbE internal connectivity from switch8 x 40GbE external connectivity (QSFP+)Green links optionally connect to other 32way carrier
Thanks for your Attention!Questions?