ANTON D.E Shaw Research
description
Transcript of ANTON D.E Shaw Research
ANTON
D.E Shaw Research
Force Fields: Typical Energy Functions
20
20
12 6
1( )
2
1( )
2
[1 cos( )]2
( )
[ ]
rbonds
angles
n
torsions
improper
i j
elec ij
ij ij
LJ ij ij
U k r r
k
Vn
V improper torsion
q q
r
A B
r r
Bond stretches
Angle bending
Torsional rotation
Improper torsion (sp2)
Electrostatic interaction
Lennard-Jones interaction
• Parallelization• (getting an idea of the level of computation
needed)
• For every time step, every atom must communicate within its cutt-off radius with every other atom.
A lot of inter-processor communication that can be scaled well is needed.
MD Simulator Requirements
• 1) Need a huge number of arithmetic processing elements
• 2) A lot of inter-processor communication that can be scaled well is needed.
• 3) Memory is not an issue– With 25,000 atoms (64bytes
each) total=1.6MB over 512 nodes=3.2KB/node which is < most L1
Why Specialized Hardware?
Memory
Communication
Computation
Needs
Anton System-Level Organization
Multiple segments (probably 8 in first machine)
512 nodes (each consists of one ASIC plus DRAM) per segment– Organized in an 8 x 8 x 8 toroidal mesh
Each ASIC equivalent performance to roughly 500 general purpose microprocessors– ASIC power similar to a single
microprocessor
6
Anton• 33M gate ASIC• Two computational
subsystems connected by communication ring
• Hardware datapaths compute over 25 billion interactions/s
• Full machine has 512 ASICs in a 3D torus
• 13 embedded processors
Flexible Subsystem
High-Throughput Interaction Subsystem
Me
mo
ry C
on
tro
ller
Host Interface
Channel Channel Channel
Ch
an
ne
l C
ha
nn
el
Ch
an
ne
l
Ro
ute
r
Router Router
Ro
ute
r Memory Controller
Ro
ute
r
Router
DRAM
DR
AM
+Z
+Y +X
-Z
-X
-Y Host Processor
Communication Ring
Example: Particle Interaction Pipeline (one of 32)
Where We Use Flexible Hardware
– Use programmable hardware where:• Algorithm less regular• Smaller % of total computation
- E.g., local interactions (fewer of them)• More likely to change
– Examples:• Bonded interactions• Bond length constraints• Experimentation with
- New, short-range force field terms- Alternative integration techniques
Overview of the Flexible Subsystem
GC = Geometry Core
(each a VLIW processor)
10
Anton in Action
• 500X NAMD 80-100X Desmond 100X Blue Matter
Simulation Evaluations
GPU+FPGA ???
LVDSGPU
6*GDDR5
FPGA
HIGH SPEED SERIAL I/O
UP TO 2 Tbit/S
LVDS
FFT and LJ
16*PCIe