ANTON D.E Shaw Research

12
ANTON D.E Shaw Research

description

ANTON D.E Shaw Research. Force Fields: Typical Energy Functions. Bond stretches Angle bending Torsional rotation Improper torsion (sp2) Electrostatic interaction Lennard-Jones interaction. MD Simulator Requirements. Parallelization (getting an idea of the level of computation needed) - PowerPoint PPT Presentation

Transcript of ANTON D.E Shaw Research

Page 1: ANTON D.E Shaw Research

ANTON

D.E Shaw Research

Page 2: ANTON D.E Shaw Research

Force Fields: Typical Energy Functions

20

20

12 6

1( )

2

1( )

2

[1 cos( )]2

( )

[ ]

rbonds

angles

n

torsions

improper

i j

elec ij

ij ij

LJ ij ij

U k r r

k

Vn

V improper torsion

q q

r

A B

r r

Bond stretches

Angle bending

Torsional rotation

Improper torsion (sp2)

Electrostatic interaction

Lennard-Jones interaction

Page 3: ANTON D.E Shaw Research

• Parallelization• (getting an idea of the level of computation

needed)

• For every time step, every atom must communicate within its cutt-off radius with every other atom.

A lot of inter-processor communication that can be scaled well is needed.

MD Simulator Requirements

Page 4: ANTON D.E Shaw Research

• 1) Need a huge number of arithmetic processing elements

• 2) A lot of inter-processor communication that can be scaled well is needed.

• 3) Memory is not an issue– With 25,000 atoms (64bytes

each) total=1.6MB over 512 nodes=3.2KB/node which is < most L1

Why Specialized Hardware?

Memory

Communication

Computation

Needs

Page 5: ANTON D.E Shaw Research

Anton System-Level Organization

Multiple segments (probably 8 in first machine)

512 nodes (each consists of one ASIC plus DRAM) per segment– Organized in an 8 x 8 x 8 toroidal mesh

Each ASIC equivalent performance to roughly 500 general purpose microprocessors– ASIC power similar to a single

microprocessor

Page 6: ANTON D.E Shaw Research

6

Anton• 33M gate ASIC• Two computational

subsystems connected by communication ring

• Hardware datapaths compute over 25 billion interactions/s

• Full machine has 512 ASICs in a 3D torus

• 13 embedded processors

Flexible Subsystem

High-Throughput Interaction Subsystem

Me

mo

ry C

on

tro

ller

Host Interface

Channel Channel Channel

Ch

an

ne

l C

ha

nn

el

Ch

an

ne

l

Ro

ute

r

Router Router

Ro

ute

r Memory Controller

Ro

ute

r

Router

DRAM

DR

AM

+Z

+Y +X

-Z

-X

-Y Host Processor

Communication Ring

Page 7: ANTON D.E Shaw Research

Example: Particle Interaction Pipeline (one of 32)

Page 8: ANTON D.E Shaw Research

Where We Use Flexible Hardware

– Use programmable hardware where:• Algorithm less regular• Smaller % of total computation

- E.g., local interactions (fewer of them)• More likely to change

– Examples:• Bonded interactions• Bond length constraints• Experimentation with

- New, short-range force field terms- Alternative integration techniques

Page 9: ANTON D.E Shaw Research

Overview of the Flexible Subsystem

GC = Geometry Core

(each a VLIW processor)

Page 10: ANTON D.E Shaw Research

10

Anton in Action

Page 11: ANTON D.E Shaw Research

• 500X NAMD 80-100X Desmond 100X Blue Matter

Simulation Evaluations

Page 12: ANTON D.E Shaw Research

GPU+FPGA ???

LVDSGPU

6*GDDR5

FPGA

HIGH SPEED SERIAL I/O

UP TO 2 Tbit/S

LVDS

FFT and LJ

16*PCIe