First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil...

37
First Summit results Lattice QCD at non-zero baryon density April 26, 2019 Patrick Steinbrecher, HotQCD

Transcript of First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil...

Page 1: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

First Summit resultsLattice QCD at non-zero baryon density

April 26, 2019 Patrick Steinbrecher, HotQCD

Page 2: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Quantum Chromodynamics (QCD)

Quantum Chromodynamics

Proton

Quarks

Gluons

Color Charge

Strong Force

Confinement

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 1

1% of proton mass

= quark mass

99% of proton mass

= kinetic energy ofgluons & quarks

Page 3: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

What happens if you make things hotter and hotter?

What happens if you keep squeezing and squeezing?

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 2

Page 4: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Phases of Water

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 3

Page 5: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

The QCD phase diagram

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 4

Page 6: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Relativistic Heavy Ion Collider

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 5

Page 7: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Beam Energy Scan Theory

HotQCD is part of BEST and its mission is to provide critical

input for the Beam Energy Scan II which is performed now

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 6

Page 8: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Simulating Quantum Chromodynamicsfrom first principles

discretize Dirac equation

(iγµ∂µ − m)Ψ = 0

calculate path integral using

monte carlo methods

〈O〉 = 1Z

DU O exp (−S)

Lattice QCD

✉ ✉ Q

✉ Q

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟

✟✟✟✟

✟✟

✟✟✟✟

✟✟

✟✟✟✟

✟✟✟✟

✟✟

✟✟✟✟

✟✟

✟✟✟✟

✟✟

✟✟✟✟

✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟

✟✟✟

✟✟

✟✟✟✟

✟✟✟✟

✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

✟✟✟✟✟✟

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 7

e.g. 4-dimensional Lattice N3σ×Nτ

V = (aNσ)3

T = 1aNτ

Page 9: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

The Lattice QCD Kernel

TrM -1 = limN→∞

1

N

N∑

k=1

η†kM -1ηk

evaluate traces using N random noise vectors η

solve M-1ηk using Conjugate Gradient

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 8

condition:

limN→∞

1

N

N∑

k=1

η∗kiηkj = δij

up to 2000 η per gauge field configuration

99% of runtime

Page 10: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Conjugate Gradient

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9

1. calculate stencil operator (66% of runtime)

2. call multiple STREAM kernels (33% of runtime)

3. repeat

no optimizations possible for 2.

performance determined by memory bandwidth

Page 11: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Stencil Operator

wn =

4∑

µ=0

[(

Un,µvn+µ − U†n−µ,µvn−µ

)

+(

Nn,µvn+3µ − N†n−3µ,µvn−3µ

)]

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9

complex 3-dim vector

complex 3×3 matrix U(3) matrix→֒ reconstruct from 14 floats

ν

µmatrix

vector

Page 12: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Stencil Operator

wn =

4∑

µ=0

[(

Un,µvn+µ − U†n−µ,µvn−µ

)

+(

Nn,µvn+3µ − N†n−3µ,µvn−3µ

)]

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9

complex 3-dim vector

complex 3×3 matrix U(3) matrix→֒ reconstruct from 14 floats

ν

µmatrix

vector

w = standard

Page 13: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Stencil Operator

wn =

4∑

µ=0

[(

Un,µvn+µ − U†n−µ,µvn−µ

)

+(

Nn,µvn+3µ − N†n−3µ,µvn−3µ

)]

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9

complex 3-dim vector

complex 3×3 matrix U(3) matrix→֒ reconstruct from 14 floats

ν

µmatrix

vector

w = standard

+ naik

� precalculated

� three-link term

Page 14: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Stencil Operator

wn =

4∑

µ=0

[(

Un,µvn+µ − U†n−µ,µvn−µ

)

+(

Nn,µvn+3µ − N†n−3µ,µvn−3µ

)]

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9

complex 3-dim vector

complex 3×3 matrix U(3) matrix→֒ reconstruct from 14 floats

ν

µmatrix

vector

w = standard

+ naik

� precalculated

� three-link term

1146 Flop/site

0.8 Flop/byte→֒ single-precision

Page 15: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

Page 16: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

Page 17: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

Page 18: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( )

Page 19: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( )

Page 20: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

,

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( )

Page 21: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

,

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( )

Page 22: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

,

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( ) SO multi3( )

Page 23: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

,

·, , ,

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( ) SO multi3( )

Page 24: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

,

·, , ,

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( ) SO multi3( )

Page 25: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

Memory

Memory

constant matrices η0 η1 η2 η3 η4 η5 η6 · · ·

random vectors

,

·, , ,

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 10

SO( ) SO multi3( )

pro: much better arithmetic intensity

con: higher register pressure

Page 26: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multiple right-hand sides

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

1 2 3 4 5 6 7 8

Flop/byte

#right-hand sides

Stencil Operator

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 11

Page 27: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Single-node Performance

0

500

1000

1500

2000

0 5 10 15 20

Conjugate Gradient

GFlop/s

#right-hand sides

fp32, single node

Volta V100

Knights Landing

Skylake

Haswell

K40

K20X

Ivy Bridge

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 12

Page 28: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Multi-node Performance

short setup phase assigns local problems to each GPU

e.g. inversions of different matrices

performance scales linearly with number of nodes

maximize number of nodes to reduce time to solution

largest Summit job used 2k nodes

achieved 23 PFlop/s

largest Titan job used 14k nodes

achieved 5 PFlop/susing both GPU and CPU

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 13

Page 29: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

The QCD phase diagram

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 14

Page 30: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Critical point from Taylor expansions

expansion of QCD pressure

P

T 4=

n

1

n!χB

n µ̂nB , χB

n =1

VT 3

∂n lnZ

∂µ̂nB

µB=0

analysis of convergence radius can determine bound on the

location of a critical point:

rP2n =

(2n + 2)(2n + 1)χB2n

χB2n+2

1/2

only if coefficients are positive for all n ≥ n0

if not → no critical point on real axis

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 15

Page 31: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Radius of convergence

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 16

complex function

f (z) = 11+z2

series expansion

f (z) =∑∞

n (−1)nz2n

convergence determined

by nearest singularity

from the origin

Page 32: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Comparison to experiment

Moment Symbol Experiment Lattice

mean MX 〈NX 〉 VT3χ

X1

variance σ2X

(δNX )2⟩

VT3χ

X2

skewness SX

(δNX )3⟩

σ3X

VT 3χX3

(

VT 3χX2

)3/2

kurtosis kX

(δNX )4⟩

σ4X

− 3VT 3χX

4(

VT 3χX2

)2

volume independent ratios

σ2X

MX=

χX2

χX1

, SXσX =χX

3

χX2

, kXσ2X =

χX4

χX2

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 17

only at freeze-out (µf , Tf )

Page 33: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

First results from Summit

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 18

Page 34: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

The QCD crossover line

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 19

135

140

145

150

155

160

165

170

175

0 50 100 150 200 250 300 350 400

Tc [MeV]

µB [MeV]

nS = 0, nQ

nB= 0.4

crossover line: O(µ4B)

constant: ǫs

freeze-out: STARALICE

Page 35: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

No signs for a QCD critical point along Tc(µB)

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 20

0.10

0.12

0.14

0.16

0.18

0.20

0 50 100 150 200 250

c2s(Tc(µB), µB)

µB [MeV]

µQ = µS = 0

O(µ4B)

O(µ2B)

HRG0.000

0.002

0.004

0.006

0.008

0.010

0.012

0 50 100 150 200 250

Tc(µB)3/cP (Tc(µB), µB)

µB [MeV]

µQ = µS = 0

O(µ6B)

O(µ4B)

HRG

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0 50 100 150 200 250 300

χdisc(Tc(µB), µB)/χdisc(T0, 0)− 1

µB [MeV]

HotQCD preliminary

nS = 0, nQ

nB= 0.4O(µ4

B)

O(µ2B)

15.0

20.0

25.0

30.0

35.0

40.0

45.0

0 50 100 150 200 250

cV (Tc(µB), µB)/Tc(µB)3

µB [MeV]

µQ = µS = 0

O(µ4B)

O(µ2B)

HRG

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0 50 100 150 200 250

1/(Tc(µB)4κT (Tc(µB), µB))

µB [MeV]

µQ = µS = 0

O(µ6B)

O(µ4B)

HRG

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 50 100 150 200 250 300

σ2B(Tc(µB), µB)/σ

2B(T0, 0)− 1

µB [MeV]

HotQCD preliminary

nS = 0, nQ

nB= 0.4O(µ4

B)

O(µ2B)

HRG

0

1

2

3

4

5

6

7

8

9

135 140 145 150 155

r nχ -

- e

stim

ato

r fo

r µ

Bcrit /T

T [MeV]

2017: lower bound for r4χ

estimator r2χ

D’Elia et al., 2016, r4χ

Datta et al., 2016

Fodor, Katz, 2004

0

1

2

3

4

5

6

7

8

9

135 140 145 150 155

r2χ,HRG

disfavored region for thelocation of a critical point

r4χ,HRG

r6χ,HRG

0

20

40

60

80

100

135 145 155 165 175 185 195

nS=0, nQ/nB=0.4

χdisc/fk4

T [MeV]

Nτ=8, O(µB

6)

µB = 0.0 MeV125.0 MeV200.0 MeVno deviations from HRG

no increased fluctuations

no narrowing crossover

Page 36: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Summary

constrains on location of QCD critical point

upper bound from crossover line and radius of convergence

negative 6th and 8th order Taylor coefficients of P

no increased fluctuations along crossover

possible existing critical point may be found only for

µB > 400 MeV and T < (130 − 140) MeV

provides important input for Beam Energy Scan II currently

performed at the Relativistic Heavy Ion Collider

CG sustained 23 PFlop/s on Summit using 2k nodes

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 21

Page 37: First Summit results · April 26, 2019 Patrick Steinbrecher, HotQCD Slide 9 1. calculate stencil operator (66% of runtime) 2. call multiple STREAM kernels (33% of runtime) 3. repeat

Thank you for your attention!

April 26, 2019 Patrick Steinbrecher, HotQCD Slide 22