Download - HLT and CA Tracker - GSI Indico (Indico) · S. Gorbunov et al. Real Time Conference (RT), 2010 GORBUNOV et al.: ALICE HLT HIGH SPEED TRACKING ON GPU 1847 Fig. 7. a) Neighbors ﬁnder.

STAR High-Level Trigger & STAR Tracking Focus Group

HLT and CA Tracker

Sept 23rd, 2017 CBM-STAR Joint Workshop, CCNU Wuhan 1

Aihong Tang

Outline

q  STAR High Level Trigger l  History

l  CA tracker (online tracking & offline as seeding)

l  Infrastructure

q  HLT Spring/Summer 2017 activities

q  Plan for BES-II


STAR HLT History


•  STAR Level-3 trigger system (Developed by Frankfurt group, conformal mapping based tracker, phased out ~2002)

•  HLT 1.0 (2010 – 2012) – Borrowed CPU share from TPX machines. Fragmented event reconstruction. Not scalable.

•  HLT 2.0 (2013 – 2016) – Independent farm sponsored by NSFC. Integrate all tasks in one process. Use CA tracker to replace the L3 tracker, including experimental usage of KFParticle. Developed new DAQ infrastructure. Scale up effortlessly.

•  HLT 3.0 (2017 – ) – Balance the total throughput of ~1k CPU

threads and ~10k Phi threads. Decompose the HLT task into independently schedulable sub-tasks and hide the details of Phi from DAQ side.

Year # of Nodes # of CPU 2012 4 96

2014 9 296

2016 27 1192 + 45 Xeon Phi

Integration with DAQ


HLT

•  STAR HLT use high performance computers to do real time event reconstruction and analysis

•  Provide additional event selection capability based on physics analysis on top of hardware trigger layers

CellularAutomatonTracker


•  local data access •  intrinsically parallel •  extremely simple algorithms •  suitable for SIMD

S. Gorbunov et al. Real Time Conference (RT), 2010

GORBUNOV et al.: ALICE HLT HIGH SPEED TRACKING ON GPU 1847

Fig. 6. Reconstruction time on CPU for events with different track multiplicity.

Fig. 7. a) Neighbors finder. b) Evolution step of the Cellular Automaton.

The tracking algorithm starts with a combinatorial search fortrack candidates (tracklets), which is based on the Cellular Au-tomaton method [3]. Local parts of trajectories are created fromgeometrically nearby hits, thus eliminating unphysical hit com-binations at the local level. The combinatorial processing com-poses the following two steps:

• 1. Neighbor finder: For each hit at a row k the best pairof neighboring hits from rows k 1 and k 1 is found,as it is shown in Fig. 7(a). The neighbor selection criteriarequires the hit and its two best neighbors to form a straightline. The links to the best two neighbors are stored. Oncethe best pair of neighbors is found for each hit, the step iscompleted.

• 2. Evolution step: Reciprocal links are determined andsaved, all the other links are removed (see Fig. 7(b)).

Every saved one-to-one link defines a part of the trajec-tory between the two neighboring hits. Chains of consecutiveone-to-one links define the tracklets. One can see from Fig. 7(b)that each hit can belong to only one tracklet because of thestrong evolution criteria. This uncommon approach is possibledue to the abundance of hits on every TPC track. Such a strongselection of tracklets results in a linear dependence of theprocessing time on the number of track candidates. When thetracklets are created, the sequential part of the reconstructionstarts, implementing the following two steps:

Fig. 8. Reconstruction performance for proton-proton collisions at 14 TeV.

Fig. 9. Reconstruction performance for central heavy ion collisions at 5 TeV.

• 3. Tracklet construction: The tracklets are created by fol-lowing the hit-to-hit links as it is described above. The ge-ometrical trajectories are fit using a Kalman Filter, with a

quality check. Each tracklet is extended in order to col-lect hits being close to its trajectory.

• 4. Tracklet selection: Some of the track candidates can haveintersected parts. In this case the longest track is saved,the shortest removed. A final quality check is applied tothe reconstructed tracks, including a cut on the minimalnumber of hits and a cut for low momentum.

IV. TRACKER EFFICIENCY

The performance of the HLT track finder of 99.9% for proton-proton events and 98.5% for central Pb-Pb collisions has beenverified on simulated events. Corresponding efficiency plots areshown on Figs. 8 and 9. In addition to the high efficiency, thereal-time reconstruction is an order of magnitude faster than theoff-line algorithm used as reference.

The described algorithm has the advantage of a high degree oflocality and parallelism. Step one only searches for local neigh-bors to each hit. It can be done in parallel for all the hits as theresult does not depend on the order of processing. Step three

NHitsInTrack0 10 20 30 40 50 60 70 80 90 100

Entr

ies

0

500

1000

1500

2000

2500

3000

3500

4000

Entries 33666

Mean 12.2

RMS 8.63

Underflow 0

Overflow 0

of 806 Aug 2013 Igor Kulakov, BNL

HLT Based CA (offline calibration)

eval_Sti-CA/daq_sl53.ittf/year_2011/AuAu200_production

STAR AuAu 200GeV Run11


Ivan Kisel, GSI STAR Tracking Review Meeting, Berkeley, November 14, 2011

Cellular Automaton versus Track Following

The CA track finder is more stable w.r.t. track multiplicity and is ~10 times faster than the TF based Sti track finder.8

The Track Following method is well suitable for simple event topologies, but suffers from large combinatorics in case of high track densities. In addition, the final efficiency of the TF method is limited by the seeding efficiency.

The Cellular Automaton method is based on local analysis of data. The CA algorithm has staged structure, therefore it accumulates continuously the tracking information while working with hits, neighbors, track segments, track candidates and tracks. In addition, CA can apply global competition at each stage of data analysis. Locality of the algorithm makes CA intrinsically parallel.

Improve the STAR tracking by integrating the CA track finder as a seed finder for Sti.

Sti trackerCA+Sti tracker

CPU time per event

global CA tracker sector CA tracker

CA+Sti is ~50% faster than Sti alone Efficiency for global tracks has been increased on 7-9%

Sti trackerCA+Sti tracker

Real Au-Au 200 GeV/n data

Y. Fisyak, I. Kisel, I. Kulakov, J. Lauret, M. Zyzak, „Track Reconstruction in the STAR TPC with a Cellular Automaton Based Approach“,CHEP-2010, Taipei, October 18-22, 2010


•  StiCA gives 6-12% higher tracking efficiency, depend on luminosity, than Sti in Au+Au 200GeV collisions

•  StiCA gives ~8% higher tracking efficiency than Sti in p+p 510GeV collisions

•  The pT difference between Sti and StiCA is less than 3% for global tracks and no obvious difference for primary tracks

•  StiCA is more stable when bad TPC sectors exist

•  StiCA is now the default tracker for Run16/17 offline data production

M.Mustafa & X. Chen

Particle Identification


Pt (GeV/c)0 2 4 6 8 10 12 141

10

210

310

410

510

610

Prim_PtPrim_Pt

Entries 4679274

Mean 0.6214

RMS 0.4307

Prim_Pt

φ

0 1 2 3 4 5 60

2000

4000

6000

8000

10000

12000

14000

16000

18000

Prim_PhiPrim_Phi

Entries 4679274

Mean 2.965

RMS 1.805

Prim_Phi

η-3 -2 -1 0 1 2 30

20406080

100120140160180200220240

310×Prim_Eta

Prim_EtaEntries 1.057004e+07

Mean 0.04002

RMS 0.7795

Prim_Eta

1

10

210

310

Primary Momentum (GeV/c)-5 -4 -3 -2 -1 0 1 2 3 4 5

dEdx

(GeV

/cm

)

0

5

10

15

20

25

30-610×

Prim_dEdxPrim_dEdx

Entries 4386294Mean x 0.01939Mean y 2.771e-06RMS x 0.9262RMS y 1.208e-06

Prim_dEdx

VertexX (cm)-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

200

400

600

800

1000

1200

1400

1600

1800

2000

VertexXVertexX

Entries 28742

Mean -0.2138RMS 0.1614

VertexX

VertexY (cm)-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

200

400

600

800

1000

1200

1400

1600

1800

2000

VertexYVertexY

Entries 28742

Mean -0.1627RMS 0.1637

VertexY

•  PID using TPC, BEMC and TOF •  online self-calibration of TPC dE/dx gain

Multiplicity0 1000 2000 3000 4000 5000 6000 70001

10

210

globalMultglobalMultEntries 28742Mean 2243

RMS 1422

globalMult

Multiplicity0 200 400 600 800 1000 1200 1400 16001

10

210

primaryMultprimaryMultEntries 28742Mean 367.8

RMS 257.4

primaryMult

matchPhiDiff0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

10000

20000

30000

40000

50000

60000

70000

Emc_matchPhiDiffEmc_matchPhiDiff

Entries 1564783

Mean 0.02688

RMS 0.01907

Emc_matchPhiDiff

TowerEnergy (GeV)0 2 4 6 8 10 12 14 16 18 200

500

1000

1500

2000

2500

310×Emc_towerEnergy

Emc_towerEnergy

Entries 8730685

Mean 0.8227

RMS 0.3683

Emc_towerEnergy

TowerDaqId0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

5000

10000

15000

20000

25000

30000

Emc_towerDaqIdEmc_towerDaqId

Entries 8730685

Mean 2674

RMS 1373

Emc_towerDaqId

TowerSoftId0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

5000

10000

15000

20000

25000

30000

Emc_towerSoftIdEmc_towerSoftId

Entries 8730685

Mean 2455

RMS 1372

Emc_towerSoftId

zEdge0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1000

2000

3000

4000

5000

6000

Emc_zEdgeEmc_zEdgeEntries 225170

Mean 0.9535

RMS 0.579

Emc_zEdge

0

5000

10000

15000

20000

25000

φ

-3 -2 -1 0 1 2 3

η

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1Emc_towerEtaPhi

Emc_towerEtaPhiEntries 8730685Mean x 0.009471Mean y -0.02656RMS x 1.837RMS y 0.6257

Emc_towerEtaPhi

LocalZ-5 -4 -3 -2 -1 0 1 2 3 4 50

10000

20000

30000

40000

50000

60000

Tof_LocalZTof_LocalZEntries 4211508

Mean -0.7225

RMS 1.934

Tof_LocalZ

LocalY-15 -10 -5 0 5 10 150

20

40

60

80

100

120

310×Tof_LocalY

Tof_LocalYEntries 4211508

Mean 0.005957

RMS 1.445

Tof_LocalY

0

50

100

150

200

250

300

350

400

450

Momentum (GeV/c)0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

β1/

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Tof_InverseBeta

Tof_InverseBetaEntries 4211508Mean x 0.7962Mean y 1.085RMS x 0.4976RMS y 0.2175

Tof_InverseBeta

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

matchId0 20 40 60 80 100 120 140 160 180 200

fiber

Id

0

20

40

60

80

100

120

140

160

180

200Tof_matchId_fireId

Tof_matchId_fireIdEntries 5.626722e+07

Mean x 101.5

Mean y 99.58

RMS x 54.3

RMS y 55.63

Tof_matchId_fireId

Selected HLT Trigger Algorithms


[GeV/c]T,mc

p0 1 2 3 4 5 6 7 8 9 10

Eff

0

0.2

0.4

0.6

0.8

1

1.2

1.4

HLT: MTD matching efficiency for J/psi

DefaultClosest match

Heavy fragments, e.g. anti-4He e+ + e-

µ+ + µ-

Primary Vertex Section for HFT

BeamLineMonitoring


•  Online 3D primary vertex reconstruction •  Real time beam position monitoring in RHIC low energy runs •  Reject background •  Live feedback to CA for accelerator monitoring and

performance tuning •  Benefit all BES-I physics analyses •  Likely to be used again in BES-II

HLT Offline

KFParticle and Xeon Phi Coprocessor


Reconstruction of decay chains

• The state vector does not depend on the number of daughter particles

• All particles are described with the same state vector

• All particles are equivalent

• All functionality is the same for mother and daughter particles

07.07.2011 Maksym Zyzak, 2nd Tracking Workshop 4/22

r = { x, y, z, px, py, pz, E }

Position, direction and momentum State vector

Reconstruction of decay chains

• The state vector does not depend on the number of daughter particles

• All particles are described with the same state vector

• All particles are equivalent

• All functionality is the same for mother and daughter particles

07.07.2011 Maksym Zyzak, 2nd Tracking Workshop 4/22

r = { x, y, z, px, py, pz, E }

Position, direction and momentum State vector

•  KFParticle on Intel Xeon Phi (pioneered by FIAS group) •  Intel Xeon Phi 7120P

o  61 core x 4 hardware threads per core o  512-bit vector register o  16G RAM o  on board Linux

•  Experience with Xeon Phi benefits more the KFParticle

15 January 2014 Maksym Zyzak, BNL /05

KF Particle Finder on Xeon Phi

!3

• Standalone KF Particle Finder is run on the Xeon Phi card.

• Currently search of Ks0, Λ, Ξ- and Ω- is turned on. The parallelism between cores is implemented on the event level.

• The algorithm is significantly modified for higher level of vectorization, including data structure modification.

• The program scales up to 240 logical cores on the Xeon Phi.

Number of logical cores0 50 100 150 200

Eve

nts/

s

0

2000

4000

6000

8000

10000

Intel Xeon E5-2680 2.7 GHzIntel Xeon Phi 1 GHz 1 thread/coreIntel Xeon Phi 1 GHz 2 threads/coreIntel Xeon Phi 1 GHz 3 threads/coreIntel Xeon Phi 1 GHz 4 threads/core

Number of logical cores0 50 100 150 200

Spee

d up

fact

or

0

20

40

60

80

100

Intel Xeon E5-2680 2.7 GHzIntel Xeon Phi 1 GHz 1 thread/coreIntel Xeon Phi 1 GHz 2 threads/coreIntel Xeon Phi 1 GHz 3 threads/coreIntel Xeon Phi 1 GHz 4 threads/core

See M. Zyzak’s talk

Infrastructure Reshape in 2017


Event Builder Event #4 Event #0

Tracking Vertex Finding(a thin wrapper)

Trigger Decision Making

Shared Memory

CPU × 40

Xeon Phi × 480 Offloaded Task

One HLT Node

Event #3 Event #2 Event #1

•  Fully synchronized system. All working threads are independently scheduled and workload balanced between different stages/devices. DAQ infrastructure provided by J. Landgraf from BNL.

•  Expose parallelism at difference levels, event level, within event J. Landgraf and H. Ke

HLT Data Taking in Year 2016


•  HLT process almost all of the data taking by DAQ at 1700+Hz, 1.7+GB/s input data

•  ~1000Hz Mini-Bias •  ~700Hz MTD (tracking 6-8

sectors) •  < 40% CPU load

•  MTD di-muon quarkonium selection •  Partial TPC tracking, < 300Hz •  TPC-MTD matching •  Selection rate ~1%

hMTDQmInvMassUSEntries 10Mean 2.161RMS 1.146

2) GeV/cµµ(invM0 2 4 6 8 10 120

2000

4000

6000

8000

10000

12000

MTD quarkonium InvMass hMTDQmInvMassUSEntries 213505

Mean 2.604RMS 2.082

USLS

MTD quarkonium InvMass hMTDQmUpsilonMassUS

Entries 10Mean 0RMS 0

2) GeV/cµµ(invM8 8.5 9 9.5 10 10.5 11 11.5 12 12.5 13

100

150

200

250

300

350

400

450

MTD quarkonium InvMass(Upsilon) hMTDQmUpsilonMassUS

Entries 213505

Mean 9.815RMS 1.342

USLS

MTD quarkonium InvMass(Upsilon)

hMTDQmJpsiMass_ptcut0_US

Entries 10Mean 3.258RMS 0.3336

2) GeV/cµµ(invM2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4

2000

3000

4000

5000

6000

7000

8000

MTD quarkonium InvMass(Jpsi) hMTDQmJpsiMass_ptcut0_US


USLS

MTD quarkonium InvMass(Jpsi) hMTDQmJpsiMass_ptcut2_US


2) GeV/cµµ(invM2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4

1500

2000

2500

3000

3500

MTD quarkonium InvMass(Jpsi_ptcut2) hMTDQmJpsiMass_ptcut2_US


USLS

MTD quarkonium InvMass(Jpsi_ptcut2)

hMTDQmJpsiMass_ptcut4_US

Entries 5Mean 3.559RMS 08− 1.314e

2) GeV/cµµ(invM2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4

300

400

500

600

700

800

900

1000

1100

MTD quarkonium InvMass(Jpsi_ptcut4) hMTDQmJpsiMass_ptcut4_US


USLS

MTD quarkonium InvMass(Jpsi_ptcut4)

HLT Run 17 p+p 510GeV

HLT Data Taking in Year 2017


Spring/Summer activities :TPC Space Charge Calibration (AuAu)

Space Charge Used

<sD

CA

> [c

m] •  TPC SC calibration needs to be done once at

the very begging of each run for HLT and fast off-line production

•  It needs to be done multiple times during the run if things get bad

•  <sDCA> as a function of SC offset has a almost universal relationship, which can be used for self-calibration.


Making the SCC automatic hDCA

xy

i = a · SC + b

Assumption:•  True SC stays the same at tk and tk+1

SCk+1 = SC

k

� hDCAxy

ik

a

a ⇡ �100 for Au+Au 200 GeV

Reality:•  At Au+Au 54 GeV, ZDCx changes in a

fraction of a second•  Need to fit multiple times in a second•  HLT run at ~1500 Hz•  Fit once every 2000, 500, and 200 events

Tue Jun 27 00:15:48 2017

hDcaXYEntries 12116Mean 0.06804− RMS 0.8948

/ ndf 2χ 202 / 86Constant 2.4± 159.7 Mean 0.0058±0.0901 − Sigma 0.0067± 0.4693

DCAxy [cm]10− 8− 6− 4− 2− 0 2 4 6 8 100

20

40

60

80

100

120

140

160

180

200

hDcaXYEntries 12116Mean 0.06804− RMS 0.8948

/ ndf 2χ 202 / 86Constant 2.4± 159.7 Mean 0.0058±0.0901 − Sigma 0.0067± 0.4693

DCA XY

Y. Ye, H. Ke and G. Van Buren Sept 23rd, 2017 CBM-STAR Joint Workshop, CCNU Wuhan 16

<DC

Axy

>

Tue Jun 27 00:27:16 2017

0 200 400 600 800 1000 1200

0.28−

0.26−

0.24−

0.22−

0.2−

0.18−

0.16−

0.14−

0.12−

0.1−

Tue Jun 27 00:44:21 2017

0 5 10 15 20 25 30

0.28−

0.26−

0.24−

0.22−

0.2−

0.18−

0.16−

0.14−

0.12−

0.1−

•  Auto calibration has effect. Converge in no more than a few seconds. •  Long time, ~10 min, trend is dominated by the luminosity •  The yet deviating from zero and the luminosity dependence is due to the

inaccuracy of the slope a in the previous page

Time for a typical run (a.u.)

Test for Auto-SCC in 2017

~ 10 min


Test for Auto-SCC in 2017 18

1680

05

1816

8010

1816

8017

1816

8023

1816

8029

1816

8038

1816

8044

1816

8050

1816

8055

1816

9008

1816

9013

1817

0015

1817

0020

1817

0034

1817

0039

1817

1005

1817

1012

1817

1018

1817

2005

1817

2010

0.16−

0.14−

0.12−

0.1−

0.08−

0.06−

0.04−

0.02−

0beam position

beam positionEntries 100Mean 50.92RMS 27.3

BeamX BeamY

beam position

1816

8005

1816

8010

1816

8017

1816

8023

1816

8029

1816

8038

1816

8044

1816

8050

1816

8055

1816

9008

1816

9013

1817

0015

1817

0020

1817

0034

1817

0039

1817

1005

1817

1012

1817

1018

1817

2005

1817

2010

0

10

20

30

40

50

9−10×innerGain

gain parametersEntries 100Mean 49.93RMS 28.88

innerGainSector outerGainSector

innerGain

1816

8005

1816

8010

1816

8017

1816

8023

1816

8029

1816

8038

1816

8044

1816

8050

1816

8055

1816

9008

1816

9013

1817

0015

1817

0020

1817

0034

1817

0039

1817

1005

1817

1012

1817

1018

1817

2005

1817

2010

0.35−

0.3−

0.25−

0.2−

0.15−

0.1−

0.05−

0

dcaXy mean valuedcaXy mean value Auto Correction ON


•  HLT Cluster Sharing Resources for Offline Production •  ~500 job slots available for use during experiment downtime •  Intend to share the resources with anyone •  Wayne Betts is experimenting migrating production jobs submitted to

STAR online pool to HLT cluster •  Both two clusters are in DAQ room •  HLT cluster has no AFS access, in general, has no network

access (easy to solve, need additional cable and switch) •  Two ways of doing this and both has been successful

•  Merge to online linux pool •  Flock jobs, i.e. grid production

•  CephFS (massive storage) and Xeon Phi cards both require specific version of Linux kernel version. Potential conflict is under investigation.

Spring/Summer activities : HLT Job Management/Sharing

W. Betts & H. Ke


HLT Cluster Summer Activities

Max. # of Slots

•  Users mainly from BNL, GSI and Czech for TFG •  ~67000 CPU hours in two weeks (as of 09/17/2017, only

limited historical data available)


Plans for BES-II


•  Produce picoDST online (express online production) for fast diagnostics with physics quantities, which is critical for decision making and fast turning around.

Take advantage of otherwise mostly idling resource. Speed up BFC with new tracker and Xeon Phi cards. Gain experience for future online parallel computing.

•  Prototype next year

BES-II •  Will run express online production for HLT-good events only. •  Total ~1200 slots. For HLT PV filtering : ~400 slots. For express

production : 500 – 800 with all Xeon phi cards. •  Projected production CPU time per event is ~3s at 39GeV without

additional optimization, i.e. 0.3Hz per slot1

•  Total processing rate is 150-240Hz •  Targeting event rate is up to 450Hz for HLT-good events2

•  Looking for a 3x speed-up

•  25-30TB buffer needed for buffering one days’ data, assuming DAQ runs 16hr per day at 450Hz “HLT good events”

•  Require a copy of all “HLT Good Event”, which is additional pressure to the DAQ network and CPU power, especially the later is already stretching

1Energies for BES-II are below 20 GeV, here we use 39 as baseline is to take into account the ~50% event size increase due to iTPC. 2From D. Cebra’s estimation.


Potential Improvements aka. Action Items

Sept 23rd, 2017 23

•  Full CA tracking including speeding up fitting (Yuri and Grigory ) •  Use AVX instructions in CA (Grigory) •  Try to run dN/dx on Phi cards (Hongwei/Biao) •  Additional possible gain from trying out new compiler and 64 bit.

•  Job management system, needed for resource sharing anyway. In place. (Wayne/Levente/Hongwei)

•  Disk space requirement and data transfer IO bandwidth – needs to be tested and be sorted out (Hongwei/Jeff)

CBM-STAR Joint Workshop, CCNU Wuhan

BFC timing profile

Sept 23rd, 2017 24

AuAu 200 GeV

AuAu 39 GeV

Sti and dN/dx take most of the time. Inside Sti, track (re)fitting takes most of the time. Solution : 1.  Speed up fitting. 2.  Run one or both on Xeon Phi

cards.


than 1% of the total run time of the job. The total input for this production is 397.3 TB, and the total output is 213.7 TB. For the movement of data in both directions the methods employed with KISTI worked very well with little measurable loss.

3. Stages of production Figure 1 shows the stages of STAR’s grid production framework. Many of the components are used for other functions and have been reused here, because they are well debugged, actively maintained, proven to work over a decade, and cut down on development time. These components include the STAR file catalog [4], STAR data carousel [5] [6], STAR Unified Meta Scheduler [7], HTCondor [8] and, Globus [9].

Figure 1. Stages of STAR’s production system – each box and stages are explained in details as sub-

sections of section 3.

3.1. Input File Staging The first task illustrated in box one, is staging input files. The input files are initially stored on tape. They have the extension *.daq which stands for data acquisition, meaning it is data read out of STAR’s detectors. The STAR file catalog holds the list of files belonging to the dataset that needs to be reconstructed. It holds LFNs (logical file names), PFNs (physical file names) as well as a breadth of MetaData (MD) users may key on. The file list from the STAR file catalog is passed via a script which is part of the grid production framework to the STAR data carousel and an entry stub for each file is made in the production DB. The STAR data carousel is a queueing tool which optimizes requests to restore files from tape (HPSS) by reordering the requests in the queue to minimize the number of mount and dismount operations needed to restore the requests. Once the files are restored from tape to central disk the entry in the production database is marked as restored and input staging is completed.

3.2. Job Submission Once the input is staged we move on to submission. With the stub in the production database and the input file restored, a script will check the production database for restored files and compose a list of

3

L Hajdu et al, Automated Finite State Workflow for Distributed Data Production, Journal of Physics: Conference Series 762 (2016) 012006

HLT Job Management/Sharing for BES II express production

Considered PANDA and Levente’s scheme. Decided to use the latter. In place already.


Summary


•  Collaboration between STAR and CBM has been very fruitful within STAR High-Level Trigger and Tracking Focus Group. We expect it will be even more fruitful in BES-II era.

•  It is demonstrated that we can deliver important physics fast with the STAR HLT. We continue to make improvements to the system, and expand its functionalities for multi-purpose usages.

Backup Slides


RHIC Beam Use Request for Run17


43

Figure 4–3: The projected uncertainties for transverse single spin asymmetries of W-bosons as a function of rapidity for a delivered integrated luminosity of 400 pb-1 and an average beam polarisation of 55%. The solid light gray and pink bands represent the uncertainty on the KQ [18] and EIKV [52] results, respectively, due to the unknown sea-quark Sivers function. The crosshatched dark grey region indicates the current uncertainty in the theoretical predictions due to TMD evolution.

W± boson production is also the ideal tool to study the spin-flavor structure of sea

quarks inside the proton. Such a measurement of the transverse single-spin asymmetry will provide the very first constraint on the sea quark Sivers function in an x-range where the measured asymmetry in the and unpolarized sea quark distribution functions, as measured by E866 [55], can only be explained by strong non-pQCD contributions. At the same time, this measurement is also able to access the sign change of the Sivers function, if the effect due to TMD evolution on the asymmetries is in the order of a factor of 5 reduction. Figure 4–3 shows the projected uncertainties for transverse single-spin asymmetries of W± and Z0 bosons as a function of rapidity for a delivered integrated luminosity of 400pb-1 and an average beam polarization of 55%. The solid light gray and pink bands represent the uncertainty on the KQ [18] and EIKV [52] results, respectively, due to the unknown sea-quark Sivers function. The crosshatched dark grey region indicates the current uncertainty in the theoretical predictions due to TMD evolution.

Figure 4–4: The FoM to reconstruct W-bosons in STAR as function of the ZDC raw rate. The W-boson reconstruction efficiency was obtained from the data measured in 2011 to 2013. The highest FoM is reached at a ZDC rate of 330 kHz corresponding to a luminosity of 1.5 x 1032 cm-2s-1.

W y-0.5 0 0.5

N A

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

W y-0.5 0 0.5

N A

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2STAR projections

-1L(del.) = 400 pb

i + lA +W

no TMD evolutionarXiv:0903.3629

due to TMD evolutionUncertainty

Uncertainty on sea quarks

W y-0.5 0 0.5

N A

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

W y-0.5 0 0.5

N A

-0.3

-0.2

-0.1

0

0.1

0.2

0.3 STAR projections-1L(del.) = 400 pbi - lA -W

TMD evolvedarXiv:1401.5078

Uncertainty on sea quarks

Z y0.5< 0 0.5

N A

0.4<

0.3<

0.2<

0.1<

0

0.1

0.2

0.3

0.4STAR projections

-1L(del.) = 400 pb- l+ lA 0Z

arXiv:0903.3629 (no TMD evolution)Uncertainty due to TMD evolutionUncertainty on sea quarks

arXiv:1401.5078 (TMD evolved)Uncertainty on sea quarks

u d

•  Transversely polarized p+p @ 510GeV •  Requested luminosity increase 10% in terms of ZDC rate

StiCA Evaluation 2015


•  StiCA vs. Sti: Use CA as track finder and Sti do the track fitting as usual.

5

Run14 Au + Au 200GeV Primary Tracks

StiCA

Sti

Production Mid (w/ HFT)

Production High (w/o HFT)

Production Mid (w/ HFT)

Production High (w/o HFT)

8

Run 9 and 13 pp 500GeV Primary Tracks

Run 13 Run 9 Run 13 Run 9AuAu 200GeV

High L w/ HFT Mid L w/o HFT pp 510GeV

High L Low L

D0 Production in Run14 Au+Au 200GeV


)2(GeV/cπKm1.7 1.75 1.8 1.85 1.9 1.95 2 2.05 2.1

)2C

ount

s (p

er 1

0 M

eV/c

0

20

40

60

80

100

120

140 = 200 GeVNNsAu+Au

RHIC Run 2014 < 12.00

T1.50 < p

Sti26M MinBias Events

= 6S+BS/Counts = 204

)2(GeV/cπKm1.7 1.75 1.8 1.85 1.9 1.95 2 2.05 2.1

)2C

ount

s (p

er 1

0 M

eV/c

020406080

100120140160180200220

= 200 GeVNNsAu+Au RHIC Run 2014

< 12.00T

1.50 < p

StiCA25M MinBias Events

= 7S+BS/Counts = 256

•  Run14 Au+Au @ 200 GeV, 25M events production sample •  StiCA gives ~25% more D0 count with better S/B ratio

M. Mustafa

Single-spin Asymmetries AL for W±


•  Run13 longitudinally polarized p+p @ 510GeV

•  StiCA provides 20+% W •  With StiCA the “eta-dip” at high

luminosity is not observed

Devika Gunarathne

Di-jet observables


Daniel Olvitt

More Than HLT


Standalone event reconstruction

@FIAS

Development

HLT event reconstruction

@BNL

Application

StRoot event reconstruction

@BNL

Implementation

Ivan Kisel, STAR Collaboration Meeting, Berkeley, 18.10.2013

RHIC BES-II


26

2. Quantum Chromodynamics: The Fundamental Description of the Heart of Visible Matter

The trends and features in BES-I data provide compelling

motivation for a strong and concerted theoretical

response, as well as for the experimental measurements

with higher statistical precision from BES-II. The goal

of BES-II is to turn trends and features into definitive

conclusions and new understanding. This theoretical

research program will require a quantitative framework

for modeling the salient features of these lower energy

heavy-ion collisions and will require knitting together

components from different groups with experience

in varied techniques, including LQCD, hydrodynamic

modeling of doped QGP, incorporating critical

fluctuations in a dynamically evolving medium, and more.

Experimental discovery of a critical point on the QCD

phase diagram would be a landmark achievement. The

goals of the BES program also focus on obtaining a

quantitative understanding of the properties of matter

in the crossover region of the phase diagram, where it

is neither QGP nor hadrons nor a mixture of the two, as

these properties change with doping.

Additional questions that will be addressed in this

regime include the quantitative study of the onset

of various signatures of the presence of QGP. For

example, the chiral symmetry that defines distinct

left- and right-handed quarks is broken in hadronic

matter but restored in QGP. One way to access the

onset of chiral symmetry restoration comes via BES-II

measurements of electron-positron pair production in

collisions at and below 20 GeV. Another way to access

this, while simultaneously seeing quantum properties

of QGP that are activated by magnetic fields present

early in heavy collisions, may be provided by the slight

observed preference for like-sign particles to emerge

in the same direction with respect to the magnetic field.

Such an effect was predicted to arise in matter where

chiral symmetry is restored. Understanding the origin

of this effect, for example by confirming indications that

it goes away at the lowest BES-I energies, requires the

substantially increased statistics of BES-II.

NEW MICROSCOPES ON THE INNER WORKINGS OF QGPTo understand the workings of QGP, there is no

substitute for microscopy. We know that if we had a

sufficiently powerful microscope that could resolve the

structure of QGP on length scales, say a thousand times

smaller than the size of a proton, what we would see

Figure 2.10: The top panel shows the increased statistics anticipated at BES-II; all three lower panels show the anticipated reduction in the uncertainty of key measurements. RHIC BES-I results indicate nonmonotonic behavior of a number of observables; two are shown in the middle panels. The second panel shows a directed flow observable that can encode information about a reduction in pressure, as occurs near a transition. The third panel shows the fluctuation observable understood to be the most sensitive among those measured to date to the fluctuations near a critical point. The fourth panel shows, as expected, the measured fluctuations growing in magnitude as more particles in each event are added into the analysis.

are quarks and gluons interacting only weakly with each

other. The grand challenge for this field in the decade

to come is to understand how these quarks and gluons

conspire to form a nearly perfect liquid.

Microscopy requires suitable messengers that reveal

what is happening deep within QGP, playing a role

analogous to light in an ordinary microscope. The

The Beam Energy Scan at the Relativistic Heavy Ion Collider 32

Table 2. Event statistics (in millions) needed for Beam Energy Scan Phase-II for various observables.Collision Energy (GeV) 7.7 9.1 11.5 14.5 19.6µB (MeV) in 0-5% central collisions 420 370 315 260 205

Observables

RCP up to pT = 5 GeV/c – 160 125 92Elliptic Flow (f mesons) 100 150 200 200 400Chiral Magnetic Effect 50 50 50 50 50Directed Flow (protons) 50 75 100 100 200Azimuthal Femtoscopy (protons) 35 40 50 65 80Net-Proton Kurtosis 80 100 120 200 400Dileptons 100 160 230 300 400Required Number of Events 100 160 230 300 400

3.1.1. RCP of identified hadrons up to pT = 5 GeV/c High-pT suppression is seen as anindication of the energy loss of leading partons in a colored medium. Therefore, the RAA

measurements are one of the clearest signatures for the formation of the quark-gluon plasma.Because there was not a comparable p+ p energy scan, the BES analysis has had to resortto RCP measurements as a proxy. Still the study of the shape of RCP(pT ) will allow usto quantitatively address the evolution of the phenomenon of jet-quenching to lower beamenergies. A very clear change in behavior as a function of beam energy is seen in thesedata (see Figs. 12 and 13); at the lowest energies (7.7 and 11.5 GeV) there is no evidence ofsuppression for the highest pT values that are reached. However, it should be noted that forthese energies the BES-I measurements are only able to reach 3-4 GeV/c for inclusive hadronsand 2-3 GeV/c for identified hadrons. Typically, one considers pT of 5 GeV/c and above to bedominated by partonic behavior. Therefore, although the BES-I RCP results are suggestive ofa disappearance of this QGP signature, they are not conclusive. The pT reach expected in theproposed BES Phase-II measurements will be crucial in drawing definitive conclusions aboutevidence for the creation of QGP at a given collision energy.

Although the BES-I spectra do not reach high enough pT to extend into the purely hard-scattering regime, they do allow us to make detailed projections of how many events would beneeded to reach a given pT for a given beam energy. We propose to acquire about 400 tracksin the pT range of 4-5 GeV/c for the 11.5, 14.5, and 19.6 GeV energies. At the lower energiesof 7.7 and 9.1 GeV, there is simply not enough kinematic reach to get out to 4-5 GeV/c. Theserequired numbers of events are listed in Table 2

We have used the yields of identified particles measured in BES-I to make projections ofthe expected errors for the RCP measurements with increased statistics expected to be availablein BES Phase-II. For each particle species, energy, and centrality, we have used a exponentialextrapolation (note that this is a more conservative estimate than the power law extrapolation)to estimate the expected number of particles to be measured in each pT bin based on theexpected number of events at each energy shown in Table 2. From this expected number ofparticles per bin, we can estimate the statistical error for the central and peripheral bins andpropagate these to estimate the expected error on the RCP measurements. These projected

The Beam Energy Scan at the Relativistic Heavy Ion Collider 32

Table 2. Event statistics (in millions) needed for Beam Energy Scan Phase-II for various observables.Collision Energy (GeV) 7.7 9.1 11.5 14.5 19.6µB (MeV) in 0-5% central collisions 420 370 315 260 205

Observables

RCP up to pT = 5 GeV/c – 160 125 92Elliptic Flow (f mesons) 100 150 200 200 400Chiral Magnetic Effect 50 50 50 50 50Directed Flow (protons) 50 75 100 100 200Azimuthal Femtoscopy (protons) 35 40 50 65 80Net-Proton Kurtosis 80 100 120 200 400Dileptons 100 160 230 300 400Required Number of Events 100 160 230 300 400

3.1.1. RCP of identified hadrons up to pT = 5 GeV/c High-pT suppression is seen as anindication of the energy loss of leading partons in a colored medium. Therefore, the RAA

measurements are one of the clearest signatures for the formation of the quark-gluon plasma.Because there was not a comparable p+ p energy scan, the BES analysis has had to resortto RCP measurements as a proxy. Still the study of the shape of RCP(pT ) will allow usto quantitatively address the evolution of the phenomenon of jet-quenching to lower beamenergies. A very clear change in behavior as a function of beam energy is seen in thesedata (see Figs. 12 and 13); at the lowest energies (7.7 and 11.5 GeV) there is no evidence ofsuppression for the highest pT values that are reached. However, it should be noted that forthese energies the BES-I measurements are only able to reach 3-4 GeV/c for inclusive hadronsand 2-3 GeV/c for identified hadrons. Typically, one considers pT of 5 GeV/c and above to bedominated by partonic behavior. Therefore, although the BES-I RCP results are suggestive ofa disappearance of this QGP signature, they are not conclusive. The pT reach expected in theproposed BES Phase-II measurements will be crucial in drawing definitive conclusions aboutevidence for the creation of QGP at a given collision energy.

Although the BES-I spectra do not reach high enough pT to extend into the purely hard-scattering regime, they do allow us to make detailed projections of how many events would beneeded to reach a given pT for a given beam energy. We propose to acquire about 400 tracksin the pT range of 4-5 GeV/c for the 11.5, 14.5, and 19.6 GeV energies. At the lower energiesof 7.7 and 9.1 GeV, there is simply not enough kinematic reach to get out to 4-5 GeV/c. Theserequired numbers of events are listed in Table 2

We have used the yields of identified particles measured in BES-I to make projections ofthe expected errors for the RCP measurements with increased statistics expected to be availablein BES Phase-II. For each particle species, energy, and centrality, we have used a exponentialextrapolation (note that this is a more conservative estimate than the power law extrapolation)to estimate the expected number of particles to be measured in each pT bin based on theexpected number of events at each energy shown in Table 2. From this expected number ofparticles per bin, we can estimate the statistical error for the central and peripheral bins andpropagate these to estimate the expected error on the RCP measurements. These projected

-  The 2015 Long Range Plan for Nuclear Science -  Studying the Phase Diagram of QCD Matter at RHIC, BES-II Whitepaper

•  iTPC extent eta coverage •  45 pad rows to 72 pad rows

Current BFC Performance


•  Au+Au @ 62GeV •  ~7s per event •  ~39GeV during BES-II ?

•  Au+Au @ 200GeV •  ~40s per event

•  70+% time spend on tracking, CA tracking finding + Sti track fiting

STAR BFC Performance @ 39GeV

Sept 23rd, 2017 36

Memory [MB] CPU Time [s/event]

•  Consider the iTPC will increase the TPC data volume by ~50%, use 39GeV as estimation baseline.


Expected Rates during BES-II

Sept 23rd, 2017 37

•  DAQ hours per day during Run10 39GeV

•  ~70TB per day on average •  Need to be prepared for the worst

case scenario

D. Cebra


Available HLT Resources

Sept 23rd, 2017 38

²  Existing Computing Resources •  1192 CPU logical cores •  45 Xeon Phi 7110P Coprocessors (2 per node, each has

240 hardware threads, 16GB RAM) •  Up to 48T disk storage for online calibration, QA and etc.


dN/dx Status and Plans


Status •  Year ago new dN/dx method to use ionization measurement from TPC has

been proposed (https://drupal.star.bnl.gov/STAR/system/files/dNdx_rev3.pptx.pdf). The main goal is to improved particle identification (PiD) in iTPC era.

•  The main idea on the dN/dx method is to use model for shape of energy deposited in a given TPC cluster versus number of primary clusters created during passage of charged particle in TPC. The shape has been calculated using Photo absorption Ionization (PAI) Model (http://www.annualreviews.org/doi/abs/10.1146/annurev.ns.30.120180.001345).

•  dN/dx method has demonstrated improvement in PiD by ~15% using 2016 AuAu200 calibration sample.

•  It was proposed to use dN/dx for whole 2016 AuAu200 sample and to check its performance.

•  The production has been done and PWG have tested the dN/dx performance (https://drupal.star.bnl.gov/STAR/system/files/dNdxdEdx_Run16.pdf, https://drupal.star.bnl.gov/STAR/system/files/20170726_ComparisionStiSticaStihr.pdf )

Sept 23rd, 2017 40 CBM-STAR Joint Workshop, CCNU Wuhan

PWG test results

•  dN/dx PiD resolution is ~10% better than dE/dx one. •  dN/dx distribution has longer tails wrt dE/dx. •  dN/dx measured for kaons is significantly shifted wrt the model

prediction. • Usage dN/dx does NOT give any improvement in significance

of D0 signal wrt dE/dx. • Signal of ϕèK+K- is ~5% narrower for dN/dx wrt dE/dx (?).


Conclusions and plans for dN/dx NO obvious improvement with dN/dx wrt dE/dx is seen. The plan to improve it includes : 1.  It is necessary to improve dN/dx fit for tracks in order to

avoid long tails. 2.  It is necessary to correct model predictions of dN/dx for

kaons, protons, … 3.  It is necessary to revist dN/dx model in order to

understand the reason why do we need these corrected predictions.