Boosted Top Tagging with Deep Neural Networks · Background rejection: No pile up Background...

Boosted Top Taggingwith Deep Neural Networks

Jannicke PearkesUniversity of British Columbia, Engineering Physics

Wojtek Fedorko, Alison Lister, Colin GayInter-Experimental Machine Learning Workshop

March 22nd, 2017

Overview

2

• Introduction • Method

– Monte Carlo Samples– Network architecture & training

• Results – Preprocessing– PT dependence– Pileup dependence– Learning what is being learnt

• Next Steps

Introduction

• Train a deep neural network to discriminate between jets originating from top quarks and those originating from QCD background

3

boost

Low top pTHigh top pT

W

b

W

bImage: Emily Thompson

Monte Carlo Samples• Signal: Z’ to ttbar• Background: Dijet• Generated with PYTHIA v8.219 NNPDF23 LO AS 0130 QED PDF• DELPHES v3.4.0 using default CMS card• Jets clustered using DELPHES energy-flow objects

• Anti-kT jets selected with R = 1.0• Trimming performed with kT algorithm and R = 0.2, pT frac = 5%

• Signal jets are selected where a truth top decays hadronically within 𝛥R= 0.75 of a large radius jet

• Jets are required to have 𝜂<= 2.0• Jets are subsampled to be flat in pT and signal-matched in eta• Looking at jets with pT between 600-2500 GeV

• ~ 4 million signal jets and ~4 million background jets • Sample divided into 80%, 10%, 10% for training, validation and testing

4

Examples of Jet Images

5

�1.0 �0.5 0.0 0.5 1.0Translated pseudorapidity ⌘

�1.0

�0.5

0.0

0.5

1.0

Tran

slat

edaz

imut

hala

ngle

�

Background jet with pT = 1370 GeV

10�4

10�3

10�2

10�1

100

Jetp

Tpe

rpix

el[G

eV]


�1.0

�0.5

0.0

0.5

1.0

Tran

slat

edaz

imut

hala

ngle

�


10�4

10�3

10�2

10�1

100

Jetp

Tpe

rpix

el[G

eV]


�1.0

�0.5

0.0

0.5

1.0

Tran

slat

edaz

imut

hala

ngle

�


10�4

10�3

10�2

10�1

100

Jetp

Tpe

rpix

el[G

eV]


�1.0

�0.5

0.0

0.5

1.0

Tran

slat

edaz

imut

hala

ngle

�

Signal jet with pT = 781 GeV

10�4

10�3

10�2

10�1

100

Jetp

Tpe

rpix

el[G

eV]


�1.0

�0.5

0.0

0.5

1.0

Tran

slat

edaz

imut

hala

ngle

�


10�4

10�3

10�2

10�1

100

Jetp

Tpe

rpix

el[G

eV]


�1.0

�0.5

0.0

0.5

1.0

Tran

slat

edaz

imut

hala

ngle

�


10�4

10�3

10�2

10�1

100

Jetp

Tpe

rpix

el[G

eV]

Jet images are typically very sparse roughly 5-10% pixel activation on average if using a 0.1x0.1 grid [1][1] L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartzman, Jet-images -- deep learning edition, JHEP 07 (2016) 069, arXiv:1511.05190 [hep-ph].

Neural Network Inputs

• Use sequence of jet constituents rather than image

• Advantages: – No loss of information due to pixelization in an image– Inputs are more information dense

• Using 120 constituents average activation is 30%-50%

6

Training and Network Architecture

• Implemented with Keras• Initially planned on using an LSTM, but ended up using a fully connected network • We found that performance between the LSTM and the fully connected network was

very similar, but the deep networks were much faster to train (~10 times) which allowed for faster experimentation with preprocessing techniques and network architectures

7

Network Type Fully connected

Number oflayers

5,[300,150,50,10,5,1]

Number of free parameters

41,323

Activation function

Rectified linear units, sigmoid on output

Optimizer Adam

Loss Binary Cross-Entropy

Early Stopping Patience of 5

Preprocessing

Preprocessing

• Large radius, R = 1.0, jets are trimmed using subjets R = 0.2 found with the kT algorithm with and pT frac = 5%

• Order subjets by subjet pT and jet constituent pTwithin each subjet

• We use only the 120 highest pT jet constituents• Perform preprocessing using domain knowledge

about the physics at hand

9

No Preprocessing

10

0.0 0.2 0.4 0.6 0.8 1.0Top Tagging Efficiency

100

101

102

103

Bac

kgro

und

Rej

ectio

nJet pT = 600 - 2500 GeV

Trimming only

Trimming onlyAUC = 0.83Rϵ = 50% = 8.85Rϵ = 80% = 3.36

Scale

• Scale pT of all jet constituents by a common factor to ensure that the constituent pT is approximately between 0 and 1

11


100

101

102

103

Bac

kgro

und

Rej

ectio

nJet pT = 600 - 2500 GeV

Trimming onlyScale

Scale

12

ScalingAUC = 0.900Rϵ = 50% = 21.3Rϵ = 80% = 6.02

Translate

• Center jet about highest pT subjetin 𝜂, 𝜙 plane

13


100

101

102

103

Bac

kgro

und

Rej

ectio

nJet pT = 600 - 2500 GeV

Trimming onlyScaleTranslation

Translate

14

TranslationAUC = 0.924Rϵ = 50% = 33.2Rϵ = 80% = 8.48

Rotate• Designed method of rotations

to preserve jet mass• Transform 𝑝', 𝜂, 𝜙 into

𝑝), 𝑝*,, 𝑝+• Rotate so that second highest

pT subjet is aligned with negative y-axis:

• Transform (𝑝), 𝑝*,, 𝑝+) back to 𝑝', 𝜂, 𝜙

15


100

101

102

103

Bac

kgro

und

Rej

ectio

nJet pT = 600 - 2500 GeV

Trimming onlyScaleTranslationRotation

Rotate

16

RotationAUC = 0.932Rϵ = 50% = 42.3Rϵ = 80% = 9.57

Flip

• Third subjet is not constrained, but can be moved to right half of plane

• Flip jet if average pT is in left half of plane

17

Flip

18


100

101

102

103

Bac

kgro

und

Rej

ectio

nJet pT = 600 - 2500 GeV

Trimming onlyScaleTranslationRotationFlip

FlipAUC = 0.933Rϵ = 50% = 44.3Rϵ = 80% = 9.75

Performance onTruth vs Reconstructed Jets

Performance after preprocessing

20


100

101

102

103B

ackg

roun

dR

ejec

tion

Jet pT = 600 - 2500 GeV

DNN, truth⌧32, truthDNN, reco⌧32, reco

Performance at 50% overall Signal Efficiency

21

600 800 1000 1200 1400 1600 1800 2000 2200 2400Jet pT [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

Sig

nale

ffici

ency

Signal efficiencyBackground rejection

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

600 800 1000 1200 1400 1600 1800 2000 2200 2400Jet pT [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

Sig

nale

ffici

ency

Signal efficiencyBackground rejection

0

10

20

30

40

50

60

70

80

Bac

kgro

und

reje

ctio

n

Reconstructed JetsTruth Jets

AUC = 0.947Rϵ = 50% = 66Rϵ = 80% = 13

AUC = 0.933Rϵ = 50% = 44Rϵ = 80% = 9.7

Pileup

Performance at different levels of pileup

23


100

101

102

103B

ackg

roun

dR

ejec

tion

Jet pT = 600 - 2500 GeV

No pile upPile up = 23Pile up = 50

Extremely stable performance with respect to pileup

24

600 800 1000 1200 1400 1600 1800 2000 2200 2400Jet pT [GeV]

0.0

0.2

0.4

0.6

0.8

1.0S

igna

leffi

cien

cy

Signal efficiency: No pile upSignal efficiency: Pile up = 23Signal efficiency: Pile up = 50

Background rejection: No pile upBackground rejection: Pile up = 23Background rejection: Pile up = 50

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

Performance at different levels of pileup

pT dependence also stable with respect to pileup

Learning what is being learnt

0 100 200 300 400 500Jet mass [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

DN

Nou

tput

Background Jets

0.000

0.015

0.030

0.045

0.060

0.075

0.090

0.105

0.120

P(J

etm

ass

[GeV

]|DN

Nou

tput

)

Jet Mass

26

0 50 100 150 200 250 300 350 400

Jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014Flat pT distribution600 < jet pT < 2500 GeV

SignalBackground

0 100 200 300 400 500Jet mass [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

DN

Nou

tput

Background Jets

0.000

0.015

0.030

0.045

0.060

0.075

0.090

0.105

0.120

P(J

etm

ass

[GeV

]|DN

Nou

tput

)

Jet Mass

27

0 50 100 150 200 250 300 350 400

Jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

0.012


SignalBackground

Next StepsShort term:• We plan to revisit LSTMs• Thorough Bayesian hyper-parameter optimization

Longer term:• Both top and W tagging with deep neural networks now

reasonably well-established on Monte Carlo• “But does it work on data?”• Start working towards evaluating the performance of these

techniques on data • Investigate effects of systematics and strategies for

mitigating the impact of systematics

28

Thank you!

29

W-tagging performance on truth

30

QCD-Aware Recursive Neural Networks for Jet Physics.Louppe, Cho, Becot, Cranmer https://arxiv.org/abs/1702.00748

Zooming

31

Parton Shower Uncertainties in Jet Substructure Analyses with Deep Neural Networks Barnard, Dawe, Dolan, Rajcic https://arxiv.org/pdf/1609.00607v2.pdf

Performance when trained and tested on different levels of pileup

600 800 1000 1200 1400 1600 1800 2000 2200 2400Jet pT [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

Sig

nale

ffici

ency

Signal efficiency: NN trained on µ = 23 tested on µ = 0Signal efficiency: NN trained on µ = 23 tested on µ = 23Signal efficiency: NN trained on µ = 23 tested on µ = 50Background rejection: NN trained on µ = 23 tested on µ = 0Background rejection: NN trained on µ = 23 tested on µ = 23Background rejection: NN trained on µ = 23 tested on µ = 50

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

32

600 800 1000 1200 1400 1600 1800 2000 2200 2400Jet pT [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

Sig

nale

ffici

ency


0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

600 800 1000 1200 1400 1600 1800 2000 2200 2400Jet pT [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

Sig

nale

ffici

ency


0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

0

10

20

30

40

50

60

Bac

kgro

und

reje

ctio

n

- Examined how a neural network trained at one pileup level performs on another level of pileup

- NN seems relatively robust to changes in pileup expected at the LHC in the next few years

0 100 200 300 400 500Jet mass [GeV]

0.0

0.2

0.4

0.6

0.8

1.0

DN

Nou

tput

Background Jets

0.000

0.015

0.030

0.045

0.060

0.075

0.090

0.105

0.120

P(J

etm

ass

[GeV

]|DN

Nou

tput

)

Jet Mass

33

0 50 100 150 200 250 300 350 400

Jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

0.012


SignalBackground

34

0.0 0.2 0.4 0.6 0.8 1.0 1.2

⌧32

0.0

0.5

1.0

1.5

2.0


SignalBackground

0.0 0.2 0.4 0.6 0.8 1.0⌧wta

32

0.0

0.2

0.4

0.6

0.8

1.0

DN

Nou

tput

Background Jets

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

P(⌧

wta

32|D

NN

outp

ut)

0.0 0.2 0.4 0.6 0.8 1.0 1.2

⌧32

0.0

0.5

1.0

1.5

2.0


SignalBackground

Boosted Top Tagging with Deep Neural Networks · Background rejection: No pile up Background...

Documents

Transcript of Boosted Top Tagging with Deep Neural Networks · Background rejection: No pile up Background...