Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP:...
Transcript of Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP:...
![Page 1: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/1.jpg)
Feature Selection Topics
Sergei V. Gleyzer
Data Science at the LHC Workshop
Nov. 9, 2015
![Page 2: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/2.jpg)
Outline
• Motivation
• What is Feature Selection
• Feature Selection Methods
• Recent work and ideas
• Caveats
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 2
![Page 3: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/3.jpg)
Motivation
• Common Machine Learning (ML)
problems in HEP:
– Classification or class discrimination
• Higgs event or background?
– Regression or function estimation
• How to best model particle energy based on
detector measurements
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 3
![Page 4: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/4.jpg)
Motivation continued
• While performing data analysis one of
the most crucial decisions is which
features to use
– Garbage In = Garbage Out
– Ingredients:
• Relevance to the problem
• Level of understanding of the feature
• Power of the feature and its relationship with
others
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 4
![Page 5: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/5.jpg)
Goal
• How to:
Select
Assess
Improve
Feature set
used to solve the problem
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 5
![Page 6: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/6.jpg)
Example
• Build a classifier to discriminate events of
different classes based on event kinematics
• Typical initial feature set:
– Functions of object four-vectors in event
– Basic kinematics: transverse momenta,
invariant masses, angular separations
• More complex features relating objects in the event
topology using physics knowledge to help
discriminate among classes (thrusts, helicity e.t.c.)
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 6
![Page 7: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/7.jpg)
Initial Selection
• Features initially chosen due to their
individual performance
– How well does X discriminate between
signal and background?
• Vetos: Is X well-understood?
• Theoretical and other uncertainties
• Monte-Carlo and data agreement
– Arrive at order of 10-30 features (95% use
cases)
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 7
![Page 8: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/8.jpg)
Feature Engineering
• By combining features with each other, boosting into other frames of reference* this set can grow quickly from tens to hundreds of features
– That’s ok if you have enough of computational power
• Still small compared to 100k features of cancer/image recognition datasets
– Balance between Occam’s razor and need for additional performance/power
* JHEP 1104:069,2011 K. Black, et. al.
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 8
![Page 9: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/9.jpg)
Practicum
Training Set
Feature Selection
Model Building
f()
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 9
Training/Test Set
Feature Selection*
Model Building
f()
Test Set f()Estimate
Performance
*Feature Selection Bias
✓
✗
✓
![Page 10: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/10.jpg)
Methods
Filters
Wrappers
Embedded-Hybrid
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 10
Feature Selection
Model Building
Model Building
Feature Selection
Feature Selection during Model
Building
![Page 11: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/11.jpg)
• Filters: usually fast
– No feedback from the Classifier
– Use correlations/mutual information gain
“quick and dirty” and less accurate
Useful in pre-processing
Example algorithms: information gain, Relief, rank-sum test, e.t.c.
Feature Selection
Model Building
Filter Methods
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 11
![Page 12: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/12.jpg)
Wrapper Methods
• Wrappers: typically slower and relatively more accurate (due to model-building)– Tied to a chosen model:
• Use it to evaluate features
• Assess feature interactions
• Search for optimal subset of features
– Different types: • Methodical
• Probabilistic (random hill-climbing)
• Heuristic (forward backward elimination)
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 12
Model Building
Feature Selection
![Page 13: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/13.jpg)
• Feature Importance proportional to classifier
performance in which
feature participates
• Full feature set {V}
• Feature subsets {S}
• Classifier performance F(S)
• Fast stochastic version
uses random subset seeds
Ex: Feature Importance
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 13
FI(Xi ) = F(S)´WXi(S)
SÍV :XiÎS
å
)(
}){(1)(
SF
XSFSW i
X i
![Page 14: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/14.jpg)
Example: RuleFit
• Rulefit: rule-based binary classification and regression (J. Friedman)– Transforms decision trees into rule ensembles– A powerful classifier even if some rules are
poor
• Feature Importance:– Proportional to performance of classifiers in
which features participate (similar)– Difference: no Wi(S)
• Individual Classifier Performance evenly divided among participating features
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 14
![Page 15: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/15.jpg)
Selection Caveats
• Feature Selection Bias
– Common mistake leading to over-optimistic evaluation of classifiers from “usage” of the testing dataset
– Solution:
• M-fold cross-validation/Bootstrap
• Second testing sample for evaluation
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 15
![Page 16: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/16.jpg)
Feature Interactions
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 16
![Page 17: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/17.jpg)
Feature Interactions
• Features often interact strongly in the classification process.
– Their removal affects the performance of remaining interacting partners
• Strength of interaction quantifiedby some wrapper methods
– In some classifiers features can be overlooked (or shadowed) by their interacting partners
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 17
Beware of hidden reefs
![Page 18: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/18.jpg)
Selection Caveats
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 18
After
Remove
Before
Importance Landscape Has Changed
Holds for any criterion that doesn’t incorporate interactions
![Page 19: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/19.jpg)
Global Loss Function
• GLoss Function Global measure of loss
– Selects feature subsets for global removal
• Shows the amount of predictive power loss
relative to the upper bound of performance of
remaining classifiers
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 19
S’ is the subset
to be removed'||
)'(
2
)(
1)'(SV
SVS
SF
SGF
'||
)'(
max 2)( SV
SVS
SF
![Page 20: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/20.jpg)
Global Loss and Classifier
Performance
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 20
GLoss Function minimization NOT EQUIVALENT to
Maximization of F(S) – i.e. finding the highest performing
classifier and its constituent features
![Page 21: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/21.jpg)
Recent Work
• Probabilistic Wrapper Methods:
– Stochastic approach
• Hybrid Algorithms:
– combine Filters and Wrappers
• Embedded Methods
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 21
Feature Selection during Model
Building
Feature Selection
Model Building
Feature Selection
![Page 22: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/22.jpg)
Embedded Methods
• At model-building stage assess feature importance and incorporate it in the process
– Way to penalize/remove features in the classification or regression process
• Regularization
• Examples: LASSO, RegTrees
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 22
![Page 23: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/23.jpg)
Regularized Trees
• Inspired by Rules Regularization in
Friedman and Popescu 2008
• Decision Tree Reminder:
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 23
Pt_Jet1Jet2
Ht_AllJets QTimesEta
Shat DeltaRJet1Jet2
BGND
BGND
BGND
BGNDSIGNAL SIGNAL
< 80.46
< 140.1
< 349.3
< 1.12
> 80.46
> 140.1
> 349.3 > 3.05 < 3.05
“Votes” taken at decision
junctions on possible
splits among the features
RegTrees penalize during
voting features similar to
those used in previous
decisions
End up with a high
quality feature set
classifier
![Page 24: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/24.jpg)
Feature Amplification
• Another example: feedback feature
importance into classifier building
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 24
Pt_Jet1Jet2
Ht_AllJets QTimesEta
Shat DeltaRJet1Jet2
BGND
BGND
BGND
BGNDSIGNAL SIGNAL
< 80.46
< 140.1
< 349.3
< 1.12
> 80.46
> 140.1
> 349.3 > 3.05 < 3.05
Weight votes by
log(feature importance)
![Page 25: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/25.jpg)
The Decade Ahead
Data
New Theory 1
New
Theory 2
p(Data | Theory)
SM me, mµ, mτ
mu, md, ms, mc, mb, mt
θ12, θ23, θ13, δ
g1, g2, g3
θQCD
µ, λ
Basic statistical questions:
1. Which theories are preferred, given the data?
2. And which parameter sub-spaces within these theories?
All interesting theories are multi-parameter models
p(Theory | Data)
Decade Ahead
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 25
H. Prosper
![Page 26: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/26.jpg)
Minimal SUSY
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 26
33
![Page 27: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/27.jpg)
In HEP
• Often in HEP one searches for new phenomena and applies classifiers trained on MC for at least one of the classes (signal) or sometimes both to real data– Flexibility is KEY to any search
– It is more beneficial to choose a reduced parameter space that consistently produces strong performing classifiers at actual analysis time • Useful for general SUSY and other new phenomena
searches
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 27
![Page 28: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/28.jpg)
Feature Selection Tools
• R (CRAN): Boruta, RFE, CFS, Fselector, caret
• TMVA: FAST algo (stochastic wrapper), Global Loss function
• Scikit-Learn
• Bioconductor
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 28
![Page 29: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function](https://reader033.fdocuments.in/reader033/viewer/2022042109/5e89a3e3c339af18f14ce996/html5/thumbnails/29.jpg)
Summary
• Feature selection is important part of robust HEP Machine Learning applications
• Many methods available
• Watch out for caveats
• Happy ANALYZING
Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 29