July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel...

16
July 11, 2 001 Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley

Transcript of July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel...

July 11, 2001Daniel Whiteson

Support Vector Machines:

Get more Higgs out of your data

Daniel WhitesonUC Berkeley

July 11, 2001Daniel Whiteson

Multivariate algorithms

Square cuts may work well for simpler tasks, but as the data are multivariate, the algorithms also must be.

July 11, 2001Daniel Whiteson

Multivariate Algorithms

• HEP overlaps with Computer Science, Mathematics and Statistics in this area:How can we construct an algorithm that

can be taught by example and generalize effectively?

• We can use solutions from those fields:Neural NetworksProbability Density EstimatorsSupport Vector Machines

July 11, 2001Daniel Whiteson

Neural Networks

• Decision function learned using freedom in hidden layers.– Used very effectively as signal discriminators, particle

identifiers and parameter estimators– Fast evaluation makes them suited to triggers

• Constructed from a very simple object, they can learn complex patterns.

July 11, 2001Daniel Whiteson

Probability Density Estimation

If we knew the distributions of the signal fs(x)

and the background fb(x),

Then we could calculate

And use it to discriminate.

)()(

)()(

xx

xx

bs

ss

ff

fP

Example disc. surface

July 11, 2001Daniel Whiteson

Probability Density Estimation

Of course we do not know the analytical distributions.

• Given a set of points drawn from a distribution, put down a kernel centered at each point.

• With high statistics, this approximates a smooth probability density.

Surface with many kernels

July 11, 2001Daniel Whiteson

Probability Density Estimation

• Simple techniques have advanced to more sophisticated approaches:– Adaptive PDE

• varies the width of the kernel for smoothness

– Generalized for regression analysis• Measure the value of a continuous parameter

– GEM • Measures the local covariance and adjusts

the individual kernels to give a more accurate estimate.

July 11, 2001Daniel Whiteson

Support Vector Machines• PDEs must evaluate a kernel at every training point for every

classification of a data point.• Can we build a decision surface that only uses the relevant bits

of information, the points in training set that are near the signal-background boundary?

For a linear, separable case, this is not too difficult. We simply need to find the hyperplane that maximizes the separation.

July 11, 2001Daniel Whiteson

jijijijii

i yyL xx ,2

1

(xi,yi) are training data i are positive Lagrange multipliers

(images from applet at http://svm.research.bell-labs.com/)

Support Vector Machines

• To find the hyperplane that gives the highest separation (lowest “energy”), we maximize the Lagrangian w.r.t i:

i

iii y xw

The solution is:

Where i=0 for non support vectors

July 11, 2001Daniel Whiteson

Support Vector Machines

But not many problems of interest are linear.Map data to higher dimensional space where separation can be made by hyperplanes

We want to work in our original space. Replace dot product with kernel function:

HdR:

)()(),( jijiK xxxx 3)1(),( jijiK xxxxFor these data, we need

)()( jiji xxxx

July 11, 2001Daniel Whiteson

Support Vector Machines

Neither are entirely separable problems very difficult.

• Allow an imperfect decision boundary, but add a penalty.

• Training errors, points on the wrong side of the boundary, are indicated by crosses.

July 11, 2001Daniel Whiteson

Support Vector Machines

We are not limited tolinear or polynomialkernels.

222/

),(jieK ji

xxxx

Gaussian kernel SVMs outperformed PDEs in recognizing handwrittennumbers from the USPS database.

Gives a highly flexible SVM

July 11, 2001Daniel Whiteson

Comparative study for HEP

Discriminator Value

2-dimensionaldiscriminant with variables Mjj and Ht

Neural Net

PDE

SVM

Signal: Wh to bb

Background: Wbb

Background: tt

Background: WZ

July 11, 2001Daniel Whiteson

Comparative study for HEPSignal to Noise Enhancement

Discriminator Threshold

All of thesemethods providepowerful signal enhancement

Efficiency 49%

Efficiency 50%Efficiency 43%

July 11, 2001Daniel Whiteson

Algorithm Comparisons

Algorithm Advantages Disadvantages

Neural Nets

•Very fast evaluation

•Build structure by hand•Black box•Local optimization

PDE •Transparent operation

•Slow evaluation•Requires high statistics

SVM •Fast evaluation•Kernel positions chosen automatically•Global optimization

•Complex•Training can be time intensive•Kernel selection by hand

July 11, 2001Daniel Whiteson

Conclusions

• Difficult problems in HEP overlap with those in other fields. We can take advantage of our colleagues’ years of thought and effort.

• There are many areas of HEP analysis where intelligent multivariate algorithms like NNs, PDEs and SVMs can help us conduct more powerful searches and make more precise measurements.