Download - Identifying the Higgs Boson - Machine learningcs229.stanford.edu/proj2015/350_poster.pdfIdentifying the Higgs Boson Guy Amdur, Anton Apostolatos, Leonard Bronner LHS and the Higgs

Transcript
Page 1: Identifying the Higgs Boson - Machine learningcs229.stanford.edu/proj2015/350_poster.pdfIdentifying the Higgs Boson Guy Amdur, Anton Apostolatos, Leonard Bronner LHS and the Higgs

Identifying the Higgs BosonGuy Amdur, Anton Apostolatos, Leonard Bronner

LHS and the Higgs Boson

Data

Results Optimizations and Experiments

Vision

𝜂

𝜙

HiggsColorflowEnergy Image

𝜂

𝜙

GluonColorflowEnergy Image

The ATLAS detector at the LargeHadron Collider, recording energysnapshots of 40 million proton-protoncollisions a second, is tasked tofinding the elusive Higgs bosonsubatomic particle. In a super-j

In collaboration with ATLAS we were given access to energyimages for both Higgs boson and gluon decay. We had 50,00025x25 and 100x100 pixel images in 𝜂 and 𝜙 cylindricalcoordinates. These were pre-processed so that the jet center isat the center of an image, with resonance being kept constantin every sampled data point. The nature of the electromagneticmechanism of jet pull energy observation leads the largerimages to less accurately detect charged particles. Thus, thereis an inherent balance between higher resolution and lowerquality of information.

charged state, the particle decays in observable ways. Bymeasuring and analyzing the energy patterns of this decaywe can identify the subatomic particle itself.

Current state-of-the-art runs Fisher Discriminant Analysis onpull data, a feature class which characterizes the superstructureof the particle collision events. This classification method has aAUC of 0.667.

We achieved an AUC of 0.867 by running an AdaBoost treealgorithm with its base estimator being a Random Forestclassifier on the vectorized images. The AdaBoost classifier isan adaptive model that runs weaker learning algorithms,tweaking them and combining them into a weighted sumclassifier that is sensitive to outliers and noise. This method ofclassification is significantly better than current state-of-the-artmethods.

The finer granularity dataset did not lead to betterclassification, indicating that the higher resolution did not offsetthe loss of information that came as a result of larger images.

Colorflow Image FAST Keypoints

We experimented with various image descriptors, includingORB, BRIEF, and SIFT for the higher granularity images, withunsatisfactory results, with a maximum achieved AUC of 0.550.These descriptors weren’t able to capture many of the nuancesof the sparse pixel matrices. Image histograms of pixel densityalso proved unsuccessful, resulting in an AUC score of 0.660.

We evaluated many different supervised learning classificationmethods for this project and ended up finding decision treemodels to be the most successful. Interestingly, the final modelexcelled even with very little training data, indicating that evenjust a handful of samples gave enough information to makethese decision tree classifiers reach their full potential.