Identifying the Higgs Boson - Machine learningcs229.stanford.edu/proj2015/350_poster.pdfIdentifying...

1
Identifying the Higgs Boson Guy Amdur, Anton Apostolatos, Leonard Bronner LHS and the Higgs Boson Data Results Optimizations and Experiments Vision Higgs Colorflow Energy Image Gluon Colorflow Energy Image The ATLAS detector at the Large Hadron Collider, recording energy snapshots of 40 million proton-proton collisions a second, is tasked to finding the elusive Higgs boson subatomic particle. In a super- j In collaboration with ATLAS we were given access to energy images for both Higgs boson and gluon decay. We had 50,000 25x25 and 100x100 pixel images in and cylindrical coordinates. These were pre-processed so that the jet center is at the center of an image, with resonance being kept constant in every sampled data point. The nature of the electromagnetic mechanism of jet pull energy observation leads the larger images to less accurately detect charged particles. Thus, there is an inherent balance between higher resolution and lower quality of information. charged state, the particle decays in observable ways. By measuring and analyzing the energy patterns of this decay we can identifythe subatomic particle itself. Current state-of-the-art runs Fisher Discriminant Analysis on pull data, a feature class which characterizes the superstructure of the particle collision events. This classification method has a AUC of 0.667. We achieved an AUC of 0.867 by running an AdaBoost tree algorithm with its base estimator being a Random Forest classifier on the vectorized images. The AdaBoost classifier is an adaptive model that runs weaker learning algorithms, tweaking them and combining them into a weighted sum classifier that is sensitive to outliers and noise. This method of classification is significantly better than current state-of-the-art methods. The finer granularity dataset did not lead to better classification, indicating that the higher resolution did not offset the loss of information that came as a result of larger images. Colorflow Image FAST Keypoints We experimented with various image descriptors, including ORB, BRIEF, and SIFT for the higher granularity images, with unsatisfactory results, with a maximum achieved AUC of 0.550. These descriptors weren’t able to capture many of the nuances of the sparse pixel matrices. Image histograms of pixel density also proved unsuccessful, resulting in an AUC score of 0.660. We evaluated many different supervised learning classification methods for this project and ended up finding decision tree models to be the most successful. Interestingly, the final model excelled even with very little training data, indicating that even just a handful of samples gave enough information to make these decision tree classifiers reach their full potential.

Transcript of Identifying the Higgs Boson - Machine learningcs229.stanford.edu/proj2015/350_poster.pdfIdentifying...

Page 1: Identifying the Higgs Boson - Machine learningcs229.stanford.edu/proj2015/350_poster.pdfIdentifying the Higgs Boson Guy Amdur, Anton Apostolatos, Leonard Bronner LHS and the Higgs

Identifying the Higgs BosonGuy Amdur, Anton Apostolatos, Leonard Bronner

LHS and the Higgs Boson

Data

Results Optimizations and Experiments

Vision

𝜂

𝜙

HiggsColorflowEnergy Image

𝜂

𝜙

GluonColorflowEnergy Image

The ATLAS detector at the LargeHadron Collider, recording energysnapshots of 40 million proton-protoncollisions a second, is tasked tofinding the elusive Higgs bosonsubatomic particle. In a super-j

In collaboration with ATLAS we were given access to energyimages for both Higgs boson and gluon decay. We had 50,00025x25 and 100x100 pixel images in 𝜂 and 𝜙 cylindricalcoordinates. These were pre-processed so that the jet center isat the center of an image, with resonance being kept constantin every sampled data point. The nature of the electromagneticmechanism of jet pull energy observation leads the largerimages to less accurately detect charged particles. Thus, thereis an inherent balance between higher resolution and lowerquality of information.

charged state, the particle decays in observable ways. Bymeasuring and analyzing the energy patterns of this decaywe can identify the subatomic particle itself.

Current state-of-the-art runs Fisher Discriminant Analysis onpull data, a feature class which characterizes the superstructureof the particle collision events. This classification method has aAUC of 0.667.

We achieved an AUC of 0.867 by running an AdaBoost treealgorithm with its base estimator being a Random Forestclassifier on the vectorized images. The AdaBoost classifier isan adaptive model that runs weaker learning algorithms,tweaking them and combining them into a weighted sumclassifier that is sensitive to outliers and noise. This method ofclassification is significantly better than current state-of-the-artmethods.

The finer granularity dataset did not lead to betterclassification, indicating that the higher resolution did not offsetthe loss of information that came as a result of larger images.

Colorflow Image FAST Keypoints

We experimented with various image descriptors, includingORB, BRIEF, and SIFT for the higher granularity images, withunsatisfactory results, with a maximum achieved AUC of 0.550.These descriptors weren’t able to capture many of the nuancesof the sparse pixel matrices. Image histograms of pixel densityalso proved unsuccessful, resulting in an AUC score of 0.660.

We evaluated many different supervised learning classificationmethods for this project and ended up finding decision treemodels to be the most successful. Interestingly, the final modelexcelled even with very little training data, indicating that evenjust a handful of samples gave enough information to makethese decision tree classifiers reach their full potential.