Armando Benitez -- Data x Desing
-
Upload
jorge-armando-benitez -
Category
Data & Analytics
-
view
35 -
download
0
Transcript of Armando Benitez -- Data x Desing
Machine Learning Applications
Armando Benitez BMO Capital Markets
Jul 18, 2016
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Located on the outskirts of Geneva. France - Switzerland
• 27 km in circumference
• The tunnel is buried around 50 to 175 m. underground.
2
LHC - CERN
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 3
Atlas Detector
Detector
Amplifier
Digitizer
selectionstorage
computers
Particle
signal
Trash
010010
5/6/03
Shabnam Jabeen (Kansas)
26
Trig
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 4
Multiple Algorithms in Parallel
!Reinhard Schwienhorst, Michigan State University
"#$%&'()&(%*+(,($-.&.+/*%012.!!!!!"##$%&'!!!!!!!!!!!!!!!!"()&$*(+!!!!!!!!!!,(%-*./&0*$*#+!1-&&$!!2&3-(4!2&%5#-6$!!74&8&+%$
Using another ML algorithm to combine the result of individual classifiers.
Purpose: extract all possible information from the Dataset.
The Combination produces an output, from where all measurements
are obtained
Combine
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 5
Mobile Market Place
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Data Processing and Modelling
Transaction grade
APIs + MQs
Data Lake
HBase, Cassandra,
etc.
Stream Processing
Batch Processing
Model Generator
Decision Engine
(context, event, data)
(event)(data)
Feature Selection
Model Training
Model Evaluation
Model Assembly
Real-Time Layer
Batch Processing Layer
{
Data Science
1. Fraud Detection 2. Search 3. Recommendations 4. Notifications 5. Ratings 6. Merchant Intelligence 7. Engagement
Optimization 8. Marketing Optimization 9. App Personalization 10. Ad Network Support 11. Image / Speech
Recognition
Theory (Math, Algorithms)
Proof-of-Concept (R, Python, Scala, C++)
Spark Implementation (Scalability, Robustness)
Platform Integration
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Fraud Detection
7
• Very small number of fraud cases
• Large number of good transactions
• Many different “types” of anomalies. Hard for algorithms to learn from positive examples what the anomalies look like
• Future anomalies may look nothing like any of the anomalous examples we’ve seen so far
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 8
Personalization
• Offers targeted for each user
• Use browsing history and shopping habits to determine products the user is most likely to buy
• Similarity among users
• Similarity among items
• Catalog search results
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 9
Incorporating ML to Design
Visual Inputs
Aural Inputs
Corporal Inputs
Environmental Inputs
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Machine Learning algorithm capable of discovering pattern with data presented to them. How can we make use of it?
• Find discovery opportunities that only are possible with the help of Machine Learning
• Designers and programmers to establish a strong collaboration to find ground-breaking applications.
• Understand rules to know which ones to bend or break
10
Creating Dialogue
Extra Slides
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 12
Search Strategy
Initial objects Found it!
15
)2Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Ev
en
ts /
( 0
.05
)
2
4
6
8
10
12
)2Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Ev
en
ts /
( 0
.05
)
2
4
6
8
10
12
)2Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Ev
en
ts /
( 0
.05
)
0
2
4
6
8
10
12
)2Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Ev
en
ts /
( 0
.05
)
0
2
4
6
8
10
12
FIG. 16: ��b mass distribution of background events from J/� sideband events after all selection cuts have been applied (top),and these events -red squares- on top of the signal observed in right-sign combination events -open circles- (bottom).
3. ⇥�b reconstruction on �b ⇥ J/⇥�(p��) MC events.We applied our ⇥�b selection on 30K generated �b ⇥ J/⇥�(p��) MC events. This is p17 MC with the samecuts at generation level as those applied to our ⇥�b MC, and reprocessed with the same extended configurationas used on data. No events survived after selection.
VI. CONCLUSIONS
By using a simple set of cuts we observe a signal peak with a mass of 5.774± 0.011 GeV/c2 (stat) ± 0.22 GeV/c2
(sys) and a width of 0.037 ± 0.008 GeV/c2, a significance of 5.53 and S/⇤
B = 7.80. This peak is showed in Fig. 12and the results of the fit are in Table II. This support the previous report of the observation by using Bagger DecisionTrees [6]. We measure a relative production ratio to be f(b⇥⇥�b )Br(⇥�b ⇥J/⇥⇥�(���))
f(b⇥�b)Br(�b⇥J/⇥�) = 0.376± 0.119stat.± 0.188syst
[1] PL B384 449, D. Buskalic et. al.[2] ZPHY C68 541 P. Abreu et al.[3] Common Samples Group, http://wwwd0.fnal.gov/Run2Physics/cs/.[4] See description of ”J/psi & dimuon mass continuum” at http://d0server1.fnal.gov/users/nomerot/Run2A/BANA/Dskim.html.[5] Reconstruction of B hadron signals at DØ , DØ Note 4481.[6] DØ Note 5401.
DØ Note 5403Version 4.1 as June 5, 2007
Observation of the heavy baryon ��b
E. De La Cruz Burelo, H.A. Neal, and J. QianUniversity of Michigan
B. AbbottUniversity of Oklahoma
G.D. Alexeev, Yu.P. Merekov, G.A. Panov, A.M. Rozhdestvensky, L.S. Vertogradov, Yu.L. VertogradovaJoint Institute for Nuclear Research, Russia
Using approximately 1.3 fb�1 of data collected by the upgraded DØ detector in Run II of theTevatron, the ⇤�b state has been observed in the decay mode J/⇤(⇤ µ+µ�)⇤�(⇤� ⇤ ⇥⇥±, ⇥⇤ ⇥p)A tracking algorithm which allows a more e⇧cient method of reconstructing tracks with large impactparameters was used in order to increase the e⇧ciency of reconstructing the ⇥ and ⇤�. We observethe ⇤�b with a significance of
��2� ln(L) = 5.53, S/
⌅B = 7.80 with a mass of 5.774 ± 0.011
GeV/c2 (stat) ± .022 GeV/c2 (sys). We measure the relative production ratio to be
f(b⇤ ⇤�b )Br(⇤�b ⇤ J/⇤⇤�(⇥⇥�))
f(b⇤ ⇥b)Br(⇥b ⇤ J/⇤⇥)= 0.376± 0.119 stat. ± 0.188 syst.
Data Cleaning Signal to Bkg20:1
Initial objects Found it!Data Cleaning Machine
Learning
9.4.2 Observed Results
tb+tqb DT Output0 0.2 0.4 0.6 0.8 1
Even
t Y
ield
0
200
400
600
800-1D0 RunII Prelim. 2.3 fb
channelµp17+p20 e+
1-2 b-tags
2-4 jets
tb+tqb DT Output0 0.2 0.4 0.6 0.8 1
Even
t Y
ield
0
200
400
600
800
tb+tqb DT Output0 0.2 0.4 0.6 0.8 1
Even
t Y
ield
210
310 -1D0 RunII Prelim. 2.3 fb channelµp17+p20 e+
1-2 b-tags
2-4 jets
tb+tqb DT Output0 0.2 0.4 0.6 0.8 1
Even
t Y
ield
210
310
tb+tqb DT Output0.8 0.85 0.9 0.95 1
Even
t Y
ield
0
20
40
60-1D0 RunII Prelim. 2.3 fb
channelµp17+p20 e+
1-2 b-tags
2-4 jets
tb+tqb DT Output0.8 0.85 0.9 0.95 1
Even
t Y
ield
0
20
40
60
Figure 9.4: Decision tree discriminant outputs for all 24 channels combined. Thehistograms are obtained by stacking each one of the 24 DT outputs on top of eachother. The Single Top contribution in this plot is normalized to the measured crosssection. The three plots correspond to the same distribution: linear scale (top left),log scale (top right) and a zoom in the signal region (bottom). The color key is shownin the bottom right-hand corner.
This section contains the decision tree observed Single Top cross section results
using the 2.3 fb−1 dataset. The decision tree discriminant distributions used for this
measurement are shown in Appendix A. The decision tree output for all 24 channels
combined 1 is shown in Figure 9.4.
1 The histograms are combined by stacking the individual decision tree individualoutputs.
180
Traditional searches
Small Signal Analysis
Signal to Bkg1:20
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
13
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
Signal
Signal
Bkg
Bkg
Bkg
Task: separate signal from background Issue: A single split on X or Y is not
enough!
Solution: Use a series of consecutive splits,
generating a tree structure
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
14
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
FailedC1
Split 1: on the X variable
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
PassedC1
P1F1
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
15
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
F: C1F: C2
Split 2: Recovered events that failed the split 1
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
PassedC1
P1F1
P2F2
F: C1P: C2
repeat and continue the splitting process until events are classified
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 16
Decision Trees
After 4 splits: Signal and Background regions are separated! Done!
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
P1F1
P2F2 P3F3
P4F4
Signal
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solvingthe classification problem of signal and background.
8.1 Overview
Consider the following two dimensional classification problem: in the plane (X,Y)
there are five regions that need to be separated, two of these regions are considered
signal and three are background (see Figure 8.1). The solution to this problem can
be seen as a simple binary tree which, by performing a series of disjoint cuts, the
plane is separated into five non-overlaping regions. Starting with a cut on the x-axis
for values of X < x1, the full sample is split in two subsets: right (R1) and left (L1).
Next, the L1 subset is divided into L2 and R2 by the cut Y < y1, similarly the R1
subset is split twice more until all five regions are defined. This procedure can be
scaled to more complex problems where the number of dimensions is much greater
than two, and the cuts criteria are more difficult to determine.
More generally, a Decision Tree is built by repeatedly splitting an initial set of
signal and background into two nodes. The criteria for a split is known as cut, where
each cut is selected to maximize the signal-to-background ratio on the node to be
split. The final nodes are know as leaves. A node is determined to be a leaf if the
143
F: C1P: C2
P: C1,C2F: C4
P: C1, C3,C4
F: C1,C2
P: C1F: C2
Toy model: only 2 variables, easy to determine cut values
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 17
A/B Testing
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Anomaly detection
19
๏ Fit model on training set
๏ On a cross validation/test example, predict
๏ Possible evaluation metrics:
๏ True positive, false positive, false negative, true negative
๏ Precision/Recall
๏ F1-score
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• The SM describes the world around us
• Components:
• 24 particles of matter
• 4 mediators
• Interactions of the particles explained by the mediators
• Does not include: gravity, dark matter and dark energy
20
Standard Model (SM)
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 21
Identity Resolution • What?
Identify products having similar properties (name, colour, size) as a unique product
• Why? Recommender systems trained on these products would produce better recommendations -> Non-repetitive
• How?
• Classifying pairs as match or non-match, based on how similar they are.
• Making use of catalog known features