Tan ([email protected])[email protected] COMPUTER FORENSICS.
Machine Learning: Chenhao Tan University of Colorado ...
Transcript of Machine Learning: Chenhao Tan University of Colorado ...
Machine Learning: Chenhao TanUniversity of Colorado BoulderLECTURE 6
Slides adapted from Jordan Boyd-Graber, Chris Ketelsen
Machine Learning: Chenhao Tan | Boulder | 1 of 39
• HW1 turned in• HW2 released• Office hour• Group formation signup
Machine Learning: Chenhao Tan | Boulder | 2 of 39
Overview
Feature engineering
Revisiting Logistic Regression
Feed Forward Networks
Layers for Structured Data
Machine Learning: Chenhao Tan | Boulder | 3 of 39
Feature engineering
Outline
Feature engineering
Revisiting Logistic Regression
Feed Forward Networks
Layers for Structured Data
Machine Learning: Chenhao Tan | Boulder | 4 of 39
Feature engineering
Feature Engineering
Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended...
(
(From Chris Harrison's WikiViz)
Machine Learning: Chenhao Tan | Boulder | 5 of 39
Feature engineering
Brainstorming
What are features useful for sentiment analysis?
Machine Learning: Chenhao Tan | Boulder | 6 of 39
Feature engineering
What are features useful for sentiment analysis?
• Unigram• Bigram• Normalizing options• Part-of-speech tagging• Parse-tree related features• Negation related features• Additional resources
Machine Learning: Chenhao Tan | Boulder | 7 of 39
Feature engineering
Sarcasm detection
“Trees died for this book?” (book)
• find high-frequency words and content words• replace content words with “CW”• extract patterns, e.g., “does not CW much about CW”
[Tsur et al., 2010]
Machine Learning: Chenhao Tan | Boulder | 8 of 39
Feature engineering
Sarcasm detection
“Trees died for this book?” (book)
• find high-frequency words and content words• replace content words with “CW”• extract patterns, e.g., “does not CW much about CW”
[Tsur et al., 2010]
Machine Learning: Chenhao Tan | Boulder | 8 of 39
Feature engineering
More examples: Which one will be retweeted more?
[Tan et al., 2014]https://chenhaot.com/papers/wording-for-propagation.html
Machine Learning: Chenhao Tan | Boulder | 9 of 39
Revisiting Logistic Regression
Outline
Feature engineering
Revisiting Logistic Regression
Feed Forward Networks
Layers for Structured Data
Machine Learning: Chenhao Tan | Boulder | 10 of 39
Revisiting Logistic Regression
Revisiting Logistic Regression
P(Y = 0 | x, β) = 11 + exp [β0 +
∑i βiXi]
P(Y = 1 | x, β) =exp [β0 +
∑i βiXi]
1 + exp [β0 +∑
i βiXi]
L = −∑
j
logP(y(j) | X(j), β)
Machine Learning: Chenhao Tan | Boulder | 11 of 39
Revisiting Logistic Regression
Revisiting Logistic Regression
• Transformation on x (we map class labels from {0, 1} to {1, 2}):
li = βTi x, i = 1, 2
oi =exp li∑
c∈{1,2} exp lc, i = 1, 2
• Objective function (using cross entropy −∑
i pi log qi):
L (Y, Y) = −∑
j
P(y(j) = 1) logP(yi = 1 | x(j), β) + P(y(j) = 0) log P(yi = 0 | Xi)
Machine Learning: Chenhao Tan | Boulder | 12 of 39
Revisiting Logistic Regression
Logistic Regression as a Single-layer Neural Network
x1
x2
. . .
xd
Inputlayer
l1
l2
Linear
o1
o2
Softmax
Machine Learning: Chenhao Tan | Boulder | 13 of 39
Revisiting Logistic Regression
Logistic Regression as a Single-layer Neural Network
x1
x2
. . .
xd
Inputlayer
o1
o2
SingleLayer
Machine Learning: Chenhao Tan | Boulder | 14 of 39
Feed Forward Networks
Outline
Feature engineering
Revisiting Logistic Regression
Feed Forward Networks
Layers for Structured Data
Machine Learning: Chenhao Tan | Boulder | 15 of 39
Feed Forward Networks
Deep Neural networks
A two-layer example (one hidden layer)
x1
x2
. . .
xd
Input Hidden
o1
o2
Output
Machine Learning: Chenhao Tan | Boulder | 16 of 39
Feed Forward Networks
Deep Neural networks
More layers:
x1
x2
. . .
xd
Input Hidden 1 Hidden 2 Hidden 3
o1
o2
Output
Machine Learning: Chenhao Tan | Boulder | 17 of 39
Feed Forward Networks
Forward propagation algorithm
How do we make predictions based on a multi-layer neural network?Store the biases for layer l in bl, weight matrix in Wl
x1
x2
. . .
xd
W1, b1 W2, b2 W3, b3 W4, b4
o1
o2
Machine Learning: Chenhao Tan | Boulder | 18 of 39
Feed Forward Networks
Forward propagation algorithm
Suppose your network has L layersMake a prediction based on text point x
1: Initialize a0 = x2: for l = 1 to L do3: zl = Wlal−1 + bl
4: al = g(zl)5: end for6: The prediction y is simply aL
Machine Learning: Chenhao Tan | Boulder | 19 of 39
Feed Forward Networks
Nonlinearity
What happens if there is no nonlinearity?
Linear combinations of linear combinations are still linear combinations.
Machine Learning: Chenhao Tan | Boulder | 20 of 39
Feed Forward Networks
Nonlinearity
What happens if there is no nonlinearity?Linear combinations of linear combinations are still linear combinations.
Machine Learning: Chenhao Tan | Boulder | 20 of 39
Feed Forward Networks
Neural networks in a nutshell
• Training data Strain = {(x, y)}• Network architecture (model)
y = fw(x)
• Loss function (objective function)
L (y, y)
• Learning (next lecture)
Machine Learning: Chenhao Tan | Boulder | 21 of 39
Feed Forward Networks
Nonlinearity Options
• Sigmoid
f (x) =1
1 + exp(x)
• tanhf (x) =
exp(x)− exp(−x)exp(x) + exp(−x)
• ReLU (rectified linear unit)f (x) = max(0, x)
• softmaxx =
exp(x)∑xiexp(xi)
https://keras.io/activations/
Machine Learning: Chenhao Tan | Boulder | 22 of 39
Feed Forward Networks
Nonlinearity Options
Machine Learning: Chenhao Tan | Boulder | 23 of 39
Feed Forward Networks
Loss Function Options
• `2 loss ∑i
(yi − yi)2
• `1 loss ∑i
|yi − yi|
• Cross entropy−∑
i
yi log yi
• Hinge loss (more on this during SVM)
max(0, 1− yy)
https://keras.io/losses/
Machine Learning: Chenhao Tan | Boulder | 24 of 39
Feed Forward Networks
A Perceptron Example
x = (x1, x2), y = f (x1, x2)
b
x1
x2
o1
We consider a simple activation function
f (z) =
{1 z ≥ 00 z < 0
Machine Learning: Chenhao Tan | Boulder | 25 of 39
Feed Forward Networks
A Perceptron Example
x = (x1, x2), y = f (x1, x2)
b
x1
x2
o1
We consider a simple activation function
f (z) =
{1 z ≥ 00 z < 0
Machine Learning: Chenhao Tan | Boulder | 25 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn OR?
x1 0 1 0 1x2 0 0 1 1
y = x1 ∨ x2 0 1 1 1
w = (1, 1), b = −0.5
b
x1
x2
o1
Machine Learning: Chenhao Tan | Boulder | 26 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn OR?
x1 0 1 0 1x2 0 0 1 1
y = x1 ∨ x2 0 1 1 1
w = (1, 1), b = −0.5
b
x1
x2
o1
Machine Learning: Chenhao Tan | Boulder | 26 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn AND?
x1 0 1 0 1x2 0 0 1 1
y = x1 ∧ x2 0 0 0 1
w = (1, 1), b = −1.5
b
x1
x2
o1
Machine Learning: Chenhao Tan | Boulder | 27 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn AND?
x1 0 1 0 1x2 0 0 1 1
y = x1 ∧ x2 0 0 0 1
w = (1, 1), b = −1.5
b
x1
x2
o1
Machine Learning: Chenhao Tan | Boulder | 27 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn NAND?
x1 0 1 0 1x2 0 0 1 1
y = ¬(x1 ∧ x2) 1 0 0 0
w = (−1,−1), b = 0.5
b
x1
x2
o1
Machine Learning: Chenhao Tan | Boulder | 28 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn NAND?
x1 0 1 0 1x2 0 0 1 1
y = ¬(x1 ∧ x2) 1 0 0 0
w = (−1,−1), b = 0.5
b
x1
x2
o1
Machine Learning: Chenhao Tan | Boulder | 28 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn XOR?
x1 0 1 0 1x2 0 0 1 1
x1 XOR x2 0 1 1 0
NOPE!But why?The single-layer perceptron is just a linear classifier, and can only learn things thatare linearly separable.How can we fix this?
Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn XOR?
x1 0 1 0 1x2 0 0 1 1
x1 XOR x2 0 1 1 0
NOPE!
But why?The single-layer perceptron is just a linear classifier, and can only learn things thatare linearly separable.How can we fix this?
Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn XOR?
x1 0 1 0 1x2 0 0 1 1
x1 XOR x2 0 1 1 0
NOPE!But why?
The single-layer perceptron is just a linear classifier, and can only learn things thatare linearly separable.How can we fix this?
Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn XOR?
x1 0 1 0 1x2 0 0 1 1
x1 XOR x2 0 1 1 0
NOPE!But why?The single-layer perceptron is just a linear classifier, and can only learn things thatare linearly separable.
How can we fix this?
Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks
A Perceptron Example
Simple Example: Can we learn XOR?
x1 0 1 0 1x2 0 0 1 1
x1 XOR x2 0 1 1 0
NOPE!But why?The single-layer perceptron is just a linear classifier, and can only learn things thatare linearly separable.How can we fix this?
Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks
A Perceptron Example
Increase the number of layers.
x1 0 1 0 1x2 0 0 1 1
x1 XOR x2 0 1 1 0
b
x1
x2
b
h1
h2
o1W1 =
[1 1−1 −1
], b1 =
[−0.51.5
]W2 =
[11
], b2 = −1.5
Machine Learning: Chenhao Tan | Boulder | 30 of 39
Feed Forward Networks
General Expressiveness of Neural Networks
Neural networks with a single hidden layer can approximate any measurablefunctions [Hornik et al., 1989, Cybenko, 1989].
Machine Learning: Chenhao Tan | Boulder | 31 of 39
Layers for Structured Data
Outline
Feature engineering
Revisiting Logistic Regression
Feed Forward Networks
Layers for Structured Data
Machine Learning: Chenhao Tan | Boulder | 32 of 39
Layers for Structured Data
Structured data
Spatial information
https://www.reddit.com/r/aww/comments/6ip2la/before_and_after_she_was_told_she_was_a_good_girl/
Machine Learning: Chenhao Tan | Boulder | 33 of 39
Layers for Structured Data
Convolutional Layers
Sharing parameters across patchesinput image
or input feature mapoutput feature maps
https://github.com/davidstutz/latex-resources/blob/master/tikz-convolutional-layer/convolutional-layer.tex
Machine Learning: Chenhao Tan | Boulder | 34 of 39
Layers for Structured Data
Structured data
Sequential information“My words fly up, my thoughts remain below: Words without thoughts never toheaven go.”
—Hamlet
• language• activity history
x = (x1, . . . , xT)
Machine Learning: Chenhao Tan | Boulder | 35 of 39
Layers for Structured Data
Structured data
Sequential information“My words fly up, my thoughts remain below: Words without thoughts never toheaven go.”
—Hamlet
• language• activity history
x = (x1, . . . , xT)
Machine Learning: Chenhao Tan | Boulder | 35 of 39
Layers for Structured Data
Structured data
Sequential information“My words fly up, my thoughts remain below: Words without thoughts never toheaven go.”
—Hamlet
• language• activity history
x = (x1, . . . , xT)
Machine Learning: Chenhao Tan | Boulder | 35 of 39
Layers for Structured Data
Recurrent Layers
Sharing parameters along a sequence
ht = f (xt, ht−1)
Machine Learning: Chenhao Tan | Boulder | 36 of 39
Layers for Structured Data
Recurrent Layers
Sharing parameters along a sequence
ht = f (xt, ht−1)
Long short-term memory
Machine Learning: Chenhao Tan | Boulder | 37 of 39
Layers for Structured Data
What is missing?
• How to find good weights?• How to make the model work (regularization, architecture, etc)?
Machine Learning: Chenhao Tan | Boulder | 38 of 39
Layers for Structured Data
References (1)
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, andSystems (MCSS), 2(4):303–314, 1989.
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universalapproximators. Neural networks, 2(5):359–366, 1989.
Chenhao Tan, Lillian Lee, and Bo Pang. The effect of wording on message propagation: Topic- andauthor-controlled natural experiments on twitter. In Proceedings of ACL, 2014.
Oren Tsur, Dmitry Davidov, and Ari Rappoport. ICWSM-A Great Catchy Name: Semi-Supervised Recognition ofSarcastic Sentences in Online Product Reviews. In Proceedings of ICWSM, 2010.
Machine Learning: Chenhao Tan | Boulder | 39 of 39