Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf ·...
Transcript of Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf ·...
![Page 1: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/1.jpg)
Applied Machine LearningLecture 12: Time series and sequential models
Selpi ([email protected])
The slides are further development of Richard Johansson's slides
March 3, 2020
![Page 2: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/2.jpg)
Example of tasks related to sequences
I sequence (sentiment/review) classi�cation
I machine translation
I forecasting load of electricity
I recognising activities from videos
I ....
![Page 3: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/3.jpg)
overview
I how can we represent and predict sequential data usingmachine learning models?
I use classical approachesI use tailored neural network approaches
![Page 4: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/4.jpg)
overview
I how can we represent and predict sequential data usingmachine learning models?I use classical approachesI use tailored neural network approaches
![Page 5: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/5.jpg)
sequence (review/sentiment) classi�cation
I RECOMMEND this movie!
↓
positive
![Page 6: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/6.jpg)
sequence prediction (1)
G O T H E N B
↓
U
![Page 7: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/7.jpg)
sequence prediction (2)
![Page 8: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/8.jpg)
sequence tagging
United Nations o�cial Ekeus heads for Baghdad .
↓
B-ORG I-ORG O B-PER O O B-LOC O
A C A T G G T C T G A A
↓
N N C C C C C C C C C N
![Page 9: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/9.jpg)
Machine translation
This is a very nice school building.
↓
Detta är en mycket trevlig skolbyggnad.
![Page 10: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/10.jpg)
Overview
sequence prediction: classical approaches
neural networks for sequential problems
Di�erent types of RNNs
![Page 11: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/11.jpg)
Use "reduce problem" concept
I a reduction in machine learning means that we convert acomplicated problem into (one or more) simpler problemsI example: multiclass classi�cation → several binary
I let's see how we attack sequential problems as a series ofclassi�cation or regression tasks
![Page 12: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/12.jpg)
step-by-step prediction: general ideas
I break down the problem into a sequence of smallerdecisionsI think of it as a system that gradually consumes input and
generates output
I use standard classi�ers or regressors to guess the next stepI use features from the input and from the historyI might need to constrain predictions to keep output consistent
![Page 13: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/13.jpg)
example: named entity tagging
I input: a sentence
I output: names in the sentence bracketed and labeled
United Nations o�cial Ekeus heads for Baghdad.[ ORG ] [ PER ] [ LOC ]
![Page 14: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/14.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O B-PER O O B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 15: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/15.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG
I-ORG O B-PER O O B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 16: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/16.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG
O B-PER O O B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 17: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/17.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O
B-PER O O B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 18: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/18.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O B-PER
O O B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 19: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/19.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O B-PER O
O B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 20: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/20.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O B-PER O O
B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 21: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/21.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O B-PER O O B-LOC
O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 22: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/22.jpg)
converting brackets into tags
I typical solution: Beginning/Inside/Outside coding
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O B-PER O O B-LOC O
I see Ratinov and Roth: Design challenges and misconceptions
in named entity recognition. CoNLL 2009.
![Page 23: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/23.jpg)
training step-by-step systems: the basic approach
I how can we train the decision classi�er in a step-by-stepsystem?
I imitation learning: learn to imitate an �expert�
I naive approach to imitation learning:I force the system to generate the correct outputI observe the states we pass along the wayI use them as examples and train the classi�er
![Page 24: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/24.jpg)
training step-by-step systems: the basic approach
I how can we train the decision classi�er in a step-by-stepsystem?
I imitation learning: learn to imitate an �expert�
I naive approach to imitation learning:I force the system to generate the correct outputI observe the states we pass along the wayI use them as examples and train the classi�er
![Page 25: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/25.jpg)
example of features: named entity tagging
![Page 26: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/26.jpg)
what happens at prediction time?
I how will we use this system for prediction?
I there is di�erence between training and prediction:I when training, �history features� are based on the gold standardI when predicting, they are based on automatic predictions
![Page 27: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/27.jpg)
what happens at prediction time?
I how will we use this system for prediction?
I there is di�erence between training and prediction:I when training, �history features� are based on the gold standardI when predicting, they are based on automatic predictions
![Page 28: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/28.jpg)
code example: named entity tagging
see https://www.clips.uantwerpen.be/conll2003/ner/
![Page 29: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/29.jpg)
discussion
I how would you build a model to predict the demand ofelectricity load an hour from now?
0 25 50 75 100 125 150 175 200
7000
8000
9000
10000
11000
12000
see e.g. Chang et al. (2001) EUNITE Network Competition: Electricity Load
Forecasting
![Page 30: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/30.jpg)
time series prediction
I time series is a set of observation collected at speci�ed timenormally at equal interval (every 30minutes, every month, etc.)
I One usage is: to predict future values given the historicalobserved values (predictive perspective). This is the focus inML.
I Note that the temporal aspect is important.I Useful info come not only from input features, but also from
the changes in input/output over time.I make prediction is harder
![Page 31: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/31.jpg)
Components of time series, in classic statistics approach
I trend (increase/decrease, something that happen for sometimeand then disappear)
I seasonality (repeating pattern within a certain �xed period;e.g., sale is higher in December than other months)
I irregularity (noise, cannot be predicted); e.g., an outbreakhappen, sale of maskers gone up, after that come down again
I cyclic (up/down movement), they keep on repeating, butharder to predict
![Page 32: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/32.jpg)
time series analysis software in Python
I StatsModels:https://www.statsmodels.org/stable/tsa.html
![Page 33: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/33.jpg)
autoregressive models: AR(p)
I an autoregressive model AR(p) is de�ned by the equation
Yt = c +
p∑i=1
aiYt−i + εt
where Yt is the time series, c and ai are constants, and εt isGaussian noise
0 25 50 75 100 125 150 175 200
8
6
4
2
0
2
4 AR(2) a=[0.99, 0.9]
0 25 50 75 100 125 150 175 200
2
1
0
1
2
3 AR(1) a=[-0.7]
![Page 34: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/34.jpg)
using an AR model for prediction
Yt ≈ c +
p∑i=1
aiYt−i
I for our purposes, this type of model can be seen as a type oflinear regression model
I next step can be predicted from some previous steps
![Page 35: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/35.jpg)
extensions to the basic autoregressive model
I AR models with exogenous variables (ARX )
Yt = c +
p∑i=1
aiYt−i +
q∑i=1
biXt−i + εt
allow us to add more features, e.g. to model seasonality
I nonlinear models (NARX ):
Yt = F (Yt−1, ...,Xt−i , ...) + εt
where F is any predictive model
![Page 36: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/36.jpg)
Converting time series prediction/forecasting to supervisedlearning
I Convert data using sliding window methodI one step forecastI multi-step forecast
I If using
![Page 37: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/37.jpg)
Overview
sequence prediction: classical approaches
neural networks for sequential problems
Di�erent types of RNNs
![Page 38: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/38.jpg)
overview
I previously, we had an introduction to neural network classi�ersI key attractions of this type of model is the reduced need of
feature engineering
I today: recurrent neural networks, which we apply tosequential dataI classifying sentences or documents (e.g., predict
positive/negative review)I outputting sequences, such as time series predictionI �sequence-to-sequence�, such as machine translation
![Page 39: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/39.jpg)
recap: feedforward NN
I a feedforward neural network or multilayerperceptron consists of connected layers of �classi�ers�I the intermediate classi�ers are called hidden unitsI the �nal classi�er is called the output unit
I each hidden unit hi computes its output based on its ownweight vector whi
:
hi = f (whi· x)
I and then the output is computed from the hidden units:
y = f (wo · h)
I the function f is called the activation
![Page 40: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/40.jpg)
recap: representing words in NNs
I NN implementations tend to prefer short, dense vectors
I recall the way we code word features as one-hot sparse vectors:
tomato → [0, 0, 1, 0, 0, . . . , 0, 0, 0]carrot → [0, 0, 0, 0, 0, . . . , 0, 1, 0]
I the solution: represent words with low-dimensional vectors(word embeddings), in a way so that words with similarmeaning have similar vectors
tomato → [0.10,−0.20, 0.45, 1.2,−0.92, 0.71, 0.05]carrot → [0.08,−0.21, 0.38, 1.3,−0.91, 0.82, 0.09]
I libraries such as word2vec help us build the vectors
![Page 41: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/41.jpg)
word embeddings
I word embeddings: instead of high-dimensional sparsevectors, low-dimensional dense vectors
Gothenburg → [0.010,−0.20, . . . , 0.15, 0.07,−0.23, . . .]Stockholm → [0.015,−0.05, . . . , 0.11, 0.14,−0.12, . . .]
I created as a by-product of training a neural network, or byseparate pre-training using raw text
I this solution addresses the previously mentioned limitations:I low dimensionality, easy to plug into NNsI can encode some sort of �similarity�I to some extent, they mitigate the problem of rarity � if
pre-trained
![Page 42: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/42.jpg)
Using �contexts� to determine similarityI the distributional hypothesis: the distribution of contexts in
which a word appears is a good proxy of the meaning of thatword. Assumption: two words probably have a similar�meaning� if they tend to appear in similar contexts
I example of a word�word co-occurrence matrix:I �I like NLP�I �I like deep learning�I �I enjoy �ying�
[source]
![Page 44: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/44.jpg)
word2vec: basic assumptions
I each word w comes from a word vocabulary W
I from a large collection of text, we collect contexts such as
they drink co�ee with milk
![Page 45: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/45.jpg)
word2vec: �skip-gram with negative sampling�
I collect word�context pairs occurring in the text dataco�ee: drink
I for each such pair, randomly generate a number of syntheticpairs:co�ee: mechanic
co�ee: contrary
co�ee: persuade
I then �t the following model w.r.t. (V ,V ′)
P(true pair|(w , c)) = 1
1+ exp (−V (w) · V ′(c))
P(synthetic pair|(w , c)) = 1− 1
1+ exp (−V (w) · V ′(c))
![Page 46: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/46.jpg)
inspection of the model
I after training the embedding model, we can inspect the resultusing nearest-neighbor lists
I for illustration, vectors can be projected to two dimensionsusing methods such as t-SNE or PCA
pizzasushi
falafel
spaghetti
rock
techno
funksoul
punkjazz
routertouchpad
laptop
monitor
![Page 47: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/47.jpg)
gensim example: training a model
import gensim
text_file_name = 'data/wikipedia.txt'
sentences = gensim.models.word2vec.LineSentence(text_file_name,
limit=100000)
model = gensim.models.Word2Vec(sentences, size=20, window=5,
min_count=5, workers=2)
...
model.most_similar('europe')
![Page 48: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/48.jpg)
gensim example: using a trained model
from gensim.models import KeyedVectors
filename = 'GoogleNews-vectors-negative300.bin'
model = KeyedVectors.load_word2vec_format(filename, binary=True)
...
![Page 49: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/49.jpg)
recap: sequential prediction
United Nations o�cial Ekeus heads for Baghdad .
B-ORG I-ORG O B-PER O O B-LOC O
I a normal classi�er is applied in each step: SVM, LR, decisiontree, neural network, . . .
I limitation: features are typically extracted from a smallwindow!
![Page 50: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/50.jpg)
recurrent NN for processing sequences
I recurrent NNs (RNNs) are applied in a step-by-step fashion toa sequential input such as a sentence
I RNNs use a state vector that represents what has happenedpreviously
I after each step, a new state is computed
![Page 51: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/51.jpg)
RNN example
![Page 52: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/52.jpg)
RNN example
![Page 53: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/53.jpg)
RNN example
![Page 54: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/54.jpg)
RNN example
![Page 55: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/55.jpg)
using the RNN output
I the RNN output isn't that interesting on its own . . .
I we can use itI in sequence classi�ersI in sequence predictorsI in sequence taggersI in translation
![Page 56: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/56.jpg)
example: using the RNN output in a document classi�er
![Page 57: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/57.jpg)
example: using the RNN output in a part-of-speech tagger
![Page 58: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/58.jpg)
example: using the RNN output in a part-of-speech tagger
![Page 59: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/59.jpg)
what's in the box? simple RNN implementation
I the simplest type of RNN looks similar to afeedforward NNI the next state is computed like a hidden
layer in a feedforward NN
statet = σ(Wh · (input⊕ statet−1))
I . . . and the output from the next state
output = softmax(Wo · statet)
![Page 60: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/60.jpg)
simple RNNs in Keras
model = Sequential()
model.add(SimpleRNN(32))
![Page 61: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/61.jpg)
training RNNs: backpropagation through time
![Page 62: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/62.jpg)
Overview
sequence prediction: classical approaches
neural networks for sequential problems
Di�erent types of RNNs
![Page 63: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/63.jpg)
Many-to-many architectures(1)
I For input and output sequences of same length
![Page 64: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/64.jpg)
Many-to-many architectures(2)
I For input and output sequences of di�erent lengths
I For example, machine translation for sentiment/documentclassi�cation, input: sequence, output: positive/negative
![Page 65: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/65.jpg)
Many-to-one
I For example, for sentiment classi�cation, input: sequence andoutput: positive/negative. For video-based activity recognition,input: sequence of video frames, output: type of activity
![Page 66: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/66.jpg)
One-to-many
I For example, for music generation, input: genre, output:sequence of notes
![Page 67: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/67.jpg)
simple RNNs have a drawback
image borrowed from C. Olah's blog
![Page 68: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/68.jpg)
long short-term memory: LSTM
I the long short-term memory is a type of RNN that isdesigned to handle long-term dependencies in a better way
image borrowed from C. Olah's blog
I its state consists of two parts � a short-term and along-term part (also known as the cell state)
I note that the exact implementation can di�er slightly inliterature!
![Page 69: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/69.jpg)
LSTM: short-term part and output
image borrowed from C. Olah's blog
I short-term state is a bit similar to what we had in the simpleRNN
![Page 70: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/70.jpg)
LSTM: long-term part
image borrowed from C. Olah's blog
I the long-term state is controlled by a forget gate that decidesif information should be passed along or discarded
I parts of the input and short-term state can be added to thelong-term state
![Page 71: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/71.jpg)
LSTMs in Keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam')
model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)
![Page 72: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/72.jpg)
stacked LSTMs
![Page 73: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/73.jpg)
stacked LSTMs in Keras
model = Sequential()
# returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True,
input_shape=(timesteps, data_dim)))
# returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))
# return a single vector of dimension 32
model.add(LSTM(32))
model.add(Dense(10, activation='softmax'))
![Page 74: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/74.jpg)
sequence tagging based on LSTMs
[source]
![Page 75: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/75.jpg)
example: NER using LSTMs
![Page 76: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/76.jpg)
recent developments in machine translation
[image by Chris Manning]
![Page 77: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/77.jpg)
sequence-to-sequence learning for translation
Sutskever et al. (2014) Sequence to Sequence Learning with Neural Networks.
![Page 78: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/78.jpg)
training data for machine translation systems
![Page 79: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/79.jpg)
a �nal twist on RNNs: neural attention
![Page 80: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/80.jpg)
translation with attention
[source]
![Page 81: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/81.jpg)
example: visualizing attention
[source]
![Page 82: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/82.jpg)
recent developments (2018)
I We can use a pre-trained CNN for our own image recognitiontask to give us better performance, compared to when buildingown model using own training data
I can something similar be done for text?
I the ELMo model (by Allen AI) is a pre-trained LSTM modelthat can be �plugged� into other applicationsI https://allennlp.org/elmo
I the BERT model (by Google) uses a new architecture calledthe transformerI https://github.com/google-research/bert
I Many other pre-trained models derived from BERT
I XL-net
![Page 84: Applied Machine Learning Lecture 12: Time series and ...richajo/dit866/lectures/l12/l12.pdf · Applied Machine Learning Lecture 12: Time series and sequential models Selpi (selpi@chalmers.se)](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0dc9b97e708231d43c1684/html5/thumbnails/84.jpg)
Next lectures
I 6 March: Guest lectures from two companies giving industrialaspects of machine learningI SolitaI Ericsson
I 10 March: Semi-supervised learning, co-training/multi-viewlearning, self-learning, recommender systems, and more onLSTM