Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model...
Transcript of Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model...
![Page 1: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/1.jpg)
Chapter 9: Machine Learning in Time Series
1
![Page 2: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/2.jpg)
Time series processing
• Given: time-dependent observables
• Scalar: univariate; vector: multivariate
• Typical tasks:
,1,0, txt
2
- Forecasting- Noise modeling
- Pattern recognition
- Modeling
- Filtering- Source separation
Time series(minutes to days)
Signals(milliseconds to seconds)
![Page 3: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/3.jpg)
Examples
3
Standard & Poor‘s Sunspots
Preprocessed: (returns) Preprocessed: (de-seasoned)1 ttt xxr 11 ttt xxs
![Page 4: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/4.jpg)
Autoregressive models
• Forecasting: making use of past information to predict (estimate) the future
• AR: Past information = past observations
tptttt xxxFx ,,, 21
4
past observations ptX ,
Expected valuetx̂
Noise,„random shock“
• Best forecast: expected value
![Page 5: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/5.jpg)
Linear AR models
• Most common case:
• Simplest form: random walk
• Nontrivial forecast impossible
p
ittit xax
11
1,0~ ;1 Nxx tttt
5
![Page 6: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/6.jpg)
MLP as NAR
• Neural network can approximate nonlinear AR model
• „time window“ or „time delay“
6
![Page 7: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/7.jpg)
Complex noise models
• Assumption: arbitrary distribution
• Parameters are time dependent (dependent on past):
• Likelihood:
D~
ptXg ,
7
N
i
iptXgdL
1
)(,
Probability density function for D
![Page 8: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/8.jpg)
Heteroskedastic time series
• Assumption: Noise is Gaussian with time-dependent variance
• ARCH model
• MLP is nonlinear ARCH (when applied to returns/residuals)
N
i
X
Xx
iptt
iptt
ipt
it
eX
L1
2
)(,
2
)(,
2
2)(,
)(
2
1
p
iitit ra
1
22
8
222
2121
2 ,,,',,, ptttptttt rrrFrrrF
![Page 9: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/9.jpg)
Non-Gaussian noise
• Other parametric pdfs (e.g. t-distribution)
• Mixture of Gaussians (Mixture density network, Bishop 1994)
• Network with 3k outputs (or separate networks)
k
i
X
Xx
pti
pti pti
pti
eX
Xd
1
2
,2
,2 ,2
2,
2,,
9
![Page 10: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/10.jpg)
Identifiability problem
• Mixture models (like neural networks) are not identifiable (parameters cannot be interpreted)
• No distinction between model and noisee.g. sunspot data:
• Models have to be treated with care
10
![Page 11: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/11.jpg)
Recurrent Perceptrons
• Recurrent connection = feedback loop
• From hidden layer („Elman“) or output layer („Jordan“)
11
Learning:„backpropagation through time“
Input Zustands- bzw.Kontextlayer
copy
![Page 12: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/12.jpg)
Recurrent networks: Moving Average
• Second model class: Moving Average models
• Past information: random shocks
• Recurrent (Jordan) network: Nonlinear MA
• However, convergence notguaranteed
q
iitit bx
0
12
ttt xx ˆ
![Page 13: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/13.jpg)
State space models
• Observables depend on (hidden) time-variant state
• Strong relationship to recurrent (Elman) networks
• Nonlinear version only with additional hidden layers
ttt
ttt
ss
sx
BA
C
1
13
![Page 14: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/14.jpg)
Practical considerations
• Stationarity is an important issue
• Preprocessing (trends, seasonalities)
• N-fold cross-validation time-wise(validation set must be after training set
• Mean and standard deviation model selection
14
train
validation
test
![Page 15: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/15.jpg)
Unfolding recurrent networks
• Event long in the past can have influence on presence
https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/
15
![Page 16: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/16.jpg)
Vanishing (or exploding) gradients
16
• Gradient quickly goes to 0 (or infinity)
• Long term dependencies cannot be learnedhttps://adventuresinmachinelearning.com/recurrent-neural-
networks-lstm-tutorial-tensorflow/
![Page 17: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/17.jpg)
Long short term memory (LSTM)
17
No vanishinggradient if f>>0
https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/
![Page 18: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/18.jpg)
Medical example
• Sleep staging
18
Iber et al. 2007 (AASM scoring manual)
Stephansen et al., Nature Communications, 2018
![Page 19: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/19.jpg)
Medical example 2
• Predicting mortality in ICU
19Xia et al., Comp Math Meth Med, 2018
![Page 20: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/20.jpg)
Text/Speech processing
• Text/language is sequential in nature
• Long term dependencies:
– The girl, who played in the team last week, took her sister to school and gratulated her
• Medical applications:
– Text mining in abstracts
– Physician reports
20
„forget“
![Page 21: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/21.jpg)
Symbolic time series
• Examples:– DNA
– Text
– Quantised time series (e.g. „up“ and „down“)
• Past information: past p symbols probability distribution
• Markov chains
• Problem: long substrings are rare
it sx
ptttt xxxxp ,,,| 21
21
alphabet
![Page 22: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/22.jpg)
Fractal prediction machines
• Similarsubsequences aremapped to pointsclose in space
• Clustering = extraction ofstochasticautomaton
• Variable lengthMarkov model
22
![Page 23: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/23.jpg)
Relationship to recurrent network
• Network of 2nd order
23
![Page 24: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/24.jpg)
Distinguishing coding/noncoding DNA
24
Tino P., Dorffner G., Machine Learning 2001
- DNA: sequence with alphabet size 4
![Page 25: Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks • Recurrent (Jordan)](https://reader033.fdocuments.in/reader033/viewer/2022042220/5ec679a5db0d1917dc626a2f/html5/thumbnails/25.jpg)
Summary
• Neural networks are powerful semi-parametric models fornonlinear dependencies
• Can be considered as nonlinear extensions of classical time series and signal processing techniques
• Applying semi-parametric models to noise modeling addsanother interesting facet
• Models must be treated with care, much data necessary
• Recurrent networks
• Latest development (deep learning): LSTM
25