Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model...
Transcript of Chapter 9: MachineLearning in Time Series · Recurrent networks: Moving Average • Second model...
Chapter 9: Machine Learning in Time Series
1
Time series processing
• Given: time-dependent observables
• Scalar: univariate; vector: multivariate
• Typical tasks:
,1,0, txt
2
- Forecasting- Noise modeling
- Pattern recognition
- Modeling
- Filtering- Source separation
Time series(minutes to days)
Signals(milliseconds to seconds)
Examples
3
Standard & Poor‘s Sunspots
Preprocessed: (returns) Preprocessed: (de-seasoned)1 ttt xxr 11 ttt xxs
Autoregressive models
• Forecasting: making use of past information to predict (estimate) the future
• AR: Past information = past observations
tptttt xxxFx ,,, 21
4
past observations ptX ,
Expected valuetx̂
Noise,„random shock“
• Best forecast: expected value
Linear AR models
• Most common case:
• Simplest form: random walk
• Nontrivial forecast impossible
p
ittit xax
11
1,0~ ;1 Nxx tttt
5
MLP as NAR
• Neural network can approximate nonlinear AR model
• „time window“ or „time delay“
6
Complex noise models
• Assumption: arbitrary distribution
• Parameters are time dependent (dependent on past):
• Likelihood:
D~
ptXg ,
7
N
i
iptXgdL
1
)(,
Probability density function for D
Heteroskedastic time series
• Assumption: Noise is Gaussian with time-dependent variance
• ARCH model
• MLP is nonlinear ARCH (when applied to returns/residuals)
N
i
X
Xx
iptt
iptt
ipt
it
eX
L1
2
)(,
2
)(,
2
2)(,
)(
2
1
p
iitit ra
1
22
8
222
2121
2 ,,,',,, ptttptttt rrrFrrrF
Non-Gaussian noise
• Other parametric pdfs (e.g. t-distribution)
• Mixture of Gaussians (Mixture density network, Bishop 1994)
• Network with 3k outputs (or separate networks)
k
i
X
Xx
pti
pti pti
pti
eX
Xd
1
2
,2
,2 ,2
2,
2,,
9
Identifiability problem
• Mixture models (like neural networks) are not identifiable (parameters cannot be interpreted)
• No distinction between model and noisee.g. sunspot data:
• Models have to be treated with care
10
Recurrent Perceptrons
• Recurrent connection = feedback loop
• From hidden layer („Elman“) or output layer („Jordan“)
11
Learning:„backpropagation through time“
Input Zustands- bzw.Kontextlayer
copy
Recurrent networks: Moving Average
• Second model class: Moving Average models
• Past information: random shocks
• Recurrent (Jordan) network: Nonlinear MA
• However, convergence notguaranteed
q
iitit bx
0
12
ttt xx ˆ
State space models
• Observables depend on (hidden) time-variant state
• Strong relationship to recurrent (Elman) networks
• Nonlinear version only with additional hidden layers
ttt
ttt
ss
sx
BA
C
1
13
Practical considerations
• Stationarity is an important issue
• Preprocessing (trends, seasonalities)
• N-fold cross-validation time-wise(validation set must be after training set
• Mean and standard deviation model selection
14
train
validation
test
Unfolding recurrent networks
• Event long in the past can have influence on presence
https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/
15
Vanishing (or exploding) gradients
16
• Gradient quickly goes to 0 (or infinity)
• Long term dependencies cannot be learnedhttps://adventuresinmachinelearning.com/recurrent-neural-
networks-lstm-tutorial-tensorflow/
Long short term memory (LSTM)
17
No vanishinggradient if f>>0
https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/
Medical example
• Sleep staging
18
Iber et al. 2007 (AASM scoring manual)
Stephansen et al., Nature Communications, 2018
Medical example 2
• Predicting mortality in ICU
19Xia et al., Comp Math Meth Med, 2018
Text/Speech processing
• Text/language is sequential in nature
• Long term dependencies:
– The girl, who played in the team last week, took her sister to school and gratulated her
• Medical applications:
– Text mining in abstracts
– Physician reports
20
„forget“
Symbolic time series
• Examples:– DNA
– Text
– Quantised time series (e.g. „up“ and „down“)
• Past information: past p symbols probability distribution
• Markov chains
• Problem: long substrings are rare
it sx
ptttt xxxxp ,,,| 21
21
alphabet
Fractal prediction machines
• Similarsubsequences aremapped to pointsclose in space
• Clustering = extraction ofstochasticautomaton
• Variable lengthMarkov model
22
Relationship to recurrent network
• Network of 2nd order
23
Distinguishing coding/noncoding DNA
24
Tino P., Dorffner G., Machine Learning 2001
- DNA: sequence with alphabet size 4
Summary
• Neural networks are powerful semi-parametric models fornonlinear dependencies
• Can be considered as nonlinear extensions of classical time series and signal processing techniques
• Applying semi-parametric models to noise modeling addsanother interesting facet
• Models must be treated with care, much data necessary
• Recurrent networks
• Latest development (deep learning): LSTM
25