Chapter 7-1: Sequential Data - Max Planck...
Transcript of Chapter 7-1: Sequential Data - Max Planck...
IRDM β15/16
Jilles Vreeken
Chapter 7-1: Sequential Data
24 Nov 2015
Revision 1, November 26th Definition of smoothing clarified
IRDM β15/16
IRDM Chapter 7, overview
Time Series 1. Basic Ideas 2. Prediction 3. Motif Discovery
Discrete Sequences 4. Basic Ideas 5. Pattern Discovery 6. Hidden Markov Models
Youβll find this covered in Aggarwal Ch. 3.4, 14, 15
VII-1: 2
IRDM β15/16
IRDM Chapter 7, today
Time Series 1. Basic Ideas 2. Prediction 3. Motif Discovery
Discrete Sequences 4. Basic Ideas 5. Pattern Discovery 6. Hidden Markov Models
Youβll find this covered in Aggarwal Ch. 3.4, 14, 15
VII-1: 3
IRDM β15/16
Chapter 7.1: Basic Ideas
Aggarwal Ch. 14.1-14.2
VII-1: 4
IRDM β15/16
Temperature Data
VII-1: 5
Temp (Β°C)
28.2
25.4
30.5
15.7
33.4
29.4
28.6
16.1
28.5
27.9
15.5
31.4
IRDM β15/16
Temperature Data
VII-1: 6
Time Temp (Β°C)
June-15 28.2
June-16 25.4
June-17 30.5
June-18 15.7
June-19 33.4
June-20 29.4
June-22 28.6
June-23 16.1
June-24 28.5
June-25 27.9
June-26 15.5
June-27 31.4
0
5
10
15
20
25
30
35
40
Daily Temperature
IRDM β15/16
Temperature Data
VII-1: 7
Time Temp (Β°C)
June-15 28.2
June-16 25.4
June-17 30.5
June-18 15.7
June-19 33.4
June-20 29.4
June-22 28.6
June-23 16.1
June-24 28.5
June-25 27.9
June-26 15.5
June-27 31.4
0
5
10
15
20
25
30
35
40
Daily Temperature
IRDM β15/16
Temperature Data
VII-1: 8
Time Temp (Β°C)
June-15 28.2
June-16 25.4
June-17 30.5
Sept-18 15.7
June-19 33.4
June-20 29.4
June-22 28.6
Sept-23 16.1
Sept-24 28.5
June-25 27.9
Sept-26 15.5
June-27 31.4
0
5
10
15
20
25
30
35
40
Daily Temperature
IRDM β15/16
Applications
VII-1: 9
Stock analysis Weather Forecasting Health Monitoring
Social Network Analysis
IRDM β15/16
Definition A time series of length π consists of π tuples π‘1,π1 , π‘2,π2 , β¦ (π‘π,ππ) where for a tuple (π‘π ,ππ), π‘π is the
time stamp, and ππ is the data at time π‘π , and we have a total order on the time stamps π‘1 < π‘2 < β― < π‘π
Length may either be finite or infinite Time stamps may be contiguous, in practice integers are easier Data when talking about time series, usually numeric, continuous real-valued may be univariate (one attribute) or multivariate (multiple attributes)
VII-1: 10
IRDM β15/16
Probabilistic Model of Time Series Consider data ππ at time π‘π as a random variable the actual data we observe at π‘π is a realization of ππ
Some probabilistic properties can be stable over time e.g. the mean ππ of ππ does not change (much) the covariance between pairs (ππ ,ππ+β) is (almost) the same as (π1,π1+β), i.e.,
the autocovariance of ππ does not change (much)
A time series is stationary if the process behind it does not change ππ‘ = ππ = π for all π‘, π , and πΆππ π‘, π = πΆππ π β π‘ = πΆππ(π) where π = |π β π‘| is the amount of time
by which the signal is shifted Stationary time series are easy to model and predict most real-world time series, however, are anything but stationary (recall, if ππ has mean ππ = πΈ[ππ], πΆππ π‘, π = πππ ππ‘,ππ = πΈ ππ‘ππ β ππ‘ππ )
VII-1: 11
IRDM β15/16
Stationarity of Time Series
VII-1: 12
0
10
20
30
40
Daily Temperature
0
10
20
30
40
Monthly Temperature
IRDM β15/16
Seasonality & trend
VII-1: 13
0
5
10
15
20
25
30
35
40
Monthly Temperature
2011 2012 2013
IRDM β15/16
Formulation
Classically, we assume a time series π is composed of
ππ = π π π π πππ π π π‘π¦π + π‘π‘π πππ + πππ π π π
where πππ π π π is stationary. To make π stationary, we simply have to remove seasonality and trend.
VII-1: 14
IRDM β15/16
Seasonality
Seasonality is essentially periodicity seasonality is a periodic function of time with period π
π π π π πππ π π π‘π¦π = π π π π πππ π π π‘π¦πβπ
How to find the seasonality function? 1. by fitting a sine or cosine function
difficult β the signal may also be sineβish
2. by differencing ππ = π π π π πππ π π π‘π¦π + π‘π‘π πππ + πππ π π π ππβπ = π π π π πππ π π π‘π¦πβπ + π‘π‘π πππβπ + πππ π π πβπ
VII-1: 15
IRDM β15/16
Seasonality
Seasonality is essentially periodicity seasonality is a periodic function of time with period π
π π π π πππ π π π‘π¦π = π π π π πππ π π π‘π¦πβπ
How to find the seasonality function? 1. by fitting a sine or cosine function
difficult β the signal may also be sineβish
2. by differencing ππ = π π π π πππ π π π‘π¦π + π‘π‘π πππ + πππ π π π ππβπ = π π π π πππ π π π‘π¦πβπ + π‘π‘π πππβπ + πππ π π πβπ
ππβ² = ππ β ππβπ
VII-1: 16
IRDM β15/16 VII-1: 17
0
5
10
15
20
25
30
35
40
Monthly Temperature
2011 2012 2013
ππβ² = ππ β ππβπ where d = 12
IRDM β15/16
Example: Removing Seasonality
VII-1: 18
0
5
10
15
20
25
30
35
40
Monthly Temperature
This is the time series we obtained by removing seasonality
IRDM β15/16
Trend
Trend is a polynomial function of time (assumption)
How to find the trend function?
1. by fitting functions difficult to do, up to what order, when to stop?
2. by differencing ππβ² = ππ β ππβ1 ππβ²β² = ππβ² β ππβ1β²
usually 2 times is enough
VII-1: 19
IRDM β15/16 VII-1: 20
0
5
10
15
20
25
30
35
40
Monthly Temperature
This is the time series we obtained by removing seasonality
Example: Removing Trend
IRDM β15/16
Example: Removing Trend
VII-1: 21
-5
0
5
10
15
20
25
30
35
40
Monthly Temperature
This is the time series we obtained by removing seasonality and trend
ππβ² = ππ β ππβ1
IRDM β15/16
Example: Removing Trend
VII-1: 22
-5
0
5
10
15
20
25
30
35
40
Monthly Temperature
The left-over fluctuations are either noise or non-trivial patterns
ππβ² = ππ β ππβ1
IRDM β15/16
Pre-processing
We can infer missing values by interpolation
ππ = ππ +π‘π β π‘ππ‘π β π‘π
Γ (ππ β ππ)
where π‘π < π‘π < π‘π
VII-1: 23
IRDM β15/16
Pre-processing
We can infer missing values by interpolation
ππ = ππ +π‘π β π‘ππ‘π β π‘π
Γ (ππ β ππ)
where π‘π < π‘π < π‘π
VII-1: 24
Time Temp (Β°C)
1 June-19 33.4
2 June-20 29.4
4 June-22
5 June-23 16.1
Temperature on June-22:
π4 = π2 +π‘4 β π‘2π‘5 β π‘2
Γ π5 β π2
= 29.4 + 4β25β2
Γ 16.1 β 29.4
= 20.5
IRDM β15/16
Smoothing
We can remove noise by smoothing
Standard options include averaging
ππβ² = π ππ(ππβπ€ , β¦ ,ππ) where window length π€ is a user-specified parameter We can more weight to recent values by exponential smoothing
ππβ² = 1 β πΌ π β π0β² + πΌοΏ½ππ β 1 β πΌ πβππ
π=1
where the user chooses decay factor πΌ
(updated on Nov 26th : we now average explicitly over past values) VII-1: 25
IRDM β15/16
Chapter 7.2: Forecasting
Aggarwal Ch. 14.3
VII-1: 26
IRDM β15/16
Principle of Forecasting
If we wish to make predictions, then clearly we must assume that something is stable over time.
VII-1: 27
IRDM β15/16
Autoregressive (AR) model
Future values depend on past values + random noise assumption: the time series depends on autocorrelation
Which past values? the π€ immediately previous values
What relation between past and future? linear combination
What kind of noise? Gaussian
VII-1: 28
IRDM β15/16
AR, formally
Future value is a linear combination of past values + white noise
ππ‘ = οΏ½π π β ππ‘βπ
π€
π=1
+ π + ππ‘
where ππ‘~π©(0,π2)
VII-1: 29
Linear combination of past values
noise with shifted mean
IRDM β15/16
Least-square regression
ππ‘ = ππ‘ β (π 1 β ππ‘β1 + π 2 β ππ‘β2 + β―+ π π€ β ππ‘βπ€ + π)
Given data π« of π training instances, we want to find π 1, β¦ ,π π€ and π that minimise the mean squared error
1
π β π€οΏ½ ππ‘2π
π‘=π€+1
VII-1: 30
predicted value actual value
the prediction error is simply the Gaussian noise in the AR model, the smaller we can get this value, the better!
IRDM β15/16
Solving AR
Find π 1, β¦ ,π π€ and π that minimize 1πβπ€
β ππ‘2ππ‘=π€+1
There are different solving strategies available ordinary least squares, assumes ππ‘ and ππ‘ are uncorrelated generalized least squares, assumes correlation exists but is known iteratively reweighted least squares, assumes correlation is unknown
Many standard tools available to do AR MATLAB: ar function R: arima function
VII-1: 31
IRDM β15/16
Example: AR
Monthly temperature measured above the ground in a province of Vietnam from 1971 to 2001
VII-1: 32
-3
-2
-1
0
1
2
3
4
Jan-
71
Oct
-71
Jul-7
2
Apr-
73
Jan-
74
Oct
-74
Jul-7
5
Apr-
76
Jan-
77
Oct
-77
Jul-7
8
Apr-
79
Jan-
80
Oct
-80
Jul-8
1
Apr-
82
Jan-
83
Oct
-83
Jul-8
4
Apr-
85
Jan-
86
Oct
-86
Jul-8
7
Apr-
88
Jan-
89
Oct
-89
Jul-9
0
Apr-
91
Jan-
92
Oct
-92
Jul-9
3
Apr-
94
Jan-
95
Oct
-95
Jul-9
6
Apr-
97
Jan-
98
Oct
-98
Jul-9
9
Apr-
00
Jan-
01
Oct
-01
Original Data
IRDM β15/16 VII-1: 33
-3
-2
-1
0
1
2
3
4
Jan-
71
Oct
-71
Jul-7
2
Apr-
73
Jan-
74
Oct
-74
Jul-7
5
Apr-
76
Jan-
77
Oct
-77
Jul-7
8
Apr-
79
Jan-
80
Oct
-80
Jul-8
1
Apr-
82
Jan-
83
Oct
-83
Jul-8
4
Apr-
85
Jan-
86
Oct
-86
Jul-8
7
Apr-
88
Jan-
89
Oct
-89
Jul-9
0
Apr-
91
Jan-
92
Oct
-92
Jul-9
3
Apr-
94
Jan-
95
Oct
-95
Jul-9
6
Apr-
97
Jan-
98
Oct
-98
Jul-9
9
Apr-
00
Jan-
01
Oct
-01
Original Data
-3
-2
-1
0
1
2
3
Jan-
72
Oct
-72
Jul-7
3
Apr-
74
Jan-
75
Oct
-75
Jul-7
6
Apr-
77
Jan-
78
Oct
-78
Jul-7
9
Apr-
80
Jan-
81
Oct
-81
Jul-8
2
Apr-
83
Jan-
84
Oct
-84
Jul-8
5
Apr-
86
Jan-
87
Oct
-87
Jul-8
8
Apr-
89
Jan-
90
Oct
-90
Jul-9
1
Apr-
92
Jan-
93
Oct
-93
Jul-9
4
Apr-
95
Jan-
96
Oct
-96
Jul-9
7
Apr-
98
Jan-
99
Oct
-99
Jul-0
0
Apr-
01
Season Removed
-2.5-2
-1.5-1
-0.50
0.51
1.52
Feb-
72
Nov
-72
Aug-
73
May
-74
Feb-
75
Nov
-75
Aug-
76
May
-77
Feb-
78
Nov
-78
Aug-
79
May
-80
Feb-
81
Nov
-81
Aug-
82
May
-83
Feb-
84
Nov
-84
Aug-
85
May
-86
Feb-
87
Nov
-87
Aug-
88
May
-89
Feb-
90
Nov
-90
Aug-
91
May
-92
Feb-
93
Nov
-93
Aug-
94
May
-95
Feb-
96
Nov
-96
Aug-
97
May
-98
Feb-
99
Nov
-99
Aug-
00
May
-01
Differencing 1
Mea
n Sq
uare
d Er
ror
Mea
n Sq
uare
d Er
ror
Mea
n Sq
uare
d Er
ror
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Original Data: MSE vs. w
Season Removed: MSE vs. w
0
0.1
0.2
0.3
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Differencing 1: MSE vs. w
34
These plots show how the MSE behaves wrt to π€. I.e., they to choose π€.
Moving Average (MA) model
Future values depend on deterministic factor + noise assumption: the time series depends on history of shocks
What deterministic factor? the mean of the time series Noise over what past values? the current value and π immediately previous values
What kind of noise? Gaussian
35
IRDM β15/16
The ππ π is defined as
ππ‘ = π + ππ‘ + οΏ½ππ β ππ‘βπ
π
π=1
where ππ~π©(0,ππ2) Recall, for the ππ΄(π€) model we had
ππ‘ = π + ππ‘ + οΏ½π π β ππ‘βπ
π€
π=1
MA, formally
VII-1: 36
past noise mean
current noise
IRDM β15/16
Solving MA
Find those π1, β¦ , ππ that minimize the error Unlike for AR, this problem is not linear to identify noise terms, we need to know π1, β¦ , ππ to identify π1, β¦ , ππ, we need to know the noise terms typically we use an iterative non-linear fitting approach,
instead of linear least-squares
VII-1: 37
IRDM β15/16
The ARMA model
ARMA combines the AR model with the MA model
Future values depend on past values + history of noise the time series depends on
both autocorrelation and history of shocks
The ARMA model has two parameters, π€ and π window length w for autocorrelation history length π for noise What kind of noise? Gaussian
VII-1: 38
IRDM β15/16
ARMA, formally
ARMA combines the AR model with the MA model
Autoregressive model, ππ΄(π€): ππ‘ = π + ππ‘ + β π π β ππ‘βππ€
π=1
Moving Average model, ππ(π)
ππ‘ = π + ππ‘ + β ππ β ππ‘βπππ=1
Autoregressive Moving Average model, ππ΄ππ(π€, π)
ππ‘ = π + ππ‘ + οΏ½π π β ππ‘βπ
π€
π=1
+οΏ½ππ β ππ‘βπ
π
π=1
VII-1: 39
IRDM β15/16
Solving ARMA
Find those π π and ππ and π that minimize the error We need non-linear least-square regression many standard tools to do this MATLAB and R implement ARMA as βarmaβ resp. βarimaβ
How to set π€ and π? as small as possible, so that the model still fits the data well aka, good luck
VII-1: 40
IRDM β15/16
Chapter 7.3: Motif Discovery
Aggarwal Ch. 14.4, 3.4
VII-1: 41
IRDM β15/16
Motifs
A motif is a shape that frequently repeats in a time series shape can also be called βpatternβ
Many variations of motif discovery exist contiguous versus
non-continguous shapes low versus
high granularities single time series versus
databases of time series
VII-1: 42
IRDM β15/16
What is a motif?
When does a motif belong to a time series? there are two main methods for deciding
1. distance-based support
A segment π[π , π] of a sequence π is said to support a motif π when the distance π(π[π , π],π) between the segment and the motif is below some threshold π.
2. discrete-matching based support first we discretise time series π into a discrete sequence π . A motif is now a (frequent) subsequence of π .
VII-1: 43
IRDM β15/16
Distance-based motifs, formally
A motif, a sequence π = π1, β¦ , ππ€ of real values, is said to approximately match a contiguous subsequence of length π€ in time series π, if the distance between (π1, β¦ , ππ€) and ππ , β¦ ,ππ+π€β1 is at most π. commonly, Euclidean distance or Dynamic Time Warping
The frequency of a motif is its number of occurrences the number of matches of a motif π = π1, β¦ , ππ€ to the time series π1, β¦ ,ππ at threshold π is equal to the number of windows of length π€ in π for which the distance is at most π
VII-1: 44
IRDM β15/16
Top-π motifs
Nobody wants all motifs lots of many π-similar matches for even a single true occurrence instead, we aim for the top-π best motifs
As with frequent itemset mining, redundancy is an issue we need to keep the top-k diverse distances between any pair of motifs must be at least 2 β π
VII-1: 45
IRDM β15/16
FINDBESTMOTIF(π,π€, π) begin for π = 1 to π β π€ + 1 do begin πΆπ πππ ππ π‘π = (ππ , β¦ ,ππ+π€β1) for π = 1 to π β π€ + 1 do begin πΆππΆπΆπ π‘π πΆπ = (ππ , β¦ ,ππ+π€β1) π = ππ π π‘π πππ (πΆπ πππ ππ π‘π ,πΆππΆπΆπ π‘π πΆπ) if π < π and (non-trivial-match) then increment support count of πΆπ πππ ππ π‘π endfor if πΆπ πππ ππ π‘π has the highest count found so far then update π΅π π π‘πΆπ πππ ππ π‘π endfor return π΅π π π‘πΆπ πππ ππ π‘π end
(trivially expanded to top-π) VII-1: 46
IRDM β15/16
Computational Complexity
Finding the best motif takes π(π2) distance computations
Practical complexity largely depends on distance function Euclidean distance is fast Dynamic Time Warping is often better, but much slower
Lower bounds are our friend if the lower bound on the distance between a motif and a windows
is greater than π, the window will never support the motif piecewise-aggregate approximations (PAA) allow fast computation
of lower bounds by considering simplified (compressed) time series
VII-1: 47
IRDM β15/16
Conclusions
Prediction over time is one of the most important and most used data analysis problems β predictive analytics
There exist two main types of sequential data continuous real-valued time series and discrete event sequences for both specialised algorithms exist
In practice, despite many assumptions ARMA is powerful often used in industry, learn how to use it, learn when to use it
Patterns in time series are called motifs by choosing a distance function can be mined directly from time series
VII-1: 48
IRDM β15/16
Thank you! Prediction over time is one of the most important and most used data analysis problems β predictive analytics
There exist two main types of sequential data continuous real-valued time series and discrete event sequences for both specialised algorithms exist
In practice, despite many assumptions ARMA is powerful often used in industry, learn how to use it, learn when to use it
Patterns in time series are called motifs by choosing a distance function can be mined directly from time series
VII-1: 49