1
Data Mining: Concepts and Techniques
— Chapter 8 —
8.2 Mining time-series data
Jiawei Han and Micheline Kamber
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj©2010 Jiawei Han and Micheline Kamber. All rights reserved.
April 18, 2023Data Mining: Concepts and
Techniques 2
3
Mining Time-Series Data A time series is a sequence of data points, measured
typically at successive times, spaced at (often uniform) time intervals
Time series analysis: A subfield of statistics, comprises methods that attempt to understand such time series, often either to understand the underlying context of the data points or to make forecasts (or predictions)
Methods for time series analyses Frequency-domain methods: Model-free analyses, well-
suited to exploratory investigations spectral analysis vs. wavelet analysis
Time-domain methods: Auto-correlation and cross-correlation analysis
Motif-based time-series analysis Applications
Financial: stock price, inflation Industry: power consumption
Scientific: experiment results Meteorological: precipitation
4
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in
Time Series Data
Summary
5
Time-Series Data Analysis: Prediction & Regression Analysis
(Numerical) prediction is similar to classification construct a model use model to predict continuous or ordered value for a given
input Prediction is different from classification
Classification refers to predict categorical class label Prediction models continuous-valued functions
Major method for prediction: regression model the relationship between one or more independent or
predictor variables and a dependent or response variable Regression analysis
Linear and multiple regression Non-linear regression Other regression methods: generalized linear model, Poisson
regression, log-linear models, regression trees
6
What is Regression?
Modeling the relationship between one response variable and one or more predictor variables
Analyzing the confidence of the model E.g, height v.s weight
7
Regression Yields Analytical Model
Discrete data points →Analytical model General relationship Easy calculation Further analysis
Application - Prediction
8
Application - Detrending
Obtain the trend for irregular data series Subtract trend Reveal oscillations
trend
9
Linear Regression - Single Predictor
Model is linear
y = w0 + w1 x
where w0 (y-intercept) and w1
(slope) are regression coefficients
Method of least squares:
y: response variable
x: predictor variable
w1
w0
| |
1| |
2
1
( )( )
1( )
D
i ii
D
ii
x x y y
x xw
xwyw
10
10
Training data is of the form (X1, y1), (X2, y2),…, (X|
D|, y|D|) E.g., for 2-D data or
y = w0 + w1 x1+ w2 x2
Solvable by Extension of least square method
(XTX ) W=Y →W = (XTX ) -1Y Commercial software (SAS, S-Plus)
x1
x2
y
Linear Regression – Multiple Predictor
11
Nonlinear Regression with Linear Method
Polynomial regression model E.g., y = w0 + w1 x + w2 x2 + w3 x3
Let x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
Log-linear regression model E. g., y = exp(w0 + w1 x + w2 x2 + w3 x3
)
Let y’=log(y) y’= w0 + w1 x + w2 x2 + w3 x3
12
Generalized Linear Regression
Response y Distribution function in the exponential family Variance of y depends on E( y), not a constant
E( y) = g-1( w0 + w1 x + w2 x2 + w3 x3 ) Examples
Logistic regression (binomial regression): probability of some event occurring
Poisson regression: number of customers …
References: Nelder and Wedderburn, 1972; McCullagh and Nelder, 1989
13
Regression Tree (Breiman et al., 1984)
Figure source: http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf
Partition the domain space
Leaf: (1) a continuous-valued
prediction; (2) average value
14
Model Tree (Quinlan, 1992)
Leaf – a linear equation More general than regression tree
Figure source: http://datamining.ihe.nl/research/model-trees.htm
15
Regression Trees and Model Trees
Regression tree: proposed in CART system (Breiman et al.
1984) CART: Classification And Regression Trees
Each leaf stores a continuous-valued prediction
It is the average value of the predicted attribute for the training
tuples that reach the leaf
Model tree: proposed by Quinlan (1992) Each leaf holds a regression model—a multivariate linear
equation for the predicted attribute
A more general case than regression tree
Regression and model trees tend to be more accurate than
linear regression when the data cannot be represented well
by a simple linear model
16
Predictive Modeling in Multidimensional Databases
Predictive modeling: Predict data values or construct generalized linear models based on the database data
One can only predict value ranges or category distributions
Method outline Minimal generalization Attribute relevance analysis Generalized linear model construction Prediction
Determine the major factors which influence the prediction Data relevance analysis: uncertainty measurement,
entropy analysis, expert judgment, etc. Multi-level prediction: drill-down and roll-up analysis
17
Predictive modeling: Predict data values or construct generalized linear models based on the database data
One can only predict value ranges or category distributions
Method outline: Minimal generalization Attribute relevance analysis Generalized linear model construction Prediction
Determine the major factors which influence the prediction Data relevance analysis: uncertainty measurement,
entropy analysis, expert judgment, etc. Multi-level prediction: drill-down and roll-up analysis
Predictive Modeling in Multidimensional Databases
18
Prediction: Numerical Data
19
References
Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal of the Royal Statistical Society A, 135, 370-384.
C. Chatfield. The Analysis of Time Series: An Introduction, 3rd ed. Chapman & Hall, 1984.
McCullagh, P. and Nelder, J.A. (1989). Generalized linear models, 2nd ed. Chapman and Hall, London.
Breiman L, Friedman JH, Olshen RA, Stone CJ. (1984). Classification and Regression Trees. Chapman &Hall (Wadsworth, Inc.): New York.
Quinlan, J. R. (1992). Learning with continuous classes. In: Adams, , Sterling, (Eds.), Proceedings of artificial intelligence'92, World Scientific, Singapore. pp. 343-348.
Acknowledgment This presentation integrates Xiaopeng Li’s slides in his CS 512
class presentation
20
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in
Time Series Data
Summary
21
A time series can be illustrated as a time-series graph which describes a point moving with the passage of time
22
Categories of Time-Series Movements
Categories of Time-Series Movements Long-term or trend movements (trend curve): general
direction in which a time series is moving over a long interval of time
Cyclic movements or cycle variations: long term oscillations about a trend line or curve
e.g., business cycles, may or may not be periodic Seasonal movements or seasonal variations
i.e, almost identical patterns that a time series appears to follow during corresponding months of successive years.
Irregular or random movements Time series analysis: decomposition of a time series into these
four basic movements Additive Modal: TS = T + C + S + I Multiplicative Modal: TS = T C S I
23
Estimation of Trend Curve
The freehand method
Fit the curve by looking at the graph
Costly and barely reliable for large-scaled data
mining
The least-square method
Find the curve minimizing the sum of the
squares of the deviation of points on the curve
from the corresponding data points
The moving-average method
24
Moving Average
Moving average of order n
Smoothes the data
Eliminates cyclic, seasonal and irregular
movements
Loses the data at the beginning or end of a series
Sensitive to outliers (can be reduced by weighted
moving average)
25
Trend Discovery in Time-Series (1): Estimation of Seasonal
Variations
Seasonal index Set of numbers showing the relative values of a variable
during the months of the year E.g., if the sales during October, November, and
December are 80%, 120%, and 140% of the average monthly sales for the whole year, respectively, then 80, 120, and 140 are seasonal index numbers for these months
Deseasonalized data Data adjusted for seasonal variations for better trend and
cyclic analysis Divide the original monthly data by the seasonal index
numbers for the corresponding months
April 18, 2023Data Mining: Concepts and
Techniques 26
Seasonal Index
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8 9 10 11 12
Month
Seasonal Index
Raw data from http://www.bbk.ac.uk/manop/man/docs/QII_2_2003%20Time%20series.pdf
27
Trend Discovery in Time-Series (2)
Estimation of cyclic variations If (approximate) periodicity of cycles occurs,
cyclic index can be constructed in much the same manner as seasonal indexes
Estimation of irregular variations By adjusting the data for trend, seasonal and
cyclic variations With the systematic analysis of the trend, cyclic,
seasonal, and irregular components, it is possible to make long- or short-term predictions with reasonable quality
28
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in
Time Series Data
Summary
29
Similarity Search in Time-Series Analysis
Normal database query finds exact match Similarity search finds data sequences that differ
only slightly from the given query sequence Two categories of similarity queries
Whole matching: find a sequence that is similar to the query sequence
Subsequence matching: find all pairs of similar sequences
Typical Applications Financial market Market basket data analysis Scientific databases Medical diagnosis
30
Data Transformation
Many techniques for signal analysis require the data to be in the frequency domain
Usually data-independent transformations are used The transformation matrix is determined a priori
discrete Fourier transform (DFT) discrete wavelet transform (DWT)
The distance between two signals in the time domain is the same as their Euclidean distance in the frequency domain
31
Discrete Fourier Transform
DFT does a good job of concentrating energy in the first few coefficients
If we keep only first a few coefficients in DFT, we can compute the lower bounds of the actual distance
Feature extraction: keep the first few coefficients (F-index) as representative of the sequence
32
DFT (continued)
Parseval’s Theorem
The Euclidean distance between two signals in the time domain is the same as their distance in the frequency domain
Keep the first few (say, 3) coefficients underestimates the distance and there will be no false dismissals!
1
0
21
0
2 ||||n
ff
n
tt Xx
|])[(])[(||][][|3
0
2
0
2
f
n
t
fQFfSFtQtS
33
Multidimensional Indexing in Time-Series
Multidimensional index construction Constructed for efficient accessing using the first
few Fourier coefficients Similarity search
Use the index to retrieve the sequences that are at most a certain small distance away from the query sequence
Perform post-processing by computing the actual distance between sequences in the time domain and discard any false matches
34
Subsequence Matching
Break each sequence into a set of pieces of window with length w
Extract the features of the subsequence inside the window
Map each sequence to a “trail” in the feature space
Divide the trail of each sequence into “subtrails” and represent each of them with minimum bounding rectangle
Use a multi-piece assembly algorithm to search for longer sequence matches
35
Analysis of Similar Time Series
36
Enhanced Similarity Search Methods
Allow for gaps within a sequence or differences in offsets or amplitudes
Normalize sequences with amplitude scaling and offset translation
Two subsequences are considered similar if one lies within an envelope of width around the other, ignoring outliers
Two sequences are said to be similar if they have enough non-overlapping time-ordered pairs of similar subsequences
Parameters specified by a user or expert: sliding window size, width of an envelope for similarity, maximum gap, and matching fraction
37
Steps for Performing a Similarity Search
Atomic matching Find all pairs of gap-free windows of a small
length that are similar Window stitching
Stitch similar windows to form pairs of large similar subsequences allowing gaps between atomic matches
Subsequence Ordering Linearly order the subsequence matches to
determine whether enough similar pieces exist
38
Similar Time Series Analysis
VanEck International Fund Fidelity Selective Precious Metal and Mineral Fund
Two similar mutual funds in the different fund group
39
Query Languages for Time Sequences
Time-sequence query language Should be able to specify sophisticated queries like
Find all of the sequences that are similar to some sequence in class A, but not similar to any sequence in class B
Should be able to support various kinds of queries: range queries, all-pair queries, and nearest neighbor queries
Shape definition language Allows users to define and query the overall shape of time
sequences Uses human readable series of sequence transitions or
macros Ignores the specific details
E.g., the pattern up, Up, UP can be used to describe increasing degrees of rising slopes
Macros: spike, valley, etc.
40
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in
Time Series Data
Summary
41
Sequence Distance
A function that measures the differentness of two sequences (of possibly unequal length)
Example: Euclidean Distance between TS Q,C
n
i ii cqCQD1
2)(),(
42
Motif: Basic Concepts
What is a motif? A previously unknown, frequently
occurring sequential pattern
Match: Given subsequences Q,C ⊆ T,
C is a match for Q iff for some R
Non-Trivial Match: C = T[p..*], Q = T[q..*] and C match
Q. If p = q or non-match N = T[s..*]∄ such that s between
p,q then match is non-trivial.
(i.e. C,Q must be separated by a non-match)
1-Motif: the subsequence with most non-trivial matches
(least variance decides ties)
k-Motif: Ck such that D(Ck,Ci) > 2R i [1,k)∀ ∈
RCQD ),(
43
SAX: Symbolic Aggregate approXimation
Dim. Reduction/Compression
“Symbolic Aggregate approXimation”
SAX : ℝ → ∑
SAX : ccbaabbbabcbcb↦
Essentially an alphabet over the Piecewise Aggregate
Approximation (PAA) rank
Faster, simpler, more compression, yet on par with DFT,
DWT and other dim. reductions
44
SAX Illustration
45
SAX Algorithm
Parameters: alphabet size, word (segment) length (or output rate)
1. Select probability distribution for TS
2. z-Normalize TS
3. PAA: Within each time interval, calculate aggregated value (mean) of the segment
4. Partition TS range by equal-area partitioning the PDF into n partitions (eq. freq. binning)
5. Label each segment with arank ∑∈ for aggregate’s
corresponding partition rank
46
Finding Motifs in a Time Series
EMMA Algorithm: Finds 1-(k-)motif of fixed length n SAX Compression (Dim. Reduction)
Possible to store D(i,j) ∀(i,j) ∈ ∑∑ Allows use of various distance measures (Minkowski,
Dynamic Time Warping) Multiple Tiers
Tier 1: Uses sliding window to hash length-w SAX subsequences (aw addresses, total size O(m)).Bucket B with most collisions & buckets with MINDIST(B) < R form neighborhood of B.
Tier 2: Neighborhood is pruned using more precise ADM algorithm. Ni with max. matches is 1-motif. Early stop if |ADM matches| > maxk>i(|neighborhoodk|)
47
Hashing
c e c a b b c b a c c e c a b b c b a cc c c c b b c c d c
w
n
2 4 2 0 1 1 2 1 0 25
2 2 2 2 1 1 2 2 3 25
2 4 2 0 1 1 2 1 0 25
… …
… …
…… …
…
…
…
…
Classification in Time Series
Application: Finance, Medicine
1-Nearest Neighbor Pros: accurate, robust, simple Cons: time and space complexity (lazy
learning); results are not interpretable
0 200 400 600 800 1000 1200
false nettles
stinging nettles
false nettles
Shapelet
stinging nettles
false nettles stinging nettles
Leaf Decision Tree
Shapelet Dictionary
5.1
yes
no
I
I
0 1
Testing the utility of a candidate shapelet
Arrange the time series objects
based on the distance from candidate
Find the optimal split point (maximal information gain)
Pick the candidate achieving best utility as the shapelet
Split Point
0
candidate
Information gain
false nettles stinging nettles
Leaf Decision Tree
Shapelet Dictionary
5.1
yes
no
I
I
0 1
false nettles
stinging nettles
false nettles
false nettles
Shapelet
stinging nettles
ClassificationClassification
52
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in
Time Series Data
Summary
53
Summary
Time series analysis is an important
research field in data mining
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in Time
Series Data
54
References on Time-Series Similarity Search
R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. FODO’93 (Foundations of Data Organization and Algorithms).
R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. VLDB'95.
R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying shapes of histories. VLDB'95. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series
databases. SIGMOD'94.
J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with
Implications for Streaming Algorithms”, Data Mining and Knowledge discovery, 2003
P. Patel, E. Keogh, J. Lin, and S. Lonardi, “Mining Motifs in Massive Time Series Databases”,
ICDM’02 D. Rafiei and A. Mendelzon. Similarity-based queries for time series data. SIGMOD'97. Y. Moon, K. Whang, W. Loh. Duality Based Subsequence Matching in Time-Series Databases,
ICDE’02 B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time
warping. ICDE'98. B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining
for co-evolving time sequences. ICDE'00. D. Shasha and Y. Zhu. High Performance Discovery in Time Series: Techniques and Case Studies,
SPRINGER, 2004
L. Ye and E. Keogh, “Time Series Shapelets: A New Primitive for Data Mining”, KDD’09
April 18, 2023Data Mining: Concepts and
Techniques 55
Top Related