Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos...
-
Upload
prudence-cox -
Category
Documents
-
view
222 -
download
0
Transcript of Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos...
Streaming Pattern Discovery in Multiple Time-Series
Jimeng Sun
Spiros Papadimitrou Christos Faloutsos
PARALLEL DATA LABORATORYCarnegie Mellon University
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 2
Motivation
• Co-evolving time series (data streams) appear in many different applications—e.g.:• Disk access traffic in network clusters• Internet flow traffic in a network• Temperatures in a large building• Chlorine concentration in water distribution
network
Values are typically correlated
Would be very useful if we could summarize them on the fly
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 3
Example
water distribution network
normal operation
Phase 1 Phase 2 Phase 3
: : : : : :
: : : : : :
chlo
rine c
once
ntr
ati
ons
sensorsnear leak
sensorsawayfrom leak
time
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 4
• Discover “hidden” (latent) variables for:• Summarization of main trends for users• Efficient forecasting, spotting outliers/anomalies
• Incremental, real-time computation• Limited memory requirements
Goals
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 5
Phase 1 Phase 2 Phase 3
: : : : : :
: : : : : :
Example: chlorine measurements
water distribution network
normal operation major leak
chlo
rine c
once
ntr
ati
ons
sensorsnear leak
sensorsawayfrom leak
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 6
Phase 1
k = 1
Example: hidden variable
actual measurements(n streams)
k hidden variable(s)
We would like to discover a few “hidden(latent) variables” that summarize the key trends
Phase 1
: : : : : :
: : : : : :
chlo
rine c
once
ntr
ati
ons
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 7
Example: hidden variable trackingch
lori
ne c
once
ntr
ati
ons
Phase 1 Phase 1Phase 2 Phase 2
actual measurements(n streams)
k hidden variable(s)
k = 2
: : : : : :
: : : : : :
We would like to discover a few “hidden(latent) variables” that summarize the key trends
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 8
Example: hidden variable trackingch
lori
ne c
once
ntr
ati
ons
Phase 1 Phase 1Phase 2 Phase 2Phase 3 Phase 3
actual measurements(n streams)
k hidden variable(s)
k = 1
: : : : : :
: : : : : :
We would like to discover a few “hidden(latent) variables” that summarize the key trends
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 9
Method outline
• Step 1: How to capture correlations?
• Step 2: How to do it incrementally, when we have a very large number of points?
• Step 3: How to dynamically adjust the number of hidden variables?
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 10
1. How to capture correlations?
20oC
30oC
Tem
pera
ture
T1
• First sensor
time
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 11
1. How to capture correlations?
• First sensor• Second sensor
20oC
30oC
Tem
pera
ture
T2
time
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 12
20oC 30oC
1. How to capture correlations
20oC
30oC
Temperature T1
•Correlations:
•Let’s take a closer look at the first three value-pairs…
Tem
pera
ture
T2
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 13
20oC 30oC
1. How to capture correlations
20oC
30oC
Tem
pera
ture
T2
Temperature T1
•First three lie (almost) on a line in the space of value-pairs…
O(n) numbers for the slope, and One number for each value-pair (offset on line)
offse
t = “h
idde
n va
riabl
e”
time=1
time=2
time=3
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 14
1. How to capture correlations
20oC 30oC
20oC
30oC
Tem
pera
ture
T2
Temperature T1
•Other pairs also follow the same pattern: they lie (approximately) on this line
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 15
Method outline
• Step 1: How to capture correlations?
• Step 2: How to do it incrementally, when we have a very large number of points?
• Step 3: How to dynamically adjust the number of hidden variables?
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 16
From hidden variables
Experiments: chlorine concentration
166 streams2 hidden variables (~4% error)
Measurements
Reconstruction
[CMU Civil Engineering]
from sensor
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 17
Experiments: chlorine concentration
hidden variables
[CMU Civil Engineering]
• Both capture global, periodic pattern• Second: ~ first, but “phase-shifted”• Can express any “phase-shift”…
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 18
Conclusion• Many settings with hundreds of streams, but
• Stream values are, by nature, related• We proposed a method to
• discover hidden variables as summarization of main trends for users
• require only incremental computation without buffering of any past data
• Future work:• Apply on more applications: e.g, performance
monitoring for storage system, network system.
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 19
Related work
• Stream SVD [Guha, Gunopulos, Koudas / KDD03]• StatStream [Zhu, Shasha / VLDB02]• Clustering• [Aggarwal, Han, Yu / VLDB03], [Guha, Meyerson,
et al / TKDE],• [Lin, Vlachos, Keogh, Gunopulos / EDBT04], • Classification• [Wang, Fan, et al / KDD03], [Hulten, Spencer,
Domingos / KDD01]• Piecewise approximations• [Palpanas, Vlachos, Keogh, etal / ICDE 2004]
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 20
Experiments: Light measurements
54 sensors2-4 hidden variables (~6% error)
measurementreconstruction
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 21
Experiments: Light measurements
• 1 & 2: main trend (as before)• 3 & 4: potential anomalies and
outliers
hidden variables
intermittentintermittent
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 22
Stream correlations
• Step 1: How to capture correlations?
• Step 2: How to do it incrementally, when we have a very large number of points?
• Step 3: How to dynamically adjust the number of hidden variables?
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 23
2. Incremental update
error
20oC 30oC
20oC
30oC
Tem
pera
ture
T2
Temperature T1
• For each new point
• Project onto current line
• Estimate error
New value
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 24
2. Incremental update
error
20oC
30oC
20oC 30oC
Tem
pera
ture
T2
Temperature T1
• For each new point• Project onto
current line• Estimate error• Rotate line in the
direction of the error and in proportion to its magnitude
O(n) time New value
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 25
2. Incremental update
20oC
30oC
20oC 30oC
Tem
pera
ture
T2
Temperature T1
• For each new point• Project onto
current line• Estimate error• Rotate line in the
direction of the error and in proportion to its magnitude
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 26
Stream correlationsPrincipal Component Analysis (PCA)
• The “line” is the first principal component (PC) vector
• This line is optimal: it minimizes the sum of squared projection errors
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 27
2. Incremental updateGiven number of hidden variables k
• Assuming k is known• We know how to update the slope• (detailed equations in paper)
• For each new point x and for i = 1, …, k :
• yi := wiTx (proj. onto wi)
• di di + yi2 (energy i-th eigenval.)
• ei := x – yiwi (error)
• wi wi + (1/di) yiei (update estimate)
• x x – yiwi (repeat with remainder)
y1
w1
xe1
w1 updated
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 28
Stream correlations
• Step 1: How to capture correlations?
• Step 2: How to do it incrementally, when we have a very large number of points?
• Step 3: How to dynamically adjust k, the number of hidden variables?
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 29
T3
3. Number of hidden variables
• If we had three sensors with similar measurements
• Again: points would lie on a line (i.e., one hidden variable, k=1), but in 3-D space
T1
T2
value-tuple space
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 30
T3
3. Number of hidden variables
• Assume one sensor intermittently gets stuck
• Now, no line can give a good approximation
T1
T2
value-tuple space
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 31
T3
3. Number of hidden variables
• Assume one sensor intermittently gets stuck
• Now, no line can give a good approximation
• But a plane will do (two hidden variables, k = 2)
T1
T2
value-tuple space
<your name here> © Apr 21, 2023 http://www.pdl.cmu.edu/ 32
Number of hidden variables (PCs)
•Keep track of energy maintained by approximation with k variables (PCs):
• Reconstruction accuracy, w.r.t. total squared error
•Increment (or decrement) k if fraction of energy maintained goes below (or above) a threshold
• If below 95%, k k 1
• If above 98%, k k 1