Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of...
-
Upload
marilyn-collins -
Category
Documents
-
view
222 -
download
1
Transcript of Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of...
![Page 1: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/1.jpg)
Efficient Elastic Burst Detection in Data Streams
Yunyue Zhu and Dennis ShashaDepartment of Computer ScienceCourant Institute of Mathematical SciencesNew York University
SIGKDD 2003
![Page 2: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/2.jpg)
Abstract
Burst detection Find abnormal aggregates in data streams Sliding window
In some applications, we want to monitor many sliding window sizes simultaneously. Brute force: O(n2) Shifted Wavelet Tree: near linear time
![Page 3: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/3.jpg)
Problem Statement
For a time series x1, x2, …, xn, given a set of window sizes w1, w2, …, wm, an aggregate function F and threshold associated with each window size, f(wj), j = 1, 2, …, m
Monitoring elastics window aggregates of the time series is to find all the subsequences of all the window sizes such that the aggregate applied to the subsequences cross their window sizes' thresholds, i.e.
![Page 4: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/4.jpg)
Wavelet Tree Haar Wavelet Tree
Level 0: original time series Level 1: pair wise averages and differences o
f the adjacent data items at level 0 Level i: pair wise averages and differences o
n averages at level i - 1
The wavelet coefficients can represent the trend of the time series.
![Page 5: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/5.jpg)
Wavelet coefficient → Aggregate Average and difference → Sum Problem: the windows at the same
level are non-overlapping
Wavelet Tree (cont.)
![Page 6: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/6.jpg)
Shifted Wavelet Tree
Add additional “line” of windows They can be maintained explicitly or
implicitly.
![Page 7: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/7.jpg)
Shifted Wavelet Tree (cont.)
Any subsequence of length w, w 2≦ i is included in one of the windows at level i + 1 of the SWT.
We say that windows with size w, 2i -1 < w 2≦ i , are monitored by level i + 1 of the SW
T.
Level 3
Level 4
7 3
![Page 8: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/8.jpg)
SWT Construction
For each level i (i 1)≧ Compute the pair wise aggregate (sum) for each
two consecutive data items at level i - 1 Downsampling
sampling every second item in the series of aggregates → the input for the higher level in the SWT
O(n), n: time series length
![Page 9: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/9.jpg)
Search for a Burst
Given window size w 2≦ i, threshold f(w)
Search in two stages The potential burst is detected at the leve
l i + 1 in the SWT Detailed search in those subsequences o
f size 2i with sum f(w)≧ O(k), k: #alarms (output size)
![Page 10: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/10.jpg)
Streaming Algorithm
Assume that new data becomes available at every time unit.
The set of window sizes are 2L < w1 < w2 < … < wm < 2U.
Maintain the levels from L+2 to U+1 of the SWT that monitor those windows.
Two methods Online algorithm Batch algorithm
![Page 11: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/11.jpg)
Streaming Algorithm:Online Algorithm
Whenever a new data item becomes available Update those 2(U - L) aggregates of the windo
ws in the SWT. If the aggregate at level i exceeds δi , perform a
detailed search on those windows monitored by i.
For level i, threshold δi = min f(wj), 2i-2 < wj ≦2i-1
Response time = one time unit
![Page 12: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/12.jpg)
Streaming Algorithm:Batch Algorithm
Maintain the aggregates at level L+1 The aggregate in the most recently complet
ed window of level L+1 is updated every time unit.
An aggregate of a window at the upper levels will not be computed until all the data in that window are available.
Once an aggregate at a certain upper level is updated, we also check alarms for time intervals monitored by that level.
Higher throughput, longer response time.
![Page 13: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/13.jpg)
Other Aggregates
The monitoring of many other aggregates based on elastic windows could benefit from our data structure, as long as the following conditions holds.
1. The aggregate F is monotonically increasing or decreasing with respect to the window. e.g. Max, Count → monotonically increasing Min → monotonically increasing
2. The alarm domain is one sided, that is, monotonic increasing → [threshold, ∞) monotonic decreasing → (-∞, threshold]
![Page 14: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/14.jpg)
Extension to Two Dimensions
The problem is to report the positions of spatial sliding windows (rectangle regions) having different sizes, within which the density exceeds some predefined threshold.
Using the same techniques of SWT-1D.
Wavelet Tree 2D Shifted Wavelet Tree 2D
![Page 15: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/15.jpg)
Effectiveness Study
Bursts of the number of times that countries were mentioned in the presidential speech of the state of the union.
![Page 16: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/16.jpg)
A predefined sliding window size is insufficient.
Bursts at large time scales are not necessarily reflected at smaller time scales. may be composed of many consecutive “bumps"
Effectiveness Study (cont.)
![Page 17: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/17.jpg)
Bursts in population distribution data (1990)
Window sizes 1°x1°, 2°x2° and 5°x5° in Latitude/Longitude
Effectiveness Study (cont.)
![Page 18: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/18.jpg)
Performance Study
Experiments on a 1.5GHz Pentium 4 PC with 512 MB of main memory running Windows 2000.
Datasets The Gamma Ray data set
12 hours of data from a small region of the sky, where Gamma Ray bursts were actually reported
The data are time series of the number of photons observed (events) every 0.1 second.
Totally 19,015 events in this time series The NYSE TAQ Stock data set
Tick-by-tick trading activities of the IBM stock between July 1st, 1998 and July 1st, 2002.
5,331,145 trading records (ticks) Each record contains trading time, trading price and trading v
olume.
![Page 19: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/19.jpg)
Training threshold Use the first few hours of Gamma Ray data and
the first year of Stock data as training data. For a window of size w, we compute the aggreg
ates on the training data with sliding window of size w => → y
f(w) = avg(→ y) + ξstd(→ y)
Window sizes: 5, 10, …,5 * Nw time units Nw : #windows, varies from 5 to 50 Time units: 0.1 sec for the Gamma Ray data, an
d 1 min for the stock data.
Performance Study (cont.)
![Page 20: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/20.jpg)
The processing time of our algorithm is output-dependent.
Performance Study (cont.)
![Page 21: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/21.jpg)
Experiments on stock data
Performance Study (cont.)
![Page 22: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/22.jpg)
Use spread as aggregate function
Performance Study (cont.)
![Page 23: Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eef5503460f94bff402/html5/thumbnails/23.jpg)
Conclusion and Future Work
This paper introduces elastic window model and demonstrates the desirability of the new model.
A novel data structure for efficient detection of elastic bursts and other aggregates.
Experiments show that our algorithm is faster than a brute force algorithm by several orders of magnitude.
Future work A robust way of setting the thresholds Non-monotonic aggregates