BRAID: Stream Mining through Group Lag Correlations

32
BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Chri stos Faloutsos SIGMOD 2005

description

BRAID: Stream Mining through Group Lag Correlations. Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005. Introduction. Lag correlations : For example: Higher amounts of fluoride in water → fewer dental cavities some years later Goal : - PowerPoint PPT Presentation

Transcript of BRAID: Stream Mining through Group Lag Correlations

Page 1: BRAID: Stream Mining through Group Lag Correlations

BRAID: Stream Mining through Group Lag Correlations

Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos

SIGMOD 2005

Page 2: BRAID: Stream Mining through Group Lag Correlations

IntroductionIntroduction

Lag correlations :Lag correlations : For example:For example:

Higher amounts of fluoride in water Higher amounts of fluoride in water → → fewer dental cavities some years laterfewer dental cavities some years later

Goal : Goal : Monitor multiple numerical streams Monitor multiple numerical streams

determine the pair correlated with lag and determine the pair correlated with lag and the valuethe value

Page 3: BRAID: Stream Mining through Group Lag Correlations

Introduction Introduction

k numerical sequences k numerical sequences XX11,…X,…Xk k , , repreport all pair of ort all pair of XXii and and XXjj which which XXii follo follow w XXjj with lag with lag ll

Page 4: BRAID: Stream Mining through Group Lag Correlations

Introduction Introduction

Page 5: BRAID: Stream Mining through Group Lag Correlations

IntroductionIntroduction

In this paper, propose BRAID handle In this paper, propose BRAID handle data stream of semi-infinite lengthdata stream of semi-infinite length Any time processing, and fastAny time processing, and fast NimbleNimble AccurateAccurate Small resource consumptionSmall resource consumption

Page 6: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

Data stream Data stream X X : {: {xx11, …, , …, xxtt, ..., , ..., xxnn} , } , xxnn is the is the most recent valuemost recent value

RR(0) : X and Y with the same length n and (0) : X and Y with the same length n and have zero lag have zero lag

ρρ Coefficient : Coefficient :

Page 7: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

For lag For lag ll ,consider common part of ,consider common part of XX and and shifted shifted Y Y , only n-, only n-l l time tickstime ticks

Page 8: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

Page 9: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

RR((ll) : correlation coefficient, X is delayed ) : correlation coefficient, X is delayed by by ll

Score at lag Score at lag l l ::

Page 10: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

RR((ll) for large value of lag ) for large value of lag ll ≈ ≈ nn, the origi, the original and shifted time sequence have too fnal and shifted time sequence have too few overlappingew overlapping Restrict maximum lag Restrict maximum lag mm to be to be nn/2/2

Page 11: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

Naive solution :Naive solution : At time At time nn, access all value of , access all value of XX and and YY, compu, compu

te te RR((ll) of all value lag ) of all value lag ll(=0,1,…)(=0,1,…) Choose earliest max score above Choose earliest max score above r r , or repor, or repor

t no lagt no lag The solution based on three major stepThe solution based on three major step

Page 12: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

Need some sufficient statistics for Need some sufficient statistics for RR to c to computed easilyomputed easily SSxx((ll,,nn) = : sum of ) = : sum of XX of length of length nn SSxxxx((ll,,nn) = : sum of square ) = : sum of square XX of length of length nn SSxyxy((ll) = : sum of square ) = : sum of square XX of length of length nn

n

tx

1t

2

1

n

ttx

n

lt

ttyx1

1

Page 13: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

RR((ll) is obtained :) is obtained :

Page 14: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

RR((ll) can estimate at any point time, only ) can estimate at any point time, only need to keep track five sufficient statistineed to keep track five sufficient statisticscs

It still needs linear time to compute the It still needs linear time to compute the cross-correlation function between two cross-correlation function between two sequencessequences

Page 15: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

Propose to keep track of only a geometric Propose to keep track of only a geometric progression of the lag value : progression of the lag value : ll= 0,1,2,..2= 0,1,2,..2ii,.,.

Only O(logOnly O(lognn) number to track of, instead o) number to track of, instead of O(f O(nn) that “Naïve solution” requires) that “Naïve solution” requires

Space required grow linearly with length Space required grow linearly with length nn

Page 16: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method In order to compute In order to compute RR((ll) at any time, keep slidi) at any time, keep slidi

ng window of size ng window of size ll, , mm==nn/2 need O(/2 need O(nn) space) space

Instead of operating on original time sequence,Instead of operating on original time sequence, also compute their smoothed version by com also compute their smoothed version by computing non-overlapping windowsputing non-overlapping windows

Page 17: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method Window size : power of g=2Window size : power of g=2 XX : original time sequence : original time sequence AAxxh h : smoothed version with window of length 2: smoothed version with window of length 2hh

AAxx00 : original sequence, A : original sequence, Axx11 : consists of n/2 ticks : consists of n/2 ticks ,..etc ,..etc

AAxxh h ‘s sufficient statistic need compute every 2‘s sufficient statistic need compute every 2hh time tickstime ticks

At time n, need O(log At time n, need O(log nn) level, for each level com) level, for each level compute sufficient statisticpute sufficient statistic

Page 18: BRAID: Stream Mining through Group Lag Correlations
Page 19: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

In contrast with small lags, the larger onIn contrast with small lags, the larger one are sparsee are sparse Use cubic spline to interpolate the missing cUse cubic spline to interpolate the missing c

orrelation coefficient orrelation coefficient

Page 20: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method

AAxxhh(t) : window average at time tick t for (t) : window average at time tick t for level level hh

AAxxhh(0) ≡ (0) ≡ xxt t

Page 21: BRAID: Stream Mining through Group Lag Correlations

Proposed methodProposed method Sufficient statistics:Sufficient statistics:

Page 22: BRAID: Stream Mining through Group Lag Correlations
Page 23: BRAID: Stream Mining through Group Lag Correlations

Enhanced BRAIDEnhanced BRAID

If two sequence of size If two sequence of size ≈ 2≈ 22020, require ab, require about 5*log 2out 5*log 22020 = 5*20=100 float numbers , = 5*20=100 float numbers , about 800 bytes about 800 bytes

Large memory available, propose a soluLarge memory available, propose a solution to probe more but use O(log tion to probe more but use O(log nn) spac) spacee

Use mix of arithmetic plus geometric proUse mix of arithmetic plus geometric probingbing

Page 24: BRAID: Stream Mining through Group Lag Correlations

Enhanced BRAIDEnhanced BRAID

BRAID use only one window at each smoBRAID use only one window at each smoothing levelothing level

Propose use b>1 windows, b=4 insteadPropose use b>1 windows, b=4 instead Algorithm before b=1,with exception botAlgorithm before b=1,with exception bot

tom level has 2b coefficienttom level has 2b coefficient While computing While computing RR((ll), use mixture geom), use mixture geom

etric and arithmetic progression:etric and arithmetic progression:

Page 25: BRAID: Stream Mining through Group Lag Correlations

Enhanced BRAIDEnhanced BRAID

Example of enhanced BRAID of b=4Example of enhanced BRAID of b=4

The algorithm behind if b=1 also The algorithm behind if b=1 also equal to the algorithm beforeequal to the algorithm before

Page 26: BRAID: Stream Mining through Group Lag Correlations
Page 27: BRAID: Stream Mining through Group Lag Correlations
Page 28: BRAID: Stream Mining through Group Lag Correlations
Page 29: BRAID: Stream Mining through Group Lag Correlations
Page 30: BRAID: Stream Mining through Group Lag Correlations
Page 31: BRAID: Stream Mining through Group Lag Correlations

Conclusion Conclusion

Proposed BRAID to detection lag Proposed BRAID to detection lag correlation on streaming datacorrelation on streaming data At any timeAt any time Low resource consumptionLow resource consumption High accuracyHigh accuracy

Page 32: BRAID: Stream Mining through Group Lag Correlations

Thank you very much~Thank you very much~