The Automatic Explanation of Multivariate Time Series (MTS)

32
The Automatic Explanation of Multivariate Time Series (MTS) Allan Tucker

description

The Automatic Explanation of Multivariate Time Series (MTS). Allan Tucker. The Problem - Data. Datasets which are Characteristically: High Dimensional MTS Large Time Lags Changing Dependencies Little or No Available Expert Knowledge. The Problem - Requirement. - PowerPoint PPT Presentation

Transcript of The Automatic Explanation of Multivariate Time Series (MTS)

Page 1: The Automatic Explanation of Multivariate Time Series (MTS)

The Automatic Explanation of Multivariate Time Series (MTS)

Allan Tucker

Page 2: The Automatic Explanation of Multivariate Time Series (MTS)

The Problem - Data

• Datasets which are Characteristically:– High Dimensional MTS – Large Time Lags– Changing Dependencies– Little or No Available Expert Knowledge

Page 3: The Automatic Explanation of Multivariate Time Series (MTS)

• Lack of Algorithms to Assist Users in Explaining Events where:– Model Complex MTS Data– Learnable from Data with Little or No User

Intervention– Transparency Throughout the Learning and

Explaining Process is Vital

The Problem - Requirement

Page 4: The Automatic Explanation of Multivariate Time Series (MTS)

Contribution to Knowledge

• Using a Combination of Evolutionary Programming (EP) and Bayesian Networks (BNs) to Overcome Issues Outlined

• Extending Learning Algorithms for BNs to Dynamic Bayesian Networks (DBNs) with Comparison of Efficiency

• Introduction of an Algorithm for Decomposing High Dimensional MTS into Several Lower Dimensional MTS

Page 5: The Automatic Explanation of Multivariate Time Series (MTS)

Contribution to Knowledge (Continued)

• Introduction of New EP-Seeded GA Algorithm

• Incorporating Changing Dependencies

• Application to Synthetic and Real-World Chemical Process Data

• Transparency Retained Throughout Each Stage

Page 6: The Automatic Explanation of Multivariate Time Series (MTS)

Real Data

Data Preparation

Search Methods

Variable Groupings

Synthetic Data

Explanation

Model Building

Evaluation

Changing Dependencies

Framework Pre-processing

Page 7: The Automatic Explanation of Multivariate Time Series (MTS)

Key Technical Points 1Comparing Adapted Algorithms

• New Representation• K2/K3 [Cooper and Herskovitz]• Genetic Algorithm [Larranaga]• Evolutionary Algorithm [Wong]• Branch and Bound [Bouckaert]• Log Likelihood / Description Length• Publications:

– International Journal of Intelligent Systems, 2001

Page 8: The Automatic Explanation of Multivariate Time Series (MTS)

Key Technical Points 2Grouping

• A Number of Correlation Searches• A Number of Grouping Algorithms• Designed Metrics• Comparison of All Combinations• Synthetic and Real Data• Publications:

– IDA99– IEEE Trans System Man and Cybernetics 2001– Expert Systems 2000

Page 9: The Automatic Explanation of Multivariate Time Series (MTS)

Key Technical Points 3EP-Seeded GA

• Approximate Correlation Search Based on the One Used in Grouping Strategy

• Results Used to Seed Initial Population of GA• Uniform Crossover• Specific Lag Mutation• Publications:

– Genetic Algorithms and Evolutionary Computation Conference 1999 (GECCO99)

– International Journal of Intelligent Systems, 2001– IDA2001

Page 10: The Automatic Explanation of Multivariate Time Series (MTS)

Key Technical Points 4Changing Dependencies

• Dynamic Cross Correlation Function for Analysing MTS

• Extend Representation Introduce a Heuristic Search - Hidden Controller Hill Climb (HCHC)– Hidden Variables to Model State of the System– Search for Structure and Hidden States Iteratively

Page 11: The Automatic Explanation of Multivariate Time Series (MTS)

Future Work

• Parameter Estimation

• Discretisation

• Changing Dependencies

• Efficiency

• New Datasets – Gene Expression Data– Visual Field Data

Page 12: The Automatic Explanation of Multivariate Time Series (MTS)

DBN Representation

t-4 t-3 t-2 t-1 t

a0(t)

a1(t)

a2(t)

a3(t)

a4(t)

a2(t-2)

a3(t-2)

a4(t-3)

a3(t-4)

(3,1,4)(4,2,3)(2,3,2)(3,0,2)(3,4,2)

Page 13: The Automatic Explanation of Multivariate Time Series (MTS)

Sample DBN Search Results

5000

6000

7000

8000

9000

10000

11000

0 5000 10000 15000

Function Calls

Des

crip

tio

n L

eng

th

K3

EP

GA

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 200 400 600 800 1000

Function Calls

Des

crip

tio

n L

eng

th

K3

EP

GA

N = 5, MaxT = 10 N = 10, MaxT = 60

Page 14: The Automatic Explanation of Multivariate Time Series (MTS)

Grouping

One High Dimensional

MTS (A)

1. Correlation Search (EP)

2. GroupingAlgorithm (GGA)

Several Lower Dimensional

MTS

List

(a, b, lag)(a, b, lag)

(a, b, lag)

12

R

G{0,3}

{1,4,5}{2}

Page 15: The Automatic Explanation of Multivariate Time Series (MTS)

Sample Grouping Results

0 1 2

3 4 5 6 7

8 9 10 11 12

13 14 15 16 17 18 19 20 21 22

23 24 25 26 27 28 29 30 31 3233 34 35 36 37 38 39 40 41 4243 44 45 46 47 48 4950 51 52 53 54 5556 57 5859 60

0 6123 4 5 789 1011 121314 15 20 21 2216 17 18 1923 24 25 26 27 28 29 30 31 3233 34 35 36 37 38 39 40 41 4243 44 45 46 47 48 4950 51 52 53 54 5556 57 5859 60

Original Synthetic MTS Groupings

Groupings Discoveredfrom Synthetic Data

25

27

29

31

33

35

37

39

41

43

45

1 501 1001 1501 2001 2501 3001

Time

Mag

nit

ud

e (T

emp

etc

)

10

12

14

16

18

20

22

39

11

15

Sample of Variables from a Discovered Oil Refinery Data Group

Page 16: The Automatic Explanation of Multivariate Time Series (MTS)

Parameter Estimation• Simulate Random Bag (Vary R, s and c, e)• Calculate Mean and SD for Each Distribution (the

Probability of Selecting e from s)• Test for Normality (Lilliefors’ Test)• Symbolic Regression (GP) to Determine the Function

for Mean and SD from R, s and c (e will be Unknown)

• Place Confidence Limits on the P(Number of Correlations Found e)

Page 17: The Automatic Explanation of Multivariate Time Series (MTS)

0: (a,b,l) 1: (a,b,l) 2: (a,b,l)

EPListSize: (a,b,l)

Final EPList

EP

0: ((a,b,l),(a,b,l)…(a,b,l))1: ((a,b,l),(a,b,l)…(a,b,l))2: ((a,b,l),(a,b,l)…(a,b,l))

GAPopsize: ((a,b,l) … (a,b,l))

GA

Initial GAPopulationDBN

EP-Seeded GA

Page 18: The Automatic Explanation of Multivariate Time Series (MTS)

EP-Seeded GA Results

6000

6500

7000

7500

8000

8500

9000

9500

10000

10500

0 500 1000 1500 2000 2500 3000

Function Calls

Des

crip

tio

n L

eng

th EP

EP Seeded GA(c=20%)

EP Seeded GA(c=100%)

N = 10, MaxT = 60 N = 20, MaxT = 60

14000

15000

16000

17000

18000

19000

20000

21000

0 1000 2000 3000 4000

Function Calls

Des

crip

tio

n L

eng

th EP

EP Seeded GA(c=100%)

EP Seeded GA(c=20%)

Page 19: The Automatic Explanation of Multivariate Time Series (MTS)

Varying the value of c

-7000

-6500

-6000

-5500

-5000

-4500

-4000

0 1000 2000 3000 4000 5000

Function Calls

Lo

g L

ikel

iho

od

c=10%

c=20%

c=30%

c=50%

c=70%

c=100%

EP

Page 20: The Automatic Explanation of Multivariate Time Series (MTS)

P(TGF instate_0) = 1.0t

t-1

t-11

t-13

t-16

t-20

t-60

P(TT instate_0) = 1.0 P(BPF instate_3) = 1.0

P(TT instate_1) = 0.446

P(TGF instate_3) = 1.0

P(SOT instate_0) = 0.314

P(C2% instate_0) = 0.279

P(T6T instate_0) = 0.347

P(RinT instate_0) = 0.565

Time Explanation

Page 21: The Automatic Explanation of Multivariate Time Series (MTS)

Changing Dependencies

20

25

30

35

40

45

50

1 501 1001 1501 2001 2501 3001 3501

Time (Minutes)

Var

iab

le M

agn

itu

de

7

7.5

8

8.5

9

9.5

10

10.5

A/M_GB

TGF

20

40

60

80

100

120

140

1 501 1001 1501 2001 2501 3001 3501 4001 4501 5001

Time (Minutes)

Var

iab

le M

agn

itu

de

7

7.5

8

8.5

9

9.5

10

10.5

A/M_GB

TGF

Page 22: The Automatic Explanation of Multivariate Time Series (MTS)

Dynamic Cross- Correlation

Function

1 6

11 16

21

26

31

36

41

46

51

56

61

S1

S5

S9

S13

S17

S21

S25

S29

S33

S37

S41

S45

S49

S53

S57

S61

S65

S69

S73

S77

S81

S85

Time Lag

Win

do

w P

os

ition

0.3-0.4

0.2-0.3

0.1-0.2

0-0.1

-0.1-0

-0.2--0.1

-0.3--0.2

Page 23: The Automatic Explanation of Multivariate Time Series (MTS)

Hidden Variable - OpState

t-4 t-3 t-2 t-1 t

a2(t) OpState2a2(t-1)

a3(t-2)

a0(t-4)

Page 24: The Automatic Explanation of Multivariate Time Series (MTS)

Hidden Controller Hill Climb

Update Segment_Lists through Op_State Parameter

Estimation

Update DBN_List through DBN Structure

Search

< DBN_List > < Segment_Lists >

Score

Page 25: The Automatic Explanation of Multivariate Time Series (MTS)

HCHC Results - Oil Refinery Data

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

Window Position

Mo

st S

ign

ific

ant

Co

rrel

atio

n

A/M_GB

SOT

T6T

Segments

Page 26: The Automatic Explanation of Multivariate Time Series (MTS)

HCHC Results - Synthetic Data

MTS 1 MTS 2 MTS 3Number of Original Links 12 26 16Spurious Links 2.3 2.9 4.0Implicit Links 2.3 1.0 0.4Missed Links 1.0 2.8 1.4Total SD 5.6 6.7 5.8Original Segmentation Length 1000 1000 500Segmentation Error 15.89 16.08 14.157Missed Segmentations 0.6 0.0 1.2Spurious Segmentation 0.9 0.5 0.8

Generate Data from Several DBNsAppend each Section of Data Together to Form One MTS with Changing DependenciesRun HCHC

Page 27: The Automatic Explanation of Multivariate Time Series (MTS)

t

t-1

t-3

t-5

t-6

t-9

Time Explanation

P(OpState1 is 0) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(OpState1 is 0) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(a2 is 0) = 0.758

P(a2 is 0) = 0.545

P(a0 is 0) = 0.968

P(a0 is 1) = 0.517

P(OpState0 is 0) = 0.519

P(a0 is 1) = 0.778P(OpState0 is 0) = 0.720

Page 28: The Automatic Explanation of Multivariate Time Series (MTS)

t

t-1

t-3

t-5

t-6

t-7

t-9

Time Explanation

P(OpState1 is 4) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(OpState1 is 4) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(a2 is 1) = 0.570

P(a2 is 1) = 0.974

P(a0 is 0) = 0.506

P(a0 is 1) = 0.549

P(OpState2 is 3) = 0.210

P(a2 is 0) = 0.882P(OpState2 is 4) = 0.222

Page 29: The Automatic Explanation of Multivariate Time Series (MTS)

Process Diagram

TT

T6T

T36T

RBT

SOTT11SOFT13

TGF

BPF

%C3

%C2

RINT

FF

PGM

PGB

AFT

C11/3T

Page 30: The Automatic Explanation of Multivariate Time Series (MTS)

Typical Discovered Relationships

TT

T6T

T36T

RBT

SOTT11

SOFT13

AFT

TGF

BPF

%C3

%C2

RINT

FF

C11/3T

PGM

PGB

Page 31: The Automatic Explanation of Multivariate Time Series (MTS)

ParametersDBN Search GA EP

PopSize 100 10MR 0.1 0.8CR 0.8 ---Gen Based on FC Based on FC

Correlation Searchc - Approx. 20% of sR - Approx. 2.5% of s

Grouping GA Synth. 1 Synth. 2-6 Oil

PopSize 150 100 150CR 0.8 0.8 0.8MR 0.1 0.1 0.1Gen 150 100 (1000 for GPV) 150

Page 32: The Automatic Explanation of Multivariate Time Series (MTS)

ParametersEP-Seeded GAc - Approx. 20% of sEPListSize - Approx. 2.5% of s GAPopSize - 10MR - 0.1CR - 0.8LMR - 0.1Gen - Based on FC

HCHCOil Synthetic

DBN_Iterations 1×106 5000Winlen 1000 200Winjump 500 50