Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi...

23
Discovering Interesting Sub- paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K. Snyder [email protected]

Transcript of Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi...

Page 1: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of

Results

Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K. Snyder

[email protected]

Page 2: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

2

Outline

• Introduction• Problem Formulation & Challenges • Computational Solutions • Experimental Evaluation • Case Study • Conclusion and Future Work

Page 3: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

3

Interesting ST Sub-path• Interesting subsets of ST paths

৹ Climate Change

৹ Transport Science

৹ Environmental Monitoring

Speed profile along a trajectory on I-95

Mississippi river

Source: http://ops.fhwa.dot.gov/tolling_pricing/value_pricing/pubs_reports/projectreports/i95managedlanes/index.htm

Sea level rise along coastal areas

Source: http://scienceblogs.com/intersection/2009/01/federal_report_warns_of_rising.php

Source: http://blog.seattlepi.com/environment/

Page 4: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

4

Sub-path of Abrupt Change• Spatial sub-path of Abrupt Change

৹ Sharp change in vegetation cover

৹ Transition between ecological zones (ecotones)

৹ Moves in response to climate change

The change is enduringly abrupt

W1=[12N, 17N]W2W3

A plot of vegetation cover along 18.5E longitude (the red line) from GIMMS vegetation dataset [1]

Vegetation Cover in Africa in NDVI (normalized difference vegetation index)

Page 5: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

5

Related WorkInteresting Sub-path Discovery

Interesting point/unit sub-path

Interesting sub-path with arbitrary length (our work)

2-D: Edge detection[4]

1-D: Change point detection, e.g., CUSUM[3]

Page 6: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

6

Our contribution

• Formalize the Interesting Sub-path Discovery problem

• A novel computational solution : SEP

• Cost model and analysis on its performance

• Case study in real application

Page 7: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

7

Problem Formulation: Basic Concepts

• Interesting Sub-path (ISP):(1). Interest Measure: Function Fspi(i, j) R, R is a real value. Fspi is an

algebraic function[5] (e.g., mean=sum/count)

(2). Interestingness test T: Fspi {True, False}

(3). Example: “average increase is at least 3.5”

8 2 3 2 7 12 16 13 18 23 121

1 2 3 4 5 6 7 8 9 10 11 12

Attribute value

Location

Difference value: 7 -6 1 -1 5 5 4 -3 5 5 -11

Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12

•Sub-path: A contiguous subset of a path

•Unit sub-path: two neighboring locations, length = 1. ৹A value is associated with each unit sub-path.

•Dominant ISP (DISP):৹An ISP that is not a subset of any other ISP.

Slope of (5, 11) = 3.5 !

Aggregate Functions

Distributive: SUM, COUNT. SUM(1, 5)= SUM(SUM(1,3), SUM(3,5))

Algebraic: AVG. AVG = SUM/COUNT.

Holistic: MEDIAN

Page 8: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

8

Problem Statement

• Given৹ A path S in a ST framework with n unit sub-paths

৹ A function f of values associated with each sub-path in S

৹ A interestingness measure (algebraic function) Fspi: Rn R

৹ A test function T: R {True, False}

• Find৹ All the dominant interesting sub-paths (DISP) in S

• Objective৹ Reduce computational cost

• Constraints৹ Correctness & CompletenessDifference value: 7 -6 1 -1 5 5 4 -3 5 5 -11

Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12

Page 9: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

9

Challenges• No pre-defined maximum length for DISPs

৹ E.g., the length can range from 1 to then |S|

• Pattern interestingness is lack of monotonicity৹ Interest measures are usually algebraic functions

৹ E.g., sub-path (8, 9) in sub-path (5, 11).

• The data volume can be very large.৹ Long time series/Fine resolution images.

৹ GPS Trajectories.

Page 10: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

10

Step 1: ISP identification• Exhaustively enumerate all the sub-paths• Scan each sub-path to compute and test the interestingness

Step 2: Dominated ISP elimination • For each ISP in the candidate set, eliminate all the ISPs it dominates

Computational Solutions: Naive Approach

• Bottleneck 1: Repetitive scans of sub-paths to computer Fspi.• Bottleneck 2: Many dominated sub-paths are generated.

Page 11: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

11

Computational Solution: SEP Approach• Solution 1: Build lookup tables for distributive functions

৹ E.g., SUM(3,5)=SUM(1,5)-SUM(1,3)

৹ Built in linear time, lookup in constant time

৹ Reversible Aggregate Function[6] : sum, count, etc.

• Solution 2: Design efficient enumeration strategies ৹ Traverse the sub-path space in certain order

৹ Following the dominance relationship

• The Sub-path Enumeration and Pruning (SEP) Approach

Sub-path (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8) (1,9) (1,10)

(1,11)

(1,12)

SUM 7 1 2 1 6 11 15 12 17 22 11

Difference value: 7 -6 1 -1 5 5 4 -3 5 5 -11

Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12

Page 12: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

12

• Step 0: Build the lookup table by scanning the entire path

• Step 1: Sub-path enumeration

• Step 2: Dominated sub-path elimination

{Identical to that of Naive Approach}

SEP with Row-wise Traversal

1-2

1-3

1-4

1-5

1-6

1-7

1-8

1-9

1-10

1-11

1-12

2-3

2-4

2-5

2-6

2-7

2-8

2-9

2-10

2-11

2-12

3-4

3-5

3-6

3-7

3-8

3-9

3-10

3-11

3-12

4-5

4-6

4-7

4-8

4-9

4-10

4-11

4-12

5-6

5-7

5-8

5-9

5-10

5-11

5-12

6-7

6-8

6-9

6-10

6-11

6-12

7-8

7-9

7-10

7-11

7-12

8-9

8-10

8-11

8-12

9-10

9-11

9-12

10-11

10-12 11-12

Page 13: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

13

SEP with Top-down Traversal

1-2

1-3

1-4

1-5

1-6

1-7

1-8

1-9

1-10

1-11

1-12

2-3

2-4

2-5

2-6

2-7

2-8

2-9

2-10

2-11

2-12

3-4

3-5

3-6

3-7

3-8

3-9

3-10

3-11

3-12

4-5

4-6

4-7

4-8

4-9

4-10

4-11

4-12

5-6

5-7

5-8

5-9

5-10

5-11

5-12

6-7

6-8

6-9

6-10

6-11

6-12

7-8

7-9

7-10

7-11

7-12

8-9

8-10

8-11

8-12

9-10

9-11

9-12

10-11

10-12 11-12

1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12

• Traversal space Grid-based DAG• A breadth-first traversal on the G-DAG

৹ A node can be visited only if none of its

predecessors is pruned.

৹ Determine the number of predecessors

1

1 2

1 2 2

1 2 2 2

1 2 2 2 2

1 2 2 2 2 2

1 2 2 2 2 2 2

1 2 2 2 2 2 2 2

1 2 2 2 2 2 2 2 2

1 2 2 2 2 2 2 2 2 2

0 1 1 1 1 1 1 1 1 1 1

1 2 3 4 5 6 7 8 9 10 11 12

1

2

3

4

5

6

7

8

9

10

11

12

Page 14: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

14

Experimental Evaluation(1)

(1) PLR = 0.1 (worst case for SEP ) (2) PLR = 1 (best case for SEP top-down)

• Pattern Length Ratio (PLR)• Longest DISP’s length against total number of unit sub-paths

Run time: Naive vs. SEP two designs.

Page 15: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

15

Experimental Evaluation(2) Performance of the two traversal designs with PLR: 0.1 1

Summary:

(1)SEP is scalable & efficient compared to the Naive approach.

(2) Top-down outperforms row-wise when data has longer DISPs.

Page 16: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

16

Case Study: Results on Spatial Paths

Input : GIMMS vegetation cover in NDVI, Aug. 1-15, 1981, Africa.

Output : Sub-paths with vegetation cover change in above data.

• Interest Measure: “Sameness Degree (SD)”

• “Average value change” against “average value change that >=Θa”

Page 17: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

17

Conclusion and Future Work

• Conclusion৹ SEP is a novel computational solution to the Interesting

Sub-path Discovery problem

৹ It is effective, efficient and scalable.

৹ A cost model is studied to analyze the performance tradeoff.

• Future Work৹ Improve algorithmic design and evaluation metric

৹ Interesting Spatial-Temporal Regions.

৹ Application on other domains (transport science, etc).

Page 18: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

18

Acknowledgements and References

• We would like to thank ৹ ACMGIS reviewers৹ Sponsor of this work: NSF, USDOD৹ Spatial Database and Data Mining Group @ UMN৹ Kim Koffolt

References[1] Tucker, C.J., J.E. Pinzon, M.E. Brown. Global inventory modeling and mapping studies. Global Land Cover Facility, University of Maryland, College Park, Maryland, 1981-2006.[2] Joint Institute for the Study of the Atmosphere and Ocean(JISAO). Sahel rainfall index. http://jisao.washington.edu/data/sahel/.[3] E. Page. Continuous inspection schemes. Biometrika, 41(1/2):100-115, 1954.[4] J. Canny. A computational approach to edge detection. Readings in computer vision: issues, problems, principles, and paradigms, 184(87-116):86, 1987.[5] S. Shekhar and S. Chawla. Spatial Ddatabases: A Tour. Prentice Hall, 2003 (ISBN 013-017480-7).[6] S. Cluet and G. Moerkotte. Efficient evaluation of aggregates on bulk types. In In Proc. Int. Workshop on Dat

abase Programming Languages, 1995

Page 19: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

19

Sub-path of Abrupt Change• Spatial sub-path of Abrupt Change

৹ Sharp change in vegetation cover

৹ Transition between ecological zones (ecotones)

৹ Moves in response to climate change

The change is enduringly abrupt

W1=[12N, 17N]W2W3

A plot of vegetation cover along 18.5E longitude (the red line) from GIMMS vegetation dataset [1]

Vegetation Cover in Africa in NDVI (normalized difference vegetation index)

• Temporal sub-path of Abrupt Change৹ Abrupt shift in precipitation, temperature, etc.

৹ Climate change detection.

Smoothed Sahel precipitation anomaly (JJASO)

Raw Sahel precipitation anomaly (JJASO)[2]

Page 20: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

21

SEP with Top-down Traversal(2)

• Determine the number of predecessors

• Use an array to record the number

of predecessors visited

1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12

Case # Predecessors

Root node 0

Boundary node 1

Inner node 2

5

11 2 10

Page 21: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

22

• Step 0: Build the lookup table by scanning the entire path

• Step 1: ISP Identification

• Step 2: Not Needed

ptv[][] : predecessors to visit; Q[]: queue for breadth-first traversal;

Q.Enqueue (S)

While Q is not empty W = Q.pop() Compute Fspi(W) using the lookup tables If T(Fspi) == TRUE Then Output W Next Loop End IF For each successor (i, j) of W update ptv[i][j] If ptv[i][j]==0 Then Q.enqueue([i,j]) End For

End While

SEP with Top-down Traversal(3)

Page 22: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

23

Theoretical Analysis• n: Number of unit sub-paths

Approach Naive SEP Row-wise SEP Top-down

Best case time complexity

O(n3) O(n2) O(n)

Worst case time complexity

O(n4) O(n2) O(n2)

Space complexity O(n) O(n) O(n2)

Page 23: Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of Results Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K.

24

Case Study: Results on Temporal Dimension

Temporal Sub-paths of abrupt precipitation change in the Sahel region, Africa.