Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

38
1 Hierarchy Navigation Framework: Supporting Scalable Interactive Exploration over Large Databases Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department Worcester Polytechnic Institute IDEAS’05 Thank you to NSF for several IDM grants for XMDV project.

description

Hierarchy Navigation Framework: Supporting Scalable Interactive Exploration over Large Databases. Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department Worcester Polytechnic Institute IDEAS’05 Thank you to NSF for several IDM grants for XMDV project. - PowerPoint PPT Presentation

Transcript of Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

Page 1: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

1

Hierarchy Navigation Framework: Supporting Scalable Interactive

Exploration over Large Databases

Nishant Mehta, Elke A. Rundensteiner and Matt WardComputer Science DepartmentWorcester Polytechnic Institute

IDEAS’05

Thank you to NSF for several IDM grants for XMDV project.

Page 2: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

3

XmdvTool: Multivariate Data Visualization

Example

MPG Cyli. HP Wt.18 8 130 3504

17 8 132 3700

.

.

.

.

.

.

.

.

.

.

.

.

40 2 100 2500

Cars Data Set

Parallel coordinate display

Dataset with 4096 points in XmdvTool 6.0

18

8

130 3504

Page 3: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

4

Hierarchical Displays [Fua:99]

C

G

Base Data Points

MPG Cyli. HP Wt.

C 6 8 130 3504

D 4 8 132 3000

G 10 6 110 2800

H 12 2 70 2100

I 15 2 80 2200

J 15 2 100 2500

B 5 8 131 3252

F 12.3 3.33 86.66 2366.6

Cars Data Set

Structure-based brush components: b- level of detaild- focus areae- focus extents

C D

B

A

E

F

G I

J

H

6

42

30

00

0 0

A 10.33 4.66 103.66 2684

E 13 33 90 2400

D

JH

I

0

Page 4: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

5

Hierarchical Displays

Page 5: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

6

Problems: Hierarchical Display

Achieved: Screen space solution to clutter

problem

But Data handing problem remains …

Cluster tree size greater than initial tree Cluster tree may not fit into main memory Structure based brush semantics involve

recursive searches over cluster tree

Page 6: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

7

Goal Overall Goal:

Scale hierarchical displays to support navigation over large hierarchies

Subgoals : Support navigation over large-scale

persistent data Store hierarchies on disk Map navigation operations to efficient queries

Meet interactive response requirements

Page 7: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

8

Support navigation operations over large scale persistent data

Overview of Approach

Meet interactive response requirements

HierarchyEncoding

Caching

Prefetching

Spatial Indexing

Page 8: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

9

Hierarchy EncodingProblem : Structure-based brush Selection semantics involve recursive search Recursive search over secondary storage is slow

Solution: Hierarchy encoding Push recursive processing into precomputation step Precompute label for each node in hierarchy Map recursive search to equivalent non-recursive one

LabelingHierarchical

DataDatabase

Hierarchy Encoding

Page 9: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

10

Structure-Based Brush Semantics [Fua:99]

Horizontal Selection Subtree (e1, e2)

Vertical Selection Level of detail (lod)

C D

B

A

E

F

G I

J

H

0.6

0.4

00

000

0.2

0 0

Node selection based on 2 steps:

Page 10: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

11

AimSelect subtree that user is interested in viewing

Approach Brush focus extents (e1,e2), select set of base points. Propagate selection: select parent(n) if n is selected

Horizontal Selection

Selected Clusters

Selected Leaves

C D

B

A

E

F

G I

J

H

0.6

0.4

0

00

0

0 0

0.2

0.3

(e1,e2) = (2/6, 11/12) , lod=0.4

Page 11: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

12

Non-Recursive Horizontal Selection

Offline Precompute intervals for each node (hmin, hmax) Interval of parent includes interval of childOnline Search for nodes that intersect brush interval (e1,e2)

C D

B

A

E

F

G I

J

H

(0,1/6) (1/6,2/6)

(2/6,3/6)(3/6,4/6)

(4/6,5/6)

(5/6,1)

(0,2/6)

(2/6,5/6)

(2/6,1)

(0,1)

0.6

0.5

0

0

0.3

0.2

00

00

(e1,e2) = (2/6, 11/12) , lod=0.4

Page 12: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

13

Vertical Selection Aim

Select points at desired lod (lod handle of SBB) Approach

Explore each branch starting at root to find node: lod(n) <= lod(brush)

C D

B

A

E

GI

J

H

0.6

0.2 0.5

0

0

0

0

0.30 0 F

SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

lod=0.4

Page 13: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

14

Non-Recursive Vertical Selection Node n satisfies vertical selection criteria iff:

C D

B

A

E

F

G I

J

H

0.2 0.5

00

0

0.30 0

lod(brush) = 0.4

0.2

,0.

6

0.5

,0.

6

0,0

.5

0

0,0

.3

0,0

.3

0,0

.3

0,0

.2

0,0

.2

0.3

,0.5

0.6

lod(n) <= lod(brush) < lod(parent(n))

Each node n, has extents (vmin,vmax)

vmin<= lod(brush) < vmax

0.6

,

SBB: (e1,e2) = (2/6, 11/12) , b=0.4

Page 14: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

15

Non-Recursive Selection

C D

B

A

E

F

G I

J

H

0.6

,

0.2

,0.

6

0.5

,0.

6

0,0

.5

0,0

.3

0,0

.3

0,0

.3

0,0

.2

0,0

.2

0.3

,0.5

(0,1)

(2/6,1)

(4/6,5/6)(3/6,4/6)

(2/6,3/6)

(2/6,5/6)`(0,1/6)

(0,2/6)`

(1/6,2/6) (5/6,1)

Selects all nodes that satisfy: hmin <= e2 and hmax >= e1 vmin <= lod(brush) < vmax

SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

Page 15: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

16

2D Hierarchy Map

0.6

0.2

0.5

03/6 4/6 5/6 1

0.3

C D G H I

B

J

F

E

1.0

1/6

Brush

A

C D

B

A

E

F

G I

J

H

0.6

,

0.2

,0.

6

0.5

,0.

6

0,0

.5

0,0

.3

0,0

.3

0,0

.3

0,0

.2

0,0

.2

0.3

,0.5

(0,1)

(2/6,1)

(4/6,5/6)(3/6,4/6)

(2/6,3/6)

(2/6,5/6)(0,1/6)

(0,2/6)

(1/6,2/6)(5/6,1)

SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

e2e1

lod

Page 16: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

17

Properties of 2D Hierarchy Map

Progressive Tree Structure Space Filling Non-Overlapping

BF

E

A

C D G H

B

J

F

E

I

1.0

0.6

0.5

0.30.2

011/6 2/6 3/6 4/6 5/6

Page 17: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

18

Navigation operations in 2D Hierarchy Map

0.6

0.2

0.5

02/6 3/6 4/6 5/6 1

0.3

C D G H I

B

J

F

E

1.0

1/6

Brush

A

selected

Page 18: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

20

Spatial Index Q searches for nodes intersecting structure based

brush Q is spatial range query over spatial objects

2D Hierarchy Map

01/6 2/6 3/6 4/6 5/6 1

Brush0.6

0.2

0.5

BF

E

A

0.3

C D G H

B

J

F

E

I

1.0

Spatial Index (R-Tree index) can help faster searches

Page 19: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

26

Next

Caching and Prefetching

Page 20: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

27

Presence of idle timePredictable of user movements (User Inertia)

Locality of explorationContiguous queries have similar answers

User Trace Characteristics [Doshi:2003]

Caching

Prefetching

0.6

0.2

0.5

02/6 3/6 4/6 5/6

1

0.3

C D G H I

B

J

F

E

1/6

BrushA

Page 21: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

28

Cache Design Purpose

Minimize system latency

Design Issues Cache Organization Cache Lookup Policy Cache Replacement Policy Computation of Remainder Queries

Page 22: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

29

Cache Organization Contiguous chunk of main

memory that stores recently fetched nodes

Each node has a descriptor Horizontal and Vertical Extents

GF

H

EA

2D Hierarchy Map in database 2D Hierarchy Map of Cache Contents

C D G H I

JFB

E

(0,0) (1,0)

(0,1)

G H

F

E

(0,0)

(1,0)

(0,1)

A A

emptyoccupied

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

Page 23: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

30

Cache Lookup

Cache Lookup Sequential scan, or Main memory spatial index

(1,0)

Brush

G H

F

E

(0,0)

(0,1) A Main Memory Index

Advantage Faster cache look up

Disadvantage Frequent index updates

empty

selectedoccupied

Aim: Find nodes in cache that lie in current brush

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

Page 24: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

31

Cache Replacement Policy

Locality of Exploration

Spatial Locality

Distance

Temporal Locality

Contiguous queries have similar answers

LRU

Aim: Make room for new nodes Replace node with least probability of being

referenced. Approach

Exploit general user trace characteristics

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

Page 25: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

32

Distance Replacement Policy

Realization : Maintain brush store Select victim brush with max distance from current brush Replace individual cached nodes in victim brush

Distance: Length of line segment that joins center of 2 brushes.

Idea Replace object furthest away (2D space) from current brush

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

Page 26: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

33

b4b1

IH

I

Distance Replacement Policy

G

FB

E

(0,0) (1,0)

(0,1)

Cache Contents

Current Brush

b1

b2

b3

b2b3

Brush Store

C D G H I

JFB

E

(0,0) (1,0)Database Contents

(0,1) A A

FE

G

B

Cache Contents

A

Current Brush

empty

selectedoccupied

b4

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

Page 27: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

34

Computation of Remainder Queries

For each user request cache may contain: All nodes requested A subset of nodes requested None of nodes requested

G H

F

E

(0,0) (1,0)

(0,1)

Cache Contents

BrushRemainder Brush

A

empty

selectedoccupied

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

Page 28: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

35

G

J

Computation of Remainder Queries

F

E

(0,1)

Cache Contents

Remainder Brush

(0,0) (1,0)

empty

selectedoccupied

Current Brush

Focus extents (e1,e2) of brush define interval Horizontal extents of cached nodes also form an

interval Remainder query consists of a set of remainder

brushes Remainder brush: Part of brush interval not occupied by

cache nodesA

e1 e2

Cache OrganizationCache Lookup PolicyCache Replacement PolicyComputation of Remainder Queries

Page 29: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

36

Prefetcher [Doshi:03]

Motivation

Presence of idle timePredictable user movements

Prefetching

Prefetcher

PredictionModel

User

GUI

Front End

User Requests

Working Model:

Aim: Predict and prefetch future user requests into cache Increase hit ratio or minimize latency

Cache Manager

User Log

Page 30: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

37

Directional Prefetcher

Prediction Model Uses recent history of user requests Prefetches in direction of last user

movement

e2t

Direction Direction Strategy

e2 t+1 e2

Prefetch

Page 31: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

38

System Architecture

BackendController

Direction Prefetcher

Loader

Cache Manager

Delta Calculator

Request

Answer

PrefetchController

Cache Index

LRU

Cache Memory Rep. Policy

Distance

Spatial Index

Seq. Scan

query

Start/Stop

Prefetch Request

Labeling Hierarchical Data

Flat Data

Offline processDatabase

SpatialIndex

User

GUI

Front End

Start/Stop

Request

data

Cached Nodes

query

Delta query

CacheLookup

Answer

Page 32: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

39

System Implementation

Implemented as backend to XmdvTool 6.0 Language: C++ Database: Oracle with Oracle Spatial Extension Libraries:

Spatial Index Library (UC Riverside) OTL (Oracle.. Template library) ZThread

Page 33: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

40

Evaluation

Goal: Effectiveness of Proposed Techniques in Isolation and in

Combination Workloads:

Real Datasets D1, out5d, size = 20,000, dimensions =5 D2, uvw, flow simulation data, size = 200,000, dimensions

= 6

Input A set of 4 ,1/2 hr. real user traces collected in

[Doshi:2003apr] for dataset D1. A set of 4, 1/2 hr. synthetic user traces for dataset D2

User Trace Sequence of user requests. Each user request (position of SBB, time)

Page 34: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

41

Evaluation Metrics Latency for User Trace

Latency Reduction Ratio (lrr)

N

ii

N

ii

T

Llatency

1

1

base

base

Latency

LatencyLatencylrr

Base Configuration

• No Index at the database

• Li = Latency for request i.

• Ti = Number of nodes in

request i

Page 35: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

47

Experimental Results: Brief Summary

Spatial Index on the database used alone lrr 33% for Data Set D1 lrr 72% for Data Set D2

Cache lrr 58% for Data Set D1 (Cache Size = 10%) lrr 94% for Data Set D2 (Cache Size = 2%)

Comparison of Replacement Policies Distance replacement policy performs as well or better than LRU Increase in hit ratio 7% , Increase in lrr 2% for Data Set D2

Main Memory Index We need spatial index structures that support high update rates. (e.g. LR-

Tree [Bozanis:2003])

Prefetcher and Cache lrr 63% for Data Set D1 lrr 96% for Data Set D2

Page 36: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

48

Related Work Visualization-database integrated systems

ADR [Kurc:2001] Tioga [Stonebaker:1993] USD [Johnson:1992]

Caching Semantic Caching [keller:1996] or Predicate Caching [dar:1996]

Hierarchy Encoding Nested Interval Method [Celko:2004] Dietz’s numbering scheme [dietz:1982] Dewey Order Encoding [tatxmlorder:2002]

Page 37: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

49

Conclusions Hierarchy encoding technique

Maps tree structures to 2 dimensional spaces Maps visual exploration operations to spatial

range queries

Designed cache to reduce response time Replacement Policy: Distance or LRU Cache Lookup: Sequential or Spatial Index

Integrated direction-based prefetcher Implemented in free-ware XMDV Tool Conducted a performance study

Page 38: Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

50

References[Doshi:2003] P. Doshi et al. Prefetching for Visual Data Exploration [Doshi:2003apr] P. Doshi et al. A strategy selection framework for adaptive prefetching

in data visualization[Bozanis:2003] P. Bozanis et al. LR-Tree: a logarithmic decomposable spatial index

method[Celko:2004] J. Celko. Joe Celko’s Trees and Hierarchies in SQL for Smarties[Teuhola:1996] J. Teuhola. Path signatures to speed up recursion in relational

databases[Stonebaker:1993] M. Stonebraker et al. Providing data management support for

scientific visualization applications[dar:1996] S. Dar et al. Semantic Data Caching and Replacement[keller:1996] A.M. Keller et al. A predicated based caching scheme for client-server

database architectures.[Kurc:2001] T. Kurc et al. Exploration and visualization of large datasets with the active

data repository[Johnson:1992] M. Goldner et al. Usd- a database management system for scientific

research[Fua:1999] Y.H. Fua et al. Navigating hierarchies with structure-based brushes[dietz:1982] P.F. Dietz, Maintaining order in a linked list[tatxmlorder:2002] I. Tatarinov et al. Storing and Querying Ordered {XML} Using a

Relational Database System[Stroe:2000] I. Stroe. Scalable Visual Hierarchy Exploration