VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B....

40
VLDB 2006, Seoul 1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University

Transcript of VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B....

Page 1: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 1

Indexing For Function Approximation

Biswanath PandaMirek Riedewald, Stephen B. Pope, Johannes

Gehrke, L. Paul Chew

Cornell University

Page 2: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 2

Motivation

• Simulations are important in science

• Large simulations computationally infeasible– Driven by complex mathematical models – Require solution to complex differential equations

• Approximation techniques speed up simulations– Bounded error in the simulation – Approximate simulation steps using information from

previous steps

Page 3: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 3

Outline

• Example scientific application– Combustion simulation

• Function approximation problem– Formulation– Hardness– Algorithm

• Indexing problem

Page 4: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 4

Combustion SimulationHigh Dimensional

Composition Vector

Inflow

Outflow

Mixing &

Reaction

Air

Methane

Air + Methane

Page 5: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 5

Properties Of Simulation

• Composition dimensionality– 9 for simple hydrogen simulations– >50 for complex methane simulations

• Cost of reaction function evaluation: 30ms• Number of function evaluations: 108 to 1010

• Total simulation time– 108 function evaluations ≈ 35 days

Page 6: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 6

Function Approximation

• Approximate the reaction function• Approach

– Use previous function evaluations to approximate future function evaluations

– ISAT (In Situ Adaptive Tabulation) [Pope’ 97]

• Definition: ε-approximation of f(x)– Let f: Rm → Rn be a function, let x Rm and ε R. f*(x)

is an ε-approximation of f(x) if || f*(x) –f(x)|| < ε

Page 7: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 8

Example

Cost

f

Page 8: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 9

Example

x2x1

ε

ε

f*(x2) = f(x) + s * (x2 - x)

( x, f(x) )

An ε-Local Region Rf,f*(x, ε) Rm

Original Cost

Cost

f

Page 9: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 10

x1 x2 x3 x4 x5 x6

Original Cost

Cost

Example

f

f1*

f2*

f3*

Page 10: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 11

x1 x2 x3 x4 x5 x6

Example

f

f1*

f2*

f3*

When should a local region be added?

Page 11: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 12

Example

Each query point can be covered by several Local Regions

x1 x2 x3 x4 x5 x6x7 x8

f

f1*

f2*

f3*

f4*

Page 12: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 15

Challenges

• Finding good f* s and corresponding Local Regions

• Computing a set of Local Regions• Data management: storing Local Regions for

future use

• Problem: Minimize total simulation time by computing and storing a set of Local Regions

Page 13: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 17

Finding The Optimal Set Of Local Regions

• Simplified cost model– Both the function value and Local Region at a point can be

obtained at some constant cost equal across all regions– Approximations have zero cost

• Offline Problem– Given a set X={ x1, x2, … xn } of query points, find the smallest

set L={ l1, l2, … lk } of Local Regions, such that for each xi X there is an lj L which contains xi

– NP-Complete: Reduction from Geometric Covering By Discs

• Online Problem– No online algorithm is competitive

Page 14: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 19

Algorithm Illustration

x1 x2 x3 x4 x5 x6x7 x8

f

f1*

f2*

f3*

f4*

Page 15: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 20

Algorithm

Initialize S

Lookup x in S

Local Region Found?

Return Approximation

Y N

Add new region containing x to S

Evaluate function at x

Retrieve

Add

Simulation

Page 16: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 21

Possible Instantiation Of Local Regions

• Local Regions can be approximated using high dimensional ellipsoids [Pope ‘97]– Based on Taylor Expansion of function

• Two step approach– Initial conservative approximation

– Grow

x x1

Page 17: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 22

Example

x2x1

x ε’ < ε

Page 18: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 23

Example

x’2

x

x’1

ε’ < ε

Page 19: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 24

Example

x’1 x’2

x

ε

ε’ < ε

Page 20: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 26

Updating Existing RegionsN

Evaluate function at x

Can existing region

contain x?

Update existing regions to contain x

Add new region containing x to S

GrowNY

Page 21: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 28

Outline

• Example scientific application– Combustion Simulation

• Function Approximation Problem– Formulation– Hardness– Algorithm

• Indexing problem

Page 22: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 29

Indexing Problem

• Workload– Retrieve: Find ellipsoid

containing query point

Page 23: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 30

Indexing Problem

• Workload– Retrieve: Find ellipsoid

containing query point– Grow

• Find ellipsoids to be grown

• Update grown ellipsoids

Page 24: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 31

Indexing Problem

• Workload– Retrieve: Find ellipsoid

containing query point– Grow

• Find ellipsoids to be grown

• Update grown ellipsoids

– Add: Insert a new ellipsoid

Page 25: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 32

New Indexing Problem• Shape of regions• Updates and queries interleaved • Additional costs: ellipsoid maintenance costs

• Overall aim: Reduce total simulation time• Retrieve/grow/add are all optional

– Tuning parameters at each step

Operation Cost

Evaluation 2000

Addition 1200

Grow 10

Approximation 1

Search 1

Page 26: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 34

Outline

• Example scientific application– Combustion simulation

• Function approximation problem– Formulation– Hardness– Algorithm

• Indexing problem– Cost structure, tuning parameters and effects– Index structures and experiments

Page 27: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 35

Grow Effects

Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd

• Tuning Parameter: Ellg – Limit on number of ellipsoids examined for growing– No pruning criteria – Affects

• tgrowsearch

• Chance of finding a growable ellipsoid

• Tuning Parameter: Ngrown – Number of ellipsoids grown per step– Affects

• Cgrow

• Structure of the index (overlapping ellipsoids)

Page 28: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 36

Retrieve Effects

Ctot = tsearch + Iret * tla + (1-Iret) * Cmiss

• Tuning Parameter: Ellr – Limit on number of ellipsoids examined during retrieve– Limits how much of the index is searched

– Affects• tsearch

• Chances of a current retrieve and also future retrieves

Page 29: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 38

Add Effects

Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd

• Tuning parameter: Indirectly controlled by retrieves and grows– Affects

• Should query point be covered by an add or grow?

(-) Computing new ellipsoids is expensive

(-) New ellipsoids cover smaller part of the domain

(+) May lead to better ellipsoid distribution

Page 30: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 39

Candidate Index Structures

• Bounding Box Rtree• Point Rtree• Ellipsoid Rtree• Random Projection Rtree• Binary Tree• MRU List + Rtree

Page 31: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 40

Binary Tree

Primary Retrieve

A

C

B

1

2A

B C

21

q

Page 32: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 41

Binary Tree

Secondary Retrieve

A

C

B

1

2A

B C

21

q

Page 33: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 42

Binary Tree

A

C

B

1

2A

B C

2

1

Page 34: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 43

Binary Tree

Secondary Retrieve now Primary Retrieve

A

C

B

1

2A

1

2

3

3DB

D C

C

Page 35: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 44

Effects In Action: Binary Tree

• 32 dimensional Methane simulation• 6 x 106 queries• Windows XP machine (2.4 Ghz, 2GB)

Page 36: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 45

MRU List + Rtree

• MRU List for retrieving– High locality

• Rtree for searching growable ellipsoids

MRU List

Rtree

Page 37: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 46

Effects In Action: MRU List + Rtree

• Effects very different from Binary Tree

Page 38: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 47

Total Simulation TimesIndex Type Error Tolerance

0.005 0.00005 0.00004

Binary Tree (tuned)

1073 10181 13100

MRU List + Rtree 1125 14000 19920

Bbox Rtree 1201 14700 20850

Random Projection Rtree

1378 15800 22051

Binary Tree(default)

1344 29186 31200

FIFO List + Rtree 2154 33770 42900

Point Rtree 10431 >44000 -

Ellipsoidal Rtree 14328 >44000 -

Page 39: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 48

Conclusion & Future Work

• Formulated the function approximation problem• New class of applications for high dimensional indexing• Understand index selection for function approximation

• Future work– Dynamic parameter settings– New benchmark for index structures– Evaluation of other index structures– Comparison with other function approximation techniques

Page 40: VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul 49

Questions?