IBM Research Non-Confidential | 7-12-2005 | Volker Markl © 2005 IBM Corporation Presentation...

IBM Research

Non-Confidential | 7-12-2005 | Volker Markl © 2005 IBM Corporation

Consistently Estimating the Selectivity of Conjuncts of Predicates

Volker Markl, Nimrod Megiddo, Marcel Kutsch, Tam Minh Tran, Peter Haas, Utkarsh Srivastava

IBM Research

Consistently Estimating the Selectivity of Conjuncts of Predicates | Volker Markl | Non-Confidential © 2005 IBM Corporation2

Agenda

Consistency and Bias Problems in Cardinality Estimation

The Maximum Entropy Solution

Iterative Scaling

Performance Analysis

Related Work

Conclusions

IBM Research


What is the problem?

Consider the following three attributes:

Make

Color

Model

Correlation

Legend:

IBM Research


Make

‘Mazda’

attribute

value

Color

‘red’

Model

‘323’

Legend:

How to estimate the cardinality of the predicate…

… Make = ‘Mazda’ AND Model = ’323’AND Color = ‘red’

200,000

200,000

100,000

cardinality

(real cardinality: 49,000)

IBM Research


Without any additional knowledge

Legend:

Selectivity( Make= ‘Mazda’ AND Model = 323AND Color = ‘red’ )

Independence assumption:

s(Make = ‘Mazda’ ) * s( Model = ‘323’ ) * s( Color =‘red’ ) =

100,000 * 200,000 * 200,000 = 0.0041,000,000 1,000,000 1,000,000 denote by s(?)

the selectivityof ?

Make

‘Mazda’

Color

red

Model

‘323’

100,000 200,000

200,000

Base cardinality: 1000,000

Estimated Cardinality: 0.004 * 1,000,000 = 4000

IBM Research


Additional knowledge given (1):

Make

‘Mazda’

Color

‘red’

Model

‘323’

Legend:

Selectivity( Make = ‘Mazda’ AND Model = ‘323’AND Color = ‘red’ )

Additional knowledge:

Make AND Model

card(‘Mazda’ AND ‘323’) = 50,000

case 1: s( Make AND Model ) * s( Color ) =

ConjunctPred X AND Pred Y

Cardinality

100,000 200,000

200,000

estimated card:10,000

50,000 * 200,000 = 0.01 1,000,000 1,000,000

IBM Research


200,000 * 90,000 = 0.0181,000,000 1,000,000


Legend:

case 1: s( Make AND Model ) * s(Color) =0.01 estimated card: 10,000

case 2: s( Make AND Color ) * s( Model ) =


Cardinality

100,000 200,000


Make

‘Mazda’

Color

‘red’

Model

‘323’


Make AND Model

card(‘Mazda’ AND ‘323’) = 50,000

Make AND Color

cardl(‘Mazda’ AND ‘red’) = 90,000

200,000


IBM Research


150,000 * 100,000 = 0.0151,000,000 1,000,000


Legend:

case 2: s( Make AND Color ) * s( Model ) =0.018 estimated card: 18,000

case 3: s( Model AND Color ) * s( Make ) =


Cardinality


100,000 200,000

Make

‘Mazda’

Color

‘red’

Model

‘323’


Make AND Model

card(‘Mazda’ AND ‘323’) = 50,000

Model AND Color

card(‘323’ AND ‘red’) = 150,000

Make AND Color

cardl(‘Mazda’ AND ‘red’) = 90,000

200,000



IBM Research


Why is this a problem?


case 0: s( Make) * s(Model ) * s(Color) =0.004 estimated card: 4,000

Make, Color

Index Scan

FETCH Model

90,000

18,000

Make Color

Index Intersect4,000

Model

case 3: s( Model AND Color ) * s( Make ) =0.015 estimated card: 15,000

Cardinality BiasFleeing from Knowledge to Ignorance

Model, Color

Index Scan

FETCH Make

150,000

15,000

IBM Research


What has happened?

Inconsistent modeldifferent estimates for the same intermediate result

due to multivariate statistics with overlapping information

Bias during plan selection results in the selection of sub-optimal plans

Bias Avoidance means keeping the model consistentState-of-the-art is to do bookkeeping of the first multivariate statistic used, and

ignore further overlapping multivariate statistics

Does not solve the problem, as ignoring knowledge also means bias

Bias is arbitrary, depends on what statistics are used first during optimization

Only possible solution is to exploit all knowledge consistently

IBM Research


Problem: Only partial knowledge of the DNF atoms

Mazda 323

red

Mazda & 323 & red Mazda & 323 & red

Mazda & 323 & red

Mazda & 323 &

red

Mazda & 323 &

red

Mazda & 323 & red

Mazda & 323& red

100,000 200,000

Make

‘Mazda’

Color

‘red’

Model

‘323’


Make AND Model

p(‘Mazda’ AND ‘323’) = 50,000

Model AND Color

p(‘323’ AND ‘red’) = 150,000

Make AND Color

pl(‘Mazda’ AND ‘red’) = 90,000

200,000

Mazda & 323 & red

Legend:

DNF = disjunctive normal form

X denotes not X

IBM Research


Mazda 323

red


Mazda & 323 & red

Mazda & 323 &

red

Mazda & 323 &

red

Mazda & 323 & red

Mazda & 323& red

How to compute the missing values of the distribution?

Probability( Make = ‘Mazda’ AND Model = ‘323’AND Color = ‘red’ )

Mazda & 323 & red

IBM Research


Solution: Information Entropy H( X ) = -∑ xi log( xi )

Entropy is a measure for the “uninformedness” of a probability distribution

X=(x1, …, xm) with x1 + … + xm = 1

Maximizing information entropy for unknown selectivities

using known selectivities as constraints

will avoid bias

The less is known about a probability distribution, the larger the entropyNothing uniformity: s(X = ?) = 1/m

Marginals independence: s(X = ? and Y = ?) = s(X=?) * s(Y=?)

Thus: the principle of maximum entropy generalizes uniformity and independence used in today’s query optimizers

IBM Research


Entropy Maximization for Cardinality Estimation

given some selectivities (single and conjunctive) over a space of n predicates p1, …, pn

choose a model which is consistent with this knowledge but otherwise as uniform as possible

maximize the entropy of the probability distribution X = (xb | b {0,1}n)

xb is the selectivity of the DNF atom bi = 0 means that predicate pi is negated in the DNF bi = 1 means that predicate pi is a positive term in the DNF

Legend:

{0,1}n denotes the n-fold cross product of theset {0,1}, i.e., {0,1} … {0,1}

nb bb xxXH

}1,0{log))(max(

n times

Also, for a predicatep1 = pp0 = not p

in

binib

p},...,1{}1,0{

IBM Research


Mazda & 323 & red +

Mazda & 323& red

Mazda & 323 & red

Mazda & 323 & red


Mazda & 323 & red

Mazda & 323 &

red

Mazda & 323 &

red

Maximum Entropy Principle – Example:

Constraints:

s1= Mazda & 323 & red +

Mazda & 323 & red +

Mazda 323

red

Mazda & 323 & red

1 0 0 0 1 0

0 0 1

1 0 1

1 1 0

0 1 1 1 1 1

0 0 0

s1 = x100 +

x101 +

x110 +

x111

s1 = s(Mazda) = 0.1 s2 = s(323) = 0.2 s3 = s(red) = 0.2

Knowledge sY, Y T:

T = {{1}, {2}, {3}, {1,2}, {1,3}, {2,3}, }

s1,2 = s(Mazda & 323) = 0.05

s1,3 = s(Mazda & red) = 0.09

s2,3 = s(red & 323) = 0.15

IBM Research


Maximum Entropy Principle – Example:

Constraints:

0.10 = s1 = x100 + x101 + x110 + x111

0.20 = s3 = x101 + x111 + x011 + x001

0.20 = s2 = x010 + x011 + x110 + x111

0.05 = s1,2 = x110 + x111

Mazda 323

red

0.09 = s1,3 = x101 + x111

0.15 = s2,3 = x011 + x111

1 0 0 0 1 0

0 0 1

1 0 1

1 1 0

0 1 1 1 1 1

0 0 0

3}1,0{

log))(max(b bb xxXH

1.00 = s = x000 + x001 + x010 + x011 + x100 + x101 + x110 + x111

Objective Function:

s1 = s(Mazda) = 0.1 s2 = s(323) = 0.2 s3 = s(red) = 0.2

Knowledge sY, Y T:

T = {{1}, {2}, {3}, {1,2}, {1,3}, {2,3}, }

s1,2 = s(Mazda & 323) = 0.05

s1,3 = s(Mazda & red) = 0.09

s2,3 = s(red & 323) = 0.15

IBM Research


General solution: Iterative Scaling

Solving the Constrained Optimization Problem

Minimize the objective function:

nb bb xx}1,0{

log

Satisfying the |T|2{1, .., n} constraints:

)( : allfor

YCb Yb sxTY

321 ppp 321 ppp

321 ppp

321 ppp

321 ppp 321 ppp

321 ppp

1 0 0 0 1 01 1 0

1 0 1 0 1 1

1 1 1

0 0 1

321 ppp 0 0 0

Legend:

2{1,…,n} denotes the powerset of {1,..,n}

C(Y) denotes all DNF atoms that contribute to Y, i.e., formally,

C(Y) := {b {0,1}n | iY : bi = 1} andC() := {0,1}n

IBM Research


We can build a Lagrangian function by associating a multiplier Y with each constraint and subtracting the constraints from the objective function

Maximum Entropy and Lagrange Multipliers

Replacing xb in each constraint yields a condition in the exponentiated Lagrange multipliers zX

nb bb xx}1,0{

log

TYesz YYCb TbPW W each for

,

)(

YCb Yb sx

is convex.

01ln : 10each for ,

TbPY Ybb

n xx

L}, {b

Differentiation w.r. to xb and equating to zero yields conditions for minimum

TbPY YbY ze

xez Y

,

1

Exponentiation of the Lagrange Multipliers in the derivatives yields product form

Legend:

P(b, T) T denotes the indexes Y of all known selectivities sY to which DNF atom b contributes its value xb:

P(b,T) = {Y T | iY : bi = 1} {}

TY YCb YbYb bb sxxxXL n )(}1,0{log),(

IBM Research


Iterative Scaling

We can now isolate zY for a particular Y T

and thus iteratively compute zY from all zW, W T\{Y}

This algorithm is called Iterative Scaling (Darroch and Ratcliff, 1972) and converges to a stable set of Lagrangian multipliers zY, Y T

This stable point minimizes the objective function and satisfies all constraints

We can compute all DNF atoms xb from these stable multipliers using

and can in turn compute all missing selectivities

)( }\{,

*

YCb YTbPW WY

z

esz

Y

TbPY Yb ze

x,

1

)(

YCb bY xs

IBM Research


Maximum Entropy Solution of the Example

Mazda 323

red


s1,2,3 = x111 = ???

s(Mazda) = s1 = 0.1 s(323) = s2 = 0.2 s(red) = s3 = 0.2

s(Mazda & 323) = s1,2 = 0.05s(Mazda & red) = s1,3 = 0.09s(red & 323) = s2,3 = 0.15

Knowledge:

1 0 0 0 1 0

0 0 1

1 0 1

1 1 0

0 1 1

1 1 1

0 0 0

IBM Research


1

2

3

z2,3 z2,3

z1,3

z3

z1,2

z2

z3

z2

z1,3

z1 z1 z 1 z1

z2z2

z1,2

z3 z3

zØzØzØzØzØzØzØzØ

Iterative Scaling

s1 = 0.1 s2 = 0.2

s3 = 0.2

s1,2 = 0.05s1,3 = 0.09s2,3 = 0.15

Knowledge:

1st Iteration:

})1({ }1\{,

11

*

Cb TbPW Wz

esz

z1 = 0.067957z2 = 1z1,2 = 1z3 = 1z1,3 = 1z2,3 = 1z = 1

s1 = 0.1s2 = 0.785759s1,2 = 0.05s3 = 0.785759s1,3 = 0.05s2,3 = 0.392879s = 1.571518

s1

x 100

x 101

x 110

x 111

000001010011100101110111

1,2

1,3

2,3

Ø

sØ = 1

IBM Research


Iterative Scaling

1st Iteration:

})2({ }2\{,

22

*

Cb TbPW Wz

esz

z1 = 0.067957z2 = 0.254531z1,2 = 1z3 = 1z1,3 = 1z2,3 = 1z = 1

s1 = 0.062727s2 = 0.2s1,2 = 0.012727s3 = 0.492879s1,3 = 0.031363s2,3 = 0.1s = 0.985759

x 110

x 011

x 010

x 111

s1 = 0.1 s2 = 0.2

s3 = 0.2

s1,2 = 0.05s1,3 = 0.09s2,3 = 0.15

sØ = 1

s2

Knowledge:

1

2

3

z2,3 z2,3

z1,3

z3

z1,2

z2

z3

z2

z1,3

z1 z1 z 1 z1

z2z2

z1,2

z3 z3


000001010011100101110111

1,2

1,3

2,3

Ø

IBM Research


Iterative Scaling

1st Iteration:

})2,1({ }2,1\{,

2,12,1

*

Cb TbPW Wz

esz

z1 = 0.067957z2 = 0.254531z1,2 = 3.928794z3 = 1z1,3 = 1z2,3 = 1z = 1

s1 = 0.1s2 = 0.237273s1,2 = 0.05s3 = 0.511516s1,3 = 0.05s2,3 = 0.118637s = 1.023032

x 110

x 111

s1 = 0.1 s2 = 0.2

s3 = 0.2

s1,2 = 0.05s1,3 = 0.09s2,3 = 0.15

sØ = 1Knowledge:

s1,2

1

2

3

z2,3 z2,3

z1,3

z3

z1,2

z2

z3

z2

z1,3

z1 z1 z 1 z1

z2z2

z1,2

z3 z3


000001010011100101110111

1,2

1,3

2,3

Ø

IBM Research


Iterative Scaling

1st Iteration:

})3({ }3\{,

33

*

Cb TbPW Wz

esz

z1 = 0.067957z2 = 0.254531z1,2 = 3.928794z3 = 0.390994z1,3 = 1z2,3 = 1z = 1

s1 = 0.069550s2 = 0.165023s1,2 = 0.034775s3 = 0.2s1,3 = 0.019550s2,3 = 0.046386s = 0.711516

x 101x 011

x 001

x 111

s1 = 0.1 s2 = 0.2

s3 = 0.2

s1,2 = 0.05s1,3 = 0.09s2,3 = 0.15

sØ = 1Knowledge:

s3

1

2

3

z2,3 z2,3

z1,3

z3

z1,2

z2

z3

z2

z1,3

z1 z1 z 1 z1

z2z2

z1,2

z3 z3


000001010011100101110111

1,2

1,3

2,3

Ø

IBM Research


Iterative Scaling

1st Iteration:

z1 = 0.067957z2 = 0.254531z1,2 = 3.928794z3 = 0.390994z1,3 = 4.603645z2,3 = 1z = 1

s1 = 0.14s2 = 0.200248s1,2 = 0.07s3 = 0.27045s1,3 = 0.09s2,3 = 0.081611s = 0.781966

})3,1({ }3,1\{,

3,13,1

*

Cb TbPW Wz

esz

x 101

x 111

s1 = 0.1 s2 = 0.2

s3 = 0.2

s1,2 = 0.05s1,3 = 0.09s2,3 = 0.15

sØ = 1Knowledge:

s1,3

1

2

3

z2,3 z2,3

z1,3

z3

z1,2

z2

z3

z2

z1,3

z1 z1 z 1 z1

z2z2

z1,2

z3 z3


000001010011100101110111

1,2

1,3

2,3

Ø

IBM Research


Iterative Scaling

1st Iteration:

z1 = 0.067957z2 = 0.254531z1,2 = 3.928794z3 = 0.390994z1,3 = 4.603645z2,3 = 1.837978z = 1

s1 = 0.177709s2 = 0.268637s1,2 = 0.107709s3 = 0.338839s1,3 = 0.127709s2,3 = 0.15s = 0.850355

Mazda 323

red

)3,2( }3,2\{,

3,23,2

*

Cb TbPW Wz

esz

x 011

x 111

s2,3

s1 = 0.1 s2 = 0.2

s3 = 0.2

s1,2 = 0.05s1,3 = 0.09s2,3 = 0.15

Knowledge: sØ = 1

1

2

3

z2,3 z2,3

z1,3

z3

z1,2

z2

z3

z2

z1,3

z1 z1 z 1 z1

z2z2

z1,2

z3 z3


000001010011100101110111

1,2

1,3

2,3

Ø

IBM Research


Iterative Scaling

1st Iteration:

z1 = 0.067957z2 = 0.254531z1,2 = 3.928794z3 = 0.390994z1,3 = 4.603645z2,3 = 1.837978z = 1.175979

s1 = 0.208982s2 = 0.315911s1,2 = 0.126664s3 = 0.398468s1,3 = 0.150183s2,3 = 0.176397s = 1

0.097264

0.029399

0.079133

0.1101150.029399

0.169152

0.432619

0.052919

})({ }\{,

*

Cb TbPW Wz

esz

s1 = 0.1 s2 = 0.2

s3 = 0.2

s1,2 = 0.05s1,3 = 0.09s2,3 = 0.15

Knowledge: sØ = 1

1

2

3

z2,3 z2,3

z1,3

z3

z1,2

z2

z3

z2

z1,3

z1 z1 z 1 z1

z2z2

z1,2

z3 z3


000001010011100101110111

1,2

1,3

2,3

Ø

sØ

IBM Research


Maximum Entropy Solution of the Example

Mazda 323

red


s1,2,3 = x111 = 0.049918

Iterations: 241

s(Mazda) = s1 = 0.1 s(323) = s2 = 0.2 s(red) = s3 = 0.2

s(Mazda & 323) = s1,2 = 0.05s(Mazda & red) = s1,3 = 0.09s(red & 323) = s2,3 = 0.15

Knowledge:

0.049918

0.000082

0.100082

0.0499180.009918

0.009918

0.740082

0.040082

IBM Research


Let’s compare:


case 3: s( Model AND Color ) * s( Make ) =0.015 estimated card: 15,000



case 0: s( Make) * s( Model ) * s(Color) =0.004 estimated card: 4,000

Real : s( Model AND Color AND Make ) =0.049 actual card: 49,000

ME: s( Model AND Color ) * s( Make ) =0.049918 estimated card: 49,918

Error: 10x

Error: 5x

Error: 2.5x

Error: 3x

Almost no error

IBM Research


0

100

200

300

400

500

600

700

800

900

1000

1 2.1b 2.1.c 2.1a 2.2c 2.2a 2.2b 2.3 3

Ab

solu

te E

stim

atio

n E

rro

r

75%: 2138

788

79 7942 65

11 9 6 0

100%: 9583

Forward Estimation: Predicting s1,2,3 , given …

s1

s2

s3

s1,3 s2,3 s1,2

s1,3

s2,3

s1,2

s1,3

s1,2

s2,3

s1,2

s1,3

s2,3

s1,2,3

4th quartile

3rd quartile

2nd quartile

1st quartile

median

Legend:

200 queries

IBM Research


0

100

200

300

400

500

600

700

800

900

1000

SOTA ME SOTA ME SOTA ME SOTA ME

Ab

solu

te E

stim

atio

n E

rro

r

2.2a 2.2b 2.2c 2.3

44 4411 979 65 43 6

s1,3

s2,3

s1,2

s1,3

s1,2

s2,3

s1,2 , s1,3 , s2,3

DB2 ME DB2 ME DB2 ME DB2 ME

Comparing DB2 and ME : Predicting s1,2,3 , given …

4th quartile

3rd quartile

2nd quartile

1st quartile

mean

Legend:

200 queries

4th quartile

3rd quartile

2nd quartile

1st quartile

median

Legend:

200 queries

IBM Research


0

100

200

300

400

500

600

700

800

900

1000

1100

1200

SOTA ME SOTA ME SOTA ME

abso

lute

est

imat

ion

err

or

MAKE = ? AND MODEL =? MAKE = ? AND COLOR =?MODEL = ? AND COLOR =?

Backward Estimation: Given s1,2,3 , predicting …

s1,2 s1,3 s2,3

DB2 ME DB2 ME DB2 ME

4th quartile

3rd quartile

2nd quartile

1st quartile

mean

Legend:

200 queries

4th quartile

3rd quartile

2nd quartile

1st quartile

median

Legend:

200 queries

IBM Research


0

25

50

75

100

5 6 7 8 9 10 11 12 13 14 15 16 17 18number of predicates |P|

tim

e u

nti

l co

nve

rgen

ce o

f it

erat

ive

scal

ing

012345678910

0

1

2

3

4

5

6

7

8

9

10

|T|

Computation Cost

IBM Research


Related Work Selectivity Estimation

SAC+79 P.G. Selinger et al: Access Path Selection in a Rela tional DBMS. SIGMOD 1979

Chr83S. Christodoulakis: Estimating record selectivities. Inf. Syst. 8(2): 105-115 (1983)

Lyn88C. A. Lynch: Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Col umn Values. VLDB 1988: 240-251

PC84 G. Piatetsky-Shapiro, C. Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Confer ence 1984: 256-276

PIH+96 V. Poosala, et. al: Improved histograms for selectivity estima tion of range predicates. SIGMOD 1996

Recommending, Constructing, and Maintaining Multivariate Statistics

AC99 A. Aboulnaga, S. Chaudhuri: Self-tuning Histograms: Build ing Histograms Without Looking at Data. SIGMOD 1999: 181-192

BCG01 N. Bruno, S. Chaudhuri, L. Gravano: STHoles: A Multidi mensional Workload-Aware Histogram. SIGMOD 2001

BC02 N. Bruno and S. Chaudhuri: Exploiting Statistics on Query Expressions for Optimization. SIGMOD 2002

BC03 N. Bruno, S. Chaudhuri: Efficient Creation of Statistics over Query Expressions. ICDE 2003:

BC04 N. Bruno, S. Chaudhuri: Conditional Selectivity for Statistics on Query Expressions. SIGMOD 2004: 311-322

SLM+01 M. Stillger, G. Lohman, V. Markl, and M. Kandil: LEO – DB2’s Learning Optimizer. VLDB 2001

IMH+04 I. F. Ilyas, V. Markl, P. J. Haas, P. G. Brown, A. Aboulnaga: CORDS: Automatic discovery of correlations and soft func tional dependencies. Proc. 2004 ACM SIGMOD, June 2004.

CN00 S. Chaudhuri, V. Narasayya: Automating Statistics Manage ment for Query Optimizers. ICDE 2000: 339-348

DGR01 A. Deshpande, M. Garofalakis, R. Rastogi: Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data. SIGMOD 2001

GJW+03 C. Galindo-Legaria, M. Joshi, F. Waas, et al: Statistics on Views. VLDB 2003: 952-962

GTK01 L. Getoor, B. Taskar, D. Koller: Selectivity Estimation using Probabilistic Models. SIGMOD 2001

PI97 V. Poosala and Y. Ioannidis: Selectivity Estimation without value independence. VLDB 1997

Entropy and Maximum Entropy

Sha48 C. E. Shannon: A mathematical theory of communication, Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656, July and October, 1948

DR72 J.N. Darroch and D. Ratcliff: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics (43), 1972:1470–1480.

GP00 W. Greiff, J. Ponte: The maximum-entropy approach and probabilistic IR models. ACM TIS. 18(3): 246-287, 2000

GS85 S. Guiasu and A. Shenitzer: The principle of maximum-en tropy. The Mathematical Intelligencer, 7(1), 1985.

IBM Research


Conclusions

Problem: Inconsistent Cardinality Model and Bias in today’s Query Optimizersdue to overlapping Multivariate Statistics (MD Histograms, etc.)

To reduce bias, today’s optimizers only use a consistent subset of available multivariate statistics

Cardinality estimates suboptimal despite better information

Bias towards plans without proper statistics (“fleeing from knowledge to ignorance”)

Solution: Maximizing Information Entropy

Generalizes concepts of uniformity and independence used in today’s query optimizers

All statistics are utilized Cardinality estimates improve, some by orders of magnitude

Cardinality Model is consistent No bias towards particular plans

Consistent estimates are computed in subsecond time

for up to 10 predicates per table

however, algorithm is exponential in the number of predicates

Not covered in the talk (see paper):Reducing algorithm complexity through pre-processing

Impact on query performance speedup, sometime by orders of magnitude

Future Work:Extension to join estimates

IBM Research Non-Confidential | 7-12-2005 | Volker Markl © 2005 IBM Corporation Presentation...

Documents

Transcript of IBM Research Non-Confidential | 7-12-2005 | Volker Markl © 2005 IBM Corporation Presentation...