Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline...

99
V. Megalooikonomou Fractals and Databases (based on notes by C. Faloutsos at CMU) Principles of Database Systems

Transcript of Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline...

Page 1: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

V. Megalooikonomou

Fractals and Databases

(based on notes by C. Faloutsos at CMU)

Principles of Database Systems

Page 2: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

2

Indexing - Detailed outline

fractals intro applications

Page 3: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

3

Intro to fractals - outline Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting

plots

Page 4: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

4

Road end-points of Montgomery county:

•Q1: how many d.a. for an R-tree?

•Q2 : distribution?

•not uniform

•not Gaussian

•no rules??

Problem #1: GIS - points

Page 5: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

5

Problem #2 - spatial d.m.Galaxies (Sloan Digital Sky Survey -B. Nichol)

- ‘spiral’ and ‘elliptical’ galaxies

(stores and households ...)

- patterns?

- attraction/repulsion?

- how many ‘spi’ within r from an ‘ell’?

Page 6: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

6

Problem #3: traffic disk trace (from HP - J. Wilkes); Web

traffic - fit a model

time

#bytes

Poisson

- how many explosions to expect?

- queue length distr.?

Page 7: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

7

Common answer: Fractals / self-similarities / power laws Seminal works from Hilbert, Minkowski,

Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)

Page 8: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

8

Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting

plots

Page 9: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

9

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area;

infinite perimeter!

Page 10: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

10

Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) = 1.58...

Page 11: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

11

Dfn of fd:

ONLY for a perfectly self-similar point set:

=log(n)/log(f) = log(3)/log(2) = 1.58a perfectly self-similar object with n similar pieces each scaled down by a factor f

...zero area;

infinite length!

Page 12: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

12

Intrinsic (‘fractal’) dimension Q: fractal dimension of

a line? A: 1 (= log(2)/log(2)!)

Page 13: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

13

Intrinsic (‘fractal’) dimension Q: dfn for a given

set of points?

42

33

24

15

yx

Page 14: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

14

Intrinsic (‘fractal’) dimension Q: fractal dimension of

a line? A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)

Q: fd of a plane? A: nn ( <= r ) ~ r^2fd== slope of (log(nn) vs

log(r) )

Page 15: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

15

Intrinsic (‘fractal’) dimension Algorithm, to estimate it?Notice avg nn(<=r) is exactly

tot#pairs(<=r) / (2*N)

including ‘mirror’ pairs

Page 16: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

16

Sierpinsky triangle

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

Page 17: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

17

Observations: Euclidean objects have integer fractal

dimensions point: 0 lines and smooth curves: 1 smooth surfaces: 2

fractal dimension -> roughness of the periphery

Page 18: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

18

Important properties fd = embedding dimension -> uniform

pointset a point set may have several fd,

depending on scale

Page 19: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

19

Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting

plots

Page 20: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

20

Cross-roads of Montgomery county:

•any rules?

Problem #1: GIS points

Page 21: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

21

Solution #1A: self-similarity -> <=> fractals <=> scale-free <=> power-laws

(y=x^a, F=C*r^(-2))

avg#neighbors(<= r ) = r^D

log( r )

log(#pairs(within <= r))

1.51

Page 22: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

22

Solution #1A: self-similarity avg#neighbors(<= r

) ~ r^(1.51)

log( r )

log(#pairs(within <= r))

1.51

Page 23: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

23

Examples:MG county Montgomery County of MD (road end-

points)

Page 24: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

24

Examples:LB county Long Beach county of CA (road end-

points)

Page 25: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

25

Solution#2: spatial d.m.Galaxies ( ‘BOPS’ plot - [sigmod2000])

log(#pairs)

log(r)

Page 26: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

26

Solution#2: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

-repulsion!

Page 27: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

27

spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

-repulsion!

Page 28: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

28

spatial d.m.

r1r2

r1

r2

Heuristic on choosing # of clusters

Page 29: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

29

spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

-repulsion!

Page 30: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

30

spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

-repulsion!!-duplicates

Page 31: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

31

Solution #3: traffic disk traces: self-similar:

time

#bytes

Page 32: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

32

Solution #3: traffic disk traces (80-20 ‘law’ = ‘multifractal’)

time

#bytes

20% 80%

Page 33: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

33

Solution#3: trafficClarification: fractal: a set of points that is self-similar multifractal: a probability density function

that is self-similar

Many other time-sequences are bursty/clustered: (such as?)

Page 34: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

34

Tape accesses

time

Tape#1 Tape# N

# tapes needed, to retrieve n records?

(# days down, due to failures / hurricanes / communication noise...)

Page 35: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

35

Tape accesses

time

Tape#1 Tape# N

# tapes retrieved

# qual. records

50-50 = Poisson

real

Page 36: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

36

Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting

plots

Page 37: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

37

More tools Zipf’s law Korcak’s law / “fat fractals”

Page 38: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

38

A famous power law: Zipf’s law

• Q: vocabulary word frequency in a document - any pattern?

aaron zoo

freq.

Page 39: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

39

A famous power law: Zipf’s law

• Bible - rank vs frequency (log-log)

log(rank)

log(freq)

“a”

“the”

Page 40: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

40

A famous power law: Zipf’s law

• Bible - rank vs frequency (log-log)

• similarly, in many otherlanguages; for customers and sales volume; city populations etc etc

log(rank)

log(freq)

Page 41: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

41

A famous power law: Zipf’s law

•Zipf distr:

freq = 1/ rank

•generalized Zipf:

freq = 1 / (rank)^a

log(rank)

log(freq)

Page 42: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

42

Olympic medals (Sidney):

y = -0.9676x + 2.3054R2 = 0.9458

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

Series1Linear (Series1)

rank

log(#medals)

Page 43: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

43

More power laws: areas –Korcak’s law

Scandinavian lakes

Any pattern?

Page 44: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

44

More power laws: areas –Korcak’s law

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(count( >= area))

log(area)

Page 45: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

45

More power laws: Korcak

Japan islands

Page 46: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

46

More power laws: Korcak

Japan islands;

area vs cumulative count (log-log axes) log(area)

log(count( >= area))

Page 47: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

47

(Korcak’s law: Aegean islands)

Page 48: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

48

Korcak’s law & “fat fractals”

How to generate such regions?

Page 49: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

49

Korcak’s law & “fat fractals”Q: How to generate such regions?A: recursively, from a single region

Page 50: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

50

so far we’ve seen: concepts:

fractals, multifractals and fat fractals

tools: correlation integral (= pair-count plot) rank/frequency plot (Zipf’s law) CCDF (Korcak’s law)

Page 51: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

51

Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting

plots

Page 52: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

52

Other applications: Internet How does the internet look like?

CMU

Page 53: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

53

Other applications: Internet How does the internet look like? Internet routers: how many neighbors

within h hops?

CMU

Page 54: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

54

(reminder: our tool-box:) concepts:

fractals, multifractals and fat fractals

tools: correlation integral (= pair-count plot) rank/frequency plot (Zipf’s law) CCDF (Korcak’s law)

Page 55: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

55

Internet topology Internet routers: how many neighbors

within h hops?

Reachability function: number of neighbors within r hops, vs r (log-log).

Mbone routers, 1995log(hops)

log(#pairs)

2.8

Page 56: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

56

More power laws on the Internet

degree vs rank, for Internet domains (log-log) [sigcomm99]

log(rank)

log(degree)

-0.82

Page 57: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

57

More power laws - internet pdf of degrees: (slope: 2.2 )

Log(count)

Log(degree)

-2.2

Page 58: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

58

Even more power laws on the Internet

Scree plot for Internet domains (log-log) [sigcomm99]

log(i)

log( i-th eigenvalue)

0.47

Page 59: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

59

More apps: Brain scans

Oct-trees; brain-scans

octree levels

Log(#octants)

2.63 = fd

Page 60: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

60

More apps: Medical images

[Burdett et al, SPIE ‘93]: benign tumors: fd ~ 2.37 malignant: fd ~ 2.56

Page 61: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

61

More fractals: cardiovascular system: 3 (!) stock prices (LYCOS) - random walks: 1.5

Coastlines: 1.2-1.58 (Norway!)

1 year 2 years

Page 62: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

62

Page 63: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

63

More power laws duration of UNIX jobs [Harchol-Balter] Energy of earthquakes (Gutenberg-

Richter law) [simscience.org]

log(freq)

magnitudeday

amplitude

Page 64: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

64

Even more power laws: publication counts (Lotka’s law) Distribution of UNIX file sizes Income distribution (Pareto’s law) web hit counts [Huberman]

Page 65: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

65

Power laws, cont’ed In- and out-degree distribution of web

sites [Barabasi], [IBM-CLEVER] length of file transfers [Bestavros+] Click-stream data (w/ A. Montgomery

(CMU-GSIA) + MediaMetrix)

Page 66: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

66

Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting

plots

Page 67: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

67

Settings for fractals:Points; areas (-> fat fractals), eg:

Page 68: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

68

Settings for fractals:Points; areas, eg: cities/stores/hospitals, over earth’s

surface time-stamps of events (customer

arrivals, packet losses, criminal actions) over time

regions (sales areas, islands, patches of habitats) over space

Page 69: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

69

Settings for fractals: customer feature vectors (age, income,

frequency of visits, amount of sales per visit)

‘good’ customers

‘bad’ customers

Page 70: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

70

Some uses of fractals: Detect non-existence of rules (if points

are uniform) Detect non-homogeneous regions (eg.,

legal login time-stamps may have different fd than intruders’)

Estimate number of neighbors / customers / competitors within a radius

Page 71: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

71

Multi-FractalsSetting: points or objects, w/ some value,

eg: cities w/ populations positions on earth and amount of

gold/water/oil underneath product ids and sales per product people and their salaries months and count of accidents

Page 72: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

72

Use of multifractals: Estimate tape/disk accesses

how many of the 100 tapes contain my 50 phonecall records?

how many days without an accident?

time

Tape#1 Tape# N

Page 73: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

73

Use of multifractals how often do we exceed the threshold?

time

#bytes

Poisson

Page 74: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

74

Use of multifractals cont’d Extrapolations for/from samples

time

#bytes

Page 75: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

75

Use of multifractals cont’d How many distinct products account for

90% of the sales?20% 80%

Page 76: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

76

Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting

plots

Page 77: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

77

Conclusions Real data often disobey textbook

assumptions (Gaussian, Poisson, uniformity, independence) avoid ‘mean’ - use median, or even better,

use:

fractals, self-similarity, and power laws, to find patterns - specifically:

Page 78: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

78

Conclusions tool#1: (for points) ‘correlation

integral’: (#pairs within <= r) vs (distance r)

tool#2: (for categorical values) rank-frequency plot (a’la Zipf)

tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )

Page 79: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

79

Practitioner’s guide: tool#1: #pairs vs distance, for a set of objects,

with a distance function (slope = intrinsic dimensionality)

log(hops)

log(#pairs)

2.8

log( r )

log(#pairs(within <= r))

1.51internet

MGcounty

Page 80: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

80

Practitioner’s guide: tool#2: rank-frequency plot (for categorical

attributes)

log(rank)

log(degree)

-0.82

internet domains Biblelog(freq)

log(rank)

Page 81: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

81

Practitioner’s guide: tool#3: CCDF, for (skewed) numerical

attributes, eg. areas of islands/lakes, UNIX jobs...)

log(count( >= area))

log(area)

scandinavian lakes

Page 82: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

82

Books Strongly recommended intro book:

Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite ParadiseW.H. Freeman and Company, 1991

Classic book on fractals: B. Mandelbrot Fractal Geometry of Nature,

W.H. Freeman, 1977

Page 83: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

83

References [ieeeTN94] W. E. Leland, M.S. Taqqu, W.

Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

[pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension,PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

Page 84: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

84

References [vldb95] Alberto Belussi and Christos Faloutsos,

Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995

[vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.

Page 85: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

85

References [vldb96] Christos Faloutsos and Volker Gaede

Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept. 1996

[sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

Page 86: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

86

References [icde99] Guido Proietti and Christos Faloutsos,

I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999

[sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

Page 87: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

87

Appendix - Gory details Bad news: There are more than one

fractal dimensions Minkowski fd; Hausdorff fd; Correlation fd;

Information fd

Great news: they can all be computed fast! they usually have nearby values

Page 88: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

88

Fast estimation of fd(s): How, for the (correlation) fractal

dimension? A: Box-counting plot:

log( r )

rpi

log(sum(pi ^2))

Page 89: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

89

Definitions pi : the percentage (or count) of points

in the i-th cell r: the side of the grid

Page 90: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

90

Fast estimation of fd(s): compute sum(pi^2) for another grid

side, r’

log( r )

r’

pi’

log(sum(pi ^2))

Page 91: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

91

Fast estimation of fd(s): etc; if the resulting plot has a linear part,

its slope is the correlation fractal dimension D2

log( r )

log(sum(pi ^2))

Page 92: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

92

Definitions (cont’d) Many more fractal dimensions Dq (related

to Renyi entropies):

)log()log(

1)log(

)log(1

1

1 rpp

D

qrp

qD

ii

qi

q

∂∂

=

≠∂

∂−

=

Page 93: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

93

Hausdorff or box-counting fd: Box counting plot: Log( N ( r ) ) vs Log (

r) r: grid side N (r ): count of non-empty cells (Hausdorff) fractal dimension D0:

)log())(log(

0 rrND

∂∂

−=

Page 94: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

94

Definitions (cont’d) Hausdorff fd:

r

log(r)

log(#non-empty cells)

D0

Page 95: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

95

Observations q=0: Hausdorff fractal dimension q=2: Correlation fractal dimension

(identical to the exponent of the number of neighbors vs radius)

q=1: Information fractal dimension

Page 96: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

96

Observations, cont’d in general, the Dq’s take similar, but not

identical, values. except for perfectly self-similar point-

sets, where Dq=Dq’ for any q, q’

Page 97: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

97

Examples:MG county Montgomery County of MD (road end-

points)

Page 98: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

98

Examples:LB county Long Beach county of CA (road end-

points)

Page 99: Principles of Database Systems · Principles of Database Systems. 2. Indexing - Detailed outline fractals intro applications. 3. Intro to fractals - outline Motivation – 3 problems

99

Conclusions many fractal dimensions, with nearby

values can be computed quickly

(O(N) or O(N log(N))

(code: on the web)