150 fractals intro - Carnegie Mellon School of Computer Science · 2019-09-16 · •Definition of...
Transcript of 150 fractals intro - Carnegie Mellon School of Computer Science · 2019-09-16 · •Definition of...
15-826 C. Faloutsos
1
15-826: Multimedia Databases and Data Mining
Lecture #8: Fractals - introductionC. Faloutsos
15-826 Copyright: C. Faloutsos (2019) 2
Must-read Material
• Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, Proc. ACM SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis, MN.
15-826 C. Faloutsos
2
15-826 Copyright: C. Faloutsos (2019) 3
Recommended Material
optional, but very useful:• Manfred Schroeder Fractals, Chaos, Power
Laws: Minutes from an Infinite ParadiseW.H. Freeman and Company, 1991– Chapter 10: boxcounting method– Chapter 1: Sierpinski triangle
15-826 Copyright: C. Faloutsos (2019) 4
Outline
Goal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining
15-826 C. Faloutsos
3
15-826 Copyright: C. Faloutsos (2019) 5
Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods
– z-ordering– R-trees– misc
• fractals– intro– applications
• text
15-826 Copyright: C. Faloutsos (2019) 6
Intro to fractals - outline
• Motivation – 3 problems / case studies• Definition of fractals and power laws• Solutions to posed problems• More examples and tools• Discussion - putting fractals to work!• Conclusions – practitioner’s guide• Appendix: gory details - boxcounting plots
15-826 C. Faloutsos
4
15-826 Copyright: C. Faloutsos (2019) 7
Problem
• What patterns are in real k-dim points?
15-826 Copyright: C. Faloutsos (2019) 8
Conclusions
• What patterns are in real k-dim points?• Self-similarity ( = fractals -> power laws)
15-826 C. Faloutsos
5
15-826 Copyright: C. Faloutsos (2019) 9
Road end-points of Montgomery county:
•Q1: how many d.a. for an R-tree?
•Q2 : distribution?
•not uniform
•not Gaussian
•no rules??
Problem #1: GIS - points
15-826 Copyright: C. Faloutsos (2019) 10
Problem #2 - spatial d.m.
Galaxies (Sloan Digital Sky Survey w/ B. Nichol)
- ‘spiral’ and ‘elliptical’galaxies
(stores and households ...)
- patterns?
- attraction/repulsion?
- how many ‘spi’ within r from an ‘ell’?
15-826 C. Faloutsos
6
15-826 Copyright: C. Faloutsos (2019) 11
Problem #3: traffic
• disk trace (from HP - J. Wilkes); Web traffic - fit a model
• how many explosions to expect?
• queue length distr.?time
# bytes
15-826 Copyright: C. Faloutsos (2019) 12
Problem #3: traffic
time
# bytes
Poisson indep., ident. distr
15-826 C. Faloutsos
7
15-826 Copyright: C. Faloutsos (2019) 13
Problem #3: traffic
time
# bytes
Poisson indep., ident. distr
15-826 Copyright: C. Faloutsos (2019) 14
Problem #3: traffic
time
# bytes
Poisson indep., ident. distr
Q: Then, how to generatesuch bursty traffic?
15-826 C. Faloutsos
8
15-826 Copyright: C. Faloutsos (2019) 15
Common answer:
• Fractals / self-similarities / power laws• Seminal works from Hilbert, Minkowski,
Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)
15-826 Copyright: C. Faloutsos (2019) 16
Road map
• Motivation – 3 problems / case studies• Definition of fractals and power laws• Solutions to posed problems• More examples and tools• Discussion - putting fractals to work!• Conclusions – practitioner’s guide• Appendix: gory details - boxcounting plots
15-826 C. Faloutsos
9
15-826 Copyright: C. Faloutsos (2019) 17
What is a fractal?
= self-similar point set, e.g., Sierpinski triangle:
... zero area;
infinite length!
Dimensionality??
15-826 Copyright: C. Faloutsos (2019) 18
Definitions (cont’d)
• Paradox: Infinite perimeter ; Zero area!• ‘dimensionality’: between 1 and 2• actually: Log(3)/Log(2) = 1.58...
15-826 C. Faloutsos
10
15-826 Copyright: C. Faloutsos (2019) 19
Dfn of fd:
ONLY for a perfectly self-similar point set:
=log(n)/log(f) = log(3)/log(2) = 1.58
... zero area;
infinite length!
15-826 Copyright: C. Faloutsos (2019) 20
Intrinsic (‘fractal’) dimension• Q: fractal dimension of a
line?• A: 1 (= log(2)/log(2)!)
15-826 C. Faloutsos
11
15-826 Copyright: C. Faloutsos (2019) 21
Intrinsic (‘fractal’) dimension• Q: fractal dimension of a
line?• A: 1 (= log(2)/log(2)!)
15-826 Copyright: C. Faloutsos (2019) 22
Intrinsic (‘fractal’) dimension
• Q: dfn for a given set of points?
42332415yx
15-826 C. Faloutsos
12
15-826 Copyright: C. Faloutsos (2019) 23
Intrinsic (‘fractal’) dimension• Q: fractal dimension of a
line?• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)
• Q: fd of a plane?• A: nn ( <= r ) ~ r^2fd== slope of (log(nn) vs
log(r) )
15-826 Copyright: C. Faloutsos (2019)
Intrinsic (‘fractal’) dimension• Local fractal dimension of
point ‘P’?• A: nnP ( <= r ) ~ r^1
• If this equation holds for several values of r,
• Then, the local fractal dimension of point P:
• Local fd = exp = 1
P
EXPLANATIONS
24
15-826 C. Faloutsos
13
15-826 Copyright: C. Faloutsos (2019)
Intrinsic (‘fractal’) dimension• Local fractal dimension of
point ‘A’?• A: nnP ( <= r ) ~ r^1
• If this is true for all points of the cloud
• Then the exponent is the global f.d.
• Or simply the f.d.
P
EXPLANATIONS
25
15-826 Copyright: C. Faloutsos (2019)
Intrinsic (‘fractal’) dimension• Global fractal dimension?• A: if • sumall_P [ nnP ( <= r ) ] ~
r^1• Then: exp = global f.d.
• If this is true for all points of the cloud
• Then the exponent is the global f.d.
• Or simply the f.d.
A
EXPLANATIONS
26
15-826 C. Faloutsos
14
15-826 Copyright: C. Faloutsos (2019)
Intrinsic (‘fractal’) dimension• Local fractal dimensionfor sierpinski triangle ?
EXPLANATIONS
27
15-826 Copyright: C. Faloutsos (2019)
Intrinsic (‘fractal’) dimension• Local fractal dimensionfor sierpinski triangle ?
EXPLANATIONS
28
• 2x radius, 3x points• n = r ^ (log3/log2)
15-826 C. Faloutsos
15
15-826 Copyright: C. Faloutsos (2019) 29
Intrinsic (‘fractal’) dimension
• Algorithm, to estimate it?Notice• Sumall_P [ nnP (<=r) ] is exactly
tot#pairs(<=r)
including ‘mirror’ pairs
EXPLANATIONS
15-826 Copyright: C. Faloutsos (2019) 30
Sierpinsky triangle
log( r )
log(#pairs within <=r )
1.58
== ‘correlation integral’
15-826 C. Faloutsos
16
15-826 Copyright: C. Faloutsos (2019) 31
Observations:
• Euclidean objects have integer fractal dimensions – point: 0– lines and smooth curves: 1– smooth surfaces: 2
• fractal dimension -> roughness of the periphery
15-826 Copyright: C. Faloutsos (2019) 32
Important properties
• fd = embedding dimension -> uniform pointset
• a point set may have several fd, depending on scale
15-826 C. Faloutsos
17
15-826 Copyright: C. Faloutsos (2019) 33
Important properties
• fd = embedding dimension -> uniform pointset
• a point set may have several fd, depending on scale
2-d
15-826 Copyright: C. Faloutsos (2019) 34
Important properties
• fd = embedding dimension -> uniform pointset
• a point set may have several fd, depending on scale
1-d
15-826 C. Faloutsos
18
15-826 Copyright: C. Faloutsos (2019) 35
Important properties
0-d
15-826 Copyright: C. Faloutsos (2019) 36
Road map
• Motivation – 3 problems / case studies• Definition of fractals and power laws• Solutions to posed problems• More examples and tools• Discussion - putting fractals to work!• Conclusions – practitioner’s guide• Appendix: gory details - boxcounting plots
15-826 C. Faloutsos
19
15-826 Copyright: C. Faloutsos (2019) 37
Cross-roads of Montgomery county:
•any rules?
Problem #1: GIS points
15-826 Copyright: C. Faloutsos (2019) 38
Solution #1
A: self-similarity ->• <=> fractals • <=> scale-free• <=> power-laws
(y=x^a, F=C*r^(-2))• avg#neighbors(<= r )
= r^D
log( r )
log(#pairs(within <= r))
1.51
15-826 C. Faloutsos
20
15-826 Copyright: C. Faloutsos (2019) 39
Solution #1
A: self-similarity• avg#neighbors(<= r )
~ r^(1.51)
log( r )
log(#pairs(within <= r))
1.51
15-826 Copyright: C. Faloutsos (2019) 40
Examples:MG county
• Montgomery County of MD (road end-points)
15-826 C. Faloutsos
21
15-826 Copyright: C. Faloutsos (2019) 41
Examples:LB county
• Long Beach county of CA (road end-points)
15-826 Copyright: C. Faloutsos (2019) 42
Solution#2: spatial d.m.Galaxies ( ‘BOPS’ plot - [sigmod2000])
log(#pairs)
log(r)
15-826 C. Faloutsos
22
15-826 Copyright: C. Faloutsos (2019) 43
Solution#2: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
15-826 Copyright: C. Faloutsos (2019) 44
Spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
15-826 C. Faloutsos
23
15-826 Copyright: C. Faloutsos (2019) 45
Spatial d.m.
r1r2
r1
r2
Heuristic on choosing # of clusters
15-826 Copyright: C. Faloutsos (2019) 46
Spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
15-826 C. Faloutsos
24
15-826 Copyright: C. Faloutsos (2019) 47
Spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
-repulsion!!
-duplicates
15-826 Copyright: C. Faloutsos (2019) 48
Solution #3: traffic
• disk traces: self-similar:
time
#bytes
15-826 C. Faloutsos
25
15-826 Copyright: C. Faloutsos (2019) 49
Solution #3: traffic
• disk traces (80-20 ‘law’ = ‘multifractal’)
time
#bytes20% 80%
15-826 Copyright: C. Faloutsos (2019) 50
80-20 / multifractals20 80
15-826 C. Faloutsos
26
15-826 Copyright: C. Faloutsos (2019) 51
80-20 / multifractals20
• p ; (1-p) in general
• yes, there are dependencies
80
15-826 Copyright: C. Faloutsos (2019) 52
More on 80/20: PQRS
• Part of ‘self-* storage’ project [Wang+’02]
time
cylinder#
15-826 C. Faloutsos
27
15-826 Copyright: C. Faloutsos (2019) 53
More on 80/20: PQRS
• Part of ‘self-* storage’ project [Wang+’02]
p q
r s
q
r s
15-826 Copyright: C. Faloutsos (2019) 54
Solution#3: traffic
Clarification:• fractal: a set of points that is self-similar• multifractal: a probability density function
that is self-similar
Many other time-sequences are bursty/clustered: (such as?)
15-826 C. Faloutsos
28
15-826 Copyright: C. Faloutsos (2019) 55
Example:
• network traffic
http://repository.cs.vt.edu/lbl-conn-7.tar.Z
15-826 Copyright: C. Faloutsos (2019) 56
Web traffic
• [Crovella Bestavros, SIGMETRICS’96]
1000 sec; 100sec10sec; 1sec
15-826 C. Faloutsos
29
15-826 Copyright: C. Faloutsos (2019) 57
Tape accesses
time
Tape#1 Tape# N
# tapes needed, to retrieve n records?
(# days down, due to failures / hurricanes / communication noise...)
15-826 Copyright: C. Faloutsos (2019) 58
Tape accesses
time
Tape#1 Tape# N
# tapes retrieved
# qual. records
50-50 = Poisson
real
15-826 C. Faloutsos
30
15-826 Copyright: C. Faloutsos (2019) 59
Road map
• Motivation – 3 problems / case studies• Definition of fractals and power laws• Solutions to posed problems• More tools and examples • Discussion - putting fractals to work!• Conclusions – practitioner’s guide• Appendix: gory details - boxcounting plots
15-826 Copyright: C. Faloutsos (2019) 60
A counter-intuitive example
• avg degree is, say 3.3• pick a node at random
– guess its degree, exactly (-> “mode”)
degree
count
avg: 3.3
?
15-826 C. Faloutsos
31
15-826 Copyright: C. Faloutsos (2019) 61
A counter-intuitive example
• avg degree is, say 3.3• pick a node at random
– guess its degree, exactly (-> “mode”)
• A: 1!!
degree
count
avg: 3.3
15-826 Copyright: C. Faloutsos (2019) 62
A counter-intuitive example
• avg degree is, say 3.3• pick a node at random
- what is the degree you expect it to have?
• A: 1!!• A’: very skewed distr.• Corollary: the mean is
meaningless!• (and std -> infinity (!))
degree
count
avg: 3.3
15-826 C. Faloutsos
32
15-826 Copyright: C. Faloutsos (2019) 63
Rank exponent R• Power law in the degree distribution
[SIGCOMM99]internet domains
log(rank)
log(degree)
-0.82
att.com
ibm.com
15-826 Copyright: C. Faloutsos (2019) 64
More tools
• Zipf’s law• Korcak’s law / “fat fractals”
15-826 C. Faloutsos
33
15-826 Copyright: C. Faloutsos (2019) 65
A famous power law: Zipf’s law
• Q: vocabulary word frequency in a document - any pattern?
aaron zoo
freq.
15-826 Copyright: C. Faloutsos (2019) 66
A famous power law: Zipf’s law
• Bible - rank vs frequency (log-log)
log(rank)
log(freq)
“a”
“the”
15-826 C. Faloutsos
34
15-826 Copyright: C. Faloutsos (2019) 67
A famous power law: Zipf’s law
• Bible - rank vs frequency (log-log)
• similarly, in manyother languages; for customers and sales volume; city populations etc etclog(rank)
log(freq)
15-826 Copyright: C. Faloutsos (2019) 68
A famous power law: Zipf’s law
•Zipf distr:
freq = 1/ rank
•generalized Zipf:
freq = 1 / (rank)^a
log(rank)
log(freq)
15-826 C. Faloutsos
35
15-826 Copyright: C. Faloutsos (2019) 69
Olympic medals (Sydney):
y = -0.9676x + 2.3054R2 = 0.9458
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
Series1Linear (Series1)
rank
log(#medals)
15-826 Copyright: C. Faloutsos (2019) 70
Olympic medals (Sydney’00, Athens’04):
log( rank)
log(#medals)
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
athens
sidney
15-826 C. Faloutsos
36
15-826 Copyright: C. Faloutsos (2019) 71
TELCO data
# of service units
count ofcustomers
‘best customer’
15-826 Copyright: C. Faloutsos (2019) 72
SALES data – store#96
# units sold
count of products
“aspirin”
15-826 C. Faloutsos
37
15-826 Copyright: C. Faloutsos (2019) 73
More power laws: areas –Korcak’s law
Scandinavian lakes
Any pattern?
15-826 Copyright: C. Faloutsos (2019) 74
More power laws: areas –Korcak’s law
Scandinavian lakes area vs complementary cumulative count (log-log axes)
log(count( >= area))
log(area)
15-826 C. Faloutsos
38
15-826 Copyright: C. Faloutsos (2019) 75
More power laws: Korcak
Japan islands;
area vs cumulative count (log-log axes) log(area)
log(count( >= area))
15-826 Copyright: C. Faloutsos (2019) 76
(Korcak’s law: Aegean islands)
15-826 C. Faloutsos
39
15-826 Copyright: C. Faloutsos (2019) 77
Korcak’s law & “fat fractals”
How to generate such regions?
15-826 Copyright: C. Faloutsos (2019) 78
Korcak’s law & “fat fractals”Q: How to generate such regions?A: recursively, from a single region
...
15-826 C. Faloutsos
40
15-826 Copyright: C. Faloutsos (2019) 79
so far we’ve seen:
• concepts:– fractals, multifractals and fat fractals
• tools:– correlation integral (= pair-count plot)– rank/frequency plot (Zipf’s law)– CCDF (Korcak’s law)
15-826 Copyright: C. Faloutsos (2019) 80
so far we’ve seen:
• concepts:– fractals, multifractals and fat fractals
• tools:– correlation integral (= pair-count plot)– rank/frequency plot (Zipf’s law)– CCDF (Korcak’s law)
sameinfo
15-826 C. Faloutsos
41
Next:
• More examples / applications• Practitioner’s guide• Box-counting: fast estimation of correlation
integral
15-826 Copyright: C. Faloutsos (2019) 81
15-826 Copyright: C. Faloutsos (2019) 82
Problem
• What patterns are in real k-dim points?
15-826 C. Faloutsos
42
15-826 Copyright: C. Faloutsos (2019) 83
Conclusions
• What patterns are in real k-dim points?• Self-similarity ( = fractals -> power laws)