Mehlhorn-Tsakalidis Revisited · Mehlhorn-Tsakalidis Revisited Christos Zaroliagis [Joint work with...

Post on 09-Oct-2020

4 views 0 download

Transcript of Mehlhorn-Tsakalidis Revisited · Mehlhorn-Tsakalidis Revisited Christos Zaroliagis [Joint work with...

Mehlhorn-Tsakalidis Revisited

Christos Zaroliagis

[Joint work with A. Kaporis, C. Makris, S. Sioutas, A. Tsakalidis, & K. Tsichlas]

1 / 24

Mehlhorn-Tsakalidis - once upon a time ...

2 / 24

Mehlhorn-Tsakalidis - once upon a time ...

2 / 24

Dynamic Dictionary Search - Problem

Given set of keys S = {X1, . . . , Xn}, Xi ∈ [a, b] ⊂ IR

Maintain (under key insertions and deletions) a non-decreasing ordering

P = {X(1), . . . , X(n)} of S such that:

given a query element y

find largest X(j) ∈ P : X(j) ≤ y

3 / 24

Dynamic Dictionary Search - Classical methods

Use arbitrary rule to select splitting element X(k) ∈ P that splits P into two

subsets, and recurse

e.g., in binary search, splitting element = middle element

Balanced search trees (AVL-trees, red-black trees, (a, b)-trees, etc):

search & update time: O(log n)

Bounds

• optimal for Pointer Machine

• RAM: Θ(√

log n

log log n

)[Andersson & Thorup, 2001]

4 / 24

Dynamic Dictionary Search - Classical methods

Use arbitrary rule to select splitting element X(k) ∈ P that splits P into two

subsets, and recurse

e.g., in binary search, splitting element = middle element

Balanced search trees (AVL-trees, red-black trees, (a, b)-trees, etc):

search & update time: O(log n)

Bounds

• optimal for Pointer Machine

• RAM: Θ(√

log n

log log n

)[Andersson & Thorup, 2001]

4 / 24

Dynamic Dictionary Search - Classical methods

Use arbitrary rule to select splitting element X(k) ∈ P that splits P into two

subsets, and recurse

e.g., in binary search, splitting element = middle element

Balanced search trees (AVL-trees, red-black trees, (a, b)-trees, etc):

search & update time: O(log n)

Bounds

• optimal for Pointer Machine

• RAM: Θ(√

log n

log log n

)[Andersson & Thorup, 2001]

4 / 24

Surpassing the Bounds - Expected Complexities

Interpolation Search [Peterson,57]

Select splitting elements by taking advantage of the statistical properties of

the keys

Splitting elements are spread closer to query key y

Expected search time (static problem):

Θ(log log n), uniform distribution

[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],

[Gonnet, Rogers & George, 80]

Θ(log log n), regular (non-uniform) distribution

[Willard,85]

5 / 24

Surpassing the Bounds - Expected Complexities

Interpolation Search [Peterson,57]

Select splitting elements by taking advantage of the statistical properties of

the keys

Splitting elements are spread closer to query key y

Expected search time (static problem):

Θ(log log n), uniform distribution

[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],

[Gonnet, Rogers & George, 80]

Θ(log log n), regular (non-uniform) distribution

[Willard,85]

5 / 24

Surpassing the Bounds - Expected Complexities

Interpolation Search [Peterson,57]

Select splitting elements by taking advantage of the statistical properties of

the keys

Splitting elements are spread closer to query key y

Expected search time (static problem):

Θ(log log n), uniform distribution

[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],

[Gonnet, Rogers & George, 80]

Θ(log log n), regular (non-uniform) distribution

[Willard,85]

5 / 24

Surpassing the Bounds - Expected Complexities

Interpolation Search [Peterson,57]

Select splitting elements by taking advantage of the statistical properties of

the keys

Splitting elements are spread closer to query key y

Expected search time (static problem):

Θ(log log n), uniform distribution

[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],

[Gonnet, Rogers & George, 80]

Θ(log log n), regular (non-uniform) distribution

[Willard,85]

5 / 24

Dynamic Interpolation Search

Scenario: µ-random insertions, random deletions

µ uniform [Frederickson,83], [Itai,Konheim & Rodeh, 81]

search: O(log log n) expected time

update: O(nε) time, 0 < ε < 1

6 / 24

Dynamic Interpolation Search

µ (f1, f2)-smooth (peak-less or peak-adjusting) [Mehlhorn & Tsakalidis,85]

Pr

[c2 −

c3 − c1

f1(n)≤ X ≤ c2 | c1 ≤ X ≤ c3

]≤ βf2(n)

n

smooth ⊃ {uniform, regular, bounded, other non-uniform}→ any probability distribution is (f1,Θ(n))-smooth

search: O(log log n) expected time

update:

{O(log log n) expected [Mehlhorn & Tsakalidis,85]

O(1) time (position given) [Andersson & Mattson,93]

7 / 24

Dynamic Interpolation Search

µ (f1, f2)-smooth (peak-less or peak-adjusting) [Mehlhorn & Tsakalidis,85]

Pr

[c2 −

c3 − c1

f1(n)≤ X ≤ c2 | c1 ≤ X ≤ c3

]≤ βf2(n)

n

smooth ⊃ {uniform, regular, bounded, other non-uniform}→ any probability distribution is (f1,Θ(n))-smooth

search: O(log log n) expected time

update:

{O(log log n) expected [Mehlhorn & Tsakalidis,85]

O(1) time (position given) [Andersson & Mattson,93]

7 / 24

Dynamic Interpolation Search

µ (f1, f2)-smooth (peak-less or peak-adjusting) [Mehlhorn & Tsakalidis,85]

Pr

[c2 −

c3 − c1

f1(n)≤ X ≤ c2 | c1 ≤ X ≤ c3

]≤ βf2(n)

n

smooth ⊃ {uniform, regular, bounded, other non-uniform}→ any probability distribution is (f1,Θ(n))-smooth

search: O(log log n) expected time

update:

{O(log log n) expected [Mehlhorn & Tsakalidis,85]

O(1) time (position given) [Andersson & Mattson,93]

7 / 24

Static/Dynamic Interpolation Search

Key AssumptionConditional distribution on the subinterval dictated by an arbitrary interpolation

step remains unaffected (i.e., µ-random)

8 / 24

This Work (Revisiting Mehl-Tsak DIS) - Result 1

Key assumption is valid

• only when elements are distinct (indeed assumed in all previous work)

V produced under some continuous (or discrete) distribution with zero probability

of collision

• otherwise, it fails

Probabilistic analyses of previous Interpolation Search structures are

inapplicable to sequences of non-distinct elements

V produced by discrete probability distributions with measurable (non-zero)

probability of key collision

9 / 24

This Work (Revisiting Mehl-Tsak DIS) - Result 1

Key assumption is valid

• only when elements are distinct (indeed assumed in all previous work)

V produced under some continuous (or discrete) distribution with zero probability

of collision

• otherwise, it fails

Probabilistic analyses of previous Interpolation Search structures are

inapplicable to sequences of non-distinct elements

V produced by discrete probability distributions with measurable (non-zero)

probability of key collision

9 / 24

Interpolation Search - Lack of Generalization

∃ applications where we must store duplicates

Secondary indices in databases

Searching tables with alphabetic keys (names, dictionary entries, etc)

B keys follow a non-uniform, discrete distribution and collisions do occur

B Empirical results showed that Interpolation Search has a very poor

performance on such data

[Perl & Reingold,77], [Burton & Lewis,80], [Santorno & Sidney,85], [Perl & Gabriel,92]

B Heuristics; no rigorous performance analysis

Storing duplicates once: statistical properties are destroyed

10 / 24

This Work (Revisiting Mehl-Tsak DIS) - Result 2

New dynamic interpolation search data structure

search: O(log log n) expected time

update: O(1) time (position given)

Bounds hold with high probability

Analysis

− always valid irrespectively of the distinctness or not of the elements

− applies to

(n

(log log n)1+ε , nδ)

-smooth input distributions, δ ∈ (0, 1), ε > 0

Robust (no apriori knowledge of the input distribution is required)

11 / 24

This Work (Revisiting Mehl-Tsak DIS) - Other Results

Modifications to our data structure yield

O(1) expected search time w.h.p. for a (largest so far) subclass of smooth

distributions

O(log log n) expected search time w.h.p. for Power-law and Binomialdistributions

B Power-law and Binomial: @ proper choices of f1, f2 to achieve O(log log n)expected search time

12 / 24

Interpolation Search [Mehlhorn & Tsakalidis,85], [Andersson & Mattson,93]

eFirst Interpolation Step

Children of the root node:

Root���������

""""

XXXXXXXXXe1

e2

. . . e

. . . v . . .

. . . eR(n)

Root’s ID array

ID[1] ID[2] ID[j] ID[I(n)]

Root’s REP array

a

a

b

b

pppppppppppp

pppppppppppp

pppppppppppp

ppppppppp

pppppppppppp

pppppppppppp pppppppp

ppppa+ 1 b−aI(n)

rREP[1]

rREP[2]

rREP[3]

r. . .

. . .

. . .REP[t] . . .

. . .

rREP[R(n)]

rREP[t+ l]

a+ 2 b−aI(n) a+ (j − 1) b−a

I(n) a+ j b−aI(n) a+ (f1(n)− 1) b−a

I(n)

µ is (f1(n), f2(n)) = (I(n), n/R(n))-smooth

root has R(n) children; a root-child has R(n/R(n)) children, etc.

ID array: partitions [a, b] into I(n) equal-length parts

REP array: partitions P into R(n) equal-sized subsets, each of size n/R(n)REP[i]↔ Pi = {X ∈ P | X((i−1) n

R(n)) < X ≤ X(i n

R(n))}, i = 1, . . . , R(n)

13 / 24

Interpolation Search [Mehlhorn & Tsakalidis,85], [Andersson & Mattson,93]

eFirst Interpolation Step

Children of the root node:

Root���������

""""

XXXXXXXXXe1

e2

. . . e

. . . v . . .

. . . eR(n)

Root’s ID array

ID[1] ID[2] ID[j] ID[I(n)]

Root’s REP array

a

a

b

b

pppppppppppp

pppppppppppp

pppppppppppp

ppppppppp

pppppppppppp

pppppppppppp pppppppp

ppppa+ 1 b−aI(n)

rREP[1]

rREP[2]

rREP[3]

r. . .

. . .

. . .REP[t] . . .

. . .

rREP[R(n)]

rREP[t+ l]

a+ 2 b−aI(n) a+ (j − 1) b−a

I(n) a+ j b−aI(n) a+ (f1(n)− 1) b−a

I(n)

Query element y V

j =⌊

y−a

b−aI(n)⌋

+ 1 V Ij =[a + (j − 1) b−a

I(n) , a + jb−a

I(n)

]

Search for appropriate REP within Ij

− O(1) expected number of REPs

− distribution of elements between consecutive REPs: µ-random (smooth)

14 / 24

Analysis and the Subtle Case

[a, b] µ-random and smooth, (a′, b′] ⊆ [a, b]

Pr[X = λ | a′ < X ≤ b′] =

Pr[X = λ]

Pr[a′ < X < b′], a

′ < λ < b′ (1)

Consider X1, X2, X3 ∈ [a, b], and their ordering X(1) ≤ X(2) ≤ X(3)

Pr[X = λ | X(1) = a′∩X(3) = b

′] =Pr[X = λ ∩ X(1) = a′ ∩ X(3) = b′]

Pr[X(1) = a′ ∩ X(3) = b′](2)

Is (2) µ-random and smooth ??

15 / 24

Analysis and the Subtle Case

[a, b] µ-random and smooth, (a′, b′] ⊆ [a, b]

Pr[X = λ | a′ < X ≤ b′] =

Pr[X = λ]

Pr[a′ < X < b′], a

′ < λ < b′ (1)

Consider X1, X2, X3 ∈ [a, b], and their ordering X(1) ≤ X(2) ≤ X(3)

Pr[X = λ | X(1) = a′∩X(3) = b

′] =Pr[X = λ ∩ X(1) = a′ ∩ X(3) = b′]

Pr[X(1) = a′ ∩ X(3) = b′](2)

Is (2) µ-random and smooth ??

15 / 24

Analysis and the Subtle Case

[a, b] µ-random and smooth, (a′, b′] ⊆ [a, b]

Pr[X = λ | a′ < X ≤ b′] =

Pr[X = λ]

Pr[a′ < X < b′], a

′ < λ < b′ (1)

Consider X1, X2, X3 ∈ [a, b], and their ordering X(1) ≤ X(2) ≤ X(3)

Pr[X = λ | X(1) = a′∩X(3) = b

′] =Pr[X = λ ∩ X(1) = a′ ∩ X(3) = b′]

Pr[X(1) = a′ ∩ X(3) = b′](2)

Is (2) µ-random and smooth ??

15 / 24

Analysis and the Subtle Case

Distinct keys (Pr[key collision] = 0)

Event E1 = {X = λ ∩ X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs

{X2 = λ, X3 = a′, X1 = b

′}, {X1 = λ, X2 = a′, X3 = b

′},{X3 = λ, X1 = a

′, X2 = b′}, {X1 = λ, X3 = a

′, X2 = b′},

{X3 = λ, X2 = a′, X1 = b

′}, {X2 = λ, X1 = a′, X3 = b

′}

Event E2 = {X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs

{X1 = a′, X2 = b

′, a′ < X3 < b

′}, {X2 = a′, X1 = b

′, a′ < X3 < b

′},{X1 = a

′, X3 = b′, a

′ < X2 < b′}, {X3 = a

′, X1 = b′, a

′ < X2 < b′},

{X2 = a′, X3 = b

′, a′ < X1 < b′}, {X3 = a

′, X2 = b′, a

′ < X1 < b′}

(2) = Pr[E1]Pr[E2]

= 6Pr[X=λ] Pr[X=a′] Pr[X=b′]6Pr[X=a′] Pr[X=b′] Pr[a′<X<b′] = Pr[X=λ]

Pr[a′<X<b′] = (1)

16 / 24

Analysis and the Subtle Case

Non-distinct keys (Pr[key collision] 6= 0)

Event E1 = {X = λ ∩ X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs

{X2 = λ, X3 = a′, X1 = b

′}, {X1 = λ, X2 = a′, X3 = b

′},{X3 = λ, X1 = a

′, X2 = b′}, {X1 = λ, X3 = a

′, X2 = b′},

{X3 = λ, X2 = a′, X1 = b

′}, {X2 = λ, X1 = a′, X3 = b

′}

Event E2 = {X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs

{X1 = a′, X2 = b

′, a′ < X3 < b

′}, {X2 = a′, X1 = b

′, a′ < X3 < b

′},{X1 = a

′, X3 = b′, a

′ < X2 < b′}, {X3 = a

′, X1 = b′, a

′ < X2 < b′},

{X2 = a′, X3 = b

′, a′ < X1 < b′}, {X3 = a

′, X2 = b′, a

′ < X1 < b′}

{X1,2 = a′, X3 = b

′}, {X1,2 = b′, X3 = a

′}, {X1,3 = a′, X2 = b

′},{X1,3 = b

′, X2 = a′}, {X2,3 = a

′, X1 = b′}, {X2,3 = b

′, X1 = a′}

(2) = Pr[E1]Pr[E2]

= Pr[X=λ]Pr[X=a′]+Pr[X=b′]

2+Pr[a′<X<b′]

6= (1) !!

17 / 24

The New Data Structure

n elements drawn µ-randomly from [a, b]µ is (f1, f2) = (nα, nδ)-smooth, α, δ ∈ (0, 1)

Key idea

Forget about REPs

Partition [a, b] into f1(n) intervals, each of size (b − a)/f1(n)

Recurse on each interval (BIN) until |BIN| = O(poly log n)

18 / 24

The New Data Structure

1st Layer of f1(n) bins.

Each bin receives a ball with probability O(f2(n)n

)= O

(nδ

n

).

BIN(1) BIN(2) BIN(j1) BIN(f1(n))

a b

r r r . . .

n1 balls r r . . .

n2 balls r r r . . .

nj1 balls r r r r . . .

nf1(n) balls r. . .

. . .

. . .

. . .

a+ 1 b−af1(n)

a+ 2 b−af1(n)

a+ (j1 − 1) b−af1(n)

a+ j1b−af1(n)

a+ (f1(n)− 1) b−af1(n)

✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥

✦✦✦✦✦✦✦✦✦✦✦✦✦✦✦✦✦

✄✄✄✄✄

PPPPPPPPPPPPPPPP

BIN(j1)’s 2nd Layer of f1(nj1) bins.

Each bin receives a ball with probability O(f2(nj1

)

nj1

)= O

(nδ2

).

BIN(j1, 1) BIN(j1, 2) BIN(j1, j2) BIN(j1, f1(nj1))

aj1 bj1

r . . .

nj1,1 balls r r r r . . .

nj1,2 balls r r r . . .

nj1,j2 balls r r . . .

nj1,f1(nj1) ballsr

. . .

. . .

. . .aj1 + 1bj1−aj1f1(nj1

) aj1 + 2bj1−aj1f1(nj1

) aj1 + (j2 − 1)bj1−aj1f1(nj1

) aj1 + j2bj1−aj1f1(nj1

)

Leaf BIN: q∗-heap [Willard,93]⇒ O(1) search & update time

19 / 24

The New Data Structure

Lem. 1 The elements of each BIN (subinterval) are µ-randomly distributed

Lem. 2 |BIN| = f2(|parent-BIN|) w.h.p.

⇓− 1st layer: |BIN| = O(nδ)

.

.

.

− k-th layer: |BIN| = O(nδk

)⇓

I Number of layers: O(log log n)

20 / 24

O(1) Search Time

First layer of f1(n) BINS.

Each BIN receives a ball with probability O(f2(n)n

)= O

(lnO(1) n

n

).

BIN(11) BIN(21) BIN(j1) BIN(f1(n)1)

a b

r r r . . .

n11 balls r r r r . . .

n21 balls r r r r . . .

nj1 balls r r r r . . .

nf1(n)1 balls r. . .

. . .

. . .

. . .

a+ 1 b−af1(n)

a+ 2 b−af1(n)

a+ (j1 − 1) b−af1(n)

a+ j1b−af1(n)

a+ (f1(n)− 1) b−af1(n)

µ: (f1(n), f2(n)) =(

n

g, lnO(1)

n

)-smooth, g: constant

|BIN| = O(lnO(1)n) w.h.p.

Implement each bin as q∗-heap V O(1) search time

21 / 24

Power-law Distribution

I1 I21 bnα

I1 is dense V van Emde Boas structure, O(log log |I1|) search time

I2 is sparse V distribution is (f1(n), poly log n)-smooth

Similar idea for Binomial

22 / 24

Revisiting Mehl-Tsak DIS - Conclusions

New dynamic interpolation search data structure

Supports non-distinct (duplicate) elements

Expected search time O(log log n) w.h.p. for smooth and other

distributions (e.g., power law)

O(1) expected search time for a subclass (largest so far) of smooth

distributions

23 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24

Thanks

Thank you Thanassi for

introducing me (and others) to the intricacies of elementary data structures

your contributions to CS&E and for establishing it within Greece

your huge contribution in elevating the Dept of Computer Engineering &

Informatics in all respects (infrastructure, procedures, etc) within and out of

our University

the artistic touch you gave to CEID

... your support and friendship

Wish you all the best for the future

24 / 24