Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache...

62
Cache Oblivious Searching and Sorting Gerth Stølting Brodal BRICS University of Aarhus Joint work with Rolf Fagerberg ( ˚ Arhus), Riko Jacob (Munich), Michael A. Bender, Dongdong Ge, Simai He, Haodong Hu (SUNY Stony Brook), John Iacono (Polytechnic, NY), Alejandro L´ opez-Ortiz (Waterloo) IT University of Copenhagen, April 30, 2003 1

Transcript of Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache...

Page 1: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Cache ObliviousSearching and Sorting

Gerth Stølting Brodal

BRICSUniversity of Aarhus

Joint work with Rolf Fagerberg (Arhus), Riko Jacob (Munich),

Michael A. Bender, Dongdong Ge, Simai He, Haodong Hu (SUNY Stony Brook),

John Iacono (Polytechnic, NY), Alejandro Lopez-Ortiz (Waterloo)

IT University of Copenhagen, April 30, 2003

1

Page 2: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Outline of Talk

• Hardware

• Computational models

– RAM model (Random Access Machine)

– IO model

– Cache oblivious model

• Binary searching and dictionaries

• Sorting

• Priority queues

• Concluding remarks

G. S. Brodal: Cache Oblivious Searching and Sorting 2

Page 3: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Hardware

G. S. Brodal: Cache Oblivious Searching and Sorting 3

Page 4: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Hardware• Dell Latitude L400, 700Mhz (January 2002)• Mobile Intel Pentium III• Primary 16 Kb instruction cache and 16 Kb

write-back data cache• 256 Kb Level 2 Cache• 256 Mb SDRAM• 10 Gb disk

DiskCPU

L2L1

AR

M

Cac

C

ca

eh h

e

Memory hierarchy

G. S. Brodal: Cache Oblivious Searching and Sorting 4

Page 5: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Hardware• Dell Latitude L400, 700Mhz (January 2002)• Mobile Intel Pentium III• Primary 16 Kb instruction cache and 16 Kb

write-back data cache• 256 Kb Level 2 Cache• 256 Mb SDRAM• 10 Gb disk

DiskCPU

L2L1

AR

M

Cac

C

ca

eh h

e

Memory hierarchy

G. S. Brodal: Cache Oblivious Searching and Sorting 4

Page 6: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Trends in Implementation Technology

DiskCPU

L2L1

AR

M

Cac

C

ca

eh h

e

Source: Computer Architecture – A Quantitative Approach, Hennessy & Patterson, 2nd. Ed. 1996

L1 Cache L2 Cache Virtual memory

Block size 4 – 32 bytes 32 – 256 bytes 4 – 16 KB

Hit time (cycles) 1 – 2 6 – 15 10 – 100

Miss penalty (cycles) 8 – 66 30 – 200 700.000 – 6.000.000

Size 1 – 128 KB 256 KB – 16 MB 16 – 8192 MB

G. S. Brodal: Cache Oblivious Searching and Sorting 5

Page 7: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

The Unknown Machine

Algorithm Algorithm↓ ↓

C program Java program↓ gcc ↓ javac

Object code Java bytecode↓ linux ↓ java

Execution Interpretation

Can be executed on machineswith a specific class of CPUs

Can be executed on any machinewith a Java interpreter

Goal Develop algorithms that are optimized w.r.t. memoryhierarchies without knowing the parameters

G. S. Brodal: Cache Oblivious Searching and Sorting 6

Page 8: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

The Unknown Machine

Algorithm Algorithm↓ ↓

C program Java program↓ gcc ↓ javac

Object code Java bytecode↓ linux ↓ java

Execution Interpretation

Can be executed on machineswith a specific class of CPUs

Can be executed on any machinewith a Java interpreter

Goal Develop algorithms that are optimized w.r.t. memoryhierarchies without knowing the parameters

G. S. Brodal: Cache Oblivious Searching and Sorting 6

Page 9: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Outline of Talk

• Hardware

• Computational models

– RAM model (Random Access Machine)

– IO model

– Cache oblivious model

• Binary searching and dictionaries

• Sorting

• Priority queues

• Concluding remarks

G. S. Brodal: Cache Oblivious Searching and Sorting 7

Page 10: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

RAM Model(Random Access Machine)

CPU

Memory

+ − ∗ / ∨ ∧ 6= ... O(1) timeMemory access O(1) time

Ignores the presence of memory hierarchies

G. S. Brodal: Cache Oblivious Searching and Sorting 8

Page 11: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

RAM Model(Random Access Machine)

CPU

Memory

+ − ∗ / ∨ ∧ 6= ... O(1) timeMemory access O(1) time

Ignores the presence of memory hierarchies

G. S. Brodal: Cache Oblivious Searching and Sorting 8

Page 12: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

RAM Model(Random Access Machine)

CPU

Memory

+ − ∗ / ∨ ∧ 6= ... O(1) timeMemory access O(1) time

Ignores the presence of memory hierarchies

G. S. Brodal: Cache Oblivious Searching and Sorting 8

Page 13: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

I/O ModelAggarwal and Vitter 1988

CPU External

I/O

MemoryyromeM

N = problem sizeM = memory sizeB = I/O block size

• One I/O moves B consecutive records from/to disk

• Cost: number of I/Os

Scan(N) = O(N/B) Sort(N) = O(

N

BlogM/B

N

B

)

G. S. Brodal: Cache Oblivious Searching and Sorting 9

Page 14: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

I/O ModelAggarwal and Vitter 1988

CPU External

I/O

MemoryyromeM

N = problem sizeM = memory sizeB = I/O block size

• One I/O moves B consecutive records from/to disk

• Cost: number of I/Os

Scan(N) = O(N/B) Sort(N) = O(

N

BlogM/B

N

B

)

G. S. Brodal: Cache Oblivious Searching and Sorting 9

Page 15: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Cache Oblivious ModelFrigo, Leiserson, Prokop, Ramachandran 1999

• Program in the RAM model

• Analyze in the I/O model (for arbitrary B and M )

• Optimal off-line cache replacement strategy

Advantages

• Optimal on arbitrary level ⇒ optimal on all levels

• B and M not hard-wired into algorithm

G. S. Brodal: Cache Oblivious Searching and Sorting 10

Page 16: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Cache Oblivious ModelFrigo, Leiserson, Prokop, Ramachandran 1999

• Program in the RAM model

• Analyze in the I/O model (for arbitrary B and M )

• Optimal off-line cache replacement strategy

Advantages

• Optimal on arbitrary level ⇒ optimal on all levels

• B and M not hard-wired into algorithm

G. S. Brodal: Cache Oblivious Searching and Sorting 10

Page 17: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Outline of Talk

• Hardware

• Computational models

– RAM model (Random Access Machine)

– IO model

– Cache oblivious model

• Binary searching and dictionaries

• Sorting

• Priority queues

• Concluding remarks

G. S. Brodal: Cache Oblivious Searching and Sorting 11

Page 18: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

RAM model : Binary Searching

• Sorted array of n elements= static dictionary

• Binary search requires O(log2 N) time

2 3 5 6 10 12 15 16 18 19 23 24 28 29 31

Search(28)

A binary search is cache oblivious and uses O(

log2NB

)

I/Os

G. S. Brodal: Cache Oblivious Searching and Sorting 12

Page 19: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

RAM model : Binary Searching

• Sorted array of n elements= static dictionary

• Binary search requires O(log2 N) time

2 3 5 6 10 12 15 16 18 19 23 24 28 29 31

Search(28)

A binary search is cache oblivious and uses O(

log2NB

)

I/Os

G. S. Brodal: Cache Oblivious Searching and Sorting 12

Page 20: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

RAM model : Binary Searching

• Sorted array of n elements= static dictionary

• Binary search requires O(log2 N) time

2 3 5 6 10 12 15 16 18 19 23 24 28 29 31

Search(28)

A binary search is cache oblivious and uses O(

log2NB

)

I/Os

G. S. Brodal: Cache Oblivious Searching and Sorting 12

Page 21: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

IO model : B-trees

O(logB N)

︸ ︷︷ ︸

· · ·

B

Search path

• Each node stores B keys and has degree B + 1

• Searches use O(logB N) I/Os

G. S. Brodal: Cache Oblivious Searching and Sorting 13

Page 22: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Static Cache Oblivious Dictionary

Bk

A

B1

A B1 Bk· · ·

· · ·

h

dh/2e

bh/2c

Recursive layout of binary tree

Searches use O(logB N) I/Os

≡ van Emde Boas layout

• Each green tree has height between (log2 B)/2 and log2 B

• Searches visit between logB N and 2 logB N green trees,i.e. perform at most 4 logB N I/Os (misalignment)

G. S. Brodal: Cache Oblivious Searching and Sorting 14

Page 23: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Static Cache Oblivious Dictionary

Bk

A

B1

A B1 Bk· · ·

· · ·

h

dh/2e

bh/2c

Recursive layout of binary tree

Searches use O(logB N) I/Os

≡ van Emde Boas layout

• Each green tree has height between (log2 B)/2 and log2 B

• Searches visit between logB N and 2 logB N green trees,i.e. perform at most 4 logB N I/Os (misalignment)

G. S. Brodal: Cache Oblivious Searching and Sorting 14

Page 24: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Static Cache Oblivious Dictionary

Bk

A

B1

A B1 Bk· · ·

· · ·

h

dh/2e

bh/2c

· · ·

Recursive layout of binary tree

Searches use O(logB N) I/Os

≡ van Emde Boas layout

• Each green tree has height between (log2 B)/2 and log2 B

• Searches visit between logB N and 2 logB N green trees,i.e. perform at most 4 logB N I/Os (misalignment)

G. S. Brodal: Cache Oblivious Searching and Sorting 14

Page 25: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Static Cache Oblivious Dictionary

Bk

A

B1

A B1 Bk· · ·

· · ·

h

dh/2e

bh/2c

· · ·

· · ·

· · ·· · ·

Recursive layout of binary tree

Searches use O(logB N) I/Os

≡ van Emde Boas layout

• Each green tree has height between (log2 B)/2 and log2 B

• Searches visit between logB N and 2 logB N green trees,i.e. perform at most 4 logB N I/Os (misalignment)

G. S. Brodal: Cache Oblivious Searching and Sorting 14

Page 26: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Static Cache Oblivious Dictionary

Bk

A

B1

A B1 Bk· · ·

· · ·

h

dh/2e

bh/2c

· · ·

· · ·

· · ·· · ·

· · · · · ·

· · ·· · ·

· · ·

· · ·· · ·

· · · · · ·

Recursive layout of binary tree

Searches use O(logB N) I/Os

≡ van Emde Boas layout

• Each green tree has height between (log2 B)/2 and log2 B

• Searches visit between logB N and 2 logB N green trees,i.e. perform at most 4 logB N I/Os (misalignment)

G. S. Brodal: Cache Oblivious Searching and Sorting 14

Page 27: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Static Cache Oblivious Dictionary

Bk

A

B1

A B1 Bk· · ·

· · ·

h

dh/2e

bh/2c

· · ·

· · ·

· · ·· · ·

· · · · · ·

· · ·· · ·

· · ·

· · ·· · ·

· · · · · ·

Recursive layout of binary tree Searches use O(logB N) I/Os≡ van Emde Boas layout

• Each green tree has height between (log2 B)/2 and log2 B

• Searches visit between logB N and 2 logB N green trees,i.e. perform at most 4 logB N I/Os (misalignment)

G. S. Brodal: Cache Oblivious Searching and Sorting 14

Page 28: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Example : Recursive Layout

16

6

3

2 5

12

10 15

24

19

18 23

29

28 31

16 6 24 3 2 5 12 10 15 19 17 23 29 28 31

Search(28)

G. S. Brodal: Cache Oblivious Searching and Sorting 15

Page 29: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Example : Recursive Layout

16

6

3

2 5

12

10 15

24

19

18 23

29

28 31

16 6 24 3 2 5 12 10 15 19 17 23 29 28 31

Search(28)

G. S. Brodal: Cache Oblivious Searching and Sorting 15

Page 30: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Dynamic Dictionaries

RAM model : Balanced binary search trees, e.g.AVL-trees and red-black trees

IO model : B-trees

Cache oblivious model : ?

G. S. Brodal: Cache Oblivious Searching and Sorting 16

Page 31: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Dynamic Cache Oblivious DictionariesBrodal and Fagerberg 2002

• Embed a dynamic height log2 N + O(1) tree in a complete tree

• Static van Emde Boas layout

6

4

1

3

5

8

7 11

10 13

⇓6 4 8 1 − 3 5 − − 7 − − 11 10 13

Search(10)

G. S. Brodal: Cache Oblivious Searching and Sorting 17

Page 32: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Dynamic Binary Trees of Small HeightItai, Konheim and Rodeh 1981

Andersson and Lai 19906

4

1

3

5

8

7 11

10 13

2 New

6

3

1

2

4

8

7 11

10 135

• If an insertion causes non-small height then rebuild subtreeat nearest ancestor with sufficient few descendents

• Insertions require amortized O(log2 N) time

G. S. Brodal: Cache Oblivious Searching and Sorting 18

Page 33: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Dynamic Cache Oblivious DictionariesBrodal and Fagerberg 2002

6

4

1

3

5

8

7 11

10 13

Search O(logB N)

Updates O(

logB N + log2 NB

)

• Updates can be improved to O(logB N) I/Os by buckets ofsize Θ(log2 N) and one level of indirection

G. S. Brodal: Cache Oblivious Searching and Sorting 19

Page 34: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Lower bounds

(Comparison) RAM model : log n comparisons(decision tree argument)

IO model : logB+1 N I/Os(reduction to RAM model)

Cache oblivious model : logB+1 N I/Os(follows from IO model)

log2 e · logB N ≈ 1.443 logB N I/OsBender et al. 2003

G. S. Brodal: Cache Oblivious Searching and Sorting 20

Page 35: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Outline of Talk

• Hardware

• Computational models

– RAM model (Random Access Machine)

– IO model

– Cache oblivious model

• Binary searching and dictionaries

• Sorting

• Priority queues

• Concluding remarks

G. S. Brodal: Cache Oblivious Searching and Sorting 21

Page 36: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Sorting

RAM model : Binary MergeSort takes O(N log2 N) time

IO model : Θ(

MB

)

-way MergeSort achieves optimal

O(Sort(N) = O(

NB

logM/BNB

)

I/OsAggarwal and Vitter 1988

M M

Partition into runs

Sort each run

Merge pass I

Merge pass II

· · ·

Run 1 Run 2 Run N/M

Sorted Sorted

SortedSorted

N

Sorted

Sorted ouput

Unsorted input

Cache oblivious : FunnelSort achieves O(Sort(N)) I/OsFrigo, Leiserson, Prokop and Ramachandran 1999

Brodal and Fagerberg 2002

G. S. Brodal: Cache Oblivious Searching and Sorting 22

Page 37: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

k-mergerFrigo et al., FOCS’99

Sorted output stream

M

· · ·

k sorted input streams

=Recursive def.

B1

· · ·

· · ·

· · ·

M1 M√

k

M0

B√

k← buffers of size k3/2

← k1/2-mergers

· · ·M0 M1B1 B√

kM√

kB2 M2

Recursive Layout

G. S. Brodal: Cache Oblivious Searching and Sorting 23

Page 38: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

k-mergerFrigo et al., FOCS’99

Sorted output stream

M

· · ·

k sorted input streams

=Recursive def.

B1

· · ·

· · ·

· · ·

M1 M√

k

M0

B√

k← buffers of size k3/2

← k1/2-mergers

· · ·M0 M1B1 B√

kM√

kB2 M2

Recursive Layout

G. S. Brodal: Cache Oblivious Searching and Sorting 23

Page 39: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

k-mergerFrigo et al., FOCS’99

Sorted output stream

M

· · ·

k sorted input streams

=Recursive def.

B1

· · ·

· · ·

· · ·

M1 M√

k

M0

B√

k← buffers of size k3/2

← k1/2-mergers

· · ·M0 M1B1 B√

kM√

kB2 M2

Recursive Layout

G. S. Brodal: Cache Oblivious Searching and Sorting 23

Page 40: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Lazy k-mergerBrodal and Fagerberg 2002

B1

· · ·

· · ·

· · ·

M1 M√

k

M0

B√

k →

Procedure Fill(v)

while out-buffer not fullif left in-buffer empty

Fill(left child)if right in-buffer empty

Fill(right child)perform one merge step

Lemma

If M ≥ B2 and output buffer has size

k3 then O(k3

B logM (k3) + k) I/Os are done

during an invocation of Fill(root).

G. S. Brodal: Cache Oblivious Searching and Sorting 24

Page 41: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Lazy k-mergerBrodal and Fagerberg 2002

B1

· · ·

· · ·

· · ·

M1 M√

k

M0

B√

k →

Procedure Fill(v)

while out-buffer not fullif left in-buffer empty

Fill(left child)if right in-buffer empty

Fill(right child)perform one merge step

Lemma

If M ≥ B2 and output buffer has size

k3 then O(k3

B logM (k3) + k) I/Os are done

during an invocation of Fill(root).

G. S. Brodal: Cache Oblivious Searching and Sorting 24

Page 42: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Lazy k-mergerBrodal and Fagerberg 2002

B1

· · ·

· · ·

· · ·

M1 M√

k

M0

B√

k →

Procedure Fill(v)

while out-buffer not fullif left in-buffer empty

Fill(left child)if right in-buffer empty

Fill(right child)perform one merge step

Lemma

If M ≥ B2 and output buffer has size

k3 then O(k3

B logM (k3) + k) I/Os are done

during an invocation of Fill(root).

G. S. Brodal: Cache Oblivious Searching and Sorting 24

Page 43: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

FunnelSort Brodal and Fagerberg 2002Frigo, Leiserson, Prokop and Ramachandran 1999

Divide input in N1/3 segments of size N2/3

Recursively MergeSort each segmentMerge sorted segments by an N 1/3-merger

k

N1/3

N2/9

N4/27

...

2

Theorem Provided M ≥ B2 (tall cache assumption), FunnelSortperforms optimal O(Sort(N)) I/Os

G. S. Brodal: Cache Oblivious Searching and Sorting 25

Page 44: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

FunnelSort Brodal and Fagerberg 2002Frigo, Leiserson, Prokop and Ramachandran 1999

Divide input in N1/3 segments of size N2/3

Recursively MergeSort each segmentMerge sorted segments by an N 1/3-merger

k

N1/3

N2/9

N4/27

...

2

Theorem Provided M ≥ B2 (tall cache assumption), FunnelSortperforms optimal O(Sort(N)) I/OsG. S. Brodal: Cache Oblivious Searching and Sorting 25

Page 45: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Computational GeometryBrodal and Fagerberg 2002

Cache oblivious O(Sort(N)) distribution sweeping algorithms for

• Maxima for point set (3D)

• Measure of a set of axis-parallel rectangles (2D)

• Visibility of non-intersecting line segments from a point (2D)

• All nearest neighbors for point set (2D)

Cache oblivious O(Sort(N) + outputB

) algorithms for

• Orthogonal line segment intersection reporting (2D)

• Batched orthogonal range queries on point set (2D)

• Pairwise intersections of axis-parallel rectangles (2D)

G. S. Brodal: Cache Oblivious Searching and Sorting 26

Page 46: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Outline of Talk

• Hardware

• Computational models

– RAM model (Random Access Machine)

– IO model

– Cache oblivious model

• Binary searching and dictionaries

• Sorting

• Priority queues

• Concluding remarks

G. S. Brodal: Cache Oblivious Searching and Sorting 27

Page 47: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Priority Queues

Insert(e) DeleteMin

Classic RAM:

• Heap: O(log2 n) time

, O(

log2

N

M

)

I/Os

Williams 1964

I/O model:

• Buffer tree: O(

1

BlogM/B

N

B

)

= O

(

Sort(N)

N

)

I/Os Arge 1995

G. S. Brodal: Cache Oblivious Searching and Sorting 28

Page 48: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Priority Queues

Insert(e) DeleteMin

Classic RAM:

• Heap: O(log2 n) time, O(

log2

N

M

)

I/Os Williams 1964

I/O model:

• Buffer tree: O(

1

BlogM/B

N

B

)

= O

(

Sort(N)

N

)

I/Os Arge 1995

G. S. Brodal: Cache Oblivious Searching and Sorting 28

Page 49: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Priority Queues

Insert(e) DeleteMin

Classic RAM:

• Heap: O(log2 n) time, O(

log2

N

M

)

I/Os Williams 1964

I/O model:

• Buffer tree: O(

1

BlogM/B

N

B

)

= O

(

Sort(N)

N

)

I/Os Arge 1995

G. S. Brodal: Cache Oblivious Searching and Sorting 28

Page 50: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Cache-Oblivious Priority Queues

• O(

1

BlogM/B

N

B

)

I/OsArge, Bender, Demaine,

Holland-Minkley and Munro2002

– Uses sorting and selection as subroutines

– Requires tall cache assumption, M ≥ B2

• Funnel heap Brodal and Fagerberg 2002

– Uses only binary merging

– Profile adaptive, i.e. O(

1

BlogM/B

Ni

B

)

I/Os

Ni is either the size profile, max depth profile, or#insertions during the lifetime of the ith inserted element

G. S. Brodal: Cache Oblivious Searching and Sorting 29

Page 51: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

The Priority Queue

�6

6

66

�6

6

66 6· ·

� � · · · �6

6

66 6· · ·

� � · · · �6

6

66 6· · · ·

�A1

B1

s1 s1

v1 Ai vi

Bi

ki

si si si

Link i

I

ki+1 ≈ k4/3i

si+1 ≈ s4/3i

ki ≈ s1/3i In total: A single binary merge tree

G. S. Brodal: Cache Oblivious Searching and Sorting 30

Page 52: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

The Priority Queue

�6

6

66

�6

6

66 6· ·

� � · · · �6

6

66 6· · ·

� � · · · �6

6

66 6· · · ·

�A1

B1

s1 s1

v1 Ai vi

Bi

ki

si si si

Link i

I

ki+1 ≈ k4/3i

si+1 ≈ s4/3i

ki ≈ s1/3i

In total: A single binary merge tree

G. S. Brodal: Cache Oblivious Searching and Sorting 30

Page 53: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

The Priority Queue

�6

6

66

�6

6

66 6· ·

� � · · · �6

6

66 6· · ·

� � · · · �6

6

66 6· · · ·

�A1

B1

s1 s1

v1 Ai vi

Bi

ki

si si si

Link i

I

ki+1 ≈ k4/3i

si+1 ≈ s4/3i

ki ≈ s1/3i In total: A single binary merge tree

G. S. Brodal: Cache Oblivious Searching and Sorting 30

Page 54: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Operations — DeleteMin

· ·

· · ·

· · ·

· · ·

· · · ·

A1 v1

I

• If A1 is empty, call Fill(v1)

• Search I and A1 for minimum element

G. S. Brodal: Cache Oblivious Searching and Sorting 31

Page 55: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Operations — Insert

· ·

· · ·

· · ·

· · ·

· · · ·

kiI

c1

c2

ci

• Insert in I

• If I overflows, call Sweep(i) for first i whereci ≤ ki

Sweep ≈ addition of one to number c1c2..ci..cmax

si = s1 +∑i−1

j=1 kjsj

G. S. Brodal: Cache Oblivious Searching and Sorting 32

Page 56: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Analysis

· ·

· · ·

· · ·

· · ·

· · · ·

We can prove:

• Number N of insertions performed: simax≤ N

• Number of I/Os per Insert for link i: O(

1B

logM/B si

)

• By the doubly-exponentially growth of si,the total number of I/Os per Insert is

O

(∞∑

k=0

1

BlogM/B N (3/4)k

)

= O

(

Sort(N)

N

)

• DeleteMin is amortized for free

G. S. Brodal: Cache Oblivious Searching and Sorting 33

Page 57: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Outline of Talk

• Hardware

• Computational models

– RAM model (Random Access Machine)

– IO model

– Cache oblivious model

• Binary searching and dictionaries

• Sorting

• Priority queues

• Concluding remarks

G. S. Brodal: Cache Oblivious Searching and Sorting 34

Page 58: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Some Cache-Oblivious Results

• Scanning ⇒ stack, queue, median finding,. . . .

• Sorting, matrix multiplication, FFTFrigo, Leiserson, Prokop, Ramachandran, FOCS’99

• Cache oblivious search trees Prokop 99Bender, Demaine, Farach-Colton, FOCS’00

Rahman, Cole, Raman, WAE’01Bender, Duan, Iacono, Wu and Brodal, Fagerberg, Jacob, SODA’02

• Priority queue and graph algorithmsArge, Bender, Demaine, Holland-Minkley, Munro, STOC’02

Brodal, Fagerberg, ISAAC’02

• Computational geometry Brodal, Fagerberg, ICALP’02

Bender, Cole, Raman, ICALP’02

• Scanning dynamic sets Bender, Cole, Demaine, Farach-Colton, ESA’02

G. S. Brodal: Cache Oblivious Searching and Sorting 35

Page 59: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Cache Oblivious Technics

• Scanning

• Sorting

• Recursion

• Recursive layout (van Emde Boas layout)

• Merging (FunnelSort, distribution sweeping, FunnelHeap)

G. S. Brodal: Cache Oblivious Searching and Sorting 36

Page 60: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Conclusions

• Cache oblivious model : Simpel and general

• Algorithms exist for many problems

– stacs, queues, dictionaries, priority queues, sorting,selection, permuting, matrix multiplicataion, FFT, graphalgorithms, computational geometry...

• Limitations

– searching costs a factor log2 e

– sorting and priority queues requires a tall cacheBrodal and Fagerberg 2003

G. S. Brodal: Cache Oblivious Searching and Sorting 37

Page 61: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

Open problems

• Other algorithms ...

• Cache obliviousness vs parallel disks ?

• Implementations and experiments ?

• Libraries ?

• ...

G. S. Brodal: Cache Oblivious Searching and Sorting 38

Page 62: Cache Oblivious Searching and Sorting - Aarhus …tildeweb.au.dk/au121/slides/itu03.pdfCache Oblivious Searching and Sorting Gerth St˝lting Brodal BRICS University of Aarhus Joint

References• The Cost of Cache-Oblivious Searching, Michael A. Bender, Gerth

Stølting Brodal, Rolf Fagerberg, Dongdong Ge, Simai He, Haodong Hu,John Iacono, and Alejandro López-Ortiz. Submitted.

• On the Limits of Cache-Obliviousness, Gerth Stølting Brodal andRolf Fagerberg. To appear in Proc. 35th Annual ACM Symposium onTheory of Computing, 2003.

• Funnel Heap - A Cache Oblivious Priority Queue, Gerth StøltingBrodal and Rolf Fagerberg. In Proc. 13th Annual InternationalSymposium on Algorithms and Computation, volume 2518 of LectureNotes in Computer Science, pages 219-228. Springer Verlag, Berlin,2002.

• Cache Oblivious Distribution Sweeping, Gerth Stølting Brodal andRolf Fagerberg. In Proc. 29th International Colloquium on Automata,Languages, and Programming, volume 2380 of Lecture Notes inComputer Science, pages 426-438. Springer Verlag, Berlin, 2002.

• Cache-Oblivious Search Trees via Trees of Small Height, GerthStølting Brodal, Rolf Fagerberg, and Riko Jacob. In Proc. 13th AnnualACM-SIAM Symposium on Discrete Algorithms, pages 39-48, 2002.

G. S. Brodal: Cache Oblivious Searching and Sorting 39