2001-05-02 Ideas on Treaps

download 2001-05-02 Ideas on Treaps

of 45

Transcript of 2001-05-02 Ideas on Treaps

  • 7/30/2019 2001-05-02 Ideas on Treaps

    1/45

    Ideas on Treaps

    Maverick Woo

    U,2P,8K,7

    W,6S,12M,14E,9

    T,17H,20

    N,33

    Z,4

  • 7/30/2019 2001-05-02 Ideas on Treaps

    2/45

    May 2, 2001 2

    Disclaimer

    Articles of interest

    Raimund Seidel and Cecilia R. Aragon.

    Randomized search trees.Algorithmica 16 (1996), 464-497.

    Guy E. Blelloch and Margaret Reid-Miller.Fast Set Operations Using Treaps.

    In Proc. 10th Annual ACM SPAA, 1998.Of course this is joint work with Guy.

    Hopefully Daniel will also show up.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    3/45

    May 2, 2001 3

    Background

    Very high level talk

    No analysis

    To make this a technical talk

    Some background

    Splay Trees (zig, zig zig, zig zig zig)

    Treaps, if you still remember

  • 7/30/2019 2001-05-02 Ideas on Treaps

    4/45

    May 2, 2001 4

    Agenda

    Data structure research overview

    Treaps refresher

    Some current issues on Treaps

  • 7/30/2019 2001-05-02 Ideas on Treaps

    5/45

    May 2, 2001 5

    Data Structure Research

    I am not qualified to say yet, but I dohave some feelings about it.

    Not that many high-level problems.Representing a set/ordering

    Support some operations

    Some say its all about applications.Applications dont have to very specific.

    But need to be specific enough---we canmake assumptions.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    6/45

    May 2, 2001 6

    What Operations?

    Basic

    Insert, Membership

    IntermediateDelete (e.g. Binomial vs. Fibonacci Heaps)

    Disjoint-Union (e.g. Union-Find)

    Higher LevelUnion, Intersection, Difference

    Finger Search

  • 7/30/2019 2001-05-02 Ideas on Treaps

    7/45

    May 2, 2001 7

    Behavior Restrictions

    PersistenceFunctional

    More later

    Architecture IndependenceRelatively new, a.k.a. Cache-oblivious

    Runs efficiently on hierarchical memory

    Avoid memory-specific parameterizationForget data block size, cache line width etc.

    Not my theme today

  • 7/30/2019 2001-05-02 Ideas on Treaps

    8/45

    May 2, 2001 8

    Why Persistence?

    Many reasons for persistence Its practical with good garbage collectors.

    Functional programming makes everyoneslife easier.For the theoretician

    You dont need to worry about side effects.

    Better analysis possible: NESL

    For the programmerYou dont need to worry about side effects.

    Less memory leak, less dangling pointers

  • 7/30/2019 2001-05-02 Ideas on Treaps

    9/45

    May 2, 2001 9

    Real-life example 1

    You are have operations working onmultiple-instances.

    You index the web.You build your indices with your cool data

    structures.

    Conjunction query (AND) is intersection.

    You do the intersection on two indices.

    Now one of the indices can get corrupted.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    10/45

    May 2, 2001 10

    Real-life example 2

    You are rich.Once upon a time, in a dot-com far away

    You run a multi-processor machine.

    You learned that Splay Trees are cool.

    You even learned how to write multi-threaded programs.

    Thread1 searches forx on SplayInstance42.Thread2 searches fory on SplayInstance42.

    Real-world situation: search engines

  • 7/30/2019 2001-05-02 Ideas on Treaps

    11/45

    May 2, 2001 11

    Data Structure vs. Hacking

    Examples

    To learn more about Splay Trees

    Dial (412)-HACKERS.Ask for Danny Sleator

    OK, real example

    (Persistent) FIFO Queues

    Operations IsEmpty(Q), Enqueue(Q,x), Dequeue(Q)

    Need to grow, lets use Linked List

  • 7/30/2019 2001-05-02 Ideas on Treaps

    12/45

    May 2, 2001 12

    FIFO Queues

    Linked List is bad though

    Transverse to tail takes linear time.

    Either Enqueue or Dequeue is going to be linear time.

    How about doubly-ended queues (deques)? With that much extra space, may be faster with a tree.

    If one is not good enough, use two.

    Suppose queue is x1

    x2

    xi

    yi+1

    yi+2

    yn

    .

    Represent as [x1x2xi],[yn,yn-1yi+1].

    You can figure out the details yourself.

    In the end, isnt thisjust a hack?

  • 7/30/2019 2001-05-02 Ideas on Treaps

    13/45

    May 2, 2001 13

    Agenda

    Data structure research overview

    Treaps refresher

    Some current issues on Treaps

  • 7/30/2019 2001-05-02 Ideas on Treaps

    14/45

    May 2, 2001 14

    Treaps Refresher

    A Treap is a recursive data structure. datatype 'a Treap =

    E | T of priority * 'a Treap * 'a * 'a Treap

    Each node has a key and a priority.Assume all unique

    Arrange key in in-order, priority in heap-order

    Priority is chosen uniformly at random.

    8-way independence suffices for the analysis Can be computed with hash functions

    Dont need to store the priority

    A keys priority can be made consistent across runs

  • 7/30/2019 2001-05-02 Ideas on Treaps

    15/45

    May 2, 2001 15

    Treap Operations

    Membership

    As in binary search trees

    Insert

    Add as leaf by key (in-order)

    Rotate up by priority (heap-order)

    Delete

    Reverse what insert doesFind-min, etc.

    Walk on the left spine, etc.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    16/45

    May 2, 2001 16

    Treap Split

    Want top-down split (its faster)

    (less, x, gtr) = Split(root, k)

    If (root.k > k) // want to split left subtree

    Let (l1, m, r1) = Split(root.left, k)

    (l1, m, T(root.p, r1, root.k, root.right))

    If (root.k < k) // want to split right subtree

    Let (l1, m, r1) = Split(root.right, k)

    (T(root.p, root.left, root.k, l1), m, r1)

    Else

    (root.left, root.k, root.right)

  • 7/30/2019 2001-05-02 Ideas on Treaps

    17/45

    May 2, 2001 17

    Treap Split Example

    Before After

    U,2P,8K,7

    W,6S,12M,14E,9

    T,17H,20

    N,33

    Z,4

    U,2

    P,8K,7

    W,6

    S,12M,14E,9

    T,17H,20

    N,33

    Z,4

    less gtrSplit(Tr,V)

  • 7/30/2019 2001-05-02 Ideas on Treaps

    18/45

    May 2, 2001 18

    Treap Split Persistence

    These figures are deceptive.

    Only 4 new nodes created

    All on the search path to V

    U,2P,8K,7

    W,6S,12M,14E,9

    T,17H,20

    N,33

    Z,4

    U,2

    P,8K,7

    W,6

    S,12M,14E,9

    T,17H,20

    N,33

    Z,4

    less gtr

  • 7/30/2019 2001-05-02 Ideas on Treaps

    19/45

    May 2, 2001 19

    Treap Join

    Join(less, gtr) // less < x < gtr

    Handle empty less or gtr

    If (less.p > gtr.p)T(less.p, less.left, less.k, Join(less.right, gtr))

    Else

    T(gtr.p, Join(less, gtr.left), gtr.k, gtr.right)

  • 7/30/2019 2001-05-02 Ideas on Treaps

    20/45

    May 2, 2001 20

    Treap Join Example

    After Before

    U,2P,8K,7

    W,6S,12M,14E,9

    T,17H,20

    N,33

    Z,4

    U,2

    P,8K,7

    W,6

    S,12M,14E,9

    T,17H,20

    N,33

    Z,4

    less gtrJoin(less,gtr)

  • 7/30/2019 2001-05-02 Ideas on Treaps

    21/45

    May 2, 2001 21

    Treap Running Time

    All expected O(lg n)

    Also of note is Finger Search

    Given a finger in a treapFind the key that is d away in sorted order

    Expected O(lg d) time

    Require parent pointersEvil Waste so much space

    See Seidel and Aragon for details.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    22/45

    May 2, 2001 22

    Treap Union

    Treaps really shine in set operations.

    Union(a,b)

    Suppose roots are (k1,p1), (k2,p2)WLOG assume p1 > p2.

    Let (less,x,gtr) = Split(b,k1).

    T(p1, Union(a.left, less), k1,Union(a.right, gtr))

  • 7/30/2019 2001-05-02 Ideas on Treaps

    23/45

    May 2, 2001 23

    Treap Intersection

    Inter(a,b)

    Suppose roots are (k1,p1), (k2,p2); p1>p2

    Let (less,x,gtr) = Split(b,k1) If x is null // k1 is not in b, sorry dude

    Join(Inter(a.left, less), Inter(a.right, gtr))

    Else

    T(p1, Inter(a.left, less), k1, Inter(a.right, gtr))

  • 7/30/2019 2001-05-02 Ideas on Treaps

    24/45

    May 2, 2001 24

    Treap Difference

    Similar to intersection

    Change the logic a bit

    Messier because it is not symmetricLeave as an exercise to the reader.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    25/45

    May 2, 2001 25

    Points of Note

    Persistence

    Did you see a side effect? (assignments?)

    ParallelizationParallelize without persistence is a pain.

    Very natural divide-and-conqueror

    Run the two recursive calls on different CPUs

    Running times

  • 7/30/2019 2001-05-02 Ideas on Treaps

    26/45

    May 2, 2001 26

    Set Operation Running Time

    For two sets of size m and n (m n)

    Optimal is

    (m lg (n/m))

    Whats known before this work With AVL Trees, O(m lg(n/m))

    Rather complicated algorithms For the sake of your smooth digestion

    Compare this to O(m+n) or O(m lg n)

    With Treaps Can use Finger Search if we have parent pointers

    Does not parallelize---multiple fingers???

  • 7/30/2019 2001-05-02 Ideas on Treaps

    27/45

    May 2, 2001 27

    Set Operation Running Time

    Whats known after this work

    No parent pointers

    Parallelize naturally

    Optimal expected running time O(m lg (n/m))

    Analysis available in Blelloch and Miller

    Relatively simple algorithm

    Experimental results 6.3-6.8 speedup on 8-processor SGI machine

    4.1-4.4 speedup on 5-processor Sun machine

  • 7/30/2019 2001-05-02 Ideas on Treaps

    28/45

    May 2, 2001 28

    Agenda

    Data structure research overview

    Treaps refresher

    Some current issues on Treaps

  • 7/30/2019 2001-05-02 Ideas on Treaps

    29/45

    May 2, 2001 29

    A Word on Splay Trees

    Splay Trees are slow in practice!

    Even a single simple search would requireO(lg n)pointer updates!

    Skip Lists are way simpler and faster.

    Lets switch all Splay Trees to Skip Lists.

    Danny???

  • 7/30/2019 2001-05-02 Ideas on Treaps

    30/45

    May 2, 2001 30

    Bruce said

    First find Danny.Ditch Splay Trees---say they are slow.

    Then praise Skip Lists.

    Danny will refute by quoting experimental studies.

    Splay Trees are not much slower than Skip List

    in practice.ask whos my advisor.

    I wonder if that works. So I tried.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    31/45

    May 2, 2001 31

    Current Issues on Treaps

    Treaps are simpler than Splay Trees

    No famous conjecture for my back pocket

    Neat idea from Adam Kalai

    Not self-adjusting

    Access introduces more explicit changes

    Adding data compression to Treaps

    Finger search on Treaps

    Work by Guy + Daniel Blandford

  • 7/30/2019 2001-05-02 Ideas on Treaps

    32/45

    May 2, 2001 32

    Adding Compression to Treaps

    Search engines

    Infrequent offline update (once a month)

    Frequent online query and set operationsKeys are unique.

    Keys can be huge and occurs sparsely.

    Lets compress the keys!Assume they are 64-bit integers.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    33/45

    May 2, 2001 33

    Weve got a problem!

    I dont know how to deploy datacompression to general data structures.

    Begin with the simplest---Array

    The nave approachCompress the whole array

    When need to access an element

    decompress the whole arraydo the access

    compress the whole array again

  • 7/30/2019 2001-05-02 Ideas on Treaps

    34/45

    May 2, 2001 34

    Isnt that dumb?

    Any suggestions?

    Use chunking

    Divide the array into blocks of size C.Compress each block individually.

    Now we are back to constant time!

    Shh!!! That could be a trade secret.Of course they use something better than

    vanilla array.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    35/45

    May 2, 2001 35

    Chunking a Treap

    A sub-tree is a chunk.

    Desire consistent chunk size

    But Treaps are usually not full.Need better chunking rules

    Chunks

    Cant be too big---hurt running timeCant be too small---hurt compression

    (space)

  • 7/30/2019 2001-05-02 Ideas on Treaps

    36/45

    May 2, 2001 36

    Vocab

    Internal node and Leaf block

    More precisely datatype

    tblock =

    Packed of int * key * key * key vector

    | UnPacked of int * int * key vector

    datatype

    trearray =

    TE

    | TB of tblock

    | TN of trearray * key * trearray

    All running time are in expected case.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    37/45

    May 2, 2001 37

    Idea 1 Thresholds

    Priority is in the range 1 to maxP

    Invent a threshold Pth

    e.g. maxP - log(maxP)For n=(p,k)

    Ifp > Pth, then n is an internal node.

    Otherwise, n is in some leaf block.Trick done when a key is inserted.

    Also maintained by various operations.

  • 7/30/2019 2001-05-02 Ideas on Treaps

    38/45

    May 2, 2001 38

    Idea 1 Features

    On average, constant ratio betweeninternal keys to keys in block.

    With Pth = maxP - log(maxP), N keys log N internal nodes

    Height is log log N.

    O(log N)bottom node, each w/ a blockExpect (N-log N) / O(log N) keys / block

    Binary search in block takes O(log N)

  • 7/30/2019 2001-05-02 Ideas on Treaps

    39/45

    May 2, 2001 39

    Idea 1 Running Time

    Query is still O(log n).

    Insert is also O(log n).

    Join, Split both take O(log n).Set operations rely on Join and SplitsO(log n) running time.

    Looking good

  • 7/30/2019 2001-05-02 Ideas on Treaps

    40/45

    May 2, 2001 40

    Idea 1 Problems

    Asymptotic bound

    Need to work out the constants

    Exact analysis in progress I now think of Knuth even higher

    SML implementation

    Make the idea as concrete as code

    Can now do more experiments

  • 7/30/2019 2001-05-02 Ideas on Treaps

    41/45

    May 2, 2001 41

    Idea 1 Questions

    Do we really need to maintainconsistent priority across runs?

    Make things simpler

    But Union looks suspicious

    What compression algorithm to use?

    No general data compressionTake advantage of index distribution

  • 7/30/2019 2001-05-02 Ideas on Treaps

    42/45

    May 2, 2001 42

    Idea 2 Small Blocks

    Want a more-or-less constant block size

    Small blocks are more realistic

    Say 20Processor specific---fit cache line size

    How well can we compress 20 integers?

    Leave for second stage investigation

  • 7/30/2019 2001-05-02 Ideas on Treaps

    43/45

    May 2, 2001 43

    Perhaps I can share

    Writing down algorithm as code helps

    Pseudo code are good for short algorithms

    Real code is more concrete.Good for sloppy people like me.

    Actual SML code

    You can figure out you missed some cases.

    Now if SML has a debugger

    Space time tradeoff is very real

    http://www.cs.cmu.edu/~maverick/Trearray.htmlhttp://www.cs.cmu.edu/~maverick/Trearray.html
  • 7/30/2019 2001-05-02 Ideas on Treaps

    44/45

    May 2, 2001 44

    Treap Finger Search

    Daniel is working on it.

    No parent pointers needed

    Can mimic parent pointers by reversingroot-to-(last accessed leaf) path

    Should probably leave this to him

  • 7/30/2019 2001-05-02 Ideas on Treaps

    45/45

    Q&A / Suggestions

    Work in progress, welcome suggestions

    Danny, dont kick me too hard