LOUDS: Succinct Data Structure

18

Click here to load reader

Transcript of LOUDS: Succinct Data Structure

Page 1: LOUDS: Succinct Data Structure

LOUDS: Succinct Tree Data StructureYoh Okuno

Page 2: LOUDS: Succinct Data Structure

LOUDS (Level Order Unary Degree Sequence)

● LOUDS is succinct unlabeled static tree● Succinct means theoretically smallest● Can be extended to labeled tree easily● Can not add new nodes or delete nodes● Applications in Japanese IME [Kudo+ 2011]

[Jacobson, 1989]

Page 3: LOUDS: Succinct Data Structure

The most straightforward way to express treeSpace usage is 2 N log2 N bits (N: # of nodes)

struct Node {Node *first_child;Node *next_sibling;

};

pointer representation:448 bits = 2 * 7 * 32 bits

Page 4: LOUDS: Succinct Data Structure

Problem

Page 5: LOUDS: Succinct Data Structure

LOUDS: Succinct Tree Representation

Use bit vector instead of pointersSpace consumption is 2 N +1 bits

10

101011101010000

only 15 bits!(32x smaller)

101110

10 10

Page 6: LOUDS: Succinct Data Structure

Problem

Page 7: LOUDS: Succinct Data Structure

Navigational operations

index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

bit 1 0 1 0 1 1 1 0 1 0 1 0 0 0 0

● getRoot() = 0● isNull(index) = bit[index]● nextSibling(index) = index + 1● firstChild(index): rest of slide

Using index in bit vector to represent node..

101110

10 10

10

Red background means actually stored values

Page 8: LOUDS: Succinct Data Structure

Problem

Page 9: LOUDS: Succinct Data Structure

0

2

48

63

79

512

13 14

1

1110

Introducing External Nodes

101011101010000

Add virtual nodes to represent 0s in bit vector

Page 10: LOUDS: Succinct Data Structure

0

2

48

63

79

512

13 14

1

1110

0

1

25

463

rank1

Convert Index to Internal IDrank1: convert index to internal node ID

Page 11: LOUDS: Succinct Data Structure

36 7

4 52

1

0

Getting Child NodeFact: N-th internal node’s last child is (N+1)-th external node

0

1

25

463

+1

Page 12: LOUDS: Succinct Data Structure

0

2

48

63

79

512

13 14

1

1110

Convert External ID to Indexselect0: convert external node ID to index

36 7

4 52

1

0select0

Page 13: LOUDS: Succinct Data Structure

1. rank1(index): number of 1s before index2. select0(n): index of n-th 0 in bit vector3. firstChild(index) = select0(rank(index)) + 1

How to find first child?

1. 2. 3.

Page 14: LOUDS: Succinct Data Structure

rank1(index) = block[index / 4] + table[index % 4]● block: first element of fixed size block● table: relative value to first element (precomputed)

Implementing rank1(index)

index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

bit 1 0 1 0 1 1 1 0 1 0 1 0 0 0 0

rank 1 1 2 2 3 4 5 5 6 6 7 7 7 7 7

block 1 3 6 7

index 0 1 2 3 4 5 ... 15

table 0 1 1 2 1 2 ... 4

Page 15: LOUDS: Succinct Data Structure

Implementing select0(n)Apply binary search to rank0● select0(n): index of n-th 0 = inverse of rank0● rank0(index) = index - rank(index) + 1● speed up by block-aware binary search

index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

bit 1 0 1 0 1 1 1 0 1 0 1 0 0 0 0

rank 1 1 2 2 3 4 5 5 6 6 7 7 7 7 7

rank0 0 1 1 2 2 2 2 3 3 4 4 5 6 7 8

select0 1 3 7 9 11 12 13 14

Page 16: LOUDS: Succinct Data Structure

Performance comparison

Data Structure Size Time

text 917KB N/A

Pointer-based 2.9MB 10s

LOUDS (char) 390KB 15s

LOUDS (marisa) 255KB 6s

Data source: /usr/share/dict/words in Ubuntu LinuxData size: 99,171 wordsQuery: shuffled words * 100

Page 17: LOUDS: Succinct Data Structure

Conclusion● Pointers are space confusing!● LOUDS is succinct tree representation● Rank and select are key components● LOUDS is 1/7 in size, 150% time than

pointer● How can marisa so fast and small?

Page 18: LOUDS: Succinct Data Structure

References● Space efficient static trees and graphs,

Jacobson, FOCS 1989.● Practical implementation of rank and select

queries, Gonzalez, WEA 2005.● Efficient dictionary and language model

compression for input method editors, Kudo, WTIM 2011.