BBM 201 DATA S - web.cs.hacettepe.edu.trbbm201/Fall2018/BBM201-Ders12.pdf · BBM 201 –DATA...

35
BBM 201 DATA STRUCTURES TRIES DEPT. OF COMPUTER ENGINEERING Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton University.

Transcript of BBM 201 DATA S - web.cs.hacettepe.edu.trbbm201/Fall2018/BBM201-Ders12.pdf · BBM 201 –DATA...

BBM 201 – DATA STRUCTURES

TRIES

DEPT. OF COMPUTER ENGINEERING

Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton University.

TODAY

‣ Tries

‣ R-way tries

‣ R-way tries

TRIES

Tries. [from retrieval, but pronounced "try"]

• Store characters in nodes (not keys).

• Each node has R children, one for each possible character.

• Store values in nodes corresponding to last characters in keys.

4

Tries

e

r

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

l

l

e

h

s

root

link to trie for all keys

that start with slink to trie for all keys

that start with she

value for she in node

corresponding to last

key character

key value

by 4

sea 6

sells 1

she 0

shells 3

shore 7

the 5

for now, we do not

draw null links

Follow links corresponding to each character in the key.

• Search hit: node where search ends has a non-null value.

• Search miss: reach a null link or node where search ends has null value.

5

Search in a trie

e

r

get("shells")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

ss

ll

ll

ee

hh

ss

return value associated

with last key character

(return 3)

3

Follow links corresponding to each character in the key.

• Search hit: node where search ends has a non-null value.

• Search miss: reach a null link or node where search ends has null value.

6

Search in a trie

e

r

get("she")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

l

l

ee

hh

ss

search may terminated

at an intermediate node

(return 0)

0

Follow links corresponding to each character in the key.

• Search hit: node where search ends has a non-null value.

• Search miss: reach a null link or node where search ends has null value.

7

Search in a trie

e

r

get("shell")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

ll

ll

ee

hh

ss

no value associated

with last key character

(return null)

Follow links corresponding to each character in the key.

• Search hit: node where search ends has a non-null value.

• Search miss: reach a null link or node where search ends has null value.

8

Search in a trie

e

r

get("shelter")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

l

ll

ee

hh

ss

no link to 't'

(return null)

Follow links corresponding to each character in the key.

• Encounter a null link: create new node.

• Encounter the last character of the key: set value in that node.

9

Insertion into a trie

e

r

put("shore", 7)

e

a l

l

s

e

l

s

b

y

l

o

h

e

t

hh

ss

7

50

3

1

6

4

Trie construction demo

10

trie

e

Trie construction demo

11

put("she", 0)

h

s

0

value is in node

corresponding to

last character

key is sequence

of characters from

root to value

e

Trie construction demo

12

she

trie

h

s

0

h

e

Trie construction demo

13

she

trie

s

0

h

e

Trie construction demo

14

she

put("sells", 1)

s

l

l

e

ss

1

0

h

e

Trie construction demo

15

she

sells

trie

l

l

s

s

e

1

0

h

e

Trie construction demo

16

she

sells

trie

l

l

s

s

e

0

1

h

ea

Trie construction demo

17

she

sells

put("sea", 2)

l

l

s

ee

ss

2

1

0

h

ea

Trie construction demo

18

she

sells

sea

trie

l

l

s

s

e

1

2 0

a

Trie construction demo

19

she

sells

sea

put("shells", 3)

l

l

s

e

s

l

l

ee

hh

ss

3

1

2 0

a

Trie construction demo

20

she

sells

sea

trie

l

l

s

l

s

s

l

he

e

3

1

2 0

y

b

a

Trie construction demo

21

she

sells

sea

put("by", 4)

l

l

s

l

s

s

l

he

e

4

3

1

2 0

b

y

a

Trie construction demo

22

she

sells

sea

by

trie

l

l

s

l

s

s

l

he

e

3

1

2

4

0

b

y

a

Trie construction demo

23

she

sells

sea

by

put("the", 5)

l

l

s

l

s

s

l

he

e e

h

t

5

3

1

2

4

0

a

Trie construction demo

24

she

sells

sea

by

the

trie

e

l

l

s

l

s

b

y

s

l

h h

t

e e 5

3

1

2

4

0

2a

Trie construction demo

25

put("sea", 6)

l

l

s

e

l

s

b

y

l

h h

e

t

a

ee

ss

6

overwrite

old value with

new value

5

3

1

4

0

Trie construction demo

26

trie

e

a l

l

s

e

l

s

b

y

s

l

h h

e

t

5

3

1

6

4

0

Trie construction demo

27

trie

e

a l

l

s

e

l

s

b

y

s

l

h h

e

t

3

1

6

4

0 5

e

r

Trie construction demo

28

she

sells

sea

by

the

put("shore", 7)

e

a l

l

s

e

l

s

b

y

l

o

h

e

t

hh

ss

7

5

3

1

6

4

0

Trie construction demo

29

she

sells

sea

by

the

shore

trie

e

a l

l

s

e

l

s

b

y

s

l

o

r

e

h

t

h

e 5

7

3

1

6

4

0

30

Trie representation: implementation

Node. A value, plus references to R nodes.

struct Node

{

int value;

Node * next[R];

}

A child node for each character in Alphabet.

No need to search for character, but a

pointer reserved for each character in

memory

#define R 256

Node * root;

put(&root, key, val, 0);

void put(Node ** x, char *key, int val, int d)

{

if (*x == null) *x = getNode();

if (d ==strlen(key)) { *x->value = val; return;}

char c = key[d];

put(&(x->next[c]), key, val, d+1);

}

31

R-way trie: implementation

extended ASCII

Node * getNode(){

Node * pNode = NULL;

pNode = (Node *)malloc(sizeof(Node));

if (pNode){

for (int i = 0; i < R; i++)

pNode->next[i] = NULL;

}

return pNode;

}

32

R-way trie: implementation (continued)

int get(Node * x, char * key, int d)

{

if (x == null) return -1; //-1 refers no match

if (d == strlen(key)) return x->value;

char c = key[d];

return get(x->next[c], key, d+1);

}

}

33

R-way trie: implementation (continued)

Trie performance

Search hit. Need to examine all L characters for equality.

Search miss.

• Could have mismatch on first character.

• Typical case: examine only a few characters (sublinear).

Space. R null links at each leaf.

(but sublinear space possible if many short strings share common

prefixes)

Bottom line. Fast search hit and even faster search miss, but

wastes space.

34

35

String symbol table implementations cost summary

N = number of entries, L= key length,

R= alphabet size, w= average key length

R-way trie.

• Method of choice for small R.

• Too much memory for large R.

Challenge. Use less memory, e.g., 65,536-way trie for Unicode!

character accesses (typical case)

implementation searchhit Search miss insert

space

(references)

hashing

(separate chaining)

N N 1 N

R-way trie L log R N L RNw