CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by...

34
Hash Tables CSE 373: Data Structures and Algorithms Thanks to Kasey Champion, Ben Jones, Adam Blank, Michael Lee, Evan McCarty, Robbie Weber, Whitaker Brand, Zora Fung, Stuart Reges, Justin Hsia, Ruth Anderson, and many others for sample slides and materials ... Autumn 2018 Shrirang (Shri) Mare [email protected]

Transcript of CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by...

Page 1: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Hash Tables

CSE 373: Data Structures and Algorithms

Thanks to Kasey Champion, Ben Jones, Adam Blank, Michael Lee, Evan McCarty, Robbie Weber, Whitaker Brand, Zora Fung, Stuart Reges, Justin Hsia, Ruth Anderson, and many others for sample slides and materials ...

Autumn 2018

Shrirang (Shri) [email protected]

Page 2: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

- Wrap up AVL Trees- Problem: Can we make get(k) operation on dictionaries fast: O(1)- Motivation- Hashing- Separate Chaining

Today

CSE 373 AU 18 – SHRI MARE 2

Page 3: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

AVL Trees: Four cases to considerInsert location Case Solution

Left subtree of left child of y (A) Left line case Single right rotation

Right subtree of left child of y (B) Left kink case Double (left-right) rotation

Left subtree of right child of y (C) Right kink case Double (right-left) rotation

Right subtree of right child of y (D) Right line case Single left rotation

x

y

z

A B C D CSE 373 AU 18 – SHRI MARE

Page 4: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

AVL Trees: Four cases to considerInsert location Case (also called as) Solution

Left subtree of left child of y (A) Left line case (case 1) Single right rotation

Right subtree of left child of y (B) Left kink case (case 2) Double (left-right) rotation

Left subtree of right child of y (C) Right kink case (case 3) Double (right-left) rotation

Right subtree of right child of y (D) Right line case (case 4) Single left rotation

x

y

z

A B C D CSE 373 AU 18 – SHRI MARE

Page 5: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

AVL Tree: Practice. Insert(6)

CSE 373 AU 18 – SHRI MARE 5

15

5

10

8 11

23

71

75

2011

9

Page 6: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

AVL Tree: Practice

CSE 373 AU 18 – SHRI MARE 6

15

5

10

8 11

23

71

75

2011

9

6

Unbalanced

Page 7: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

AVL Trees: Four cases to considerInsert location Case (also called as) Solution

Left subtree of left child of y (A) Left line case (case 1) Single right rotation

Right subtree of left child of y (B) Left kink case (case 2) Double (left-right) rotation

Left subtree of right child of y (C) Right kink case (case 3) Double (right-left) rotation

Right subtree of right child of y (D) Right line case (case 4) Single left rotation

x

y

z

A B C D CSE 373 AU 18 – SHRI MARE

Page 8: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

AVL Tree: Practice

CSE 373 AU 18 – SHRI MARE 8

15

5

10

8 11

23

71

75

2011

9

6

Unbalanced

xyz

A B C D

Page 9: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

AVL Tree: Practice

CSE 373 AU 18 – SHRI MARE 9

15

5

10

8 11

23

71

75

2011

9

6

15

510

823

71

75

20

1196

Unbalanced

xyz

A B C D

Page 10: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

6

4

52

8

Worksheet Q1

CSE 373 AU 18 – SHRI MARE 10

insert(1)

Case: Line or Kink?

Page 11: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Worksheet Q1

CSE 373 AU 18 – SHRI MARE 11

insert(1)6

4

52

8

1Case: Line or Kink?

Page 12: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Worksheet Q1

CSE 373 AU 18 – SHRI MARE 12

insert(1)6

4

52

8

insert(3)6

4

52

8

1Case: Line or Kink?

Case: Line or Kink?

Page 13: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Worksheet Q1

CSE 373 AU 18 – SHRI MARE 13

insert(1) insert(3)6

4

52

8

1

6

4

52

8

6

4

52

8

3Case: Line or Kink?

Case: Line or Kink?

Page 14: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

1. Do a BST insert – insert a node as you would in a BST.2. Check balance condition at each node in the path from the inserted node to the root.3. If balance condition is not true at a node, identify the case4. Do the corresponding rotation for the case

AVL Tree insertions

CSE 373 AU 18 – SHRI MARE 14

Page 15: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Draw the AVL tree that results from inserting the keys 1, 3, 7, 5, 6, 9 in that order into an initially empty AVL tree. (Hint: Drawing intermediate trees as you insert each key can help.)

Worksheet Q2

CSE 373 AU 18 – SHRI MARE 15

Page 16: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

How long does AVL insert take?AVL insert time = BST insert time + time it takes to rebalance the tree

= O(log n) + time it takes to rebalance the tree

How long does rebalancing take?-Assume we store in each node the height of its subtree.-How long to find an unbalanced node:

- Just go back up the tree from where we inserted.-How many rotations might we have to do?

- Just a single or double rotation on the lowest unbalanced node.

AVL insert time = O(log n)+ O(log n) + O(1) = O(log n)

Page 17: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

How long does AVL insert take?AVL insert time = BST insert time + time it takes to rebalance the tree

= O(log n) + time it takes to rebalance the tree

How long does rebalancing take?-Assume we store in each node the height of its subtree.-How long to find an unbalanced node:

- Just go back up the tree from where we inserted. ß O(log n)-How many rotations might we have to do?

- Just a single or double rotation on the lowest unbalanced node. ß O(1)

AVL insert time = O(log n)+ O(log n) + O(1) = O(log n)

Page 18: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Pros:- O(logn) worst case for find, insert, and delete operations.- Reliable running times than regular BSTs (because trees are balanced)

Cons:- Difficult to program & debug [but done once in a library!]- (Slightly) more space than BSTs to store node heights.

AVL wrap up

CSE 373 AU 18 – SHRI MARE 18

Page 19: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Lots of cool Self-Balancing BSTs out there!

Popular self-balancing BSTs include:AVL treeSplay tree2-3 treeAA treeRed-black treeScapegoat treeTreap

(From https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree#Implementations)

(Not covered in this class, but several are in the textbook and all of them are online!)

CSE 373 SU 17 – LILIAN DE GREEF

Page 20: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

- HW3 out. Due this Friday (10/26) at Noon (not at the usual time 11:59pm)- HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)

- Midterm coming up – Nov 2, 2:30-3:20pm, here in the class- If you can’t take the midterm on Nov 2, let me know ASAP.- Midterm practice material will be posted on the website tomorrow- Midterm review next Wednesday

Announcements

CSE 373 AU 18 – SHRI MARE 20

Page 21: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Hash tables

21CSE 373 AU 18 – SHRI MARE

Page 22: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

- data = (key, value)- operations: put(key, value); get(key); remove(key)

- O(n) with Arrays and Linked List- O(log n) with BST and AVL trees.

- Can we do better? Can we do this in O(1) ?

Revisiting Dictionaries

CSE 373 AU 18 – SHRI MARE 22

Page 23: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Why we are so obsessed with making dictionaries fast?

Dictionaries are extremely most common data structures.- Databases- Network router tables- Compilers and Interpreters- Faster than O(log n) search in certain cases- Data type in most high level programming languages

Motivation

CSE 373 AU 18 – SHRI MARE 23

Page 24: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

How would you implement at dictionary such that dictionary operations are O(1)? (Assume all keys are non-zero integers)

Question

CSE 373 AU 18 – SHRI MARE 24

Page 25: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

How would you implement at dictionary such that dictionary operations are O(1)? (Assume all keys are non-zero integers)

Idea: Create a giant array and use keys as indices

Question

CSE 373 AU 18 – SHRI MARE 25

Page 26: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

How would you implement at dictionary such that dictionary operations are O(1)? (Assume all keys are non-zero integers)

Idea: Create a giant array and use keys as indices

Problems?1. ?2. ?

Question

CSE 373 AU 18 – SHRI MARE 26

Page 27: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

How would you implement at dictionary such that dictionary operations are O(1)? (Assume all keys are non-zero integers)

Idea: Create a giant array and use keys as indices

Problems?1. Can only work with integer keys?2. Too much wasted space

Idea 2: Can we convert the key space into a smaller set that would take much less memory

Question

CSE 373 AU 18 – SHRI MARE 27

Page 28: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Solve problem: Too much wasted space

CSE 373 AU 18 – SHRI MARE 28

Page 29: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Review: Integer remainder with %The % operator computes the remainder from integer division.14 % 4 is 2

3 434 ) 14 5 ) 218

12 202 18

153

Applications of % operator:- Obtain last digit of a number: 230857 % 10 is 7- See whether a number is odd: 7 % 2 is 1, 42 % 2 is 0- Limit integers to specific range: 8 % 12 is 8, 18 % 12 is 6

CSE 142 SP 18 – BRETT WORTZMAN 29

218 % 5 is 3

Page 30: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Implement Direct Access Mappublic V get(int key) {

// input validationreturn this.array[key].value;

}

public void put(int key, V value) {this.array[key] = value;

}

public void remove(int key) {// input validationthis.array[key] = null;

}

CSE 373 WI 18 – MICHAEL LEE 30

Page 31: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Implement First Hash Functionpublic V get(int key) {

// input validationint newKey = key % this.array.length;return this.array[newkey].value;

}

public void put(int key, V value) {this.array[key % this.array.length] = value;

}public void remove(int key) {

// input validationint newKey = key % this.array.length;this.array[newKey] = null;

}

CSE 373 SP 18 - KASEY CHAMPION 31

Page 32: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

First Hash Function: % table sizeindices 0 1 2 3 4 5 6 7 8 9

elements

CSE 373 SP 18 - KASEY CHAMPION 32

put(0, “foo”);put(5, “bar”);put(11, “biz”)put(18, “bop”);put(20, “poo”); Collision!

“foo”

0 % 10 = 05 % 10 = 511 % 10 = 118 % 10 = 820 % 10 = 0

“bop”“bar”“biz”

“poo”

Page 33: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

√ Wrap up AVL Trees√ Problem: Can we make get(k) operation on dictionaries fast: O(1)√ Motivation√ Hashing- Separate Chaining

Today

CSE 373 AU 18 – SHRI MARE 33

Page 34: CSE 373: Data Structures and Algorithms · 2018. 10. 25. · -HW4 out later today (or latest by tomorrow morning). Due next Tuesday (10/30)-Midterm coming up –Nov 2, 2:30-3:20pm,

Hashing: Separate Chaining

CSE 373 AU 18 – SHRI MARE 34