Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner

24
Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner

description

Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner. Python Dictionaries. Same idea as Java/C++ hash map, C# dictionary, Perl hash, … H ow is this “magic” implemented?. d = { “snow” : 7 , “apple” : 3 , “white” : 20 } d[ “white” ] = 30 d[ “prince” ] = 16 - PowerPoint PPT Presentation

Transcript of Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner

Page 1: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

Design & Analysis of Algorithms COMP 482 / ELEC 420

John Greiner

Page 2: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

2

Python Dictionaries

Same idea as Java/C++ hash map, C# dictionary, Perl hash, …

How is this “magic” implemented?

d = {“snow” : 7, “apple” : 3, “white” : 20}d[“white”] = 30d[“prince”] = 16d[“apple”]

Page 3: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

3

Combination of Ideas

• Hash table & hash function

• Dynamic table & amortization

To do:[CLRS] 11,17#4

Page 4: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

Hash Tables & Hash Functions

4

“snow” hash 3

“apple” hash 7

“prince” hash 3

“white” hash 2

d = {“snow” : 7, “apple” : 3, “white” : 20}d[“white”] = 29d[“prince”] = 16d[“apple”]

0

9

“white”:29

“apple”:3

“snow”:7, “prince”:16

1

3

2

4

5

6

7

8

Chains:

Page 5: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

5

Access Time Depends on Chain Length

? What property do we want of our hash function? ?

? How long is each chain? ?

? How much time per access? ?

Page 6: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

6

Creating a Good Hash Function is Difficult

Generally, just use those in libraries.

key.__hash__() % table_size

class string: def __hash__(self): if not self: return 0 # empty value = ord(self[0]) << 7 for char in self: value = c_mul(1000003, value) ^ ord(char) value = value ^ len(self) if value == -1: # reserved error code value = -2 return value

Page 7: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

7

Dynamic Table Motivation

Typically, don’t know how much data we’ll have.– Want underlying hash table to grow, so average chain size

is bounded.– Want to retain constant-time indexing.

Focus on the latter goal first.– We’ll have to do a little extra to combine hash tables &

dynamic tables.

Page 8: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

8

Adding Data when Dynamic Table is Full

Must be contiguous for constant-time indexing.

datadatadatadatadatadata

datadatadatadatadatadata

Page 9: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

9

Adding Data when Dynamic Table is Full

What’s wrong with just using space at end of array?

datadatadatadatadatadata

datadatadatadatadatadata

other dataother data

That memory might already be in use.

Page 10: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

10

Adding Data when Dynamic Table is Full

So, grab needed space elsewhere & copy everything.

datadatadatadatadatadata

datadatadatadatadatadata

copy

Double the space.

Page 11: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

11

Cost of a Series of Operations

Initially: Table size = 5, table empty

Page 12: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

12

Cost of a Series of Operations

Add 5 data items, cost = 512345

Page 13: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

13

Cost of a Series of Operations

Add 5 more data items, cost = 1012345

copy

123456789

10

Page 14: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

14

Cost of a Series of Operations

Add 5 more data items, cost = 15

copy

123456789

10

123456789101112131415

Page 15: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

15

Cost of a Series of Operations

Add 5 more data items, cost = 5123456789

1011121314151617181920

Total costs:

Copying = 15Adding = 20

Total = 35

Page 16: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

16

Cost of Copying is Proportional to Adding

Each copy costs2 × #adds since previous copy.

copy

123456789

10

123456789

10

Use expensive ops sufficiently infrequently.

Buy Add 1Get Copy 2 free!

Page 17: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

17

Amortized Cost

Dynamic tables: O(1) amortized time to add data

Cost of series of n operations = m

Each operation has amortized cost = m/n

Page 18: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

18

Dynamic Hash Tablesd = {“snow” : 7, “apple” : 3, “white” : 20}d[“white”] = 29d[“prince”] = 16d[“apple”]

“snow” hash 3

“apple” hash 7

“prince” hash 3

“white” hash 2

“white”:29

“apple”:3

“snow”:7, “prince”:16

Must rehash all data.

Hash table “full enough”.

Load factor = #items/size

“snow” hash 3

“apple” hash 2

“prince” hash 1

“white” hash 2“apple”:3,“white”:29“snow”:7

“prince”:16

Page 19: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

19

Hash Table & Dynamic Table Odds & Ends

Hash table size: prime or power-of-2?Chaining vs. open addressingDynamic table expansion factorDynamic table contractionPython’s dictionary/hash table implementation

Hash tables: Dynamic tables:

Pythondict. & set

Python list…

Memory heaps for GC

When want static memory size.

Page 20: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

20

Multipop

3 ops:

Push(S,x) Pop(S) Multi-pop(S,k)

Worst-case cost: O(1) O(1)

O(min(|S|,k)= O(n)

Amortized cost: ?

Page 21: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

21

Incrementing Binary CounterCounter Bits changed

in increment0 11 2

10 111 3

100 1101 2110 1111 4

1000 11001 21010 11011 31100 11101 21110 11111 5

Bit position

# times bit changes

0 161 82 43 2

≤ ∑𝑖=0

⌈ lg𝑛 ⌉

⌈ 𝑛2 𝑖⌉=𝑂 (? )

Another way to sum the costs?

Page 22: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

22

Amortized Analysis Approaches

Aggregate:1. For all sequences of m ops, find maximum sum of actual

costs.Potentially difficult to know actual costs or bound well.

2. Result is sum/m.

Sufficient for many commonly-used examples.

Page 23: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

23

Amortized Analysis Approaches

Accounting:1. Compute actual costs ci of each kind of op.2. Define accounting costs ĉi of each kind of op, such that

can assign credits ĉi-ci consistently to data elts.Can “overpay” on some ops, and use credits to “underpay” on other ops later.Poor definitions lead to loose bounds.

3. O(max acct. cost.)

Page 24: Design & Analysis of Algorithms  COMP 482 / ELEC 420 John Greiner

24

Amortized Analysis ApproachesPotential:1. Define potential function Φ(D), such that Φ(Di)≥ Φ(D0).

Essentially assigns “credits” to data structure, rather than operations.Poor definition leads to loose bounds.

2. Calculate accounting costs ĉi=ci+ΔΦ of each kind of op.3. O(max acct. cost.)

Complicated approach necessary for more complicated data structures.