Lecture 17 - University of California, San...

Post on 29-Jun-2020

1 views 0 download

Transcript of Lecture 17 - University of California, San...

Page 1 of 31

CSE 100, UCSD: LEC 17

Lecture 17

✔ Separate chaining

✔ Dictionary data types

✔ Hashtables vs. balanced search trees

✔ A hashtable implementation: java.util.Hashtable

✔ Object serialization in Java

Reading: Weiss, Ch 5; and JDK source code

Page 2 of 31

Final exam

ns of the textbook, and

CSE 100, UCSD: LEC 17

✔ Final exam time: Tue Mar 17 11:30am-2:00pm

✔ Location: CSB 002

✔ Closed book, closed notes, no calculators...

✔ Bring something to write with, and picture ID

✔ Practice final is on line

✔ Final exam discussion topic on Webboard

✔ Exam review sessions will be:

✗ last lecture

✔ Coverage: All lectures, all assignments, corresponding sectiohandouts

Page 3 of 31

Open addressing vs. separate chaining

f the keys are kept as

head of a linked list

need to probe other table here.

r this; but linked lists

CSE 100, UCSD: LEC 17

✔ Linear probing, double and random hashing are appropriate ientries in the hashtable itself...

✗ doing that is called "open addressing"

✗ it is also called "closed hashing"

✔ Another idea: Entries in the hashtable are just pointers to the (“chain”); elements of the linked list contain the keys...

✗ this is called "separate chaining"

✗ it is also called "open hashing"

✔ Collision resolution becomes easy with separate chaining: nolocations; just insert a key in its linked list if it is not already t

✔ (It is possible to use fancier data structures than linked lists fowork very well in the average case, as we will see)

Page 4 of 31

Separate chaining: basic algorithms

t to avoid duplicates.)

)

arch.

of entries approachespen addressing

inters in addition to data, e

CSE 100, UCSD: LEC 17

✔ When inserting a key K in a table with hash function H(K)

1. Set indx = H(K)2. Insert key in linked list headed at indx. (Search the list firs

✔ When searching for a key K in a table with hash function H(K

1. Set indx = H(K)2. Search for key in linked list headed at indx, using linear se

✔ When deleting a key K in a table with hash function H(K)

1. Set indx = H(K)2. Delete key in linked list headed at indx

✔ Advantages: average case performance stays good as numberand even exceeds M; delete is easier to implement than with o

✔ Disadvantages: requires dynamic data, requires storage for pocan have poor locality which causes poor caching performanc

Page 5 of 31

Separate chaining, an example

5 6

CSE 100, UCSD: LEC 17

M = 7, H(K) = K mod Minsert these keys 701, 145, 217, 19, 13, 749in this table, using separate chaining:

index: 0 1 2 3 4

Page 6 of 31

Analysis of separate-chaining hashing

:

n the best, average, and

let’s analyze the average

CSE 100, UCSD: LEC 17

✔ Keep in mind the load factor measure of how full the table is

α = N/M

where M is the size of the table, and N is the number of keys that have been inserted in the table

✔ With separate chaining, it is possible to have α > 1

✔ Given a load factor α, we would like to know the time costs, iworst case of

✗ new-key insert and unsuccessful find (these are the same)

✗ successful find

✔ The best case is O(1) and worst case is O(N) for all of these...case

Page 7 of 31

Average case costs with separate chaining

f which may be empty),

y is accessed; then the

it by the hash function, th separate chaining is

; then the linked list robabilistic assumption) chaining is

eds 1.

CSE 100, UCSD: LEC 17

✔ Assume a table with load factor α = N/M

✔ There are N items total distributed over M linked lists (some oso the average number of items per linked list is:

✔ In any unsuccessful find/insert, the hash table entry for the kelinked list headed there is exhaustively searched

✔ Therefore, assuming all table entries are equally likely to be hthe average number of steps for insert or unsuccessful find wi

✔ In successful find, the hash table entry for the key is accessedheaded there is linearly searched. Therefore, (with the same pthe average number of steps for successful find with separate

✔ These are less than 2 and 1.5 respectively, when α < 1

✔ And these remain O(1), independent of M, even when α exce

Uα 1 α+=

Sα 1α2---+=

Page 8 of 31

Dictionary data types

find operation says ta item; etc.

consists of a key, together

nd operation takes a key te takes a key and

data types, or

CSE 100, UCSD: LEC 17

✔ A data structure is intended to hold data

✗ An insert operation inserts a data item into the structure; awhether a data item is in the structure; delete removes a da

✔ A Dictionary is a specialized kind of data structure:

✗ A Dictionary structure is intended to hold pairs: each pair with some related data

✗ An insert operation inserts a key-data pair in the table; a fiand returns the data in the key-data pair with that key; deleremoves the key-data pair with that key; etc.

✔ Dictionaries are sometimes called "Table” or “Map” abstract "associative memories"

Page 9 of 31

Dictionary as ADT

additional data

th the same key is already

n key; return the data

CSE 100, UCSD: LEC 17

✔ Domain:

✗ a collection of pairs; each pair consists of a key, and some

✔ Operations (typical):

✗ Create a table (initially empty)

✗ Insert a new key-data pair in the table; if a key-data pair withere, update the data part of the pair

✗ Find the key-data pair in the table corresponding to a give

✗ Delete the key-data pair corresponding to a given key

✗ Enumerate (traverse) all key-data pairs in the table

Page 10 of 31

Implementing the Dictionary ADT

-data pairs

, find, and delete

can specify the functions

CSE 100, UCSD: LEC 17

✔ A Dictionary can be implemented in various ways:

✗ using a list, binary search tree, hashtable, etc., etc.

✔ In each case:

✗ the implementing data structure has to be able to hold key

✗ the implementing data structure has to be able to do insertoperations paying attention to the key

✔ This could be done in a generic data structure, where the usercomparison function to be used by the insert, find, and delete

Page 11 of 31

The Dictionary ADT and search engine indexes

t to store, retrieve, and

h as what documents a within the document, etc.

n is done in the index to and possibly other

peration is done to add or n which it occurs,

have their associations

ey, a user can find the

CSE 100, UCSD: LEC 17

✔ The Dictionary ADT is useful in any situation where you wanmanipulate data based on associated keys

✔ One important application is a document search engine index

✔ An index associates words (keys) with information (data) sucword occurs in, how many times it occurs, what its position is

✗ When a word is read for the first time, an "insert" operatioassociate that word with the document in which it occurs (information)

✗ When a word is encountered again, "insert" or "update" omodify associations with that word (additional document iincrement the number of times it occurs, etc.)

✗ If a document is no longer available, words contained in itchanged, and the "delete" operation may be necessary

✗ By doing a “find” operation in the index using a word as kdocuments that contain that word

Page 12 of 31

Hashtables vs. balanced search trees

ications that need fast

(log N), which is quite

which is excellent; but

ys K1, K2, either K1<K2,

nd that you can compute

CSE 100, UCSD: LEC 17

✔ Hashtables and balanced search trees can both be used in applinsert and find

✔ What are advantages and disadvantages of each?

✗ Balanced search trees guarantee worst-case performance Ogood

✗ A well-designed hash table has typical performance O(1), worst-case is O(N), which is bad

✗ Search trees require that keys be well-ordered: For any keK1==K2, or K1> K2

✗ Hashtables only require that keys be testable for equality, aa hash function for them

Page 13 of 31

Hashtables vs. balanced search trees, cont’d

ue to a given key, or to sorted order

tion efficiently

fficient, and somewhat

ent correctly

CSE 100, UCSD: LEC 17

✗ A search tree can easily be used to return keys close in valreturn the smallest key in the tree, or to output the keys in

✗ A hashtable does not normally deal with ordering informa

✗ In a balanced search tree, delete is as efficient as insert

✗ In a hashtable that uses open addressing, delete can be inetricky to implement (easy with separate chaining though)

✗ Overall, balanced search trees are rather difficult to implem

✗ Hash tables are relatively easy to implement

Page 14 of 31

A look at Java’s Hashtable

library since JDK1.0

Framework”, and

nterface, but the

is makes them slightly programming, use

type parameters for keys

CSE 100, UCSD: LEC 17

✔ The java.util.Hashtable class has existed in the Java standard

✔ In JDK 1.2, Hashtable was incorporated into the “Collectionsdeclared declared to implement Map

✔ java.util.Hashtable is similar to java.util.HashMap

✗ They both implement Map, so they have the same public iimplementation is slightly different

✗ One difference is Hashtable has synchronized methods (thslower; if you don’t need synchronization for multitheadedHashMap)

✔ In JDK 1.5, Hashtable and Hashmap were made generic, withand values

Page 15 of 31

Hashtable.java

keys to values.

s a value.

m a hashtable, the

>hashCode</code>

izable {

CSE 100, UCSD: LEC 17

package java.util;

import java.io.*;

/**

* This class implements a hashtable, which maps

* Any non-null object can be used as a key or a

* <p>

* To successfully store and retrieve objects fro

* objects used as keys must implement the <code

* method and the <code>equals</code> method.

*/

public class Hashtable<K,V>

extends Dictionary<K,V>

implements Map<K,V>, Cloneable, java.io.Serial

Page 16 of 31

Dictionary abstract class

thods. It acts like an rface instead of a class. ere, without comments:

CSE 100, UCSD: LEC 17

✔ Dictionary is an abstract class, that specifies some abstract meinterface specification, and probably should have been an inteVery similar to the interface java.util.Map. Methods shown h

public abstract class Dictionary<K,V> {

abstract public int size();

abstract public boolean isEmpty();

abstract public Enumeration<K> keys();

abstract public Enumeration<V> elements();

abstract public V get(Object key);

abstract public V put(K key, V value);

abstract public V remove(Object key);

}

Page 17 of 31

Instance variables

:

able.

s threshold.

shtable?

CSE 100, UCSD: LEC 17

✔ Here are the instance variables declared in the Hashtable class /**

* The hash table data.

*/

private transient Entry table[];

/**

* The total number of entries in the hash t

*/

private transient int count;

/**

* Rehashes the table when count exceeds thi

*/

private int threshold;

✔ What is the type of elements of the array implementing the ha

Page 18 of 31

Entry

ntry<K,V> {

objects of this class.

lution strategy is used?

CSE 100, UCSD: LEC 17

✔ The Hashtable.java file also defines this inner class:

private static class Entry<K,V> implements Map.E

int hash;

K key;

V value;

Entry<K,V> next;

}

✔ Entries in a Hashtable object’s table[] array are pointers to

✔ From these declarations so far, can you tell what collision reso

Page 19 of 31

Hashtable methods

:

CSE 100, UCSD: LEC 17

✔ We will look at these instance methods in the Hashtable class

✗ constructors

✗ get()

✗ put()

✗ keySet()

Page 20 of 31

Hashtable constructors

ecified initial

city of the table

n 0.0 and 1.0.

initial capacity is

actor

Factor) {

0)) {

r);

CSE 100, UCSD: LEC 17

/**

* Constructs a new, empty hashtable with the sp

* capacity and the specified load factor.

*

* @param initialCapacity the initial capa

* @param loadFactor a number betwee

* @exception IllegalArgumentException if the

* less than zero, or if the load f

* is less than or equal to zero.

* @since JDK1.0

*/

public Hashtable(int initialCapacity, float load

if ((initialCapacity < 0) || (loadFactor <= 0.

throw new IllegalArgumentException();

}

this.loadFactor = loadFactor;

table = new Entry[initialCapacity];

threshold = (int) (initialCapacity * loadFacto

}

Page 21 of 31

Hashtable default constructor

ult capacity and

the hash table design

CSE 100, UCSD: LEC 17

/**

* Constructs a new, empty hashtable with a defa

* load factor.

*

* @since JDK1.0

*/

public Hashtable() {

this(11, 0.75);

}

✔ How do the default values for size and load factor compare toprinciples we talked about?...

Page 22 of 31

get()

is mapped in this

in this hashtable;

value in

;

e = e.next) {

) {

CSE 100, UCSD: LEC 17

/**

* Returns the value to which the specified key

* hashtable.

*

* @param key a key in the hashtable.

* @return the value to which the key is mapped

* null if the key is not mapped to any

* this hashtable.

*/

public synchronized V get(Object key) {

int hash = key.hashCode();

int index = (hash & 0x7FFFFFFF) % table.length

for (Entry<K,V> e = table[index] ; e != null ;

if ( e.hash == hash && e.key.equals(key)

return e.value;

}

}

return null;

}

Page 23 of 31

put()

ecified

the key nor the

de>get</code>

al key.

ed key in this

t did not have one.

or value is

CSE 100, UCSD: LEC 17

✔ Here are the javadoc comments:/**

* Maps the specified <code>key</code> to the sp

* <code>value</code> in this hashtable. Neither

* value can be <code>null</code>.

* <p>

* The value can be retrieved by calling the <co

* method with a key that is equal to the origin

*

* @param key the hashtable key.

* @param value the value.

* @return the previous value of the specifi

* hashtable,or <code>null</code> if i

* @exception NullPointerException if the key

* <code>null</code>.

* @since JDK1.0

*/

✔ ... and the code follows.

Page 24 of 31

public synchronized V put(K key, V value) {

ate its value

;

e = e.next) {

) {

ceeded

f the table

;

CSE 100, UCSD: LEC 17

// Make sure the value is not null

if (value == null) {

throw new NullPointerException();

}

// If the key is already in the hashtable, upd

int hash = key.hashCode();

int index = (hash & 0x7FFFFFFF) % table.length

for (Entry<K,V> e = table[index] ; e != null ;

if ( e.hash == hash && e.key.equals(key)

V old = e.value;

e.value = value;

return old;

}

}

if (count >= threshold) {

// Rehash the table if the threshold is ex

rehash(); // this enlarges the capacity o

index = (hash & 0x7FFFFFFF) % table.length

}

Page 25 of 31

CSE 100, UCSD: LEC 17

// Create and add the new entry.

Entry<K,V> e = new Entry<K,V>();

e.hash = hash;

e.key = key;

e.value = value;

e.next = table[index];

table[index] = e;

count++;

return null;

}

Page 26 of 31

Rehashing

rganizes this

s its entries more

l ; ) {

city;

CSE 100, UCSD: LEC 17

/** Increases the capacity of and internally reo

* hashtable, in order to accommodate and acces

* efficiently.

*/

protected void rehash() {

int oldCapacity = table.length;

Entry oldMap[] = table;

int newCapacity = oldCapacity * 2 + 1;

Entry newMap[] = new Entry[newCapacity];

threshold = (int)(newCapacity * loadFactor);

table = newMap;

for (int i = oldCapacity ; i-- > 0 ;) {

for (Entry<K,V> old = oldMap[i] ; old != nul

Entry<K,V> e = old;

old = old.next;

int index = (e.hash & 0x7FFFFFFF) % newCapa

e.next = newMap[index];

newMap[index] = e;

}

}

Page 27 of 31

keySet()

r not: just use get()

e are many possible keys, check them all with get()

e keys in the table:in this Hashtable.

emoves the

but not element

in this Map.

ver the keys in the table

CSE 100, UCSD: LEC 17

✔ For any key value, you can find out if that key is in the table o

✔ But how can you get a listing of all the keys in the table? Therand only a few of them will be in the table; it’s not feasible to

✔ The keySet() method returns a Set object that contains only th /* Returns a Set view of the keys contained

* The Set supports element removal (which r

* corresponding entry from the Hashtable),

* addition.

* @return a Set view of the keys contained

* @since 1.2

*/

public Set<K> keySet() {

//...

}

✔ An Iterator for the Set can then be used to iterate efficiently o

Page 28 of 31

Serializable objects

a sequence of bytes, in ted over a network

‘pickling’ the object

i.e. reconstituted, later received at the other end

ed to implement the

ares itself to implement it

s can also be serializable

CSE 100, UCSD: LEC 17

✔ Since JDK1.1, Java has had the ability to “serialize” objects

✔ Serialization is the process of converting an existing object toorder to be sent over a stream (e.g. saved to a file, or transmitconnection, etc.)

✗ serializing an object also sometimes called ‘persisting’ or

✔ This is done in such a way that the object can be deserialized,(e.g. by reading from the file, or when the serialized object is of the network connection, etc.)

✔ In order for an object to be serialized, its class must be declarjava.io.Serializable interface

✔ This interface does not specify any methods: a class that declis just indicating that instances of it can be serialized

✔ Many Java library classes are serializable; user-defined classe

Page 29 of 31

Serializing a serializable class

s or a subclass can be

of an appropriately

Object() method (you he appropriate type)

CSE 100, UCSD: LEC 17

✔ If a class is Serializable, objects that are instances of that classerialized

✔ To serialize an object, pass it to the writeObject() methodcreated java.io.ObjectOutputStream object

✔ The object can be deserialized by creating a corresponding java.io.ObjectInputStream object and calling its readwill want to downcast the returned Object reference to be of t

Page 30 of 31

Designing a serializable class

itive types or Serializable rializable interface and

u do not want it to be part marked transient

lues (null for class types,

serialization methods, ion for how to do this

everything to work, the d deserialization contexts

re created and initialized when an instance of the

CSE 100, UCSD: LEC 17

✔ If all the instance variables of a user-defined class are of primclass types, then the class can be declared to implement the Seinstances of the class can be serialized

✔ If an instance variable is not of a Serializable class type, or yoof the serialized representation, the instance variable must be

✔ transient instance variables are serialized as their default va“zero” for primitive types)

✗ to change this you can write your own serialization and dewhich can call the default methods; see online documentat

✔ Classes themselves are not serialized, only objects! So, to getsame class definition must be available in both serialization an

✗ As a corollary, static variables are never serialized: they awhen the class is loaded into the Java virtual machine, notclass is deserialized

Page 31 of 31

Next time

CSE 100, UCSD: LEC 17

✔ Self-organizing data structures

✔ Self-organizing lists

✔ Splay trees

✔ Spatial data structures

✔ K-D trees

✔ The C++ Standard Template Library