Topic 6 Hashing

download Topic 6 Hashing

of 31

Transcript of Topic 6 Hashing

  • 8/14/2019 Topic 6 Hashing

    1/31

    CSC508

    DATA STRUCTURES

    TOPIC 6

    Hashing1

    Zulaile

    Mabni

  • 8/14/2019 Topic 6 Hashing

    2/31

    CHAPTER OBJECTIVES

    Learn about hashing

    Learn about hash methods

    Mid-square

    Folding

    Division

    Learn about collision

    Open addressing

    Chaining

    2

    Malik,DataStructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    3/31

    WHYHASHING?

    Need a data structure in which finds/searches are

    very fast

    Insert and Delete process should be fast too

    Objects have unique keys A key may be a single property/attribute value

    Or may be created from multiple properties/values

  • 8/14/2019 Topic 6 Hashing

    4/31

    HASH TABLES VS. OTHER DATA

    STRUCTURES

    Maximize efficiency: implement the operations Insert(),

    Delete() and Search()/Find() efficiently.

    Arrays:

    not space efficient (assumes we leave empty space for

    keys not currently in the structure) Linked List

    space efficient

    Insert(), Delete() and Search()/Find() not too efficient

    Hash Tables: Better than the above in terms of space and efficiency

  • 8/14/2019 Topic 6 Hashing

    5/31

    HASH TABLES

    Very useful data structure

    Good for storing and retrieving key-value pairs

    Not good for iterating through a list of items

    Example applications: Storing objects according to ID numbers

    When the ID numbers are widely spread out

    When you dont need to access items in ID order

  • 8/14/2019 Topic 6 Hashing

    6/31

    HASH INDEX/VALUE

    A hash value or hash index is used to

    index the hash table (array)

    A hash function takes akey and returns a

    hash value/index The hash index is a integer (to index an array)

    The key is specific value associated with a

    specific object being stored in the hash

    table

    It is important that the key remain constant

    for the lifetime of the object

  • 8/14/2019 Topic 6 Hashing

    7/31

    Hash TablesConceptual View

    Obj5

    key=1

    obj1

    key=15

    Obj4

    key=2

    Obj2

    key=36

    7

    6

    5

    4

    3

    21

    0

    table

    Obj3

    key=4

    buckets

    hash

    value/inde

    x

    b1

    b2

    b3

    b4

  • 8/14/2019 Topic 6 Hashing

    8/31

    Hash Tables solve these problems by using a much smaller array and

    mapping keys with a hash function.

    Let universe of keys U and an array of size m. A hash function h is a functionfrom U to 0m, that is:

    h U 0m

    HASH TABLES

    U

    (universe of keys)

    k1 k2

    k3 k4

    k6

    0

    1

    2

    3

    4

    5

    6

    7

    h (k2)=2

    h (k1)=h (k3)=3

    h (k6)=5

    h (k4)=7

  • 8/14/2019 Topic 6 Hashing

    9/31

    HASH FUNCTION

    You want a hash function/algorithm that is:

    Fast

    Easy to compute

    Minimize the number of collisions

    Creates a good distribution of hash values so that the

    items (based on their keys) are distributed evenly

    through the array

    Hash functions can use as input

    Integer key values String key values

    Multipart key values

    Multipart fields, and/or

    Multiple fields

  • 8/14/2019 Topic 6 Hashing

    10/31

    CHOOSING AHASH FUNCTION

    The performance of the hash table depends onhaving a hash function that evenly distributesthe keys:uniform hashingis the ideal target

    Choosing a good hash function requires taking

    into account the kind of data that will be used. E.g., Choosing the first letter of a last name will likely

    cause lots of collisions depending on the nationality ofthe population.

    Most programming languages (including java)

    have hash functions built in.

  • 8/14/2019 Topic 6 Hashing

    11/31

    COMMONLYUSED HASH METHODS

    Mid-Square

    Hash method, h, computed by squaring the

    identifier

    Using appropriate number of bits from themiddle of the square to obtain the bucket

    address

    Middle bits of a square usually depend on all

    the characters, it is expected that differentkeys will yield different hash addresses with

    high probability, even if some of the characters

    are the same11

    Malik,DataStructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    12/31

    COMMONLYUSED HASH METHODS

    Folding

    KeyX is partitioned into parts such that all the parts,

    except possibly the last parts, are of equal length

    Parts then added, in convenient way, to obtain hash

    address

    Division (Modular arithmetic)

    key mod m

    m is the array size; in general, it should be prime

    number

    KeyX is converted into an integer iX

    This integer divided by size of hash table to get

    remainder, giving address ofX in HT 12

    Malik,DataStructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    13/31

    THE MOD FUNCTION

    Stands for modulo

    When you divide x by y, you get a result and a remainder

    Mod is the remainder

    8 mod 5 = 3

    9 mod 5 = 4

    10 mod 5 = 0

    15 mod 5 = 0

    Thus for key-value mod M, multiples of M give the sameresult, 0

    But multiples of other numbers do not give the same result

  • 8/14/2019 Topic 6 Hashing

    14/31

    COMMONLYUSED HASH METHODS

    Multiplication method

    Floor ((key*someFraction mod 1)*arraySize)

    Where some fraction is typically 0.618

    Java Hash Map method

    Create a hash by performing a series of shifts, adds,

    and xors on the key

    index = hash mod arraySize

  • 8/14/2019 Topic 6 Hashing

    15/31

    COMMONLYUSED HASH METHODS

    Suppose that each key is a string. Thefollowing Java method uses the divisionmethod to compute the address of the key:

    int hashmethod(String insertKey)

    {

    int sum = 0;

    for(int j = 0; j

  • 8/14/2019 Topic 6 Hashing

    16/31

    HASH FUNCTIONS & INSERT()

    Usage summary:

    int hashValue = hashFunction (int key);

    Or hashValue = hashFunction (String key);

    Or hashValue = hashFunction (itemType item);

    Insert method:

    public void insert (int key, itemType item)

    {

    hashValue = hashFunction (key);table[hashValue] = item;

    }

  • 8/14/2019 Topic 6 Hashing

    17/31

    HASH TABLES: INSERT EXAMPLE

    For example, if we hash keys 01000 into a hash table with 5

    entries and use h(key) = key mod 5 , we get the followingsequence of events:

    0

    1

    2

    3

    4

    key dataInsert 2

    2

    0

    1

    2

    3

    4

    key dataInsert 21

    2

    21

    0

    1

    2

    3

    4

    key dataInsert 34

    2

    21

    34

    Insert 54

    There is a

    collisionatarray entry #4

    ???

  • 8/14/2019 Topic 6 Hashing

    18/31

    COLLISION RESOLUTION

    Algorithms to handle collisions

    Two categories of collision resolution techniques

    Open addressing (closed hashing)

    Chaining (open hashing)

    18

    Malik,Data

    StructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    19/31

    A problem arises when we have two keys thathash in the same array entrythis is called acollision.

    There are two ways to resolve collision:

    Hashing with Chaining (a.k.a. SeparateChaining): every hash table entry contains a pointerto a linked list of keys that hash in the same entry

    Hashing with Open Addressing: every hash tableentry contains only one key. If a new key hashes to atable entry which is filled, systematically examineother table entries until you find one empty entry toplace the new key

    DEALING WITH COLLISIONS

  • 8/14/2019 Topic 6 Hashing

    20/31

    HASHING WITH CHAINING

    The problem is that keys 34 and 54 hash in the same entry (4). We

    solve this collisionby placing all keys that hash in the same hash table

    entry in a chain (linked list) or bucket (array) pointed by this entry:

    0

    1

    2

    3

    4

    other

    key key data

    Insert 54

    2

    21

    54 34

    CHAIN

    0

    1

    2

    3

    4

    Insert 101

    2

    101

    54 34

    21

  • 8/14/2019 Topic 6 Hashing

    21/31

    OPENADDRESSING

    Collisions are resolved by systematically examining other tableindexes, i0 , i1 , i2 , until an empty slot is located.

    The key is first mapped to an array cell using the hash function (e.g.

    key % array-size)

    If there is a collision find an available array cell

    There are different algorithms to find (to probe for) the next array

    cell

    Linear probing

    Quadratic probing

    Random probing Double Hashing

  • 8/14/2019 Topic 6 Hashing

    22/31

    OPENADDRESSING: LINEAR PROBING

    Suppose that an item with keyX is to be inserted in HT

    Use hash function to compute index h(X) of item in HT

    Suppose h(X) = t.

    If HT[t] is empty, store item into array slot.

    Suppose HT[t] already occupied by another item; collisionoccurs

    Linear probing: starting at location t, search array sequentiallyto find next available array slot:

    (t + 1) % HTSize, (t + 2) % HTSize,,(t + j) % HTSize

    Be sure to wrap around the end of the array! Stop when you have tried all possible array indices

    If the array is full, you need to throw an exception or, betteryet, resize the array

    22

    Malik,Data

    StructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    23/31

    COLLISION RESOLUTION:

    OPENADDRESSING

    Pseudocode implementing linear probing:

    hIndex = hashmethod(insertKey);

    found = false;

    while(HT[hIndex] != emptyKey && !found)

    if(HT[hIndex].key == key)

    found = true;

    else

    hIndex = (hIndex + 1) % HTSize;

    if(found)

    System.out.println(Duplicate items not allowed);else

    HT[hIndex] = newItem;

    23

    Malik,Data

    StructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    24/31

    HASH TABLESOPENADDRESSING

    (LINEAR PROBING)

    Obj5

    key=1

    obj1

    key=15

    Obj4

    key=2

    Obj2

    key=28

    7

    6

    5

    4

    3

    2

    1

    0

    table

    Obj3key=4Index=5

    hashv

    alue/index

    Index=4

    h(key) = key mod 8

  • 8/14/2019 Topic 6 Hashing

    25/31

    RANDOM PROBING

    Uses a random number generator to findthe next available slot

    ith slot in the probe sequence is: (h(X) +ri) %HTSize whereri is theith value in arandom permutation of the numbers 1 toHTSize1

    Suppose HTSize = 101, for h(X) = 26, and r1= 2, r2 = 5, r3 = 8.

    The probe sequence of X has the elements26, 28,31,34

    All insertions and searches use the samesequence of random numbers

    25

    Malik,Data

    StructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    26/31

    QUADRATIC PROBING

    In Quadratic probing, starting at position

    t, check the array locations ( t + 1) %

    HTSize, (t + 2) % HTSize,, (t + i) %

    HTSize.We do not know if it probes all the

    positions in the table

    WhenHTSize is prime, quadratic probing

    probes about half the table before

    repeating the probe sequence

    26

    Malik,Data

    StructuresUsingJava

  • 8/14/2019 Topic 6 Hashing

    27/31

    DOUBLE HASHING

    Apply a second hash function after the first

    The second hash function, like the first, is

    dependent on the key

    Secondary hash function must

    Be different than the first

    And, obviously, not generate a zero

    Good algorithm:

    arrayIndex = (arrayIndex + stepSize) % arraySize;

    Where stepSize = constant(key % constant)

    And constant is a prime less than the array size

  • 8/14/2019 Topic 6 Hashing

    28/31

  • 8/14/2019 Topic 6 Hashing

    29/31

    HASH TABLES IN JAVA

    Java supports a number of hash table classes

    Hashtable, HashMap, LinkedHashMap, HashSet,

    See Sun Java API Documentation

    http://java.sun.com/j2se/1.4.1/docs/api/

    As a programmer, you dont see the collision detection,chaining, etc.

    You can set

    The initial table size The load factor (Default is .75)

    hashCode()hash function

  • 8/14/2019 Topic 6 Hashing

    30/31

    HASHINGANIMATION

    http://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html

  • 8/14/2019 Topic 6 Hashing

    31/31

    REFERENCES

    Malik D.S., Nair P.S., Data Structures Using

    Java, Course Technology, 2003.

    Weiss Mark Allen,Data Structures & Algorithm

    Analysis in C++, Pearson Education

    International Inc, 2003.

    31

    Malik,Data

    StructuresUsingJava