AADS_14_Hash Tables & Hash Functions

Hash Tables & Hash Functions

AADS-14

Significance of complexity of SearchIn an unordered list the time required to find

a value is O(n)In an ordered list this time can be improved,

and there could definitely be improvement in the modification operations

In a Binary Search Tree the search time could well improve to O(log n)

Same is the limit for AVL trees

Dictionary Data StructureDictionary is a general form of Data Structure to

store key and valuesIt can be implemented using Array or Linked List

structuresFor a Dictionary, the direct addressing of each

element could be done using the value of the element as index, if the Dictionary is of that size

Key Search/Dictionary StorageBut in any of the complex applications the memory is

simultaneously used by many processesAlso there could be frequent accesses to the Keys in

the runtimeSo, there is a need for reducing both size of space

and the search time

Example-1A 4 digit number as Key may need 9999 locations If the Key stands for the Employee ID of a company

with 500 employees, Then only 500 locations shall be used when all the

Keys are arranged in the memory

Example 2A Hospital might be having large number of patients,

both inpatients and outpatientsThe database system can be modeled to group the

patients and then index them so that the retrieval of the records shall be fast.

Another way is not to group, but assign only one number to each case

Search time of O(1)In both cases of large data or small amount of data,

the amortized time of O(1) or a near about time could be achieved if we know the location of the data or key we are looking for

This location could be obtained from a mapping of the key to a new hashed key using proper functions

Hashing Hashing could provide unique locations or a

reference to a shorter list for the keys from where we can easily get the data pertaining to one key

Also, this would perhaps use less space in memory Instead of a large array, we can use a short length

array/linked list

Hash TableHash Table is a Data StructureHash tables provide the time O(1) for any

and all values in a set contained on the Hash Table for search/insert/delete

Hash Table?Hash table is an array say T[1,m] where m is a

positive integer called the table sizeWhen we try to put an item into a spot in the

hash table that is occupied, the situation is called collision

It is resolved using a collision resolution policy

Hashing-Mathematical DefinitionHashing is a mapping operationConsider the a set K of keysLet H be a function that map the keys to a new set LSuch that

H:K L

Hash Function/ & Hash AddressThe function H is called the HASH FUNCTION

This mapping done by the function H is called the HASHING

The object L is the Hash table

Each cell/location in L is identified using the Hash address

Hash AddressLet k is Key in K or k KThen k will have a mapped address in L given by

H(k) known as the Hash AddressHash Address d is the mapped address/location

given by the hashing operation

d=H(k)

of a key k

Indexing on the Hash table The hash address d shall directly point to a location

in LThis address d is also called the Hash Address or

Hash Code for the key kThe process of Hashing is also called Compression

NotesThere is no meaning between the actual data value

k and the hash key dSo there is no practical way to traverse a hash table,

except a direct search using dHash table items are not in any orderThere is no mapping function from d to k, except the

hash tableThe purpose of hash tables is to provide fast look

ups

Illustration- Bucket Array Structure for Hash Table

1k1

2k2

3k3

L-1kN-1

LkN

Uses of Hash TablesCompilers use hash tables for symbol storage.The Linux Kernel uses hash tables to manage

memory pages and buffers.High speed routing tables use hash tables.Database systems use hash tables.

Operations on Hash TablesInitializeInsert(k)Search(k)Remove(k)SizeofIsempty

Types of hashingThere are two types

1. Open hashing- Open Chaining-Closed Addressing-Separate Chaining

2. Closed hashing- Open Addressing

Open hashing-Open ChainingAmount of data to be stored is highUses a hash function to obtain the hash addressAll data with same hash address shall be stored as a

shorter list with a reference indicated by the above hash address

Bucket in Open hashingEach hash location on the Hash table is said to a

bucket for the data with an indexData within the bucket could better be organized as

Linked List

1k1

2k2

3k3

L-1kN-1

LkN

Closed hashing-Open AddressingClosed hashing uses a fixed spaceHashing shall map a key into one of the locations in

the earmarked spaceIf there are multiple keys getting hashed to same

address(collision) then the tie shall be resolvedBucket may be small enough to hold only one value

at a time

Topics in HashingBasically there are two subareas under “Hashing”

1. Hash Functions

2. Collision Resolutions

Hash Functions

1. The Hash Function H should be easy to compute

2. The function H should, as far as possible, uniformly distribute the hash addresses throughout the set L so that there are a minimum number of collisions

Hash Functions

Requirement of Hash FunctionsThe main idea of using Hash Function H is that for a

key k, the hash function H obtains a value H(k) as an index into the hash table cell/bucket so that we can locate the key k in the Hash Table easily for search/insert

Hash FunctionsDivision MethodMid Square methodMultiplication Method

Division MethodChoose a prime number that is not close to the

power of 2Let m be the selected numberThen m also indicate the size of the Hash Table in

the ideal case with one cell in each bucketThe hash address/bucket address is given by

H(k)=k mod m

ExampleGiven keys are

4845, 5679, 6381, 3636, 7180, 8126, 1127

Use Table size m=7

Hash to a Table with 7 cells

Also use m=11

and m=8 to repeat the exercise

Answer

01127

14845

25679

33636

46381

57180

68126

HASH ADDRESS

KEY

Choosing Table size in Division MethodWhen using the division method, ample

consideration must be given to the size of the table.

The best choice for table size is usually a prime number not too close to a power of 2.

Division Method for Chaining-

Here, the Hash Table will have many cellsHash addresses map multiple keys to a single location, So, there could be multiple entries in one location, These multiple entries under a single hash Code are

held as a linked list

IllustrationTake Table size m as 11 to map a set keysKeys –

Modulo Divide each by 11 and get the hash addresses

122 221 661

90 167 57

69

Answer- We get the following Table

1 111 221 551

2 90 167 57

3 69

0

4

Load FactorLet there are m slots in a Hash TableAt the instant of observation the number

elements is nTherefore the Load factor =n/m This is the average number of element stored

in the Hash Table can be less than, equal to or greater than 1

Find the Load Factor 0 110

1 89 452 68

167 57

34 225 554

9 108

5 82

10 109

SolutionThere are 11 slots11 elements = 11/11=1So, indicates the average number of elements per

positionAlso, we get =1 even if there are vacant slots,

because it is only showing the average

Notes on The Load factor could be assuming various values

as the number of keys on the Hash Table changesAccordingly, could be less than, equal, or greater

than one in a Hash Table formed using Separate Chaining(Open Hashing)

In a Hash Table formed using Open Addressing(Closed Hashing) shall be always less than one

decides the complexity of the operations on the Hash Tables like insert, search, delete etc

Hashing the Strings

ExerciseMap the following keys in such a way that we have

the hash function as followsFind the ASCII values of first and last charactersIf there is only one character, it shall be the start and

endAdd the ASCII value of last character to the ASCII

value of first multiplied by 256Apply mod m division to this resulting number

KeysA, BABU, CHOWHAN, SUMAN, DILIP

The 5 symbols are:AA BUCNSNDP

These 5 symbols are then converted to a numerical code using the rule given previously by employing the ASCII values of the characters in the symbols

ASCII ValuesA-65B-66C-67D-68E-69F-70G-71H-72I-73

J-74K-75L-76M-77N-78O-79P-80Q-81R-82

S-83T-84U-85V-86W-87X-88Y-89Z-90

Example- AnswerAA 256*65+65=16705BU 256*66+85=16981CN 256*67+78=17320SN 256* 83+78=21326DP 256*68+ 80=17488

A-65B-66C-67D-68E-69F-70G-71H-72 I-73

J-74K-75L-76M-77N-78O-79P-80Q-81R-82

S-83T-84U-85V-86W-87X-88Y-89Z-90

SolutionTake m=7Obtain the Hash Addresses

AA 256*65+65=16705mod 7=3BU 256*66+85=16981mod7=6CN 256*67+78=17320mod7=2SN 256* 83+78=21326mod7=4DP 256*68+ 80=17488mod7=2

Solution

1

2 CHOWHAN DILIP

3 AA

4 SUMAN

0

5

6 BABU

Symbol TableCompilers use a method similar to the previous one

to form a symbol table for the parsing purposes in the compilation

Hash Functions for string hashingHash Functions perform two separate functions:

1 – Convert the string to a key.

2 – Constrain the key to a positive value less than the size of the table.

The best strategy is to keep the two functions separate so that there is only one part to change if the size of the table changes.

Notes-Chaining methodThe chaining method gives infinite space in the hash

table in principleBut, in practical applications, only limited space shall

be allotted for one hash table in the memoryThere is no collision in chaining

Collisions

CollisionIn the case of closed hashing(open addressing)-

even though H is ideally giving distinct addresses in L for each member in K in the real situation two or more Keys may LEAD TO A SINGLE Hash Address when a given Hash Function is used

This situation is called collisionWe need some method to resolve collisionThe method is called “Collision Resolution Policy”

Collision Resolution PolicyLinear ProbingQuadratic ProbingDouble Hashing

Linear ProbingIf a collision occurs, look for next immediate free

location and use it for storage for the insert operationIf a key is not found, look for it in the next cells in a

linear manner for search operations

ExampleLet H is mod 11Let the keys are 56, 78, 100 appear in this order for

hashingAll these have home as position 1The table is considered a circular array

0 156

278

3100

8 9 10

4

ExerciseHash 45, 39, 66, 74 in that order with Table size m=7

0 1 2 345

566

674

439

45 mod 7=339 mod 7 = 466 mod 7 =374 mod 7=4

ExerciseLet H is mod 11Let the keys are 46, 122, 222, 441 appear in this order

for hashing

46 mod 11 = 2122 mod 11 = 1222 mod 11 = 2441 mod 11 = 1

Solution

0 1122

246

3222

8 9 10

4441

AADS_14_Hash Tables & Hash Functions

Documents

Transcript of AADS_14_Hash Tables & Hash Functions