Hashing 22

13
Hashing Hashing can be used to build, search, or delete from a table. The basic idea behind hashing is to take a field in a record, known as the key, and convert it through some fixed process to a numeric value, known as the hash key, which represents the  position to either store or find an item in the table. The numeric value will be in the range of 0 to n-1, where n is the maximum number of slots (or buckets) in the table. The fixed process to convert a key to a hash key is known as a hash function. This function will be used whenever access to the table is needed. One common method of determining a hash key is the division method of hashing. The formula that will be used is: hash key = key % number of slots in the table The division method is generally a reasonable strategy, unless the ke y happens to have some undesirable properties. For example, if the table size is 10 and all of the keys end in zero. In this case, the choice of hash function and table size needs to be carefully considered. The  best table sizes are prime numbers. One problem though is that keys are not always numeric. In fact, it's common for them to be strings.

Transcript of Hashing 22

Page 1: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 1/13

Hashing 

Hashing can be used to build, search, or delete from a table.

The basic idea behind hashing is to take a field in a record, known as the key, and convert itthrough some fixed process to a numeric value, known as the hash key, which represents the

 position to either store or find an item in the table. The numeric value will be in the range of

0 to n-1, where n is the maximum number of slots (or buckets) in the table.

The fixed process to convert a key to a hash key is known as a hash function. This function

will be used whenever access to the table is needed.

One common method of determining a hash key is the division method of hashing. The

formula that will be used is:

hash key = key % number of slots in the table

The division method is generally a reasonable strategy, unless the key happens to have some

undesirable properties. For example, if the table size is 10 and all of the keys end in zero.

In this case, the choice of hash function and table size needs to be carefully considered. The

 best table sizes are prime numbers.

One problem though is that keys are not always numeric. In fact, it's common for them to be

strings.

Page 2: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 2/13

One possible solution: add up the ASCII values of the characters in the string to get a

numeric value and then perform the division method.

int hashValue = 0;

for( int j = 0; j < stringKey.length(); j++ )

hashValue += stringKey[j];

int hashKey = hashValue % tableSize;

The previous method is simple, but it is flawed if the table size is large. For example, assume

a table size of 10007 and that all keys are eight or fewer characters long.

 No matter what the hash function, there is the possibility that two keys could resolve to the

same hash key. This situation is known as a collision.

When this occurs, there are two simple solutions:

1.  chaining

2.  linear probe (aka linear open addressing)

And two slightly more difficult solutions

3.  Quadratic Probe

4.  Double Hashing

Hashing with Chains

When a collision occurs, elements with the same hash key will be chained together. A chain 

is simply a linked list of all the elements with the same hash key.

The hash table slots will no longer hold a table element. They will now hold the address of a

table element.

Page 3: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 3/13

 

Searching a hash table with chains:

Compute the hash key

If slot at hash key is null

Key not found

ElseSearch the chain at hash key for the desired key

Endif

Inserting into a hash table with chains:

Compute the hash key

If slot at hash key is null

Insert as first node of chain

Else

Search the chain for a duplicate key

If duplicate key

Don’t insert Else

Insert into chain

Endif

Endif

Deleting from a hash table with chains:

Compute the hash key

If slot at hash key is null

Nothing to delete

Else

Search the chain for the desired key

If key is not found

Nothing to delete

Page 4: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 4/13

  Else

Remove node from the chain

Endif

Endif

Hashing with Linear ProbeWhen using a linear probe, the item will be stored in the next available slot in the table,

assuming that the table is not already full.

This is implemented via a linear search for an empty slot, from the point of collision. If the

 physical end of table is reached during the linear search, the search will wrap around to the

 beginning of the table and continue from there.

If an empty slot is not found before reaching the point of collision, the table is full.

A problem with the linear probe method is that it is possible for blocks of data to form when

collisions are resolved. This is known as primary clustering.

This means that any key that hashes into the cluster will require several attempts to resolve

the collision.

For example, insert the nodes 89, 18, 49, 58, and 69 into a hash table that holds 10 items

using the division method:

Page 5: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 5/13

 

Hashing with Quadratic Probe

To resolve the primary clustering problem, quadratic probing can be used. With quadratic

 probing, rather than always moving one spot, move i2 spots from the point of collision, where

i is the number of attempts to resolve the collision.

Page 6: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 6/13

 

Limitation: at most half of the table can be used as alternative locations to resolve collisions.

This means that once the table is more than half full, it's difficult to find an empty spot. This

new problem is known as secondary clustering because elements that hash to the same hash

key will always probe the same alternative cells.

Hashing with Double Hashing

Double hashing uses the idea of applying a second hash function to the key when a collision

occurs. The result of the second hash function will be the number of positions form the point

of collision to insert.

There are a couple of requirements for the second function:

  it must never evaluate to 0

  must make sure that all cells can be probed

A popular second hash function is: Hash2(key) = R - ( key % R ) where R is a prime number

that is smaller than the size of the table.

Page 7: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 7/13

 

Hashing with Rehashing

Once the hash table gets too full, the running time for operations will start to take too long

and may fail. To solve this problem, a table at least twice the size of the original will be built

and the elements will be transferred to the new table.

The new size of the hash table:

  should also be prime

  will be used to calculate the new insertion spot (hence the name rehashing)

This is a very expensive operation! O(N) since there are N elements to rehash and the table

size is roughly 2N. This is ok though since it doesn't happen that often.

The question becomes when should the rehashing be applied?

Some possible answers:

  once the table becomes half full

  once an insertion fails

  once a specific load factor has been reached, where load factor is the ratio of thenumber of elements in the hash table to the table size

Page 8: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 8/13

Deletion from a Hash Table

The method of deletion depends on the method of insertion. In any of the cases, the same

hash function(s) will be used to find the location of the element in the hash table.

There is a major problem that can arise if a collision occurred when inserting -- it's possible

to "lose" an element.

Operating system

What is pre-emptive and non-preemptive scheduling?Tasks are usually assigned with priorities. At times it is necessary to run a certain task that

has a higher priority before another task although it is running. Therefore, the running task is

interrupted for some time and resumed later when the priority task has finished its execution.

This is called preemptive scheduling. 

Eg: Round robin

In non-preemptive scheduling, a running task is executed till completion. It cannot be

interrupted.

Eg First In First Out

What is pre-emptive and non-preemptive scheduling?

Preemptive scheduling: The preemptive scheduling is prioritized. The highest priority

 process should always be the process that is currently utilized.

Non-Preemptive scheduling: When a process enters the state of running, the state of that

 process is not deleted from the scheduler until it finishes its service time.

Page 9: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 9/13

What is page fault and when does it occur?

When the page (data) requested by a program is not available in the memory, it is called as a

 page fault. This usually results in the application being shut down.

What is page fault and when does it occur?

A page is a fixed length memory block used as a transferring unit between physical memory

and an external storage. A page fault occurs when a program accesses a page that has been

mapped in address space, but has not been loaded in the physical memory.

What is dirty bit?

When a bit is modified by the CPU and not written back to the storage, it is called as a dirty

 bit. This bit is present in the memory cache or the virtual storage space.

Define compactions.

Compaction is a process in which the free space is collected in a large memory chunk to

make some space available for processes.

Page 10: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 10/13

Best-Fit, First-Fit and Worst-Fit Memory Allocation Method for Fixed

Partition The following jobs are loaded into memory using fixed partition following a certain memory

allocation method (best-fit, first-fit and worst-fit). 

Memory Block Size

Block 1 50k 

Block 2 200k

Block 3 70k 

Block 4 115k 

Block 5 15k

B E S T - F I T 

Best-fit memory allocation makes the best use of memory space but slower in making

allocation. In the illustration below, on the first processing cycle, jobs 1 to 5 are submitted

and be processed first. After the first cycle, job 2 and 4 located on block 5 and block 3

respectively and both having one turnaround are replace by job 6 and 7 while job 1, job 3

and job 5 remain on their designated block. In the third cycle, job 1 remain on block 4, while

 job 8 and job 9 replace job 7 and job 5 respectively (both having 2 turnaround). On the next

cycle, job 9 and job 8 remain on their block while job 10 replace job 1 (having 3 turnaround).

On the fifth cycle only job 9 and 10 are the remaining jobs to be process and there are 3 free

memory blocks for the incoming jobs. But since there are only 10 jobs, so it will remain free.

On the sixth cycle, job 10 is the only remaining job to be process and finally on the seventh

cycle, all jobs are successfully process and executed and all the memory blocks are now

free.

List of Jobs  Size  Turnaround 

Job 1  100k  3 

Job 2  10k  1 

Job 3  35k  2 

Job 4  15k  1 

Job 5  23k  2 

Job 6  6k  1 

Job 7  25k  1 

Job 8  55k  2 

Job 9  88k  3 

Job 10  100k  3 

Page 11: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 11/13

 

F I R S T - F I T 

First-fit memory allocation is faster in making allocation but leads to memory waste. The

illustration below shows that on the first cycle, job 1 to job 4 are submitted first while job 6

occupied block 5 because the remaining memory space is enough to its required memory

size to be process. While job 5 is in waiting queue because the memory size in block 5 is

not enough for the job 5 to be process. Then on the next cycle, job 5 replace job 2 on block 1

and job 7 replace job 4 on block 4 after both job 2 and job 4 finish their process. Job 8 is in

waiting queue because the remaining block is not enough to accommodate the memory size

of job 8. On the third cycle, job 8 replace job 3 and job 9 occupies block 4 after processing

 job 7. While Job 1 and job 5 remain on its designated block. After the third cycle block 1 and

block 5 are free to serve the incoming jobs but since there are 10 jobs so it will remain free. And job 10 occupies block 2 after job 1 finish its turns. On the other hand, job 8 and job 9

remain on their block. Then on the fifth cycle, only job 9 and job 10 are to be process while

there are 3 memory blocks free. In the sixth cycle, job 10 is the only remaining job to be

process and lastly in the seventh cycle, all jobs are successfully process and executed and

all the memory blocks are now free.

Page 12: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 12/13

 

W O R S T - F I T 

Worst-fit memory allocation is opposite to best-fit. It allocates free available block to the new

 job and it is not the best choice for an actual system. In the illustration, on the first cycle, job

5 is in waiting queue while job 1 to job 4 and job 6 are the jobs to be first process. After

then, job 5 occupies the free block replacing job 2. Block 5 is now free to accommodate the

next job which is job 8 but since the size in block 5 is not enough for job 8, so job 8 is in

waiting queue. Then on the next cycle, block 3 accommodate job 8 while job 1 and job 5

remain on their memory block. In this cycle, there are 2 memory blocks are free. In the fourth

cycle, only job 8 on block 3 remains while job 1 and job 5 are respectively replace by job 9and job 10. Just the same in the previous cycle, there are still two free memory blocks. At

fifth cycle, job 8 finish its job while the job 9 and job 10 are still on block 2 and block 4

respectively and there is additional memory block free. The same scenario happen on the

sixth cycle. Lastly, on the seventh cycle, both job 9 and job 10 finish its process and in this

cycle, all jobs are successfully process and executed. And all the memory blocks are now

free.

Page 13: Hashing 22

8/18/2019 Hashing 22

http://slidepdf.com/reader/full/hashing-22 13/13