cse2a.files.wordpress.com · Web view1.Explain the operations of linear list in detail. Linear Data...

1.Explain the operations of linear list in detail.

Linear Data Structures or list

Linear data structures are those data structures in which data elements are accessed (read and written) in sequential fashion ( one by one)

Eg: Stacks , Queues, Lists, Arrays

Stack :

Stack is a Linear Data Structure which follows Last in First Out mechanism.

It means: the first element inserted is the last one to be removed

Stack uses a variable called top which points topmost element in the stack. top is incremented while pushing (inserting) an element in to the stack and decremented while poping (deleting) an element from the stack

(top)

(CBA)Push(A) Push(B) Push(C) Push(D) Pop()

(AABACBAtoptoptoptopDCBAAA)Valid Operations on Stack:

· Inserting an element in to the stack (Push)

· Deleting an element in to the stack (Pop)

· Displaying the elements in the queue (Display)

Note:

While pushing an element into the stack, stack is full condition should be checked

While deleting an element from the stack, stack is empty condition should be checked

Applications of Stack:

· Stacks are used in recursion programs

· Stacks are used in function calls

· Stacks are used in interrupt implementation

Queue:

Queue is a Linear Data Structure which follows First in First out mechanism.

It means: the first element inserted is the first one to be removed

Queue uses two variables rear and front. Rear is incremented while inserting an element into the queue and front is incremented while deleting element from the queue

(rearfront) (ABACBADCBADCBrearfrontrearfrontrearfrontrearfront)Insert(A)Insert(B)Insert(C)Insert(D) Delete()

Valid Operations on Queue:

· Inserting an element in to the queue

· Deleting an element in to the queue

· Displaying the elements in the queue

Note:

While inserting an element into the queue, queue is full condition should be checked

While deleting an element from the queue, queue is empty condition should be checked

Applications of Queues:

· Real life examples

· Waiting in line

· Waiting on hold for tech support

· Applications related to Computer Science

· Threads

· Job scheduling (e.g. Round-Robin algorithm for CPU allocation)

2.Explain B-Trees and its operations

B-tree: definition

A B-treeT is a rooted tree (with root root[T]) with properties:

Every node x has four fields:

1. The number of keys currently stored in node x, n[x].

2. The n[x] keys themselves, stored in nondecreasing order:

Key1[x]<=key2[x]<=…<=keyn[x].

3.A boolean value

True if x is a leaf

Leaf[x]=false if x is an internal node.

4.n[x]+1 pointers ,c1[x],c2[x],…cn[x]+1[x] to its children.

Basic operations on B-trees

Details of the following operations:

· B-Tree-Search

· B-Tree-Create

· B-Tree-Insert

· B-Tree-Delete

Searching a B-tree (I)

2 inputs: x, pointer to the root node of a subtree, k, a key to be searched in that subtree.

function B-Tree-Search(x, k ) returns (y, i ) such that keyi[y] = k or nil i ← 1

while i ≤ n[x] and k > keyi[x] do i ← i + 1

if i ≤ n[x] and k = keyi[x] then return (x, i)

if leaf[x]

then return nil

else Disk-Read(ci[x])

return B-Tree-Search(ci[x], k )

At each internal node x we make an (n[x] + 1)-way branching decision.

Searching a B-tree (II)

Number of disk pages accessed by B-Tree-Search

Θ(h) = Θ(logt n)

• time of while loop within each node is O(t) therefore the total CPU time

O(th) = O(t logt n)

Splitting a node in a B-tree (I)

· Inserting a key into a B-tree is more complicated than in binary search tree.

· Splitting of a full node y (2t − 1 keys) fundamental operation during insertion.

· Splitting around median key keyt[y] into 2 nodes.

· Median key moves up into y’s parent (which has to be nonfull).

If y is root node tree height grows by 1.

Inserting a key into a B-tree (I)

· The key is always inserted in a leaf node

· Inserting is done in a single pass down the tree

· Requires O(h) = O(logt n) disk accesses

· Requires O(th) = O(t logt n) CPU time

Uses B-Tree-Split-Child to guarantee that recursion never descends to a full node

Inserting a key - Examples (I)

Initial tree =3

2 inserted

17 inserted

Deleting a Key from a B-tree

· Similar to insertion, with the addition of a couple of special cases

· Key can be deleted from any node.

· More complicated procedure, but similar performance figures: O(h) disk accesses,

O(th) = O(t logt n) CPU time

· Deleting is done in a single pass down the tree, but needs to return to the node with the deleted key if it is an internal node

In the latter case, the key is first moved down to a leaf. Final deletion always takes place on a leaf

Deleting a Key — Cases I

· Considering 3 distinct cases for deletion

· Let k be the key to be deleted, x the node containing the key. Then the cases are:

1. If key k is in node x and x is a leaf, simply delete k from x

2. If key k is in node x and x is an internal node, there are three cases to consider:

(a) If the child y that precedes k in node x has at least t keys (more than the minimum), then find the predecessor key k0 in the subtree rooted at y. Recursively delete k0 and replace k with k0 in x

(b) Symmetrically, if the child z that follows k in node x has at least t keys, find the successor k0 and delete and replace as before. Note that finding k0 and deleting it can be performed in a single downward pass

(c) Otherwise, if both y and z have only t −1 (minimum number) keys, merge k and all of z into y, so that both k and the pointer to z are removed from x. y now contains 2t − 1 keys, and subsequently k is deleted

Deleting a Key — Cases II

3. If key k is not present in an internal node x, determine the root of the appropriate

subtree that must contain k. If the root has only t − 1 keys, execute either of the following two cases to ensure that we descend to a node containing at least t keys. Finally, recurse to the appropriate child of x

(a) If the root has only t − 1 keys but has a sibling with t keys, give the root an extra key by moving a key from x to the root, moving a key from the roots immediate left or right sibling up into x, and moving the appropriate child from the sibling to x

(b) If the root and all of its siblings have t − 1 keys, merge the root with one sibling. This involves moving a key down from x into the new merged node to become the median key for that node.

Deleting a Key — Case 1

Deleted 6

3.Explain linear probing and chaining

Linear Probing

· Also called linear opening addressing

· Search one by one until a empty slot is found.

· Procedures: suppose b denotes the bucket size.

· Compute h(k).

· Examine the hash table buckets in the order ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens:

· ht[(h(k)+j)%b] has a pair whose key is k; k is found.

· ht[(h(k)+j)%b] is empty; k is not in the table.

· Return to ht[h(k)]; the table is full.

· divisor = b (number of buckets) = 17.

· Bucket address = key % 17. (33) (30)

· Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45

Quadratic Probing

· Suppose i is used as the increment.

· When overflow occurs, the search is carried out by examining h(x), (h(x)+i2)%b, and (h(x)-i2)%b.

· For 1≦i ≦(b-1)/2 and b is a prime number of 4j+3.

· For example, b=3, 7, 11,…,43, 59..

· Chaining

· Disadvantage of linear probing

· Comparison of identifiers with different hash values.

· Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.

4.Explain Breadth first search tree

Breadth First Traversal:

1. The breadth first traversal is similar to the pre-order traversal of a binary tree.

2. The breadth first traversal of a graph is similar to traversing a binary treelevel by level (the nodes at each level are visited from left to right).

3. All the nodes at any level, i, are visited before visiting the nodes at level i + 1.

4. To implement the breadth first search algorithm, we use aqueue.

5. BFS follows the following rules:

6. Select an unvisited node x, visit it, have it be the root in a BFS tree being formed. Its level is called the current level.

7. From each node x in the current level, visit all the unvisited neighbors of x. The newly visited nodes from this level form a new level that becomes the next current level.

8. Repeat step 2 until no more nodes can be visited.

9. If there are still unvisited nodes, repeat from Step 1.

5.Knuth-Morris-Pratt Algorithm

Input:Pattern with m characters

Output: Failure functionffor P[i. .j]

i ←1 j ←0

f(0)←0while i

if P[j] = P[i] f(i)← j+1

i ← i +1 j← j +1

else if

j ← f(j - 1)

else

f(i)←0 i ← i

Note that the failure function f for P, which maps j to the length of the longest prefix

of P that is a suffix of P[1 . .j], encodes repeated substrings inside the pattern itself.

As an example, consider the pattern P = a b a c a b. The failure function, f(j), using above algorithm is

j

a

1

2

3

4

5

P[j]

a

b

a

c

a

b

f(j)

0

0

1

0

1

2

By observing the above mapping we can see that the longest prefix of pattern, P, is "a b" which is also a suffix of pattern P.

Consider an attempt to match at position i, that is when the pattern P[0 . .m-1] is aligned with text P[i. .i+m-1].

T: a

b

a

c

a

a b a c c

P: a

b

a

c

a

Assume that the first mismatch occurs between characters T[ i+ j] and P[j] for 0

Then, T[i . . i + j -1] = P[0 . . j -1] = u

That is, T[ 0 . . 4] = P[0 . . 4] = u, in the example [u = a b a c a] and

T[i + j] ≠ P[j] i.e., T[5] ≠ P[5], In the example [T[5] = a ≠b= P[5]].

When shifting, it is reasonable to expect that a prefix v of the pattern matches some suffix of the portion u of the text. In our example, u= a b a c a and v= a b a ca, therefore, 'a' a prefix ofvmatches with 'a' a suffix ofu. Letl(j) be the length ofthe longest stringP[0 . . j -1] of pattern that matches with text followed by a character c different from P[j]. Then after a shift, the comparisons can resume between characters T[i + j] and P[l(j)], i.e., T(5) and P(1)

T: a b a c a

a b a

c

c

P:

a

b a c

a

b

Note that no comparison betweenT[4]andP[1]needed.

6.Define Time complexity?Explain asymptotic notations

Asymptotic notation gives us a method for classifying functions according to their rate of

Growth.

Big-O Notation

Definition:

· f(n) = O(g(n)) iff there are two positive constants c and n0 such that |f(n)| <= c |g(n)| for all n >= n0

· If f(n) is nonnegative, we can simplify the last condition to 0 <= f(n) <=c g(n) for all n >=n0

· We say that “f(n) is big-O of g(n).”

· As n increases, f(n) grows no faster than g(n).

In other words, g(n) is an asymptotic upperbound on f(n).

Example: n2 + n = O(n3)

Big Ω notation

Definition:

f(n) = Ω(g(n)) iff there are two positive constants c and n0 such that |f(n)| >=c |g(n)|

for all n >=n0

• If f(n) is nonnegative, we can simplify the last condition to 0 <=c g(n) <= f(n) for all n>= n0

• We say that “f(n) is omega of g(n).”

• As n increases, f(n) grows no slower than g(n).In other words, g(n) is an asymptotic lower bound on f(n)

Example: n3 + 4n2 = Ω(n2)

Big-_Θ notation

Definition:

f(n) = Θ(g(n)) iff there are three positive constants c1, c2 and n0 such that

c1|g(n)| <= |f(n)| <= c2|g(n)| for all n >= n0

• If f(n) is nonnegative, we can simplify the last condition to

0 <=c1 g(n) <= f(n) <= c2 g(n) for all n >= n0

• We say that “f(n) is theta of g(n).”

• As n increases, f(n) grows at the same rate as Θg(n). In other words, g(n) is an asymptotically

tight bound on f(n)

7.Explain hash functions?

Hash Functions

•Ahash function, h, is a function which transforms a key from a set,K, into an index in a table of sizen:

h: K -> {0, 1, ..., n-2, n-1}

•A key can be a number, a string, a record etc.

•The size of the set of keys,|K|, to be relatively very large.

•It is possible for different keys to hash to the same array location.

•This situation is called collisionand the colliding keys are called synonyms

A good hash functionshould

Minimizecollisions.

Be easyand quickto compute.

Distribute key values evenlyin the hash table.

Use all the informationprovided in the key.

Common Hashing Functions

Division Remainder (using the table size as the divisor)

•Computes hash value from key using the % operator.

•Table size that is a power of 2 like 32 and 1024 should be avoided, for it leads to more collisions.

•Also, powers of 10 are not good for table sizes when the keys rely on decimal integers.

•Prime numbers not close to powers of 2 are better table size values.

Truncation Digit/CharacterExtraction

•Works based on the distribution of digits or characters in the key.

•More evenly distributed digit positions are extracted and used for hashing purposes.

•For instance, students IDs or ISBN codes may contain common subsequences which may increase the likelihood of collision.

•Very fast but digits/characters distribution in keys may not bevery even

Folding

•It involves splitting keys into two or more parts and then combining the parts to form the hash addresses.

•To map the key 25936715 to a range between 0 and 9999, we can:

·split the number into two as 2593 and 6715 and

·add these two to obtain 9308 as the hash value.

•Very useful if we have keys that are very large.

•Fast and simple especially with bit patterns.

•A great advantage is ability to transform non-integer keys into integer values.

Radix Conversion

•Transforms a key into another number base to obtain the hash value.

•Typically use number base other than base 10 and base 2 to calculate the hash addresses.

•To map the key 55354 in the range 0 to 9999 using base 11 we have:

•We may truncate the high-order 3to yield 8652as our hash address within 0 to 9999.

Mid-Square

•The key is squared and the middle part of the result taken as the hash value.

•To map the key3121into a hash table of size1000, we square it31212 = 9740641and extract406as the hash value.

•Works well if the keys do not contain a lot of leading or trailing zeros.

•Non-integer keys have to be preprocessed to obtain corresponding integer values

Use of a Random Number Generator

•Given a seed as parameter, the method generates a random number.

•The algorithm must ensure that:

•It always generates the same random value for a given key.

•It is unlikely for two keys to yield the same random value.

•The random number produced can be transformed to produce a validhash value.

8.Define an AVL Tree?Give an example for AVL Tree?

An AVL tree is another balanced binary search tree. Named after their inventors, Adelson-Velskii and Landis, they were the first dynamically balanced trees to be proposed. Like red-black trees, they are not perfectly balanced, but pairs of sub-trees differ in height by at most 1, maintaining an O(logn) search time. Addition and deletion operations also take O(logn) time.

Definition of an AVL tree

An AVL tree is a binary search tree which has the following properties:

1. The sub-trees of every node differ in height by at most one.

2. Every sub-tree is an AVL tree.

Balance requirement for an AVL tree: the left and right sub-trees differ by at most 1 in height.

Yes

Examination shows that each left sub-tree has a height 1 greater than each right sub-tree.

No

Sub-tree with root 8 has height 4 and sub-tree with root 18 has height 2

Insertion

As with the red-black tree, insertion is somewhat complex and involves a number of cases. Implementations of AVL tree insertion may be found in many textbooks: they rely on adding an extra attribute, the balance factor to each node. This factor indicates whether the tree is left-heavy (the height of the left sub-tree is 1 greater than the right sub-tree), balanced (both sub-trees are the same height) or right-heavy (the height of the right sub-tree is 1 greater than the left sub-tree). If the balance would be destroyed by an insertion, a rotation is performed to correct the balance.

A new item has been added to the left subtree of node 1, causing its height to become 2 greater than 2's right sub-tree (shown in green). A right-rotation is performed to correct the imbalance.

9.Define B-tree of order m

A B-tree of order m is an m-ary search tree with the following properties:

· The root is either a leaf or has at least two children

· Each node, except for the root and the leaves, has between m/2 and m children

· Each path from the root to a leaf has the same length.

· The root, each internal node, and each leaf is typically a disk block.

· Each internal node has up to (m - 1) key values and up to m pointers (to children)

· The records are typically stored in leaves (in some organizations, they are also stored in internal nodes)

· Figure shows a B-tree of order 5 in which at most 3 records fit into a leaf block .

·

· A B-tree can be viewed as a hierarchical index in which the root is the first level index.

· Each non-leaf node is of the form

(p1, k1, p2, k2,..., km - 1, pm)

where

pi

is a pointer to the ith child, 1 i m

ki

Key values, 1 i m - 1, which are in the sorted order, k1 < k2 < ... < km - 1, such that

· all keys in the subtree pointed to by p1 are less than k1

· For 2 i m - 1, all keys in the subtree pointed to by pi are greater than or equal to ki - 1 and less than ki

· All keys in the subtree pointed to by pm are greater than (or equal to) km - 1

Figure 5.27: Insertion and deletion in B-trees: An example

10.explain Depth First search tree?

The depth first traversal is similar to the in-order traversal of a binary tree.

An initial or source vertex is identified to start traversing, then from that vertexany one vertex which is adjacent to the current vertex is traversed.

To implement the depth first search algorithm, we use astack.

DFS follows the following rules:

Select an unvisited node x, visit it, and treat as the current node

1. Find an unvisited neighbor of the current node, visit it, and make it the new current node;

2. If the current node has no unvisited neighbors, backtrack to the its parent, and make that parent the new current node.

3. Repeat steps 3 and 4 until no more nodes can be visited.

4. If there are still unvisited nodes, repeat from step 1.

11.Explain Heap sort?

Heap Sort

You may have noticed that in a max heap, the largest value can easily be removed from the top of the heap and in the ensuing siftDown() a 'free' slot is made available at what was the end of the heap. The removed value could then be stored in the vacated slot. If this process was repeated until the heap shrank to size 1, we would have a sorted array.

This suggests the basis of a new sorting algorithm, namely Heap Sort.

For sorting general purpose arrays, a heap going from array positions 0 to (n-1) is more appropriate than what we used in our Heap class. In this situation:

parent(i) = (i-1)/2, lchild(i) = 2i+1, rchild(i) = 2i+2.

But first of all it is necessary to convert the array into a heap array. We can do this in two ways:

1. A top-down approach whereby we insert array values one at a time into a heap initially consisting of the first array value. This approach can be summarised with:

for k = 2 to (n-1) // array of size n, a[0] to a[n-1]

siftUp(k, a)

The complexity of this loop is O(

nn2log

)

2. A bottom up approach can also be used to do this whereby 2 smaller heaps and a value are combined to give a bigger heap. Consider the diagram below, which shows two small heaps h1 and h2 and a value x. Because y is the biggest value in h1 and z is the biggest in h2, we can simply merge h1, h2 and x into a new heap by a siftDown() starting at x's position.

The smallest heaps are of size 1 and are the leaf nodes of the binary tree. The last tree node with a leaf node for a left child is at position (n-1)/2 where n is the array size. So by iterating the above described procedure starting from the node at (n-1)/2, we gradually construct a heap. It can be shown that constructing a heap in this way from an array takesn

steps where n is the array size.

This process can be expressed as: for k = (n-1-1)/2 to 0 siftDown(k, a, n)

It can be shown that constructing a heap in this way from an array takes n steps where n is the array size. Loop complexity is O(n). For this reason we use this approach.

Example

For example, consider the array below which is also shown as a binary tree.

Here n = 8. The last node with a child is the one containing 9 at position (8-1)/2 = 3.

The next node to be sifted down is at position 2. It has the value 1. Then positions 1 and 0 are processed.

5

3

1

9

8

2

4

7

9

8

4

7

5

2

1

3

Using the heap for sorting

Now that a heap has been obtained by rearranging a[ ], we begin to sort it by removing the highest priority value from the heap and placing it at the end of the array.

So the largest value is removed from the heap and the heap resized. This requires a siftDown() so that the smaller ‘heap’ is still a heap and this operation takes at most steps. The removed value is then placed at the array position which has been freed by the removal. This is repeated with the next largest value being placed in the second last position and so on. This process is repeated times. n2log 1n

We express this algorithm with:

for k = n-1 to 1

v = a[0] // largest value on heap

a[0] = a[k] // a[k] is last value on heap

siftDown( 0, a, k) // k is now heap size

a[k] = v

12.Explain representation of doubly linked list?

Topic:Doubly Linked Lists- Operations- Insertion, Deletion.

1. In Doubly Linked List ,each node contain two address fields .

2. One address field for storing address of next node to be followed and second address field contain address of previous node linked to it.

3. So Two way access is possiblei.e We can start accessing nodes from start as well as from last .

4. Like Singly Linked List also only Linear but Bidirectional Sequential movement is possible.

5. Elements are accessed sequentially ,no direct access is allowed.

Explanation :

1. Doubly Linked List is most useful type of Linked List.

2. In DLL we have facility to access both next as well as previous data using “next link” and “previous link“.

3. In the above diagram , suppose we are currently on 2nd node i.e at 4 then we can access next and previous data using –

To Access Next Data :curr_node->next->dataTo Access Previous Data :curr_node->previous->data

4. We have to always maintain start node , because it is the only node that is used to keep track of complete linked list.

Adding a Node at the start of the Non-Empty List

As shown in Figure-3 we have a list of length 3. Let us call current first Node as curFirstNode.

In newNode - NEXT points to curFirstNode (newNode->NEXT = head->FIRST) and PREV points to NULL as the newNode itself is the first in the List.

In curFirstNode - PREV points to newNode (HEAD->FIRST->PREV = newNode) and NEXT remains unchanged. (we have to update curFirstNode before HEAD since it holds the pointer to curFirstNode)

In HEAD - FIRST points to newNode (HEAD->FIRST = newNode).

Increment the LENGTH variable in HEAD to maintain the count of number of Nodes in the List.

Pseudocode:

1. newNode->NEXT = HEAD->FIRST

2.

3. HEAD->FIRST->PREV = newNode

4.

5. HEAD->FIRST = newNode

6.

7. newNode->PREV = NULL

8.

9. increment(HEAD->LENGTH)

Output:

Figure 4: After adding newNode at location 1. (Changes in BLUE)

· Adding a Node at the middle of the list.

Now we will see how to add a newNode at the arbitrary location.

Figure 5: Current Preview of the Doubly Linked List.

Let us add a newNode at Location 3.

Traverse the Doubly Linked List until the Location 2. So our curPtr is pointing to Node 2.

In newNode - PREV points to NODE 2 (newNode->PREV = curPtr) and NEXT points to NODE 3 (newNode->NEXT = curPtr->NEXT).

In NODE3 - PREV points to newNode (newNode->NEXT->PREV = newNode) and NEXT remains unchanged.

In NODE2 - NEXT points to newNode (curPtr->NEXT = newNode) and PREV remains unchanged.


Pseudocode:

viewplainprint?

1. newNode->PREV = curPtr

2.

3. newNode->NEXT = curPtr->NEXT

4.

5. newNode->NEXT->PREV = newNode

6.

7. curPtr->NEXT = newNode

8.

9. increment(HEAD->LENGTH)

Output:

Figure 6: After adding newNode at Location 3 (Changes in BLUE)

· Adding a Node at the End of the List

Traverse until the end of the List so that the curPtr points to Last Node in the List.

In Last Node - NEXT points to newNode (curPtr->NEXT = newNode) and PREV remains unchanged.

In newNode - PREV points to Last Node (newNode->PREV = curPtr) and NEXT points to NULL since we are the last Node in the List.


(a) Original Doubly Linked List

(a) After deletion of head node

After deletion of middle node

After deletion of last node

13.explain the quick sort with example?

Quick Sort:

This technique is based on the divide and conquer strategy.In this technique at every step each element is placed in its proper position.It performs very well on longer lists.It works recursively,by first selecting a random pivot value from the list.Then it partitions the list into elements that are less than the pivot and greater than the pivot.The problem of sorting a given list is reduced to the problem of sorting two sublists and process continuous until the list is sorted.This sorting technique is considered as an in-place since it uses no other array storage.

Example:

Consider the list 50,40,20,60,80,100,45,70,105,30,90,75.

pivot

50

40

20

60

80

100

45

70

105

30

90

75

Scanning from right to left ,the fist number has value less than 50 is 30 thus exchange both of them

30

40

20

60

80

100

45

70

105

50

90

75

Scanning from left to right ,the fist number has value greater than 50 is 60 thus exchange both of them

30

40

20

50

80

100

45

70

105

60

90

75

Scanning from right to left ,the fist number has value less than 50 is 45 thus exchange both of them

30

40

20

45

80

100

50

70

105

60

90

75

Scanning from left to right ,the fist number has value greater than 50 is 80 thus exchange both of them

30

40

20

45

50

100

80

70

105

60

90

75

After scanning the number 50 is placed in its proper position and we get two sublists.

30

40

20

45

50

100

80

70

105

60

90

75

Sublist-1 sublist-2

All are <=50all are >=50

After applying the same method again and for new sublists until we get the elements that cannot be sorted further the final list that we get is the sorted list as below

20

30

20

40

45

50

60

70

80

90

100

105

14.explain linear probing with hash table?

Linear Probing Hash Tables

The linear probing hash table is a fairly simple structure where data items are stored directly inside the hash element array. It gets its name by the way it handles the unique problems that exist when storing the data items in the table directly.

Adding Data

Data can be added to the table by calculating the hash code from the key of the data that is being stored, and then placing the data in the corresponding hash element.

However, there are many more possible search keys than there are hash elements. It is inevitable that some keys will resolve to the same hash code. This is known as a hash collision, and it must be handled.

Why doesn't it make sense to have a hash element for each possible piece of data?

Answer

Hash collisions will always occur, but they can be avoided. This is the reason why a good hash function is so important: by spreading the data evenly across the entire table, on the whole there is a smaller chance of collisions occuring.

Resolving Collisions

If faced with a collision situation, the linear probing table will look onto to subsequent hash elements until the first free space is found.

This traversal is known as probing the table; and as it goes by one element at a time, it is linear probing.

There are other kinds of probing; for example quadratic probing is where the traversal skips one element, then two, then four, etc. until a free space is found.

Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In order to resolve the collision, the add algorithm will need to probe the table in order to find the first free space (after 'C').

Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In order to resolve the collision, the add algorithm will need to probe the table in order to find the first free space (after 'C').

If the probe loops back, and finally reaches the same element that it started at, it means that the hash table is full, and can no longer hold any more data. The addition operation will fail.

Algorithm

We can summarise the addition algorithm as follows:

· Calculate a hash code from the key

· Access that hash element

· If the hash element is empty, add straight away

· If not, probe through subsequent elements (looping back if necessary), trying to find a free place

· If a free place is found, add the data at that position

· If no free place is found, the add will fail.

Retrieving Data

In order to retrieve data from the hash table, we need a key to search for. From this key, we can calculate the hash code. This tells us where in the data array we need to start searching.

Because of the collision resolution of the add operation, the target data might reside at a location other than the element referred to by the hash code.

Therefore, it is necessary to probe the hash table until an empty hash element is found, and for an exact match between each data item and the given key. (The probing stops at an empty element, since it signals the end of where potential data might have been stored.)

Consider a situation where 'G' maps to the same hash code as 'B', and a search is undertaken. The retrieval algorithm will start looking at data items starting at that hash code, and continue comparing each hash item's contents for a match with 'G', until either the blank element is found, or (if the array is full) the probing loops back and ends up where the traversal started.

Algorithm

We can summarise the retrieval algorithm as follows:

· Calculate the hash code for the given search key

· Access the hash element

· If the hash element is empty, the search has immediately failed.

· Otherwise, check for a match between the search and data key

· If there is a match, return the data.

· If there is no match, probe the table until either:

· An match is found between the search and data key

· A completely empty hash element is found.

Deleting Data

The basic deletion operation is merely a retrieval followed by data removal (clearing the hash element, once the target has been found.)

Unfortunately, this has a negative side-effect on the way the retrieval works. Since the data retrieval operation relies on blank hash elements as the signal to stop probing, there is the possibility that a deletion operation will render some data items unfindable. Consider where a search for 'R' (which has the same hash code as 'A') is attempted, after 'D' has been deleted:

The data 'R' will never be found, as the probing had terminated too early; this is due to the hash element that stored 'D' (and kept the probing going) being deleted.

The solution to this problem is to define two different kinds of blank hash elements:

· a purely empty element, which has never stored data; and

· an "empty but deleted" element, which stored data that has since been deleted.

These can be used to differentiate the situations in how a clear hash element came to exist; something that will be necessary to make the hash retrieval work again.

When a data item is deleted, it is not completely cleared, but instead has the "empty but deleted" mark. The retrieval function must then be modified so that it will terminate probing only on a purely empty element, and continue probing if an "empty but deleted" element is encountered.

An add operation can store data in the "empty but deleted" element.

As the deleted flag is only necessary to continue searching, adding data to one of these elements makes it work like just another normal element again (as far as the probing algorithm is concerned.)

Table Pollution

The 'deleted flag' approach is a relatively convenient way to resolve the problem of bridging together clusters of data for allowing data retrieval, however it has the side effect of causing those retrievals to take longer.

The reason for this is because over time, as the data structure is being manipulated, the number of deleted flags increase; as data is added and then deleted, what was once a purely empty element, becomes an empty but deleted element. Recall that for the 'deleted flag' technique to work, the retrieval can only terminate on a purely empty hash element.

These deleted flags may bridge together otherwise unrelated data (in terms of the search algorithm), and will gradually deteriorate the structure's efficiency. In the worst case, the hash table may prove no better than a basic linear search.

The only solution to the table pollution issue is to create a brand new hash table, add the existing data into the new table, and discard the old one.

Simply deleting the deleted flags won't work; they are an integral part of the probing process. It's just sheer coincidence that in the above diagram, none of the data items happened to move.

In a linear probing hash table, the place where data is actually stored depends a lot on what has happened in the table; past data collisions, earlier deletions and re-additions, etc.

cse2a.files.wordpress.com · Web view1.Explain the operations of linear list in detail. Linear Data...

Documents

Transcript of cse2a.files.wordpress.com · Web view1.Explain the operations of linear list in detail. Linear Data...