DSC - Handout

7/31/2019 DSC - Handout

1/85

Handout: Data Structures with CVersion: DSC/Handout/0307/2.1

Date: 05-03-07

Cognizant

500 Glen Pointe Center West

Teaneck, NJ 07666

Ph: 201-801-0233

www.cognizant.com


2/85

Data Structures with C

Page 2 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

TABLE OF CONTENTS

Introduction ................................................................................................................................4

About this Document..... .......... ........... ............ .......... ........... ........... .......... ............ .......... ........... .4

Target Audience.........................................................................................................................4

Objectives ..................................................................................................................................4

Pre-requisite ..............................................................................................................................4

Session 1: Introduction to Data Structure .................................................................................5

Learning Objectives ...................................................................................................................5

Overview....................................................................................................................................5

Summary ...................................................................................................................................9

Test your Understanding..........................................................................................................10

Session 2: Arrays ......................................................................................................................11

Learning Objectives .................................................................................................................11

Overview..................................................................................................................................11

Summary .................................................................................................................................20


Session 4: Linked Lists .............................................................................................................21


Linked lists ...............................................................................................................................21 Summary .................................................................................................................................32


Session 6: Sorting and Searching............................................................................................33


Sorting .....................................................................................................................................33

Summary .................................................................................................................................43


Session 8: Trees ........................................................................................................................45


Overview: .................................................................................................................................45

Summary .................................................................................................................................56



3/85


4/85



C3: Protected

Introduction

About this Document

This module provides the participants with the basic knowledge to understand data structuresand to measure the performance of various algorithms used in different problems.

Target Audience

In-Campus Trainees

Objectives

Acquire the basic knowledge on data structures

Select the appropriate data structures for the application

Analyze the complexity of the algorithm

Apply data structures using data structures

Pre-requisite

The participants must have basic knowledge in writing programs using C.


5/85



C3: Protected

Session 1: Introduction to Data Structure

Learning Objectives

After completing this chapter, you will be able to:

Define a data structure

List the types of data structures

Identify how to analyze and select data structure for a particular application

Overview

Study of computer science involves study of organization, manipulation and utilization of data in acomputer in order to improve the efficiency of the processor and memory.

Data type and data structure

Data can be represented in the form of binary digits in memory. A binary digit can be stored usingthe basic unit of data called bit. A bit can represent either a zero or a one.

Data type A data type defines the specification of a set of data and the characteristics for that data. Data typeis derived from the basic nature of data that are stored for processing rather from their implementation.

Data StructureData structure refers to the actual implementation of the data type and offers a way of storing datain an efficient manner. Any data structure is designed to organize data to suit a specific purpose sothat it can be accessed and worked in appropriate ways both effectively and efficiently. Incomputer programming, a data structure may be selected or designed to store data for thepurpose of working on it by various algorithms.

The choice of a data structure begins from the choice of an abstract data type. Data structures areimplemented using the data types, references and operations on them that are provided by aprogramming language.

Example data structures include:

Arrays Stacks

Queues

Linked Lists


6/85



C3: Protected

Abstract Data Types (ADT) An Abstract Data Type (ADT) defines data together with the operations. ADT is specifiedindependently of any particular implementation. ADT depicts the basic nature or concept of thedata structure rather than the implementation details of the data. A stack or a queue is an exampleof an ADT. Both stacks and queues can be implemented using an array or using a linked list.

Types of Data StructuresThe different types of data structures include linear data structures, hash tables and non linear data structures. The structure of a data file defines how records, or rows of data, are related tofields, or columns of data.

Linear structures

A data structure is said to be linear if its elements form a sequence or a linear list.

Some of the linear structures are:

Array: Fixed-size

Linked-list: Variable-size

Stack: Add to top and remove from top

Queue: Add to back and remove from front

Priority queue: Add anywhere, remove the highest priority

Possible operations on these linear structures include:

Traversal: Travel through the data structure

Search: Traversal through the data structure for a given element

Insertion: Adding new elements to the data structure

Deletion: Removing an element from the data structure

Sorting: Arranging the elements in some type of order

Merging: Combining two similar data structures into one

Hash table A hash table , or a hash map , is a data structure that associates keys with values. A functiontermed as Hash function is applied on the key to find the address of the record.

Non linear structures A data structure is said to be non linear if its elements are not in a sequence. The elements in thedata structure are not arranged in a linear manner; rather it has a branched structure.

Some of the non linear structures are:

Tree: Collection of nodes represented in hierarchical fashion

Graph: Collection of nodes connected together through edges


7/85



C3: Protected

Selecting a Data StructureData structures that suit certain applications may not suit certain other applications. The choice of the data structure often begins from the choice of an abstract data structure an abstract storagefor data defined in terms of the set of operations to be performed on data and computationalcomplexity for performing these operations, regardless of the implementation in a concrete datastructure.

Selection of an abstract data structure is crucial in the design of efficient algorithms and inestimating their computational complexity, while selection of concrete data structures is importantfor efficient implementation of algorithms. The names of many abstract data structures andabstract data types match the names of concrete data structures.

In the design of many types of programs, the choice of data structures is a primary designconsideration, as experience in building large systems has shown that the difficulty of implementation and the quality and performance of the final result depends heavily on choosingthe best data structure.

Performance Analysis and MeasurementsPerformance analysis is often made in terms of best , worst and average cases of a givenalgorithm. This expresses the resource usage as minimum, maximum, and average respectively.The resource includes the running time, memory and any other resource. In real-time computing,the worst case execution time is often of particular concern since it is important to know how muchtime might be needed in the worst case to guarantee that the algorithm would always finish ontime.

Average performance and worst case performance are the most used in algorithm analysis. Lesswidely found is best case performance. The best case performance is measured usually toimprove accuracy of an overall worst case analysis. Computer scientists use probabilistic analysistechniques, especially expected value, to determine expected average running times.

Worst case performance analysis and average case performance analysis have similarities, butusually require different tools and approaches in practice.

Determining what average input means is difficult. The complexity is analyzed based on the inputin general. Based on the nature of input, it is difficult to analyze equations in average case, andhence it is difficult to characterize the complexity mathematically.

Worst case analysis has similar problems. Typically it is difficult to determine the exact worst casescenario. Instead, a scenario is considered which is at least as bad as the worst case. For example, when analyzing an algorithm, it may be possible to find the longest possible path throughthe algorithm.

It is always important to find the efficiency of an algorithm with respect to the following:

CPU (time) usage

memory usage

disk usage

network usage


8/85


9/85



C3: Protected

Here, either the sequence of statements 1 will be executed or sequence of statements 2will be executed. So, the worst case complexity for the entire selection statement dependson the complexity of sequence 1 and sequence 2. If sequence 1 has the complexity O(1)and sequence 2 has the complexity O(N), the worst case complexity is taken as O(N).

Looping statement (for)

for (condition)Sequence of simple statements;

Here, considering that the loop executes N times, the complexity can be given by N * O(1)which is equivalent to O(N).

Nested loopsfor (condition 1)

for (condition 2)Sequence of simple statements;

Here, considering that the outer loop executes N times and the inner loop executes M

times, the complexity can be given by N * M * O(1). i.e., the complexity can be given asO(N*M)

Summary

Study of data structure deals with the actual implementation of the data type andoffers a way of storing data in an efficient manner.

An Abstract Data Type (ADT) is a data type together with the operations, whoseproperties are specified independently of any particular implementation

The different types of data structure available are:o Linear o Hash tableo Treeso Graphs

A well-designed data structure allows a variety of critical operations to be performed,using as few resources, both execution time and memory space, as possible.

Big O Notation can be made use of for the analysis of the complexity of algorithms.


10/85



C3: Protected

Test your Understanding

1. The complexity of an algorithm which finds the sum of n numbers will bea. O(n log n)b. O(n 2)

c. O(n)d. O(2n)

2. ParentChild relationship can be considered as a linear data structurea. Trueb. False

Answers1. c2. b


11/85



C3: Protected

Session 2: Arrays

Learning Objectives


Define arrays

Use arrays as data structures

Overview

An array is a collection of individual values of the same data type stored in consequent memorylocations.

An array index (positioning in the array) usually starts from 0. We can even specify the value fromwhich the index should start depending on the language we use.Here is an array of integers:myArray

0 1 2 3 4 Array positions/Index

Declaring an array in C

int CArray[10];

Referring to elements of the arrayThe position of an element in an array is given by the index. The name of the array, followed bythe index, is used to refer to a particular element:myArray[1] = 5;

The above statement assigns the value 5 to the element at the position 1(second element) of thearray, myArray.

Using elements of an array

Elements of the array can be used in the same way as variables of the same data type can beused. i.e. an element of an array of integers can be used anywhere an integer variable can beused.printf ('The fifth element of the array is %d', myArray[4]);

Array values13 5 12 3 6


12/85



C3: Protected

The above statement prints the 5 th element in myArray. i.e, it will print as follows:

The fifth element of the array is 6

Example: Assigning values to each element of the array

for ( count = 0 ; count < 5 ; count++)

{

evens[count] = 2 * count;

}

The above piece of code will construct an array evens as given below

0 2 4 6 8

0 1 2 3 4 Array index

Multi Dimensional ArraysThese are the arrays which has more than one dimension. For example, the following declarationin C creates a two-dimensional array of two rows and two columns:int myArray1[4,2]

The following declaration creates an array of three dimensions, 2, 2, and 3:int myArray2[4,2,3];

Initialization

The following piece of code initializes the arrays myArray1 and myArray2 myArray1 = {(1, 2), (3, 4)}myArray2 = {(1, 2), (3, 4), (5, 6)}In a matrix form the above array can be represented as below

myArray11 23 4

myArray21 23 45 6

Arra values


13/85



C3: Protected

Memory Organization in an array Array elements occupy contiguous locations in memory. The array elements are accessed usingtheir index. A function is needed to translate an array index to the address of the indexed element.

For a single dimensional array the address can be calculated as below:

Address = Base Address + (Index Base Index) * Size

Where,

Base Index represents the value of the first index in the array

Size represents the size of a single element in bytes

Advantages and disadvantages of an array

Advantages

Array data structure is simple to use.

Elements in an array are stored in contiguous memory locations and hence eachelement can be accessed directly using their index.

Allocation and de-allocation of memory is done automatically by the computer.

Disadvantages

Elements in an array are stored in contiguous memory locations and hence array cannot be stored if the available memory is non contiguous. i.e. if the size of the array is nbytes, then there should be n contiguous bytes available in memory.

The array size is fixed and hence the size of the array can not be reduced or increased at run time based on the requirement.

Stacks A stack is a homogeneous collection of items of any one type, arranged linearly with access at oneend only, known as the top. This means that data can be added or removed from only the top.Formally this type of stack is called a Last In First Out (LIFO) stack. Data is added to the stackusing the Push operation, and removed using the Pop operation.

In order to clarify the idea of a stack here is an example. Think of a number of plates kept in acafeteria. When the plates are being stacked, they are added one on top of each other. It doesn'tmake much sense to put each plate on the bottom of the pile, as that would be far more work.Similarly, when a plate is taken, it is usually taken from the top of the stack.

Stack consists of two parts: Storage space within stack that contains the elements of a stack.

Top of stack that refers to the element pushed recently.


14/85



C3: Protected

A stack can be implemented either using an array or a linked list.

Stack implementation using an arrayTop is an integer value, which contains the array index for the top of the stack. Each time data is

pushed or popped , top is incremented or decremented accordingly, to keep track of the current topof the stack. By convention, an empty stack is indicated by setting top to be equal to -1.

Stacks implemented as arrays are useful if a fixed amount of data is to be used. However, if theamount of data is not a fixed size or the amount of the data fluctuates widely during the stack's lifetime, then an array is a poor choice for implementing a stack.

Any recursive call is implemented with the help of a stack by the computer. The size of the stackcan not be predicted in recursion, and implementing the stack using array is a poor choice in this

Algorithm to implement the operations using array

Push:if(top>=total_no_elements)

return(1); // Error code

else

{

printf("\n Enter the element \n");scanf("%d",&stack[top]);

top++;

}


15/85



C3: Protected

Pop:if(top==0)

{

printf("\n STACK EMPTY \n");

}

else

{

top--;

printf("\n\nPopped element = %d\n",stack[top]);

}

Display:if(top==0)

{

printf("\n STACK IS EMPTY \n");

}

else

{

printf("\n The elements inside the stack are :\n");

for(j=top-1;j>=0;j--)

{

printf("\n%d",stack[j]);

}

}

Stack operations:

Operation Description Return type Requirement

PushThis operation adds or pushesanother item onto the stack.

Data typeThe number of items on thestack is less than n.

Pop:This operation removes an item fromthe stack.

Data typeThe number of items on thestack must be greater than 0.

Top:This operation returns the value of theitem at the top of the stack.

Data typeNote: It does not remove thatitem.

Is Empty:This operation returns true if the stackis empty and false if it is not.

Boolean

Is Full: This operation returns true if the stackis full and false if it is not.

Boolean


16/85



C3: Protected

Queues A queue is data structure in which elements are accessed from two different ends called Front andRear. The elements are inserted into a queue through the Rear end and are removed from theFront end. The principle used in queue is "First In First Out" or FIFO.

There are two basic operations associated with a queue : enqueue and dequeue .

Enqueue means adding a new item to the rear end of the queue. The rear end always points to therecently added element.

Dequeue refers to removing the item from front end of the queue. The front end always points tothe recently removed element.

Theoretically, a queue does not have a specific capacity. Regardless of how many elements arealready contained, a new element can always be added. It can also be empty, at which pointremoving an element will be impossible until a new element has been added again.

A practical implementation of a queue using arrays does have some capacity limit. For a datastructure the executing computer will eventually run out of memory, thus limiting the queue size.Queue overflow results from trying to add an element into a full queue and queue underflowhappens when trying to remove an element from an empty queue.

A queue consists of two major variables Front and Rear . Front refers to the first position of thequeue and Rear refers to the last position of the queue.

Types of queues

Circular queue

A circular queue is one in which the insertion of a new element is done at the very first location of the queue if the last location of the queue is full. i.e. circular queue is one in which the first elementcomes just after the last element.

A circular queue overcomes the problem of unutilized space in linear queues implemented asarrays. A circular queue also have a Front and Rear to keep the track of elements to be deletedand inserted and therefore to maintain the unique characteristic of the queue . The assumptionsmade are:

1. Front will always be pointing to the first element2. If Front =Rear , the queue is empty3. Each time a new element is inserted into the queue the Rear is incremented by one.4. Each time an element is deleted from the queue the value of Front is incremented by one


17/85



C3: Protected

Example: Circular Queue

Inserting and deleting elementsInsertion and deletion of elements in a circular queue is the same as that in a linear queue exceptthat whenever an element is deleted from the front of the queue, the rear pointer can be made topoint to the vacant position and the element can be inserted there once the queue is full.

Before insertion

Q[0] Q[1]

Q[2]

Q[3]

Q[4]

5 10

20

Q[3]

Q[4]

Front

Rear

5 10

20

30

40

Front

Rear


18/85



C3: Protected

After inserting two elements 30 and 40 Queue full

Deletion in a circular queueNow Q[0] will be available in the queue for another insertion.

Double Ended Queues

Double ended queue is a homogeneous list of elements in which insertion and deletion operationsare performed from both the ends. They are also called as deque .

There are two types of deques Input-restricted deques and Output-restricted deques

The major operations involved are:

Insertion of an element at the Rear end of the queue.

Deletion of an element from the Front end of the queue

Insertion of an element at the Front end of the queue

Deletion of an element from the Rear end of the queue

For an input-restricted deque , all the four operations mentioned above are valid. For an output-restricted deque , all the above points except the fourth are valid.

Priority QueueIn priority queues, the items added to the queue have a priority associated with them whichdetermines the order in which they exit the queue. Items with highest priority are removed first.

A priority queue is an abstract data type supporting the following three operations:

add an element to the queue with an associated priority

remove the element from the queue that has the highest priority, and return it

(optionally) peek at the element with highest priority without removing it

The simplest way to implement a priority queue data type is to keep an associative array mappingeach priority to a list of elements with that priority

Q[0] 10

20

30

40

Front

Rear


19/85



C3: Protected

Applications of queues

Round robin technique for processor scheduling uses the concept of queues

Railway ticket reservation center is designed using queues to store customer information

Printer server routines are designed using queues

Scheduling and buffering queues A queue is natural data structure for a system to serve the incoming requests. Most of the processscheduling or disk scheduling algorithms in operating systems use queues. Computer hardwarelike a processor or a network card also maintain buffers in the form of queues for incomingresource requests. A stack like data structure causes starvation of the first requests, and is notapplicable in such cases. A mailbox or port to save messages to communicate between two usersor processes in a system is essentially a queue like structure.

Search space explorationLike stacks, queues can be used to remember the search space that needs to be explored at onepoint of time in traversing algorithms. Breadth first search of a graph uses a queue to remember

the nodes yet to be visited.

Implementation of queue using arrayInserting an element into a queueif( rear ==max_no_of_elements)

rear =0;

elserear = rear +1;

if( rear == front )

{

printf("QUEUE OVERFLOW \n");

if( rear ==0)rear =max_no_of_elements-1;

elserear = rear -1;

break;

}

else

{

printf("\n Enter the elements which you want to insert:\n");

scanf("%d",&x);

queue[ rear ]=x;}


20/85



C3: Protected

Deletion of an element from a queueif(front==rear)

printf(" QUEUE UNDERFLOW \n ");

else

{

if( front == (max_no_of_elements -1) )

front=0;

else

front=front+1;

x=queue[front];

}

In a stack, each new data item is stored at the top of the stack. Top points to the top of the stackin the figure. When a new data is added, the data is stored in the Top position and the Top pointer is increased.

Summary

An array is a collection of individual values of the same data type stored in adjacentmemory locations

A stack is a homogeneous collection of items of any one type, arranged linearly withaccess at one end only, known as the top. The two major operations available for a stackinclude push(adding an element) and pop(deleting an element)

A collection of items in which only the earliest added item may be accessed. Basicoperations are add (to the tail) or enqueue and delete (from the head) or dequeue .

The major variations for queues are double ended queue, circular queue and priority queue


1. The elements inserted in order A, B, C, D are traversed in stack asa. ABCDb. DCBAc. ADCBd. None of the above

2. The size of an array can be ---a. Extendedb. Reducedc. Either a or bd. Neither a nor b

Answers1. b2. d


21/85



C3: Protected

Session 4: Linked Lists

Learning Objectives


Define linked list

Implement linked list operations in your program

Linked lists

A linked list can be viewed as a group of items, each of which points to the item in itsneighbourhood. An item in a linked list is known as a node. A node contains a data part and one or two pointer part which contains the address of the neighbouring nodes in the list. Linked list is adata structure that supports dynamic memory allocation and hence it solves the problems of usingan array.

Types of linked listsThe different types of linked lists include:

Singly linked lists

Circular linked lists

Doubly linked lists

Simple/Singly Linked ListsIn singly linked lists, each node contains a data part and an address part. The address part of thenode points to the next node in the list.Node Structure of a linked list

Data part Link part

An example of a singly linked list can be pictured as shown below. Note that each node is picturedas a box, while each pointer is drawn as an arrow. A NULL pointer is used to mark the end of the

list.


22/85



C3: Protected

The head pointer points to the first node in a linked listIf head is NULL, the linked list is empty

A head pointer to a list

Possible Operations on a singly linked list

Insertion: Elements are added at any position in a linked list by linking nodes.

Deletion: Elements are deleted at any position in a linked list by altering the links of theadjacent nodes.

Searching or Iterating through the list to display items.

To insert or delete items from any position of the list, we need to traverse the list starting from its

root till we get the item that we are looking for.

Implementation of a singly linked list

Creating a linked list A node in a linked list is usually a structure in C and can be declared asstruct Node{

int info;Node *next;

}; //end struct

A node is dynamically allocated as follows:Node *p;p = new Node;

For creating the list, the following code can be used:do{

Current_node = malloc (sizeof (node) );Current_node->info=input_value;Current_node->next=NULL;if(root_node==NULL) // the first node in the list

root_node=Current_node;else

previous_node->next=Current_node;previous_node=Current_node;scanf("%d",&input_value);

} while(x!=-999);


23/85



C3: Protected

The above given code will create the list by taking values until the user inputs -999.

Inserting an element After getting the position and element which needs to be inserted, the following code can be usedto insert an element to the list

if(position==1||root_node==NULL){

Current_node->next=root_node;Root_node=Current_node;

}else{

counter=2;temp_node=root_node;while((counternext;

}Current_node->next=temp_node->next;temp_node->next=Current_node;

}

The following figure illustrates how a node is inserted at an intermediate position in the list.

The following figure illustrates how a node is inserted at the beginning of the list.

To insert a node between two nodes


24/85



C3: Protected

Deleting an element After getting the element to be removed, the following code can be used to remove the particular element.

temp_node=root_node;

if ( root_node != NULL )if ( temp_node->info == input_element )

{

root_node=root_node->next;

return;

}

While ( temp_node != NULL && temp_node->next->info !=input_element )

temp_node = temp_node->next;

if ( temp->next != NULL )

{

delete_node = temp_node->next;temp_node->next=delete_node->next;

free ( delete_node ) ;

}

The following figures illustrate the deletion of an intermediate node and the deletion of the firstnode from the list.

To insert a node at the beginning of a linked list


25/85



C3: Protected

To display the elements of the list

temp_node = root_node;

while(temp_node != NULL)

{

printf("%d\t", temp_node->info);

temp_node = temp_node->next;

}

The following figure illustrates the above piece of code.

Deleting an intermediate node from a linked list

Deleting the first node

The effect of the assignment temp_node = temp_node->next


26/85



C3: Protected

Efficiency and advantages of Linked Lists

Although arrays require same number of comparisons, the advantage lies in the factthat no items need to be moved after insertion or deletion.

As opposed to fixed size of arrays, linked lists use exactly as much memory as isneeded.

Individual nodes need not be contiguous in memory.

Doubly Linked List A more sophisticated kind of linked list is a doubly-linked list or a two-way linked list. In a doublylinked list, each node has two links: one pointing to the previous node and one pointing to the nextnode.

Node structure

Previous Link Data Next Link

An example of a doubly linked list

Implementation of a doubly linked listAdding an element to the list

To add the first nodefirst_node->next = NULL;

first_node->data = input_element;

first_node->prev = NULL;

To add a node at the position specifiedTemp_node = *first_node;

for ( counter = 0 ; counternext;

}

new_node->next = temp_node->next;

temp_node->next->new_node;

new_node->prev = temp_node->next->prev;

temp_node->next->prev = new_node;


27/85



C3: Protected

Deleting a particular element from the listTemp_node = *first_node;

If ( temp_node->data = = input_element )

First_node = first_node->next;

else

{

while ( temp_node != NULL && temp_node->next->data !=input_element)

temp_node = temp_node -> next;

delete_node=temp_node->next;

temp_node->next=delete_node->next;

delete_node->next->prev=temp_node;

free(delete_node);

}

Circular Linked Lists

In a circularly-linked list, the first and final nodes are linked together. In another words, circularly-linked lists can be seen as having no beginning or end. To traverse a circular linked list, begin atany node and follow the list in either direction until you return to the original node. This type of listis most useful in cases where you have one object in a list and wish to see all other objects in thelist.

The pointer pointing to the whole list is usually called the end pointer .

Singly-circularly-linked listIn a singly-circularly-linked list, each node has one link, similar to an ordinary singly-linked list,except that the link of the last node points back to the first node. As in a singly-linked list, newnodes can only be efficiently inserted after a node we already have a reference to. For this reason,it's usual to retain a reference to only the last element in a singly-circularly-linked list, as this allowsquick insertion at the beginning, and also allows access to the first node through the last node'snext pointer. The following figure shows a singly circularly linked list.

Doubly-circularly-linked list

In a doubly-circularly-linked list, each node has two links, similar to a doubly-linked list, except thatthe previous link of the first node points to the last node and the next link of the last node points tothe first node. As in doubly-linked lists, insertions and removals can be done at any point withaccess to any nearby node.

10 20 30 40


28/85



C3: Protected

The following figure illustrates a doubly circularly linked list

Circularly-linked list vs. linearly-linked listCircularly linked lists are useful to traverse an entire list starting at any point. In a linear linked list,it is required to know the head pointer to traverse the entire list. The linear linked list cannot betraversed completely with the help of an intermediate pointer.

Access to any element in a doubly circularly linked list is much easier than in a linearly linked listsince the particular element can be approached in two directions. For example to access anelement present in the fourth node of a circularly linked list having five elements, it is enough tostart from the last node and traverse the list in the reverse direction to get the value in the fourth

node.

Implementation of a circular linked list:Creating the list

while (input_element != -999)

{

new_node=(struct node *) malloc (size);

new_node->info=input_element;

if ( root_node==NULL )

root_node=new_node;

else

( *last_node )->next=new_node;

(*last_node)=new_node;

scanf("%d",&input_element);

}

if(root!=NULL)

new->next=root;

return root;

10 20 30 40


29/85



C3: Protected

Inserting elements into the list After getting the position and value to be inserted, the following code can be followed:

new_node=(struct node *)malloc(sizeof(struct node));

new_node-> info=input_element;

if((position==1)||((*root_node)==NULL))

{

new_node->next =*root_node;

*root_node = new_node;

if((*last_node)!=NULL)

(*last_node)->next=*root_node;

else

*last_node=*start_node;

}

else

{

temp_node=*root_node;

counter=2;

while ( (counternext !=(*root_node) ) )

{

temp_node=temp_node->next;

++counter;

}

if(temp_node->next==(*root_node))

*last_node=new_node;

new_node->next=temp_node->next;

temp_node->next=new_node;

}

Deleting an element from the list

After getting the element to be deleted, the following code can be used:If(* front _node != NULL)

{

printf(The item deleted is %d,(* front _node->info));

If (* front _node == * rear _node)

{

* front _node = * rear _node = NULL;}

else

{

* front _node = * front _node->next;

* rear _node->link = * front _node;

}


30/85



C3: Protected

}

Stacks and queues using pointers

One disadvantage of using an array to implement a stack or queue is the wasted space---most of the time most of the most of the space in the array is unused. A more elegant and economicalimplementation of a stack or queue uses a linked list.

Here is a sketch of a linked-list-based stack that holds 1, then 5, and then 20 at the bottom:

The list consists of three cells, each of which holds a data object and a link to another cell. Avariable, top, holds the address of the first cell in the list.

An empty stack looks like this:

Top NULLImplementing stacks as linked lists provides a feasibility on the number of nodes by dynamicallygrowing stacks, as a linked list is a dynamic data structure. The stack can grow or shrink as theprogram demands it to.

Algorithm to implement stack operations using pointers:

Pushnode=(struct stack*)malloc(sizeof(struct stack));

printf("\n\n Enter the data ");scanf("%d",&node->data);

node->link=top;

top=node;

Popif(top==NULL)

return(1); //Error code

else

{

printf("\n \n Item deleted is %d ",top->data);

top=top->link;

}

NULLTop

1 5 20


31/85



C3: Protected

Displayi=top;

if(top==NULL)

return(1); //Error code

else

{

printf(" \n\n ELEMENTS ARE : \n");

while(i!=NULL)

{

printf("%d\n\n",i->data);

i=i->link;

}

}

Implementation of queues using lists is very similar to the implementation of stacks, except that inthis case items join the queue at the back and leave at the front . If the queue is represented by thelist [5, 2], adding a new item 3 will give the list [5, 2, 3]. In other words new items are added to theend of the list. Removing an item from the queue will be done from the front .

A pictorial representation of a queue being implemented as a linked list is given below. Thevariable rear points to the last item in the queue.

Rear

Algorithm to represent queue operations using pointers

Inserting an elementnew_element->link = NULL;

if (front==NULL)

front = new_element;

else

rear->link = new_element;

rear = new_element;

Deleting an elementtemp = front ;front = front ->link;

free (temp);

5 2 3Front NULL


32/85



C3: Protected

Summary

A linked list is a collection of elements called nodes, each of which contains a dataportion and a pointer to the node following that one in the linear ordering of the list.

A singly linked list is a dynamic data structure which can grow and shrink depending

upon the operations made. It has a single pointer which points to the successive nodein the list.

A doubly linked list is one in which all nodes are linked together by multiple number of links which help in accessing both the successor node and the predecessor node froma given node position. I t provides bi-directional traversing.

A circular linked list is the one which has no end. i.e the link field of the last node doesnot point to NULL, rather it points back to the beginning of the linked list.

Stacks and queues can be more efficiently implemented using pointers rather than byusing arrays.


1. The last node of a linear linked list ______.a. Has the value nullb. Has a next reference whose value is nullc. Has a next reference which references the first node of the listd. Cannot store any data

2. To delete a node N from a linear linked list, you will need to ______.a. Set the link in the node that precedes N to link in the node that follows Nb. Set the link in the node that precedes N to link Nc. Set the link in the node that follows N to link in the node that precedes Nd. Set the link in N to link in the node that follows N

3. Write a function that removes all duplicate elements from a linear linked list.

4. Write a function to print the elements in reverse order of a singly linked list.

5. Write a function to find the largest element in a circular linked list.

Answers1. b2. b


33/85



C3: Protected

Session 6: Sorting and Searching

Learning Objectives


Explain the concepts of sorting and searching

List the advantages of each technique

List the limitations of each technique

Sorting

Sorting refers to ordering data in an increasing or decreasing fashion according to some linear relationship among the data items.

Sorting can be done on names, numbers and records. Sorting reduces the For example, it isrelatively easy to look up the phone number of a friend from a telephone dictionary because thenames in the phone book have been sorted into alphabetical order. This example clearly illustratesone of the main reasons that sorting large quantities of information is desirable. That is, sortinggreatly improves the efficiency of searching. If we were to open a phone book, and find that thenames were not presented in any logical order, it would take an incredibly long time to look upsomeones phone number.

Sorting can be performed using several methods, they are:

Selection Sort.In this method, the successive elements are selected in order and are placed in their proper sortedpositions.

Insertion sort.In this method, sorting is done by inserting elements into an existing sorted list. Initially, the sortedlist has only one element. Other elements are gradually added into the list in the proper position.

Bubble Sort.In this method, the entire file will be passed through several times. Each pass will compare eachelement with its successor and putting the element in the proper position.

Merge Sort.In this method, the elements are divided into partitions until each partition has sorted elements.Then, these partitions are merged and the elements are properly positioned to get a fully sortedlist.


34/85



C3: Protected

Quick Sort.In this method, an element called pivot is identified and that element is fixed in its place by movingall the elements less than that to its left and all the elements greater than that to its right.

Radix Sort.In this method, sorting is done based on the place values of the number. In this scheme, sorting isdone on the less-significant digits first. When all the numbers are sorted on a more significant digit,numbers that have the same digit in that position but different digits in a less-significant positionare already sorted on the less-significant position.

Heap SortIn this method, the file to be sorted is interpreted as a binary tree. Array, which is a sequentialrepresentation of binary tree, is used to implement the heap sort.

In this chapter, focus is given to bubble sort, quick sort and heap sort.

The basic premise behind sorting an array is that its elements start out in some random order and

need to be arranged from lowest to highest.

It is easy to see that the list1, 5, 6, 19, 23, 45, 67, 98, 124, 401

is sorted, whereas the list4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345

is not. The property that makes the second one "not sorted" is that there are adjacent elementsthat are out of order. The first item is greater than the second instead of less, and likewise the thirdis greater than the fourth and so on. Once this observation is made, it is not very hard to devise asort that proceeds by examining adjacent elements to see if they are in order, and swapping themif they are not.

Bubble SortThis sorting technique is named so because of the logic is similar to the bubble in water. When abubble is formed it is small at the bottom and when it moves up it becomes bigger and bigger i.e.bubbles are in ascending order of their size from the bottom to the top. This sorting methodproceeds by scanning through the elements one pair at a time, and swapping any adjacent pairs itfinds to be out of order.


35/85



C3: Protected

Example 6.1Input sequence: 34 8 64 51 32 21

After iteration Altered sequence# after an iteration # of swaps

------------------------------------------------------------------------1 8 34 51 32 21 64 42 8 34 32 21 51 64 23 8 32 21 34 51 64 24 8 21 32 34 51 64 15 8 21 32 34 51 64 06 8 21 32 34 51 64 0

Each pass consists of comparing each element in the file with its successor ( x [i ] > x [i +1])

Swap the two elements if they are not in proper order. After each pass i , the largest element x [n-(i-

1)] is in its proper position within the sorted array.

Bubble Sort - Algorithmbubble(int x[], int n)

{

int hold, j, pass;

int switched = TRUE;

for (pass = 0; pass < n - 1 && switched == TRUE; pass++)

{

switched = FALSE;

for (j = 0; j < n-pass-1; j++)

if (x[j] > x[j+1]){

switched = TRUE; /* swap x[j], x[j+1] */

hold = x[j];

x[j] = x[j+1];

x[j+1] = hold;

}

} /* it stops if there is no swap in the pass */

}

In the first pass, n-1 items have to be scanned. On the second pass, the second largest item will

move to its correct position, and on the third pass (stopping at item n-3) the third largest will be inplace. It is this gradual filtration, or bubbling of the larger items to the top end that gives this sortingtechnique its name.


36/85



C3: Protected

There are two ways in which the sort can terminate with everything in the right order. It couldcomplete by reaching the n-1st pass and placing the second smallest item in its correct position.

Alternatively, it could find on some earlier pass that nothing needs to be swapped. That is, alladjacent pairs are already in the correct order. In this case, there is no need to go on tosubsequent passes, for the sort is complete already. If the list started in sorted order, this wouldhappen on the very first pass. If it started in reverse order, it would not happen until the last one.

Quick SortIn this sort an element called pivot is identified and that element is fixed in its place by moving allthe elements less than that to its left and all the elements greater than that to its right. Since itpartitions the element sequence into left, pivot and right it is referred as a sorting by partitioning.Instead of moving a single element towards its place, a pair element is moved in a single swap.This makes the sorting quick. After the partitioning, each of the sub-lists is sorted, which will causethe entire array to be sorted.quickSort(int first,int last)

{

if (first < last) /* if the part being sorted isn't empty */

{

mid = quickParition(first,last);

if (mid-1 > first)

quickSort(first,mid-1);

if (mid+1 < last)

quickSort(mid+1,last);

}

return;

}

The hardest part of quick sort is the partitioning of elements. The algorithm looks at the firstelement of the array (called the "pivot"). It will put all of the elements which are less than the pivotin the lower portion of the array and the elements higher than the pivot in the upper portion of thearray. When that is complete, it can put the pivot between those two sections and quick sort will beable to sort the two sections separately.

The details of the partitioning algorithm depend on counters which are moving from the ends of thearray toward the center. Each will move until it finds a value which is in the wrong section of thearray (larger than the pivot and in the lower portion or less than the pivot and in the upper portion).Those entries will be swapped to put them into their appropriate sections and the counters willcontinue searching for out of place values. When the two counters cross, partitioning is completeand the pivot can be swapped to its proper place between the two sections.


37/85



C3: Protected

QuickParition(first, last)

{

mid_val = data[first]; /* This is the pivot value */

i = first+1;

j = last;

while (i mid_val))

j--;

if (i < j)

swap(i,j);

else

i++;

}

if (j != first)

swap(j,first);

return j;

}

Example: 6.2Input sequence: 34,8,64,51,32,21Square brackets are used to demarcate sub files yet to be sorted.R1 R2 R3 R4 R5 R6 m n[34 8 64 51 32 21] 1 6[32 8 21] 34 [51 64] 1 3

[21 8] 32 34 [51 64] 1 2[8] 21 32 34 [51 64] 1 18 21 32 34 [51 64] 5 68 21 32 34 51 [64] 6 6

Heap SortIn heap sort the file to be sorted is interpreted as a binary tree. The sorting technique isimplemented using array, which is a sequential representation of binary tree. The positioning of anode is given as follows

For a node at position i the parent is at position i/2, the left child is at position 2i and right child is atposition 2i+1 ( 2i and 2i+1


38/85



C3: Protected

Example 6.3The list of numbers 34, 8, 64, 51, 32, 21 is arranged in an array initially as in Input file of theexample given below. Here the value of n is 6, hence the least parent is 6/2 = 3. Left child of 64(index 3) is compared with largest child, since 64 > 21 it is retained in its position. Parent 8 (index2) is compared with its largest child 51 and are interchanged since 8 < 51. Now root 31(index 1) iscompared with its largest child 64 and are interchanged since 34 < 64 and is shown in initial heap.

Input File Initial HeapIn fig 6.3(a) given below, the first largest number 64 which was brought into root is interchangedwith the last element 21 (index 6) in the tree. For easy identification of arranged elements the edgeis removed from its parent. In fig 6.3(b) given below, the same procedure is followed to bring 51 toroot and is interchanged with the element in index 5. The same step is followed in fig 6.3(c) and fig6.3(d) to get a sorted file as given in fig 6.3(e)

6.3 (a) 6.3 (b)

34

21

64

32

518

64

34

21

51

328

51

34

64

32

218

34

64

21

8

3251


39/85



C3: Protected

6.3 (c) 6.3 (d)

6.3 (e) Sorted FileAlgorithm 6.3.1: Heap Sort implementationHeap is an algorithm which sorts the given set of numbers using heap sort technique. Where n isthe number of elements, a is the array representation of elements in the input binary tree. Theheap algorithm 6.3.1 calls adjust algorithm 6.3.2 each time when heaping is needed.heap(a,n)

{

Int i,t;

for(i=n/2;i>=1;i--)

{

adjust(a,i,n);

}

for(i=n;i>=2;i--)

{

t=a[i];

a[i]=a[1];

a[i]=t;

adjust(a,1,i-1);

}

}

8

32

64

21

5134

21

32

64

8

5134

32

21

64

8

5134


40/85



C3: Protected

Algorithm 6.3.2adjust(int x[10],int i, int n)

{

int item, j;

j=2 * i;

item = x[i];

while (j


41/85



C3: Protected

Algorithm : Linear search implementation

bool linear_search ( int *list, int size, int key, int* rec )

{

// Basic Linear search

bool found = false;

int i;

for ( i = 0; i < size; i++ )

{

if ( key == list[i] )

break;

}

if ( i < size )

{

found = true;

rec = &list[i];

}

return found;

}

The code searches for the element through a loop starting form 0 to n. The loop can terminate inone of two ways. If the index variable i reach the end of the list, the loop condition fails. If thecurrent item in the list matches the key, the loop is terminated early with a break statement. Thenthe algorithm tests the index variable to see if it is less than that size (thus the loop was terminatedearly and the item was found), or not (and the item was not found).

Example 6.4 Assume the element 45 is searched from a sequence of sorted elements 12, 18, 25, 36, 45, 48,50. The Linear search starts from the first element 12, since the value to be searched is not 12(value 45), the next element 18 is compared and is also not 45, by this way all the elements before45 are compared and when the index is 5, the element 45 is compared with the search value andis equal, hence the element is found and the element position is 5.

List i Result of comparison12 18 25 36 45 48 50 1 12 45 : false

12 18 25 36 45 48 50 2 18 45 : false

12 18 25 36 45 48 50 3 25 45 : false

12 18 25 36 45 48 50 4 36 45 : false

12 18 25 36 45 48 50 5 45 = 45 : true


42/85



C3: Protected

Binary SearchIn a linear search the search is done over the entire list even if the element to be searched is notavailable. Some of our improvements work to minimize the cost of traversing the whole data set,but those improvements only cover up what is really a problem with the algorithm. By thinking of the data in a different way, we can make speed improvements that are much better than anythinglinear search can guarantee. Consider a list in sorted order. It would work to search from the

beginning until an item is found or the end is reached, but it makes more sense to remove as muchof the working data set as possible so that the item is found more quickly. If we started at themiddle of the list we could determine which half the item is in (because the list is sorted). Thiseffectively divides the working range in half with a single test. This in turn reduces the timecomplexity.

Algorithm:bool Binary_Search ( int *list, int size, int key, int* rec )

{

bool found = false;

int low = 0, high = size - 1;

while ( high >= low )

{

int mid = ( low + high ) / 2;

if ( key < list[mid] )

high = mid - 1;

else

if ( key > list[mid] )

low = mid + 1;

else

{

found = true;

rec = &list[mid];

break;

}

}

return found;

}


43/85



C3: Protected

Example 6.5Binary search is applied for data in example 6.4

The active part of search is underlined

List i j mid Result of comparison12 18 25 36 45 48 50 1 7 4 45 > 36 : Right part

12 18 25 36 45 48 50 5 7 6 45 < 48 : Left part

12 18 25 36 45 48 50 5 6 5 45 = 45 : Found

Method of search Advantages Disadvantages

Linear SimpleElements need not be in order

Less efficient since timeComplexity is more comparedto Binary search -O(n)

Binary More efficient since the timecomplexity is less compared toLinear search O(log n)

Not simple as Linear searchElements must be in order

Summary

Sorting is process of arranging elements either in ascending or descending order. Thisfacilitates the searching faster.

Bubble sorting is a sorting in which each element is compared with its adjacentelements and largest value is moved to last.

Quick sorting is a sorting by partitioning. Instead of a single element a pair of elementsare arrange in one swap.

Heap sorting is a sorting by heaping the elements in a tree. It works with the samecomplexity in all its worst, best and average cases.

In Linear search all the elements preceding the search element must be searched.

In Binary search the middle element is compared and either the left are right part isonly checked instead of all.


44/85



C3: Protected


1. Which of the following sort works with same complexity in all casesa. Heap sortb. Quick sort

c. Merge sortd. Bubble sort

2. Quick sort works better if the input elements are of a. Sorted order b. Jumbled order c. Reverse order d. All the above

Answers1. a

2. c


45/85



C3: Protected

Session 8: Trees

Learning Objectives

After completing this chapter, you will be able to

Describe a tree

Explain how a tree can be represented internally

Describe how a tree can be traversed

Overview:

The data structures discussed in the previous sessions like Lists, stacks, and queues, are all linear data structures. Tree is one of the several types of non-linear data structure.

Tree is a collection of nodes represented in a hierarchical fashion, with a specially designatednode called root . Except root all other nodes have parent in their higher hierarchy.

A parent node of a particular node is the one which is in the higher hierarchy for a A node canhave exactly one parent i.e. a node can be attached to exactly one node in its higher hierarchy.

Example 8.1

A

D

G

B

FE

C

H


46/85



C3: Protected

The following table depicts some of the important terminologies related to a general tree structure.

Term Description Example

Node An item or single element represented in a tree A,B,C.,H

Root Node that does not have any ancestors (parent

or Grandparent

A

Sub tree Internal nodes in a tree which has bothancestor(parent) and descendant(child)

B,C,D

Leaf External nodes that does not have anydescendant(child)

E,F,G,H

Edge The line depicts the connectivity between twonodes

(A-B),(A-C)

Path Sequence of nodes connected A-B-E for E from root

Length Number of nodes involved in the path 2 for E from B

Height Length of the longest path from the root 3

Depth Length of the path to that node from the root 2 for DDegree of anode

Number of children connected from that node 3 for A, 1 for B,D, 2 for C and0 for leaves

Degree of atree

Degree of a node which has maximum degree 3 (since A has maximumdegree)

Some applications of trees are:

representing family genealogy

as the underlying structure in decision-making algorithms

to represent priority queues (a special kind of tree called a heap)

to provide fast access to information in a database (a special kind of tree called a b-tree)

Binary TreeBinary tree is a finite set of nodes which either empty, or consist of a root and two disjoint binarytrees, called the left and right sub-trees. In other words it can be defined as a tree in which all thenodes can have 2 as a maximum degree i.e. a node can have maximum two children.

A binary tree differs from a general tree in the following aspects:

A tree must have at least one node but a binary tree may be empty.

A tree may have any number of sub-trees but a binary tree can have at most two.


47/85



C3: Protected

Example 8.2

Full Binary tree: A binary tree in which all its leaf nodes are in the same level is called a full binarytree.

Example 8.3

Complete Binary tree A binary tree in which the array representation is contiguous without any null pointers in between isa complete binary tree.

B C

D GF

A

B C

D GFE

A


48/85



C3: Protected

Example 8.4

Array representation of the above tree is : 0 1 2 3 4 A B C D E

In a binary tree the maximum number of nodes at level i (level of the root node is 1) is equal to 2 i-1 and the maximum number of nodes till level i is equal to 2 i 1

Example 8.5In example 8.2Number of nodes at level 2 is 2 2-1 = 2Number of nodes at level 3 is 2 3-1 = 4Maximum number of nodes till level 2 is 2 2 -1 = 3

Skewed binary tree A binary tree is a skewed binary tree, if it has only left child (skewed left) or only right (skewedright) child for all its internal nodes.

B C

D E

A


49/85



C3: Protected

Example 8.6

Skewed left Skewed right

Tree Representation

A binary tree can be represented in two ways and are1. Array representation2. Linked list representation

Array representationThe binary tree can be represented as we have discussed in the heap sort.

Since a binary-tree node never has more than two children, a node can be represented with 3fields as one field for the data in the node in remaining two fields for two child pointers.

Left child Data Right Child

Programming representation of node is as follows.Struct BinaryTreenode{

Struct BinaryTreenode * leftChild;

Char data;Struct BinaryTreenode * rightChild;};

Many algorithms pertaining to tree structures usually involve a process in which each node of thetree is visited, or processed, exactly once. Such a process is called a traversal.

B

D

B

A

D

A


50/85



C3: Protected

Tree Traversals A tree can be traversed in three different ways and are

Inorder traversal

Preorder traversal

Postorder traversal.

In all the traversal types the order of left and right sub tree are not changed i.e. always the left subtree is traversed before the right sub tree. The type of traversal is decided based on the position of the data.

In preorder traversal the data is traversed before its sub trees are traversed.

In post order traversal the data is traversed after its sub trees are traversed.

In inorder traversal the data is traversed between its sub trees.

Simple steps in traversals

Preorder traversalo Visit the root

o Traverse the left sub-tree in preorder o Traverse the right sub-tree in preorder

Inorder traversalo Traverse the left sub-tree in inorder o Visit the rooto Traverse the right subtree in inorder

Postorder traversalo Traverse the left subtree in postorder o Traverse the right subtree in postorder o Visit the root


51/85



C3: Protected

Example 8.7

Inorder traversal : D B E A I H J F C GPreorder traversal : A B D E C F H I J GPostorder traversal : D E B I J H F G C A

Algorithms for the tree traversals

Inorder traversalvoid inorder(struct btreenode *sr)

{

if(sr!=NULL)

{

inorder (sr->left);

printf(%d\n, sr->data);

inorder (sr ->right);

}

}

B C

D GFE

H

I J

A


52/85



C3: Protected

Preorder traversalvoid preorder(struct btreenode *sr)

{

if(sr!=NULL)

{

printf(d\n, sr->data);

preorder(sr -> left);

preorder (sr ->right);

}

}

Postorder traversalvoid postorder(struct btreenode *sr)

{

if(sr!=NULL)

{

postorder(sr -> left);postorder (sr ->right);

printf(d\n, sr->data);

}

}

Binary Search Tree (BST)BST is a binary tree which has the following properties.

All elements stored in the left subtree of a node whose value is K have values lessthan K. All elements stored in the right subtree of a node whose value is K have

values greater than or equal to K. That is, a nodes left child must have a key less than its parent, and a nodes right

child must have a key greater or equal to its parent

The left and right sub trees of a node is also a binary search tree


53/85



C3: Protected

Example 8.8

Operations that can be performed on a BST are:

Creation

Insertion

Deletion

Searching

CreationThe first element in the list is made as the root of the node. The elements following first are placedin its left sub tree if they are less than the root and are placed in its right sub tree if they are greater than the root. In other words we can state that creation is a combination of search and insertionafter the of root node.

SearchingThe search is always carried from the root node, if the node to be searched is less than the rootvalue then the left sub tree is searched. If the search value is greater than the node value then theright sub tree is searched. The search is continued till the search node is found or till the search isended without any branch to proceed.

InsertionSteps involved in inserting a node are

Search for the node that has to be inserted (though it is not available) in the tree. If the search ended at a node x insert the new node as its left child if the new node is

less than X, otherwise insert as its right child.

47 71

6 846754

79 91

63


54/85



C3: Protected

Example 8.9: Inserting 15 in BSTThe dotted line represents the search and the dotted circle represents the newly added node.

15 is greater than 6 hence it is joined as its right child.

DeletionThe node which has to deleted is first searched from the root to find its position. The deletionoperation is easier if the node which has to deleted is a leaf node. The link from its parent isdisconnected in order to delete that node.

If the node is a non leaf node the deletion is carried as below.

If the non leaf node has a single sub tree then the child node is replaced in its place.

If the non leaf node has both left and right sub tree then either the in order successor or thepredecessor is replaced in its place.(i.e. the greatest left descendent or the smallest rightdescendent)

Example 8.10 : Deleting 71 from example 8.9The dotted line represents the search and the dotted circle represents the node to be deleted.

47 71

6 846754

79 9115

63


55/85



C3: Protected

The node 71 is replaced either by its left or right descendent

Replaced by its left descendant Replaced by its left descendant

Advantage of a BSTSearching a node in a BST is faster, since either left or right sub tree is only searched from theroot till the node is found instead of comparing all the nodes preceding it.

Disadvantage of a BSTThe tree may be a skewed binary tree if the elements are either in ascending(skewed left) or indescending(skewed right) order, which lead to more levels.

47 67

6 8454

79 9115

63

47 79

6 846754

9115

63

47 71

6 846754

79 9115

63


56/85



C3: Protected

Summary

Tree is collection of nodes arranged in hierarchical fashion

Binary tree is tree with 2 as its maximum degree

Tree can be represented either using an array or linked list

Tree can be traversed in 3 ways Binary search tree is a binary tree in which a node can have all its left descendants as

less than that and right as greater than that.


1. A complete binary tree is a tree in which ----a. All the leaf nodes are in the same levelb. All the parent nodes have exactly two childrenc. The representation is contiguous without any null branch in betweend. None of the above

2. Binary search tree must be a ----a. Complete binary treeb. Full binary treec. Either a or bd. Need not be a or b

Answers1. c2. d


57/85



C3: Protected

Session 10: Balanced trees and hashing

Learning Objectives

After completing this chapter you will be able to

Define a balanced tree

Identify how a balanced tree can be constructed from a Binary tree

Define hashing

List the advantages and disadvantages of Hashing

Overview:

Balanced trees are classified into two categories

Height Balanced tree Weight Balanced tree

AVL Tree An AVL tree is a height balanced Binary Search Tree. The number of null branches is more in anormal BST if the elements are almost in order, this leads to more levels and in turn need morespace. This problem is solved by balancing the height whenever a node is inserted into an AVLtree. The re-balancing is recommended based on the balancing factor.

Balancing factor Balancing factor of each node is calculated by finding the difference in levels between the left and

right sub tree.

Balancing factor of X = height of left sub tree of X - height of right sub tree of XIf the balancing factor of all the nodes in the tree is within the range of -1 and 1, then the tree isalready in balanced form, otherwise balancing is needed.

AVL Tree Rotations As mentioned previously, an AVL Tree and the nodes it contains must meet strict balancerequirements to maintain its O(log n) search capabilities. These balance restrictions aremaintained using various rotation functions. Below is a diagrammatic overview of the four possiblerotations that can be performed on an unbalanced AVL Tree, illustrating the before and after statesof an AVL Tree requiring the rotation.


58/85



C3: Protected

Example 10.1: LL Rotation

Example 10.2: RR Rotation


59/85



C3: Protected

Example 10.3: LR Rotation

Example 10.4: RL Rotations


60/85



C3: Protected

Inserting in an AVL TreeNodes are initially inserted into AVL Trees in the same manner as an ordinary binary search tree(that is, they are always inserted as leaf nodes). After insertion, however, the insertion algorithmfor an AVL Tree travels back along the path it took to find the point of insertion, and checks thebalance at each node on the path. If a node is found that is unbalanced (that is, it has a balancefactor of either -2 or +2), then a rotation is performed based on the inserted nodes position relative

to the node being examined (the unbalanced node).

NB. There will ever be at most one rotation required after an insert operation.

Example: 10.5: Constructing an AVL tree for the list of elements 50, 45, 30, 55, 63, 53The upper part of the node represents the balancing factor and the lower part represents data.

LL rotationInsert 50, 45, 30 Insert 55 Insert 63

2

50

1

45

0

30

-1

45

0

30

-1

50

0

55

-2

45

0

30

-2

50

-1

55

0

63


61/85



C3: Protected

RR Rotation Insert 53 RL Rotation

Deletion in AVL treeThe deletion algorithm for AVL Trees is a little more complex, as there are several extra stepsinvolved in the deletion of a node. If the node is not a leaf node (that is, it has at least one child),then the node must be swapped with either it's in-order successor or predecessor (based onavailability). Once the node has been swapped we can delete it (and have its parent pick up anychildren it may have - bear in mind that it will only ever have at most one child). If a deletion nodewas originally a leaf node, then it can simply be removed.

Now, as with the insertion algorithm, we traverse back up the path to the root node, checking thebalance of all nodes along the path. If we encounter an unbalanced node we perform anappropriate rotation to balance the node.

NB. Unlike the insertion algorithm, more than one rotation may be required after a deleteoperation, so in some cases we will have to continue back up the tree after a rotation.

Weight Balanced TreesTree structures support various basic dynamic set operations including Search , Predecessor ,Successor , Minimum , Maximum , Insert , and Delete in time proportional to the height of the tree.Ideally, a tree will be balanced and the height will be log n where n is the number of nodes in thetree. To ensure that the height of the tree is as small as possible and therefore provide the bestrunning time, a balanced tree structure like a red-black tree, AVL tree, or b-tree must be used.

When working with large sets of data, it is often not possible or desirable to maintain the entirestructure in primary storage (RAM). Instead, a relatively small portion of the data structure is

-2

45

0

30

1

55

0

63

-1

50

0

53

0

50

1

45

0

55

0

63

0

53

0

30

-1

45

0

30

0

55

0

63

0

50


62/85



C3: Protected

maintained in primary storage, and additional data is read from secondary storage as needed.Unfortunately, a magnetic disk, the most common form of secondary storage, is significantlyslower than random access memory (RAM). In fact, the system often spends more time inretrieving data than actually processing data.

B-trees are weight balanced trees that are optimized for situations when part or the entire tree

must be maintained in secondary storage such as a magnetic disk. Since disk accesses areexpensive (time consuming) operations, a b-tree tries to minimize the number of disk accesses.For example, a b-tree with a height of 2 and a branching factor of 1001 can store over one billionkeys but requires at most two disk accesses to search for any node

B-TreesThe Structure of B-TreesUnlike a binary-tree, each node of a b-tree may have a variable number of keys and children. Thekeys are stored in non-decreasing order. Each key has an associated child that is the root of asubtree containing all nodes with keys less than or equal to the key but greater than the precedingkey. A node also has an additional rightmost child that is the root for a subtree containing all keysgreater than any keys in the node.

A b-tree has a minimum number of allowable children for each node known as the minimizationfactor . If t is this minimization factor , every node must have at least t - 1 keys. Under certaincircumstances, the root node is allowed to violate this property by having fewer than t - 1 keys.Every node may have at most 2t - 1 keys or, equivalently, 2t children.

Since each node tends to have a large branching factor (a large number of children), it is typicallynecessary to traverse relatively few nodes before locating the desired key. If access to each noderequires a disk access, then a b-tree will minimize the number of disk accesses required. Theminimization factor is usually chosen so that the total size of each node corresponds to a multipleof the block size of the underlying storage device. This choice simplifies and optimizes diskaccess. Consequently, a b-tree is an ideal data structure for situations where all data cannot residein primary storage and accesses to secondary storage are comparatively expensive (or timeconsuming).

Height of B-TreesFor n greater than or equal to one, the height of an n-key b-tree T of height h with a minimumdegree t greater than or equal to 2,

The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared tomany other balanced tree structures, the base of the logarithm tends to be large; therefore, thenumber of nodes visited during a search tends to be smaller than required by other tree structures.

Although this does not affect the asymptotic worst case height, b-trees tend to have smaller heights than other trees with the same asymptotic height.

Operations on B-TreesThe algorithms for the search, create, and insert operations are shown below. Note that thesealgorithms are single pass; in other words, they do not traverse back up the tree. Since b-trees


63/85



C3: Protected

strive to minimize disk accesses and the nodes are usually stored on disk, this single-passapproach will reduce the number of node visits and thus the number of disk accesses. Simpler double-pass approaches that move back up the tree to fix violations are possible.

Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage(memory), all references to a given node be preceded by a read operation denoted by Disk-Read .

Similarly, once a node is modified and it is no longer needed, it must be written out to secondarystorage with a write operation denoted by Disk-Write . The algorithms below assume that all nodesreferenced in parameters have already had a corresponding Disk-Read operation. New nodes arecreated and assigned storage with the Allocate-Node call. The implementation details of the Disk-Read , Disk-Write , and Allocate-Node functions are operating system and implementationdependent.

B-Tree-Search(x, k)

i


64/85



C3: Protected

n[z]


65/85



C3: Protected

i


66/85



C3: Protected

B-Tree Insertion10 17 25 9 13 16 8 5 15 22Underlined elements are newly added

10 10 17 17

10 25

17

9 10 25

10 17

259 13

10 17

259 13 16

10 17

258 9 13

10

178

2513 1695


67/85



C3: Protected

After deleting 16 from the above B-Tree

10

15 22 8

25 139 5 17

10

15 178

251395 16

10

15 178

22 251395 16


68/85



C3: Protected

Hashing

Hashing is a technique which improvises the speed of search by calculating the address of thesearch element directly using a mathematical formula instead of searching it.

Symbol Table

Symbol table is a dictionary of ADT used in a program. It is a set of names and

DSC - Handout

Documents

Transcript of DSC - Handout