DSA Full Material

download DSA Full Material

of 49

Transcript of DSA Full Material

  • 7/31/2019 DSA Full Material

    1/49

    ESD 611 Data Structures and Algorithms

    Unit 1: INTRODUCTION TO ALGORITHMS (3 hrs)1.1 Notion of Algorithm1.2 Fundamentals of Algorithmic problem Solving1.3 Important problem types1.4 Analysis Frame work1.5 Asymptotic Notations &Basic efficiency classes.

    Fundamentals of Algorithmic Problem Solving

    1. Understanding the Problem

    2. Ascertaining the Capabilities of a computational device

    3. Choosing between Exact and Approximate Problem Solving

    4. Deciding on Appropriate Data Structures

    5. Algorithm Design Techniques6. Methods of Specifying an Algorithm

    7. Proving an Algorithms Correctness

    8. Analyzing an Algorithm

    9. Coding an Algorithm

    An Input to an algorithm specifies an instance of the problem the algorithm solves

    Ascertaining the Capabilities of a computational device:

    If the machine executes the instructions one after the other, one operation at a time thenthe algorithm is said to be Sequential algorithms.

    If the machine executes the instructions in parallel then the algorithms are said toParallel Algorithms.

    Choosing between Exact and Approximate Problem Solving :

    The next principle decision is to choose between solving the problem exactly or solvingit approximately.

    Why would one opt for an approximation algorithm?

    1. There are important problems that simply cannot be solved exactly, such as Extractingsquare roots, solving non linear equations and evaluating definite integrals.

    2. Available algorithms for solving a problem exactly can be unacceptably slow becauseof the Problems intrinsic Complexity. The most well known of them is the TravelingSalesman Problem of finding the shortest tour through n cities.

    3. Deciding on Appropriate Data Structures such as Stacks, Queues , Sets , Graphs etc.

    4. An Algorithm Design Technique is a general approach to solving problemsalgorithmically that is applicable to a variety of problems from different area ofcomputing.

  • 7/31/2019 DSA Full Material

    2/49

    5. Methods of Specifying an Algorithm : Algorithm can be specified in Pseudo code orFlowchart.

    6. A Pseudo code is a mixture of natural language and programming language likeconstructs.

    7. A Flowchart is a method of expressing an algorithm by a collection of connectedgeometric shapes containing descriptions of the algorithms steps.

    Proving an Algorithms Correctness:

    You have to prove that the algorithm yields a required result for every legitimate inputin a finite amount of time.

    A common technique for proving correctness is to use mathematical induction becausean algorithms iterations provide a natural sequence of steps needed for such proofs.

    To show that an algorithm is incorrect, you need just one instance of its input for whichthe algorithm fails.

    If the algorithm is found to be incorrect, you need to either redesign it under samedecisions regarding the data structures, the design technique and so on.

    For an approximation algorithm, we usually would like to be able to show that the errorproduced by algorithm does not exceed a predefined limit.

    Analyzing an algorithm :

    There are two kinds of algorithm efficiency : Time efficiency and Space efficiency.

    Time efficiency indicates how fast the algorithm runs.

    Space efficiency indicates how much extra memory the algorithm needs.

    Another desirable characteristics of an algorithm is Simplicity.

    Simpler algorithms are easier to understand and easier to program, consequently, theresulting programs usually contain fewer bugs.

    Another desirable characteristics of an algorithm is generality.

    There are two issues : generality of the problem the algorithm solves and the range ofinputs it accepts.

    It is Some times easier to design an algorithm for a problem posed on general terms.

    Algorithm Specification :

    1. Comments begin with // and continue until the end of line.

    2. Blocks are indicated with matching braces { }

    3. An identifier begins with a letter . The Data types of variables are not explicitlydeclared.

  • 7/31/2019 DSA Full Material

    3/49

    4. Assignment of values to variables is done using the assignment statement

    < variable > : = < expression > ;

    5. There are two boolean values True and False. The Logical operators and , or , notand relational operators = are provided.

    6. Elements of multidimensional arrays are accessed using [ and ] .

    7. The following loop statements are employed.

    While loop

    While < condition > do

    {

    < Statement 1 >

    .

    < Statement n >

    }

    For loop

    forvariable : = value 1 to value 2 stepstep do

    {

    < Statement 1 >

    .

    < Statement n >

    }

    Repeat Until loop

    repeat < Statement 1 >

    < Statement n >

    until < condition >

    8. A Conditional statement has the following forms :

    if < condition > then

    if < condition > then else

    Case Statement

  • 7/31/2019 DSA Full Material

    4/49

    case {

    : < condition > :

    .

    : < condition n > :

    : else :

    }

    9. Input and Output are done using instructions read and write

    10. There is only one type of procedure : Algorithm .

    An Algorithm consists of a heading and a body . The heading takes the form

    AlgorithmName ( < parameter list > )

    where Name is the name of the procedure and () is a listing of theprocedure parameters.

    Algorithm to find maximum of n

    1. Algorithm Max ( A , n)

    2. // A is an array of size n.

    3. {

    4. Result := A[1];

    5. for i:= 2 to n do

    6. ifA[i] > Result then Result := A[i];

    7. return Result ;

    8. }

    A and n are procedure parameters, Result and i are local variables.

    Space Complexity:

    The Space needed by an algorithm is seen to be the sum of the following components:

    A fixed part that is independent of the Characteristics (ex: number, size) of the inputsand outputs. This part includes the Instruction Space (Space for the code), Space forconstants

    A variable part that consists of the space needed by component variables whose size isdependent on the particular problem instance being solved, the space needed by thereferenced variables and the recursion stack space.

    The Space requirement S (P) of any algorithm P may be written as S (P) = c + Sp(Instance characteristics), where c is constant.

    Example 1:

    1. Algorithm abc(a,b,c)

  • 7/31/2019 DSA Full Material

    5/49

    2. {

    3. return a+ b+ b*c+ (a +b- c) /(a +b)+ 4.0 ;

    4. }

    If we assume that one word is adequate to store the values of each of a, b, c andthe result then the space needed by abc algorithm is 4 words

    Since the space needed by abc is independent of the instance characteristic, Sp= 0.

    Example 2:

    1. Algorithm Sum(a,n)

    2. {

    3. s := 0.0 ;

    4. for i: = 1 to n do

    5. s: = s + a[i];

    6. return s ;

    7. }

    The Space needed by n is one word since it is of type integer. The Space neededby a is the space needed by variables of type array of floating point numbers. Thisrequires at least n words to store n elements in array and 3 words to store the values ofn, i and s.The Space required for Sum algorithm is S >= (n+3).

    Example 3 :

    1. Algorithm RSum(a,n)

    2. {

    3. if (n = 3(n+1)

    Time Complexity :

    The Time T (P) taken by a program P is the sum of the Compile time and Run time.

  • 7/31/2019 DSA Full Material

    6/49

    l The Compile time does not depend on the instance characteristics.

    l We may assume that a compiled program will be run several times with out recompilation.

    l The run time is denoted by tp (instance characteristics)

    l tp (n) = ca ADD (n) + cs SUB (n) + cm MUL (n) + cd DIV (n)+ .Where ca ,cs, cm ,cd respectively denote the time needed for an addition, subtraction,multiplication ,division and so on and ADD,SUB ,MUL , DIV denotes the functions ofaddition, subtraction ,multiplication ,division respectively.

    The time complexity is depending on the number of program steps.

    l A program step is defined as a syntactically or semantically meaningful segment ofa program that has an execution time that is independent of instance characteristics.

    l Ex : The entire statement

    return a +b + b*c + (a+ b- c) / (a+ b) + 4.0; could be regarded as a step since its

    execution time is independent of instance characteristics

    l We can determine the number of steps needed by a program to solve a particularproblem instance in one of two ways.

    Introduce a new variable, count, into the program. This is a global variable with initialvalue 0. Count is incremented by the step count of that statement in the program.

    Build a table in which we list total number of steps contributed by each statement.Determine the number of steps per execution (s/e) of that statement and the total

    number times each statement is executed. The total number of contributions of allstatements will give the step count of the program.

    Example 1 :

    1. Algorithm Sum(a,n)

    2. {

    3. s: = 0.0 ;

    4. count := count +1;

    5. for i: = 1 to n do

    6. {

    7. count := count + 1; // for For

    8. s: = s +a[i]; count := count + 1; // for assignment

    9. }

    10. count := count + 1; // for last time of for

    11. count := count + 1; // for return

    12. return s;

  • 7/31/2019 DSA Full Material

    7/49

    13. }

    14. The above algorithm can be simplified for count variable as follows.

    1. Algorithm Sum(a,n)

    2. {

    3. for i:=1 to n do count := count+2;

    4. count:= count + 3;

    5. }

    From the algorithm, the value of count will increase by a total of 2n .If count is zero tostart with, and then it will be 2n+3 on termination. So each invocation above algorithmexecutes a total of 2n +3 steps.

    Example 2:

    1. Algorithm RSum(a,n)

    2. {

    3. count := count +1; // for the if conditional

    4. if (n

  • 7/31/2019 DSA Full Material

    8/49

    t RSum(n) = 2 if n=0

    = 2 + t RSum(n-1) if n>0

    The above recurrence relation can be solved by substitution method. t RSum(n) =2 + t RSum(n-1)

    = 2 + 2 + t RSum(n-2)

    = 2(2) + t RSum(n-2) = 2 + 2 + 2 + t RSum(n-3)

    = 2(3) + t RSum(n-3)

    = 2(n) + t RSum(0)

    = 2(n) + 2 for n >= 0.

    So, the step count for RSum algorithm is 2n + 2

    The Second method to determine the step count of an algorithm is to build a table inwhich we list the total number of steps contributed by each statement.

    The s/e (Steps per execution) of a statement is the amount by which the count changesas a result of the execution of that statement. The total number times the step executedis known as Frequency. By combing these two quantities, the total contribution of eachstatement is obtained. By adding contributions of all statements, the step count for theentire algorithm is obtained.

    Asymptotic Notation (O,,)

    Big O notation : The function f(n) = O(g(n)) iff there exist positive constants c and n0such that f(n) = n0 .

    Examples:

    1. The function 3n + 2 = O(n) as 3n + 2 = 2

    2. The function 3n + 3 = O(n) as 3n + 3 = 3

    3. The function 100n + 6 = O(n) as 100n + 6 = 6

    4. The function 10n2 + 4n+2 = O(n2) as 10n2 + 4n + 2 = 5

    5. The function 6*2n + n2 = O(2n) as 6*2n + n2 =4

    6. The function 10n2 + 4n+2 = O(n4) as 10n2 + 4n + 2 = 2

    7. The function 3n + 2 O(1) as 3n + 2 is not less than or equal to c for anyconstant c and all n >= n0

    8. The function 10n2 + 4n+2 O(n)

    The statement f(n) = O(g(n)) states only that g(n) is an upper bound on the value of f(n)for all n , n >= n0 .

    If f(n) = amnm + +a1n + a0 then f(n) = O(n

    m) .

  • 7/31/2019 DSA Full Material

    9/49

    Omega notation () : The function f(n) = (g(n)) iff there exist positive constants cand n0 such that f(n) >= c * g(n) for all n , n>= n0 .

    Examples:

    1. The function 3n + 2 = (n) as 3n + 2 >= 3n for all n >= 2

    2. The function 3n + 3 = (n) as 3n + 3 >= 3n for all n >= 1

    3. The function 100n + 6 = (n) as 100n + 6 >= 100n for all n >= 1

    4. The function 10n2 + 4n+2 = (n2) as 10n2 + 4n + 2 >= n2 for all n >= 1

    5. The function 6*2n + n2 = (2n) as 6*2n + n2 >= 2n for all n>=1

    For the statement f(n) = (g(n)) to be informative , g(n) should be as large afunction of n as possible for which the statement f(n) = (g(n)) is True.

    If f(n) = amnm + +a1n + a0 and am > 0 then f(n) = (n

    m) .

    Theta notation(): The function f(n) = (g(n)) iff there exist positive constants c1,c2and n0 such that f(n) >= c1 g(n) and f(n) = n0 .

    Examples:

    The function 3n + 2 = (n) as 3n + 2 >= 3n for all n >= 2 and 3n + 2 =2 so c1 = 3 , c2 = 4 and n0 = 2

    The function 10n2 + 4n+2 = (n2)

    The function 6*2n + n2 = (2n)

    The function 3n + 2 (1)

    The function f(n) = (g(n)) iff g(n) is both an upper and lower bound on f(n).

    If f(n) = amnm + +a1n + a0 and am > 0 then f(n) = (n

    m) .

    Little oh Notation (o): The function f (n) = o (g(n)) iff lim n f(n)/ g(n) = 0.

    Examples:

    1. The function 3n + 2 = o (n2) since lim n 3n + 2 / n2 = 0

    2. The function 3n + 2 = o (n log n)

    3. The function 3n + 2 = o (n log log n)

    4. The function 6*2n + n2 =o (3n)

    5. The function 3n + 2 o (n )

    1. The function 6*2n + n2 o (2n)

    Little Omega Notation (): The function f(n) = (g(n)) iff lim n g(n)/f(n) = 0.

  • 7/31/2019 DSA Full Material

    10/49

    Unit 2 :INTRODUCTION TO DATA STRUCTURES 1 hrs

    2.1 Information & meaning2.2 Arrays2.3 Structures.

  • 7/31/2019 DSA Full Material

    11/49

    Unit 3 :STACKS , RECURSION & QUEUES 5 hrs

    3.1 Definition & examples3.2 Representing (operations) Stacks3.3 Applications3.4 Recursive Definition & processes3.5 Applications3.6 Queues & its representation3.7 Different types of Queues.

    Stacks and Queues

    Two of the more common data objects found in computer algorithms are stacks andqueues. Both of these objects are special cases of the more general data object, anordered list.

    Astackis an ordered list in which all insertions and deletions are made at one end,called the top.

    A queue is an ordered list in which all insertions take place at one end, the rear, whileall deletions take place at the other end, thefront.

    Given a stack S=(a[1],a[2],.......a[n]) then we say that a[1] is the bottommost elementand element a[i]) is on top of element a[i-1], 1

  • 7/31/2019 DSA Full Material

    12/49

    Deletion in stack

    procedure delete(var item : items);{remove top element from the stack stack and put it in the item}begin

    iftop = 0 then stack empty;item := stack(top);top := top-1;

    end; {of delete}

    Procedure delete actually combines the functions TOP and DELETE,

    stackfull and stackempty are procedures which are left unspecified since they willdepend upon the particular application.

    Often a stackfull condition will signal that more storage needs to be allocated and the

    program re-run.Stackempty is often a meaningful condition.

    Addition into a queue

    procedure addq (item : items);{add item to the queue q}begin

    ifrear=n then queuefullelse begin

    rear :=rear+1;

    q[rear]:=item;end;

    end;{of addq}

    Deletion in a queue

    procedure deleteq (var item : items);{delete from the front of q and put into item}begin

    iffront = rearthen queueemptyelse begin

    front := front+1item := q[front];end;

    end; {of deleteq}

  • 7/31/2019 DSA Full Material

    13/49

    Unit 4: LINKED LISTS 3 hrs4.1 Introduction4.2 Different types of lists & their implementation.

    Linked Lists

    Simple data structures such as arrays, sequential mappings, have the property thatsuccessive nodes of the data object are stored a fixed distance apart.

    These sequential storage schemes proved adequate given the functions one wished toperform (access to an arbitrary node in a table, insertion or deletion of nodes within astack or queue).

    However, when a sequential mapping is used for ordered lists, operations such asinsertion and deletion of arbitrary elements become expensive.

    For example, consider the following list of all of the three letter English words endingin AT:

    (BAT, CAT, EAT, FAT, HAT, JAT, LAT, MAT, OAT, PAT, RAT, SAT, TAT, VAT,WAT)

    To make this list complete we naturally want to add the word GAT.

    If we are using an array to keep this list, then the insertion of GAT will require us tomove elements already in the list either one location higher or lower.

    We must either move HAT, JAT, LAT,..., WAT or else move BAT, CAT, EAT, FAT.

    If we have to do many such insertions into the middle, then neither alternative isattractive because of the amount of data movement.

    Or suppose we decided to move the word LAT. Then again, we have to move manyelements so as to maintain the sequential representation of the list.

    When our problem called for several ordered lists of varying sizes, sequentialrepresentation again proved to be inadequate.

    By storing each list in a different array of maximum size, storage may be wasted.

    By maintaining the lists in a single array a potentially large amount of data movement isneeded.

    "Ordered lists" reduce the time needed for arbitrary insertion and deletion which areexplained in this section.

    Sequential representation is achieved by using linked representations. Unlike asequential representation where successive items of a list are located a fixed distanceapart, in a linked representation these items may be placed anywhere in memory.

    Another way of saying this is that in a sequential representation the order of elements isthe same as in the ordered list, while in a linked representation these two sequencesneed not be the same.

  • 7/31/2019 DSA Full Material

    14/49

    To access elements in the list in the correct order, with each element we store theaddress or location of the next element in that list.

    Thus associated with each data item in a linked representation is a pointer to the nextitem. This pointer is often referred to as a link. In general a node is a collection of data,data(1), ... ,data(n) and links link(1), ... , link(m).

    Each item in a node is called a field. A field contains either a data item or a link.

    The elements of the list are stored in a one dimensional array called data.

    But the elements of the list no longer occur in sequential order, BAT before CATbefore EAT, etc.

    Instead we relax this restriction and allow them to appear anywhere in the array and inany order.

    In order to remind us of the real order, a second array linkis added.

    The values in this array are pointers to elements in the data array.

    Since the list starts at data[8] = BAT, let us set a variable f=8.

    link[8] has the value 3, which means it points to data[3] which contains CAT.

    The third element of the list is pointed at by link[3] which is EAT.

    By continuing in this way we can list all the words in the proper order. We recognizethat we have come to the end when link has a value of zero.

    It is customary to draw linked lists as an ordered sequence of nodes with links beingrepresented by arrows.

    We shall use the name of the pointer variable that points to the list as the name of theentire list.

    Thus the list we consider is the list f.

    Notice that we do not explicitly put in the values of the pointers but simply draw arrowsto indicate they are there.

    This is so that we reinforce in our own mind the facts that

    (i) the nodes do not actually reside in sequential locations, and that

    (ii) the locations of nodes may change on different runs.

    Therefore, when we write a program which works with lists, we almost never look fora specific address except when we test for zero.

    It is much more easier to make an arbitrary insertion or deletion using a linked listrather than a sequential list.

    To insert the data item GAT between FAT and HAT the following steps are adequate:

    get a node which is currently unused;

    let its address be x;

    set the data field of this node to GAT;

    set the linkfield of x tp point the node after FAT which contains HAT;

  • 7/31/2019 DSA Full Material

    15/49

    set the linkfield of the node containing FAT to x.

    The important thing is that when we insert GAT we do not have to move any otherelements which are already in the list.

    We have overcome the need to move data at the expense of the storage needed for thesecond field, link.

    Now suppose we want to delete GAT from the list.

    All we need to do is find the element which immediately precedes GAT, which is FAT,and set link[9] to the position of HAT which is 1.

    Again, there is no need to move the data around.

    Even though the linkfield of GAT still contains a pointer to HAT, GAT is no longer inthe list.

  • 7/31/2019 DSA Full Material

    16/49

    Unit 5: TREES & GRAPHS 7 hrs5.1 Binary Trees5.2 Binary tree Representation

    5.3 The Huffman Algorithm5.4 Representing lists as trees5.5 Balanced Search Trees5.6 Expression Trees5.7 Tree Traversal Techniques5.8 Introduction to Graphs and their Representations.5.9 DFS &BFS Search5.10 Topological Sorting

    Binary Trees

    A binary tree is an important type of structure which occurs very often. It ischaracterized by the fact that any node can have at most two branches, i.e.,there is nonode with degree greater than two. For binary trees we distinguish between the subtreeon the left and on the right, whereas for trees the order of the subtreewas irrelevant.Also a binary tree may have zero nodes. Thus a binary tree is really a different objectthan a tree.

    Definition: A binary tree is a finite set of nodes which is either empty or consists of aroot and two disjoint binary trees called the left subtree and the right subtree.

    We can define the data structure binary tree as follows:structure BTREEdeclare CREATE( ) --> btree

    ISMTBT(btree,item,btree) --> booleanMAKEBT(btree,item,btree) --> btreeLCHILD(btree) --> btreeDATA(btree) --> itemRCHILD(btree) --> btree

    for allp,r in btree, d in item letISMTBT(CREATE)::=true

    ISMTBT(MAKEBT(p,d,r))::=falseLCHILD(MAKEBT(p,d,r))::=p; LCHILD(CREATE)::=errorDATA(MAKEBT(p,d,r))::d; DATA(CREATE)::=errorRCHILD(MAKEBT(p,d,r))::=r; RCHILD(CREATE)::=errorend

    end BTREE

    This set of axioms defines only a minimal set of operations on binary trees. Otheroperations can usually be built in terms of these. The distinctions between a binary treeand a tree should be analyzed. First of all there is no tree having zero nodes, but there isan empty binary tree. The two binary trees below are different. The first one has anempty right subtree while the second has an empty left subtree. If these are regarded astrees, then they are the same despite the fact that they are drawn slightly differently.

  • 7/31/2019 DSA Full Material

    17/49

    Binary Tree Representations

    A full binary tree of depth k is a binary tree of depth k having pow(2,k)-1 nodes. This isthe maximum number of the nodes such a binary tree can have. A very elegantsequential representation for such binary trees results from sequentially numbering thenodes, starting with nodes on level 1, then those on level 2 and so on. Nodes on anylevel are numbered from left to right. This numbering scheme gives us the definition ofa complete binary tree. A binary tree with n nodes and a depth k is complete iff its

    nodes correspond to the nodes which are numbered one to n in the full binary tree ofdepth k. The nodes may now be stored in a one dimensional array tree, with the nodenumbered i being stored in tree[i].

    Lemma 5.3: If a complete binary tree with n nodes (i.e., depth=[LOG2N]+1) isrepresented sequentially as above then for any node with index i, 1(i) parent(i) is at [i/2] if is not equal to 1. When i=1, i is the root and has no parent.

    (ii) lchild(i) is at 2i if 2in, then i has no left child.

    (iii) rchild(i) is at 2i+1 if 2i+1n, then i has no right child.

    Proof: We prove (ii). (iii) is an immediate consequence of (ii) and the numbering of

    nodes on the same level from left to right. (i) follows from (ii) and (iii). We prove (ii) by

    induction on i. For i=1, clearly the left child is at 2 unless 2>n in which case 1 has no

    left child. Now assume that for all j, 1n in which case i+1 has no left child. This

    representation can clearly be used for all binary trees though in most cases there will

    be a lot of unutilized space. For complete binary trees the representation is ideal as no

    space is wasted. In the worst case a skewed tree of k will require pow(2,k)-1 spaces. Of

    these only k will be occupied.

    While the above representation appears to be good for complete binary trees it is

    wasteful for many other binary trees. In addition, the representation suffers from the

    general inadequacies of sequential representations. Insertion or deletion of nodes from

    the middle of a tree requires the movement of potentially many nodes to reflect the

    change in level number of these nodes. These problems can be easily overcome through

    the use of a linked representation. Each node will have three fields leftchild, data and

    rightchild and is defined in Pascal as

    type treepointer = ^treerecord;

    treerecord = record

    leftchild : treepointer;

    data : char;

    rightchild : treepointer;

    end;

    Binary Tree Traversal

    There are many operations that we often want to perform on trees. One notion that

    arises frequently is the idea of traversing a tree or visiting each node in the three

    exactly once. A full traversal produces a linear order for the information in a tree. This

    linear order may be familiar and useful. When traversing a binary tree we want treat

    each node and its subtrees in the same fashion. If we let L, D, R stand for moving left,

    printing the data, and moving right when at a node then there are six possible

    combinations of traversal: LDR, LRD, DLR, DRL, RDL, and RLD. If we adopt the

    convention that we traverse left before right then only three traversals remain: LDR,

    LRD, and DLR. To these we assign the names inorder, postorder and preorder because

    there is a natural correspondence between these traversals and producing the infix,postfix and prefix forms of an expression. Inorder Traversal: informally this calls for

    moving down the tree towards the left untilyou can go no farther. Then you "visit" the

    node, move one node to the right and continue again. If you cannot move to the right,

  • 7/31/2019 DSA Full Material

    18/49

    go back one more node. A precise way of describing this traversal is to write it as a

    recursive procedure.

    procedure inorder(currentnode:treepointer);

    {currentnode is a pointer to a noder in a binary tree. For full

    tree traversal, pass inorder the pointer to the top of the tree}

    begin {inorder}

    ifcurrentnode nil

    thenbegin

    inorder(currentnode^.leftchild);

    write(currentnode^.data);

    inorder(currentnode^.rightchild);

    end

    end; {of inorder}

    Recursion is an elegant device for describing this traversal. A second form of traversal

    is preorder:

    procedure preorder(currentnode:treepointer);

    {currentnode is a pointer to a node in a binary tree. For fulltree traversal, pass preorder the ponter to the top of the tree}

    begin {preorder}

    ifcurrentnode nil

    then

    begin

    write(currentnode^.data);

    preorder(currentnode^.leftchild);

    preorder(currentnode^.rightchild);

    end{of if}

    end; {of preorder}

    In words we would say "visit a node, traverse left and continue again. When you cannot

    continue, move right and begin again or move back until you can move right and

    resume. At this point it should be easy to guess the next thraversal method which is

    called postorder:

    procedurepostorder(currentnode:treepointer);

    {currentnode is a pointer to a node in a binary tree. For full

    tree traversal, pass postorder the pointer to the top of the tree}

    begin {postorder}

    ifcurrentnode nilthen

    begin

    postorder(currentnode^.leftchild);

    postorder(currentnode^.rightchild);

    write(currentnode^.data);

    end{of if}

    end; {of postorder}

    Depth First Search Algorithm

    Algorithm DFS(G)

    // Implements a depth first search traversal of a given graph

  • 7/31/2019 DSA Full Material

    19/49

    // Input : Graph = (V,E)

    // Output : Graph g with its vertices marked with consecutive

    // integers in the order they have been first

    // encountered by the DFS traversal mark each vertex

    // in V with 0 as a mark of being unvisited

    count = 0

    for each vertex v in V do

    if v is marked with 0

    dfs(v)

    dfs(v)

    // visits recursively all the unvisited vertices connected to vertex

    // v and assigns them the numbers in the order they are

    // encountered via global variable count

    count count + 1; mark v with count

    for each vertex w in V adjacent to v do

    if w is marked with 0

    dfs(w)

    With the adjacency matrix representation of the graph , the traversals time efficiency isin (|V|2) and for the adjacency linked list representation , it is in (|V| + |E|) , where |V| and |E| are the number of the graphs vertices and edges respectively.

    Important elementary application of DFS include checking connectivity and checkingacyclycity of a Graph.

    Graphs connectivity can be done as follows.

    Start a DFS traversal at an arbitrary vertex and check , after the algorithms halts,whether all the graphs vertices will have been visited. If they have , the graph isconnected , otherwise it is not connected.

    If there is a back edge from the vertex to its ancestor then the graph has a cycle.

    A vertex of a connected graph is said to be its articulation point it its removal with alledges incident to it brakes the graph into disjoint pieces.

    Breadth First Search Algorithm

    It proceeds in a concentric manner by visiting first all the vertices that are adjacent to astarting vertex, then all unvisited vertices that are adjacent to a starting vertex, then all

  • 7/31/2019 DSA Full Material

    20/49

    unvisited vertices two edges apart from it , and so on , until all the vertices in the sameconnected component as the starting vertex are visited.

    If there still remain unvisited vertices , the algorithm has to be restarted at an arbitraryvertex of another connected component of the graph.

    It is convenient to use a Queue to trace the operation of Breadth First Search.

    The Queue is initialized with the traversals starting vertex, which is marked as visited.

    On each iteration, the algorithm identifies all unvisited vertices that are adjacent to thefront vertex, marks them as visited , and adds them to the queue, after that , the frontvertex is removed from the queue.

    The Starting Vertex serves as the root of the tree.

    Whenever a new unvisited vertex is reached for the first time, the vertex is attached as achild to the vertex it is being reached from with an edge called a Tree edge.

    If an edge leading to a previously visited vertex other than immediate predecessor is

    encountered , the edge noted as a Cross edge.

    Algorithm BFS(G)// Implements a depth first search traversal of a given graph

    // Input : Graph = (V,E)

    // Output : Graph g with its vertices marked with consecutive

    // integers in the order they have been first

    // encountered by the DFS traversal mark each vertex

    // in V with 0 as a mark of being unvisited

    count = 0

    for each vertex v in V do

    if v is marked with 0

    bfs(v)

    bfs(v)

    // visits recursively all the unvisited vertices connected to vertex

    // v and assigns them the numbers in the order they are

    // encountered via global variable count

  • 7/31/2019 DSA Full Material

    21/49

    count = count + 1; mark v with count and initialize a queuewith v

    while the queue is not empty do

    for each vertex w in V adjacent to the fronts vertex v do

    if w is marked with 0

    count = count + 1; mark w with count add w to the queue

    remove vertex v from the front of the queue

    With the adjacency matrix representation of the graph , the traversals time efficiency isin (|V|2) and for the adjacency linked list representation , it is in (|V| + |E|) , where |V| and |E| are the number of the graphs vertices and edges respectively.

    Important elementary application of DFS include checking connectivity and checkingacyclycity of a Graph.

    Directed Graph Basic Concepts

    A directed Graph or Digraph is a graph with directions specified for all its edges.

    A Digraph can be represented by Adjacency Matrix and Adjacency Linked list.

    There two basic differences between Directed graph and Undirected Graph .

    1. The Adjacency matrix of a Directed graph does not have to be Symmetric

    2. An edge in a Digraph has just one (not two) corresponding nodes in the digraphsadjacency linked lists.

    Depth First Search forest exhibits all four types of edges possible in a DFS forest of adirected graph.

    A directed cycle in a digraph is a sequence of its vertices that starts and ends at the

    same vertex in which every vertex is connected to its immediate predecessor by an edgedirected from the predecessor to the successor.

    If a DFS forest of a directed graph has no back edges , the digraph is a dag ( directedacyclic graph).

    Topological Sorting

    Consider a set of five required courses {C1,C2,C3,C4,C5} a part-time student has totake in some degree program.

    The courses can be taken in any order as long as the following course prerequisites aremet :

  • 7/31/2019 DSA Full Material

    22/49

    C1 and C2 has no prerequisites . C3 requires C1 and C2 , C4 requires C3, and C5requires C3 and C4.

    The Student can take only one course per term. In which order should the student takethe courses?

    The situation can be modeled by a graph in which vertices represent courses anddirected edges indicate prerequisite requirements.

    In terms of this digraph, the question is whether we can list its vertices in such an orderthat, for every edge in the graph, the vertex where the edge starts is listed before thevertex where the edge ends. Can you find such ordering of this digraphs vertices? This

    problem is called Topological Sorting.

    It can be posed for an arbitrary digraph, but it is easy to see that the problem cannothave a solution if a digraph has a directed cycle.

    Thus, for topological sorting to be possible, a digraph must be dag.

    If a digraph has no cycles , the topological sorting problem for it has a solution.

    There are two efficient algorithms that both verify whether a digraph is a dag and if it is, produce an ordering of vertices that solves the topological sorting problem.

    First algorithm is simple application of the DFS. Perform a DFS traversal and note theorder in which vertices become dead ends. Reversing this order yields a solution to the

    topological sorting problem provided. Of course , no back edge has been encountered,the digraph is not a dag and topological sorting of its vertices is impossible.

    The second algorithm is based on a direct implementation of the Decrease by Onetechnique, repeatedly , identify in a remaining digraph a source, which is a vertex withno incoming edges, and delete it along with all edges outgoing from it.

    If there are several sources , break the tie arbitrarily .If there is none , stop because theproblem can not be solved.

    The order in which the vertices are deleted yields a solution to the topological sorting

    problem.

  • 7/31/2019 DSA Full Material

    23/49

    Imagine a large project e.g., in construction or research that involves thousands ofinterrelated tasks with known prerequisites.

    The first thing is to make sure that the set of given prerequisites is not contradictory.

    The convenient way of doing this is to solve the topological sorting problem for the

    projects digraph.

    Only then we can start scheduling the tasks to minimize the total completion of theproject.

    Unit 6: DIVIDE & CONUQER 3 hrs6.1 Merge Sort

  • 7/31/2019 DSA Full Material

    24/49

    6.2 Quick sorts.6.3 Binary search6.4 Strassens Matrix Multiplication

    DIVIDE & CONUQER:

    Divide-and-conquer is a top-down technique for designing algorithms that consists of dividingthe problem into smaller sub problems hoping that the solutions of the sub problems are easierto find and then composing the partial solutions into the solution of the original problem.

    Little more formally, divide-and-conquer paradigm consists of following major phases:

    Breaking the problem into several sub-problems that are similar to the original problembut smaller in size,

    Solve the sub-problem recursively (successively and independently), and then Combine these solutions to sub problems to create a solution to the original problem.

    (OR)

    Divide-and-Conquer

    The most-well known algorithm design strategy:

    1. Divide instance of problem into two or more smaller instances

    2. Solve smaller instances recursively

    3. Obtain solution to original (larger) instance by combining these solutions

    Mergesort

  • 7/31/2019 DSA Full Material

    25/49

    1. Split array A[0..n-1] into about equal halves and make copies of each half in arrays Band C

    2. Sort arrays B and C recursively

    3. Merge sorted arrays B and C into array A as follows:

    Repeat the following until no elements remain in one of the arrays:

    compare the first elements in the remaining unprocessed portions of thearrays

    copy the smaller of the two into A, while incrementing the indexindicating the unprocessed portion of that array

    Once all elements in one of the arrays are processed, copy the remainingunprocessed elements from the other array into A.

    Pseudocode of Mergesort

  • 7/31/2019 DSA Full Material

    26/49

    Merge sort Example

  • 7/31/2019 DSA Full Material

    27/49

    8 3 2 9 7 1 5 4

    8 3 2 9 7 1 5 4

    8 3 2 9 7 1 5 4

    8 3 2 9 7 1 5 4

    3 8 2 9 1 7 4 5

    2 3 8 9 1 4 5 7

    1 2 3 4 5 7 8 9

    Analysis of Mergesort

    1. All cases have same efficiency: (n log n)

    2. Number of comparisons in the worst case is close to theoretical minimum forcomparison-based sorting:

    i. log2n! n log2 n - 1.44n

    3. Space requirement: (n) (not in-place)

    4. Can be implemented without recursion (bottom-up)

    Quicksort

    b Select apivot(partitioning element) here, the first element

    b Rearrange the list so that all the elements in the first s positions are smaller than

    or equal to the pivot and all the elements in the remaining n-s positions are larger

  • 7/31/2019 DSA Full Material

    28/49

    than or equal to the pivot (see next slide for an algorithm)

    Exchange the pivot with the last element in the first (i.e., ) subarray the pivot is nowin its final position

    Sort the two subarrays recursively

    Partitioning Algorithm

    Time complexity: (r-l) comparisons

    Quicksort Example 5 3 1 9 8 2 4 7

    Solution:

    2 3 1 4 5 8 9 7

    1 2 3 4 5 7 8 9

    1 2 3 4 5 7 8 9

    1 2 3 4 5 7 8 9

    1 2 3 4 5 7 8 9

    Analysis of Quicksort

    1. Best case: split in the middle (n log n)

    2. Worst case: sorted array! (n2)

  • 7/31/2019 DSA Full Material

    29/49

    3. Average case: random arrays (n log n)

    4. Improvements:

    a. better pivot selection: median of three partitioning

    b. switch to insertion sort on small subfiles

    c. elimination of recursion

    5. These combine to 20-25% improvement

    6. Considered the method of choice for internal sorting of large files (n 10000)

    Binary Search (simplest application of divide-and-conquer)

    Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given anordered array of n elements, the basic idea of binary search is that for a given element we"probe" the middle element of the array. We continue in either the lower or upper segment ofthe array, depending on the outcome of the probe until we reached the required (given)element.

    Problem LetA[1 . . . n] be an array of non-decreasing sorted order; that isA [i] A [j]whenever1 i j n. Let 'q'be the query point. The problem consist of finding 'q'in thearrayA. Ifq is not inA, then find the position where 'q'might be inserted.

    Formally, find the index i such that 1 i n+1 andA[i-1] < x A[i].

    Strassen's Matrix Multiplication

    Basic Matrix Multiplication

    Suppose we want to multiply two matrices of sizeNxN: for exampleA xB = C.

    C11 = a11b11 + a12b21

    C12 = a11b12 + a12b22

    C21 = a21b11 + a22b21

    C22 = a21b12 + a22b22

    2x2 matrix multiplication can be accomplished in 8 multiplication.(2log28 =23)

  • 7/31/2019 DSA Full Material

    30/49

    algorithm

    void matrix_mult (){

    for (i = 1; i

  • 7/31/2019 DSA Full Material

    31/49

    P1 = (A11+ A22)(B11+B22)P2 = (A21 + A22) * B11P3 = A11 * (B12 - B22)P4 = A22 * (B21 - B11)P5 = (A11 + A12) * B22P6 = (A21 - A11) * (B11 + B12)P7 = (A12 - A22) * (B21 + B22)

    C11 = P1 + P4 - P5 + P7C12 = P3 + P5C21 = P2 + P4

    C22 = P1 + P3 - P2 + P6

    Comparison

    C11 = P1 + P4 - P5 + P7= (A11+ A22)(B11+B22) + A22 * (B21 - B11) - (A11 + A12) * B22+

    (A12 - A22) * (B21 + B22)

    = A11 B11 + A11 B22 + A22 B11 + A22 B22 + A22 B21 A22 B11 -

    A11 B22 -A12 B22 +A12 B21 + A12 B22 A22 B21 A22 B22

    = A11 B11 +A12 B21

    Strassen Algorithm

    void matmul(int *A, int *B, int *R, int n) {

    if (n == 1) {

    (*R) += (*A) * (*B);

    } else {

    matmul(A, B, R, n/4);

    matmul(A, B+(n/4), R+(n/4), n/4);

    matmul(A+2*(n/4), B, R+2*(n/4), n/4);

    matmul(A+2*(n/4), B+(n/4), R+3*(n/4), n/4);

  • 7/31/2019 DSA Full Material

    32/49

    matmul(A+(n/4), B+2*(n/4), R, n/4);

    matmul(A+(n/4), B+3*(n/4), R+(n/4), n/4);

    matmul(A+3*(n/4), B+2*(n/4), R+2*(n/4), n/4);

    matmul(A+3*(n/4), B+3*(n/4), R+3*(n/4), n/4);

    }

    Divide matrices in sub-matrices and recursively multiply sub-matrices

    Time Analysis

  • 7/31/2019 DSA Full Material

    33/49

    Unit 7: TRANSFORM & CONQUER 3 hrs7.1 Balanced search trees, AVL Trees, 2-3 Trees, Splay Trees7.2 Heaps and Heap sort

    Heaps and Heapsort

    Definition A heap is a binary tree with keys at its nodes (one key per node) such that:

    It is essentially complete, i.e., all its levels are full except possibly the last level, whereonly some rightmost keys may be missing

    The key at each node is keys at its children

    10

    5

    4 2

    7

    1

    10

    5

    2

    7

    1

    10

    5

    6 2

    7

    1

    a heap not a heap not a heap

    Note: Heaps elements are ordered top down (along any path down from its root), but

    they are not ordered left to right

    Some Important Properties of a Heap

    Given n, there exists a unique binary tree with n nodes that

    is essentially complete, with h = log2 n

    The root contains the largest key

    The subtree rooted at any node of a heap is also a heap

    A heap can be represented as an array

    Heaps Array Representation Store heaps elements in an array (whose elements indexed, for convenience, 1 to n) in

    top-down left-to-right order

    Left child of nodej is at 2j

    Right child of nodej is at 2j+1

    Parent of nodej is at j/2

    Parental nodes are represented in the first n/2 locations

    Heap Construction (bottom-up)

  • 7/31/2019 DSA Full Material

    34/49

    Step 0: Initialize the structure with keys in the order given

    Step 1: Starting with the last (rightmost) parental node, fix the heap rooted at it, if it doesntsatisfy the heap condition: keep exchanging it with its largest child until the heap conditionholds

    Step 2: Repeat Step 1 for the preceding parental node

    Example of Heap Construction

    Construct a heap for the list 2, 9, 7, 6, 5, 8

    7

    2

    9

    6 5 8

    >

    2

    9

    6 5

    8

    7

    2

    9

    6 5

    8

    7

    2

    9

    6 5

    8

    7

    >

    9

    2

    6 5

    8

    7

    9

    6

    2 5

    8

    7

    >

    Heapsort

    Stage 1: Construct a heap for a given list ofn keys

    Stage 2: Repeat operation of root removal n-1 times:

    Exchange keys in the root and in the last (rightmost) leaf

    Decrease heap size by 1

    If necessary, swap new root with larger child until the heap conditionholds

    Sort the list 2, 9, 7, 6, 5, 8 by heapsort

    Stage 1 (heap construction) Stage 2 (root/max removal

    Both worst-case and average-case efficiency: (nlogn)

    Unit 8: DYNAMIC PROGRAMMING 3 hrs

    8.1 Wars halls and Floyds Algorithm8.2 Knapsack and Memory function

  • 7/31/2019 DSA Full Material

    35/49

    Warshalls algorithm

    Main idea: a path exists between two vertices i, j, iff

    there is an edge from i to j; or

    there is a path from i to j going through vertex 1; or

    there is a path from i to j going through vertex 1 and/or 2; or

    there is a path from i to j going through vertex 1, 2, and/or k; or

    ...

    there is a path from i to j going through any of the other vertices

    Idea: dynamic programming

    Let V={1, , n} and for kn, Vk={1, , k}

    For any pair of vertices i, jV, identify all paths from i to j whose intermediatevertices are all drawn from Vk: Pij

    k={p1, p2, }, if Pijk then Rk[i, j]=1

    For any pair of vertices i, j: Rn[i, j], that is Rn

    Starting with R0=A, the adjacency matrix, how to get R1 Rk-1RkRn

    Idea: dynamic programming

    pPijk: p is a path from i to j with all intermediate vertices in Vk

    If k is not on p, then p is also a path from i to j with all intermediate vertices in

    Vk-1: pPijk-1

    In the kth stage determine if a path exists between two vertices i, j using just verticesamong 1, , k R(k-1)[i,j] (path using just 1, , k-1)

    R(k)[i,j] = or

    (R(k-1)[i,k] andR(k-1)[k,j]) (path from i to k

    and from ktoj

    using just 1, , k-1)

    FLOYD WARSHALL ALGORITHM

    PROBLEM STATEMENT

    Find the shortest path between all pairs of vertices and determine the cost of each path.

  • 7/31/2019 DSA Full Material

    36/49

    AIM

    To implement Floyd-Warshall algorithm to find the shortest paths between all pairs ofvertices and to determine the cost of each path.

    ALGORITHM

    FLOYD-WARSHALL (W)

    1. n

  • 7/31/2019 DSA Full Material

    37/49

    Unit 9: GREEDY TECHNIQUE. 3 hrs9.1 Prims Algorithm9.2 Kruskals Algorithm9.3 Dijkstras Algorithm

    Minimum Spanning Trees

    Spanning trees

    Aspanning tree of a graph is just a sub graph that contains all the vertices and is a tree. Agraph may have many spanning trees; for instance the complete graph on four vertices hassixteen spanning trees:

    Minimum spanning trees

    Now suppose the edges of the graph have weights or lengths. The weight of a tree is just thesum of weights of its edges. Obviously, different trees have different lengths. The problem:how to find the minimum length spanning tree?

    Why minimum spanning trees?

    The standard application is to a problem like phone network design. You have a business withseveral offices; you want to lease phone lines to connect them up with each other; and the

    phone company charges different amounts of money to connect different pairs of cities. Youwant a set of lines that connects all your offices with a minimum total cost. It should be aspanning tree, since if a network isn't a tree you can always remove some edges and savemoney.

    A less obvious application is that the minimum spanning tree can be usedto approximately solve the traveling salesman problem. A convenient

    formal way of defining this problem is to find the shortest path that visitseach point at least once.

    How to find minimum spanning tree?

    A better idea is to find some key property of the MST that lets us be surethat some edge is part of it, and use this property to build up the MST oneedge at a time.

    Kruskal's algorithm

    sort the edges of G in increasing order by lengthkeep a subgraph S of G, initially emptyfor each edge e in sorted order

    if the endpoints of e are disconnected in Sadd e to S

    return S

    Note that, whenever you add an edge (u,v), it's always the smallest connecting the part of Sreachable from u with the rest of G, so by the lemma it must be part of the MST.

  • 7/31/2019 DSA Full Material

    38/49

    This algorithm is known as a greedy algorithm, because it chooses at eachstep the cheapest edge to add to S. if you want to find a shortest path froma to b, it might be a bad idea to keep taking the shortest edges. The greedyidea only works in Kruskal's algorithm because of the key property weproved.

    Analysis: The line testing whether two endpoints are disconnected looks likeit should be slow (linear time per iteration, or O(mn) total). But actually

    there are some complicated data structures that let us perform each test inclose to constant time; this is known as the union-find problem and isdiscussed in Baase section 8.5 (I won't get to it in this class, though). Theslowest part turns out to be the sorting step, which takes O(m log n) time.

    Prim's algorithm

    Rather than build a subgraph one edge at a time, Prim's algorithm builds a tree one vertex at atime.

    Prim's algorithm:let T be a single vertex x

    while (T has fewer than n vertices){

    find the smallest edge connecting T to G-Tadd it to T

    }Since each edge added is the smallest connecting T to G-T, the lemma we proved shows thatwe only add edges that should be part of the MST.

    Again, it looks like the loop has a slow step in it. But again, some datastructures can be used to speed this up. The idea is to use a heap toremember, for each vertex, the smallest edge connecting T with thatvertex.

    Prim with heaps:make a heap of values (vertex,edge,weight(edge))

    initially (v,-,infinity) for each vertexlet tree T be empty

    while (T has fewer than n vertices){

    let (v,e,weight(e)) have the smallest weight in the heapremove (v,e,weight(e)) from the heap

    add v and e to Tfor each edge f=(u,v)if u is not already in T

    find value (u,g,weight(g)) in heapif weight(f) < weight(g)replace (u,g,weight(g)) with (u,f,weight(f))

    }Analysis: We perform n steps in which we remove the smallest element in the heap, and atmost 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replacea value on the heap, reducing it's weight. (You also have to find the right value on the heap, butthat can be done easily enough by keeping a pointer from the vertices to the corresponding

    values.) I haven't described how to reduce the weight of an element of a binary heap, but it'seasy to do in O(log n) time. Alternately by using a more complicated data structure known as aFibonacci heap, you can reduce the weight of an element in constant time. The result is a totaltime bound of O(m + n log n).

    http://www.ics.uci.edu/~eppstein/161/people.html#primhttp://www.ics.uci.edu/~eppstein/161/960116.html#binheaphttp://www.ics.uci.edu/~eppstein/161/people.html#primhttp://www.ics.uci.edu/~eppstein/161/960116.html#binheap
  • 7/31/2019 DSA Full Material

    39/49

    The shortest path problem

    Consider the problem of finding the shortest path between nodes s and t in a graph (directed orundirected). We already know an algorithm that will solve it for unweighted graphs BFS.

    Now, what if the egdes have weights? Consider the dist[] array that we used in BFS to store thecurrent shortest known distance from the source to all other vertices. BFS can be thought of asrepeatedly taking the closest known vertex, u, and applying the following procedure to all of its

    neighbours, v.bool relax( int u, int v ) {if( dist[v]

  • 7/31/2019 DSA Full Material

    40/49

    There are several ways to implement Dijkstra' as algorithm. The main challenge is maintaininga priority queue of vertices that provides 3 operations inserting new vertices to the queue,removing the vertex with smallest dist[], and decreasing the dist[] value of some vertex duringrelaxation.We can use a set to represent the queue. This way, the implementation looks remarkablysimilar to BFS. In the following example, assume that graph[i][j] contains the weight of theedge (i, j).

    Example 1: O(n2+(m+n) log(n))Dijkstra'sint graph[128][128]; // -1 means "no edge"int n; // number of vertices (at most 128)int dist[128];// Compares 2 vertices first by distance and then by vertex numberstruct ltDist{

    bool operator()( int u, int v ) const{return make_pair( dist[u], u ) < make_pair( dist[v], v );}

    }void dijkstra( int s ) {for( int i = 0; i < n; i++ ) dist[i] = INT_MAX;dist[s] = 0;set< int, ltDist > q;q.insert( s );while( !q.empty() ) {int u = *q.begin(); // like u = q.front()q.erase( q.begin() ); // like q.pop()for( int v = 0; v < n; v++ ) if( graph[u][v] != 1) {int newDist = dist[u] + graph[u][v];if( newDist < dist[v] ) // relaxation{if( q.count( v ) ) q.erase( v );dist[v] = newDist;q.insert( v );}}}}

    First, we define a comparator that compares vertices by their dist[] value. Note that we can'tsimply do "return dist[u] < dist[v];" because a set keeps only one copy of each unique element,and so using this simpler comparison would disallow vertices with the same dist[] value.Instead, we exploit the built in lexicographic comparison for pairs.The dijkstra() function takes a source vertex and fills in the dist[] array with shortest pathdistances from s. First, all distances are initialized to infinity, except for dist[s], which is set to0. Then s is added to the queue and we proceed like in BFS: remove the first vertex, u, and scanall of its neighbours, v. Compute the new distance to v, and if it's better than our current knowndistance, update it. The order of the 3 lines inside the innermost 'f'i statement is crucial. Notethat the set q is sorted by dist[] values, so we can't simply change dist[v] to a new value what ifv is in q? This is why we first need to remove v from the set, then change dist[v] and after that

    add it.

  • 7/31/2019 DSA Full Material

    41/49

    The running time is n*log(n) for removing n vertices from the queue, plus m*log(n) forinserting into and updating the queue for each edge, plus n*n for running the 'for(v)' loop foreach vertex u. We can avoid the quadratic cost by using an adjacency list, for a total ofO((m+n)log(n)). Another way to implement the priority queue is to scan the dist[] array everytime to find the closest vertex, u.

    Example 2: O(n^2) Dijkstra's

    int graph[128][128], n; // -1 means "no edge"

    int dist[128];bool done[128];void dijkstra( int s ) {for( int i = 0; i < n; i++ ) {dist[i] = INT_MAX;done[i] = false;}dist[s] = 0;while( true ) {// find the vertex with the smallest dist[] valueint u = 1,

    bestDist = INT_MAX;for( int i = 0; i < n; i++ ) if( !done[i] && dist[i] < bestDist ) {u = i;

    bestDist = dist[i];}if( bestDist == INT_MAX ) break;// relax neighbouring edgesfor( int v = 0; v < n; v++ ) if( !done[v] && graph[u][v] != 1) {if( dist[v] > dist[u] + graph[u][v] )dist[v] = dist[u] + graph[u][v];}done[u] = true;}}We have to introduce a new array, done[]. We could also call it "black[]" because it is true forthose vertices that have left the queue. First, we initialize done[] to false and dist[] to infinity.Inside the main loop, we scan the dist[] array to find the vertex, u, with minimal dist[] valuethat is not black yet. If we can't find one, we break from the loop. Otherwise, we relax all of usneighbouring edges. This seemingly low tech method is actually pretty clever in terms ofrunning time. The main while() loop executes at most n times because at the end we always set

    done[u] to true for some u, and we can only do that n times before they are all true. Inside theloop, we do O(n) work in two simple loops. The total is O(n2), which is faster than the firstimplementation as long as the graph is fairly dense ( m >n2 /log(n) ). This is if we do use anadjacency list in the first implementation; otherwise, the second one will almost always befaster). Dijkstra's algorithm is very fast, but it suffers from its inability to deal with negativeedge weights. Having negative edges in a graph may also introduce negative weight cycles thatmake us rethink the very definition of "shortest path".

  • 7/31/2019 DSA Full Material

    42/49

    Unit 9: GREEDY TECHNIQUE. 3 hrs9.1 Prims Algorithm9.2 Kruskals Algorithm9.3 Dijkstras Algorithm

    Minimum Spanning Tree (MST)A minimum spanning tree is a subgraph of an undirected weighted graph G, such that

    it is a tree (i.e., it is acyclic)

    it covers all the vertices V contains |V| - 1 edges

    the total cost associated with tree edges is the minimum among all possible spanningtrees

    not necessarily unique

    Applications of MST

    Any time you want to visit all vertices in a graph at minimum cost (e.g., wire routing onprinted circuit boards, sewer pipe layout, road planning)

    Internet content distribution

    $$$, also a hot research topic

    Idea: publisher produces web pages, content distribution network replicates webpages to many locations so consumers can access at higher speed

    MST may not be good enough!

    content distribution on minimum cost tree may take a long time!

    Prims Algorithm

    Let V ={1,2,..,n} and U be the set of vertices that makes the MST and T be the MST

    Initially : U = {1} and T = while (U V)let (u,v) be the lowest cost edge such that

    u U and v V-UT = T {(u,v)}U = U {v}

    Prims Algorithm implementationInitialization

    a. Pick a vertex rto be the rootb. SetD(r) = 0, parent(r) = null

    c. For all vertices v V, v r, setD(v) = d. Insert all vertices into priority queueP,

    using distances as the keys

  • 7/31/2019 DSA Full Material

    43/49

    Vertex Parent

    e -

    Prims Algorithm

    1. Select the next vertex u to add to the tree

    u = P.deleteMin()

    2. Update the weight of each vertex w adjacent to u which is not in the tree

    (i.e., w P)

    Ifweight(u,w)< D(w),

    a.parent(w) = u

    b.D(w) = weight(u,w)

    c. Update the priority queue to reflectnew distance forw

    Vertex Parent

    e -

    b e

    c e

    d e

    The MST initially consists of the vertex e, and we update the distances and parent for itsadjacent vertices

    The final minimum spanning tree

  • 7/31/2019 DSA Full Material

    44/49

    Vertex Parent

    e -

    b e

    c d

    d e

    a d

    Running time of Prims algorithm

    Initialization of priority queue (array): O(|V|)

    Update loop: |V| calls

    Choosing vertex with minimum cost edge: O(|V|)

    Updating distance values of unconnected vertices: each edge is considered onlyonce during entire execution, for a total of O(|E|) updates

    Overall cost: O(|E| + |V| 2)

    Another Approach Kruskals

    Create a forest of trees from the vertices

    Repeatedly merge trees by adding safe edges until only one tree remains

    A safe edge is an edge of minimum weight which does not create a cycle

    forest: {a}, {b}, {c}, {d}, {e}

  • 7/31/2019 DSA Full Material

    45/49

    Initialization

    a. Create a set for each vertex v V

    b. Initialize the set of safe edgesA, comprising the MST to the empty set.

    c. Sort edges by increasing weight

    {a}, {b}, {c}, {d}, {e}

    A =

    E= {(a,d), (c,d), (d,e), (a,c), (b,e), (c,e), (b,d), (a,b)}

    For each edge (u,v) Ein increasing order while more than one set remains:

    Ifu and v, belong to different sets

    a.A = A {(u,v)}

    b. merge the sets containing u and v

    ReturnA

    Use Union-Find algorithm to efficiently determine ifu and v belong to different sets

    Forest

    {a}, {b}, {c}, {d}, {e}

    {a,d}, {b}, {c}, {e}

    {a,d,c}, {b}, {e}

    {a,d,c,e}, {b}

    {a,d,c,e,b}

    A

    {(a,d)}

    {(a,d), (c,d)}

    {(a,d), (c,d), (d,e)}

    {(a,d), (c,d), (d,e), (b,e)}

    After each iteration, every tree in the forest is a MST of the vertices it connects

    Algorithm terminates when all vertices are connected into one tree

    Like Dijkstras algorithm, both Prims and Kruskals algorithms are greedy algorithms

    The greedy approach works for the MST problem; however, it does not work formany other problems!

    Dijkstra's Algorithm

    Djikstra's algorithm (named after its discover, E.W. Dijkstra) solves the problem of finding theshortest path from a point in a graph (thesource) to a destination. It turns out that one can findthe shortest paths from a given source to allpoints in a graph in the same time, hence this

    problem is sometimes called the single-source shortest paths problem.

    The somewhat unexpected result that allthe paths can be found as easily as one further

    demonstrates the value of reading the literature on algorithms!

    This problem is related to the spanning tree one. The graph representing all the paths from onevertex to all the others must be a spanning tree - it must include all vertices. There will also be

    http://www.cs.auckland.ac.nz/~jmor159/PLDS210/e_w_dijkstra.htmlhttp://www.cs.auckland.ac.nz/~jmor159/PLDS210/e_w_dijkstra.html
  • 7/31/2019 DSA Full Material

    46/49

    no cycles as a cycle would define more than one path from the selected vertex to at least oneother vertex. For a graph,

    G = (V,E) where V is a set of vertices and

    E is a set of edges.

    Dijkstra's algorithm keeps two sets of vertices:

    S the set of vertices whose shortest paths from the source have alreadybeen determined and

    V-S the remaining vertices.

    The other data structures needed are:

    d array of best estimates of shortest path to each vertex

    pi an array ofpredecessors for each vertex

    The basic mode of operation is:

    1. Initialise d and pi,2. Set S to empty,3. While there are still vertices in V-S,

    i. Sort the vertices in V-S according to the current best estimate of their distancefrom the source,

    ii. Add u, the closest vertex in V-S, to S,iii. Relax all the vertices still in V-S connected to u

    Relaxation

    The relaxation process updates the costs of all the vertices, v, connected to a vertex, u, if wecould improve the best estimate of the shortest path to v by including (u,v) in the path to v.

    The relaxation procedure proceeds as follows:

    initialise_single_source( Graph g, Node s )for each vertex v in Vertices( g )

    g.d[v] := infinityg.pi[v] := nil

    g.d[s] := 0;This sets up the graph so that each node has no predecessor (pi[v] = nil) and the estimates ofthe cost (distance) of each node from the source (d[v]) are infinite, except for the source nodeitself (d[s] = 0).

    Note that we have also introduced a further way to store a graph (or part of agraph - as this structure can only store a spanning tree), the predecessor sub-graph - the list of predecessors of each node,

    pi[j], 1

  • 7/31/2019 DSA Full Material

    47/49

    The algorithm itself is now:shortest_paths( Graph g, Node s )

    initialise_single_source( g, s )S := { 0 } /* Make S empty */Q := Vertices( g ) /* Put the vertices in a PQ */while not Empty(Q)

    u := ExtractCheapest( Q );AddNode( S, u ); /* Add u to S */

    for each vertex v in Adjacent( u )relax( u, v, w )

    Operation of Dijkstra's Algorithm

    This sequence of diagrams illustrates the operation of Dijkstra's Algorithm.

    Initial graphAll nodes have infinite cost except the source

    Choose the closest node to s.

    As we initialised d[s] to 0, it's s.

    Add it to S

    Relax all nodes adjacent to s.

    Update predecessor (red arrows) for all nodes updated.

    Choose the closest node, x

    Relax all nodes adjacent to x

    Update predecessors foru, v and y.

  • 7/31/2019 DSA Full Material

    48/49

    Now y is the closest, add it to S.

    Relax v and adjust its predecessor.

    u is now closest, choose it and adjust its neighbour, v.

    Finally, add v.

    The predecessor list now defines the shortest path from eachnode to s.

    Unit 10: BACK TRACKIN, BRANCH &BOUND 5 hrs10.1 n-queens problem10.2 subset- sum problem10.3 Assignment problem10.4 Knapsack problem10.5 Travelling-salesman problem.

    Sum of Subsets Problem.

    Given n distinct positive numbers (usually called weights ) and we desire to find allcombinations of these numbers whose sums are M.This is called the Sum of Subsets Problem.

    Ex : w=(5,7,10,12,16,18,20)

    M=35

    No of elements in set n=6

    Solution space is 26

  • 7/31/2019 DSA Full Material

    49/49

    we have to search solution space to determine the solution of the problem instance

    This searching is facilitated by using tree organization for solution space which is called statespace tree.If depth first node generation strategy is used for the generation of problem state with bounding

    functions it is called Backtracking

    Many problems which deal with searching for a set of solutions satisfying some constraints

    can be solved using Backtracking

    Conditions

    1)Weights are in non decreasing order2)Weight (w1)=m

    AlgorithmSumofsub(s,k,r){

    //Generate the left child X[ k ] := 1; If (s + w[ k ] = m then write (X [1:K] ) Else if (s + w[ k ] + w[ k+1 ] =< m) Then SumOfSub( s + w[ k ], k+1, r -w[ k ]);

    //generate the right child If ((s + r w[ k ]) >= m) and ((s + w[ k+1 ]) =< m) then { X[ k ] :=0; SumOfSub(s, k+1, r -w[ k ]); } }Comparison between Backtracking and Branch & Bound

    Both are used to generate problem states in tree organization. Bounding function is used in both the techniques to kill the nodes. Backtracking is used for constraint satisfaction problems whereas B&B is used for

    optimization problem. Strategy used in backtracking is depth first search.