KMP Algorithm - DSA

51
KMP Algorithm Knuth-Morris-Pratt

description

Knuth–Morris–Pratt algorithm

Transcript of KMP Algorithm - DSA

  • KMP AlgorithmKnuth-Morris-Pratt

  • String Searching Problem

    Input: A word string W and a text string S

    Check if W exists as a substring of S, and if it does then return its location.

    Output: The position in S at which W is found

  • Brute Force

    S:

    W:

    A B C A B C A B A B A C

    C A B A

  • Brute Force

    S:

    W:

    A B C A B C A B A B A C

    C A B A

  • Brute Force

    S:

    W:

    A B C A B C A B A B A C

    C A B A

  • Brute Force

    S:

    W:

    A B C A B C A B A B A C

    C A B A

  • Brute Force

    S:

    W:

    A B C A B C A B A B A C

    C A B A

  • Brute Force

    S:

    W:

    A B C A B C A B A B A C

    C A B A

  • Brute Force

    S:

    W:

    A B C A B C A B A B A C

    C A B A

  • Worst Case of Brute Force

    S:

    W:

    A A A A A A A A A A A A

    A A A C

    A A C

  • Worst Case of Brute Force

    S:

    W:

    A A A A A A A A A A A A

    A A A C

    A A C

    If |S|=n, |W|=m then the algorithm runs in O(mn) time.

  • Better AlgorithmsBackward AlgorithmBoyer and Moore AlgorithmColussi AlgorithmCrochemore and Perrin AlgorithmGalil Gianardo AlgorithmGalil and Seiferas AlgorithmHorsepool Algorithm Knuth Morris and Pratt AlgorithmKMP Skip AlgorithmMax-Suffix Matching AlgorithmMorris and Pratt AlgorithmQuick Searching Algorithm

    Raita AlgorithmReverse Factor AlgorithmReverse Colussi AlgorithmSelf Max-Suffix AlgorithmSimon Algorithm Skip Search AlgorithmSmith Algorithm Tuned Boyer and Moore AlgorithmTwo Way AlgorithmUniqueness Algorithm Wide Window AlgorithmZhu and Takaoka Algorithm

  • KMP

    Linear Time Avoids comparisons with elements of S that

    have already been involved in a comparison, i.e. backtracking in S never occurs

    Time: O(m+n) Space: O(m+n)

  • KMP Differs from brute force by always keeping

    track of the information that it gains from previous comparisons

    A failure function or partial matching table (T) is computed which tells us how much of the last comparison can be reused if it fails

    T[i]=the longest prefix of W that is also a proper suffix of W[0..i]

  • KMPT shows how much of the beginning of W matches up to the portion of S immediately preceding the failed comparison.

    . . A B C A B C A B A .

    A B C A B A

    A B C A B A

    No need to repeat these comparisonsResume comparing here

  • Sliding Window Approach

    Nearly all exact string matching algorithms use the slide window approach

    Whenever a mismatch is found, slide the window to the right

  • Sliding Window Approach

    Nearly all exact string matching algorithms use the slide window approach

    Whenever a mismatch is found, slide the window to the right

  • Suffix to Prefix RuleFor a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern.

  • KMPT shows how much of the beginning of W matches up to the portion of S immediately preceding the failed comparison.

    . . A B C A B C A B A .

    A B C A B A

    A B C A B A

    No need to repeat these comparisonsResume comparing here

  • KMPT shows how much of the beginning of W matches up to the portion of S immediately preceding the failed comparison.

    . . A B C A B C A B A .

    A B C A B A

    A B C A B A

    No need to repeat these comparisonsResume comparing here

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP example

    mSWi

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

    A B C A B C D A A B C D A B C D A D

    A B C D A D

    0 1 2 3 4 5

  • KMP

    Calculating the longest valid suffix during runtime will be very inefficient

    Pre-processing can eliminate the problem, as the suffix also exists in W itself

  • KMP

    The algorithm preprocesses the word W to produce the prefix function, which gives the number of steps the pattern can skip for every possible location of a mismatch

  • Components of KMP

    Compute Prefix Function: For a given W, compute a table T of equal length where T[i] gives the length of the longest prefix of W that is also a proper suffix of W[0..i].

    KMP Matcher Function: Actual searching.

  • Example of a prefix function

    WT

    A C A A C A C B

    0

  • Example of a prefix function

    WT

    A C A A C A C B

    0 0

  • Example of a prefix function

    WT

    A C A A C A C B

    0 0 1

  • Example of a prefix function

    WT

    A C A A C A C B

    0 0 1 1

  • Example of a prefix function

    WT

    A C A A C A C B

    0 0 1 1 2

  • Example of a prefix function

    WT

    A C A A C A C B

    0 0 1 1 2 3

  • Example of a prefix function

    WT

    A C A A C A C B

    0 0 1 1 2 3 2

  • Example of a prefix function

    WT

    A C A A C A C B

    0 0 1 1 2 3 2 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Example

    SWT

    A C B A C A A C A A C A C A A C A B

    A C A A C A B

    0 0 1 1 2 3 0

  • Matcher FunctionKMP(String S, String W):

    set T to prefixFunc(W) //Compute the partial match table

    set q to 0 //Candidate character of W initially 0

    for every i in range 0 to n-1

    while q>0 and W[q] is not equal to S[i]

    set q to T[q-1] //Mismatch, backtrack if you can

    if W[q] is equal to S[i]

    increment q //Match, move to next character

    if q is equal to m

    print i-m+1 //Entire W has been found

    set q to T[q-1] //Find others

  • Prefix FunctionprefixFunc(List W):

    set T[0] to 0 //Set first element of table to 0

    set k to 0 //Candidate character initially 0

    for every q in range 1 to m-1

    while k>0 and W[k] is not equal to W[q]

    set k to T[k-1] //Mismatch, backtrack if possible

    if W[k] is equal to W[q]

    increment k //Match, move to next character

    Set T[q] to k //Store result

    return T

  • Runtime AnalysisAlthough the algorithm as implemented here contains a loop within a loop, it runs in linear time. This is because the backtracking statement, which essentially shifts the sliding window to the right, can only execute a maximum of n times in the entire run of the for loop. The remaining body of the for loop runs executes exactly n times itself, giving a runtime of O(n) for the matching function.Similar reasoning applies to the prefix function.