KMP by Rajeev K Krishna

download KMP by Rajeev K Krishna

of 30

Transcript of KMP by Rajeev K Krishna

  • 8/14/2019 KMP by Rajeev K Krishna

    1/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    The Knuth-Morris-Pratt algorithm

    Kshitij Bansal

    Chennai Mathematical Institute

    Undergraduate Student

    August 25, 2008

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    2/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    IntroductionThe ProblemNaive algorithm

    Knuth-Morris-Pratt algorithm

    Can we do better?Improved algorithmComputing f

    Final remarks

    ConclusionThink/read aboutReferencesThankyou

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    3/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    The Problem

    Given a piece of text, find if a smaller string occurs in it.

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    4/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    The Problem

    Given a piece of text, find if a smaller string occurs in it.

    Let T[1..n] be an array which holds the text. Call the smallerstring to be searched pattern: p[1..m].

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    5/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    Naive Algorithm

    Naive-String-Matcher(T[1..n],P[1..m])

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    6/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    Naive Algorithm

    Naive-String-Matcher(T[1..n],P[1..m])

    1 for i from 0 to n m

    I d i K h M i P l i h Fi l k

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    7/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    Naive Algorithm

    Naive-String-Matcher(T[1..n],P[1..m])

    1 for i from 0 to n m2 if P[1 . . . m] = T[i + 1 . . . i + m]

    I t d ti K th M i P tt l ith Fi l k

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    8/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    Naive Algorithm

    Naive-String-Matcher(T[1..n],P[1..m])

    1 for i from 0 to n m2 if P[1 . . . m] = T[i + 1 . . . i + m]

    3 print Pattern found starting at position , i

    Introduction Knuth Morris Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    9/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    Naive Algorithm

    Naive-String-Matcher(T[1..n],P[1..m])

    1 for i from 0 to n m2 if P[1 . . . m] = T[i + 1 . . . i + m]

    3 print Pattern found starting at position , i

    Time complexity: O(mn). Space complexity: O(1).

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    10/30

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    IntroductionThe ProblemNaive algorithm

    Knuth-Morris-Pratt algorithm

    Can we do better?Improved algorithmComputing f

    Final remarks

    ConclusionThink/read aboutReferencesThankyou

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    11/30

    Introduction Knuth Morris Pratt algorithm Final remarks

    Can we do better?

    Text: abaabaabac

    Pattern: abaabac

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    12/30

    Introduction Knuth Morris Pratt algorithm Final remarks

    Can we do better?

    Text: abaabaabac

    Pattern: abaabac

    The naive algorithm when trying to match the seventh character ofthe text with the pattern fails. It discards all information abouttext read from first seven characters and starts afresh. Can we

    somehow use this information and improve?

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    13/30

    g

    Improved algorithm

    Improved-String-Matcher(T[1..n],P[1..m],f[1..m])

    1 q = 0 (number of characters matched)

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    14/30

    Improved algorithm

    Improved-String-Matcher(T[1..n],P[1..m],f[1..m])

    1 q = 0 (number of characters matched)

    2 for i from 1 to n:

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    15/30

    Improved algorithm

    Improved-String-Matcher(T[1..n],P[1..m],f[1..m])

    1 q = 0 (number of characters matched)

    2 for i from 1 to n:3 if P[q+ 1] = T[i]:

    4 q = q+ 1

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    16/30

    Improved algorithm

    Improved-String-Matcher(T[1..n],P[1..m],f[1..m])

    1 q = 0 (number of characters matched)

    2 for i from 1 to n:3 if P[q+ 1] = T[i]:

    4 q = q+ 1

    5 else if q > 0

    6 q = f[q]

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    17/30

    Improved algorithm

    Improved-String-Matcher(T[1..n],P[1..m],f[1..m])

    1 q = 0 (number of characters matched)

    2for

    ifrom

    1to

    n:3 if P[q+ 1] = T[i]:

    4 q = q+ 1

    5 else if q > 0

    6 q = f[q]7 goto line 3

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    18/30

    Improved algorithm

    Improved-String-Matcher(T[1..n],P[1..m],f[1..m])

    1 q = 0 (number of characters matched)

    2for

    ifrom

    1to

    n:3 if P[q+ 1] = T[i]:

    4 q = q+ 1

    5 else if q > 0

    6 q = f[q]7 goto line 3

    Does this really help? What happens when text is abaabadaba . . .?

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    19/30

    Improved algorithm

    Improved-String-Matcher(T[1..n],P[1..m],f[1..m])

    1 q = 0 (number of characters matched)

    2 for i from 1 to n:

    3 if P[q+ 1] = T[i]:

    4 q = q+ 1

    5 else if q > 0

    6 q = f[q]7 goto line 3

    Does this really help? What happens when text is abaabadaba . . .?Note that q can increase by atmost 1 at each step.

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    20/30

    Computing f

    f[i] denotes the longest proper suffix of P[1 . . . i] which is aprefix of P[1 . . . n]

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    21/30

    Computing f

    f[i] denotes the longest proper suffix of P[1 . . . i] which is aprefix of P[1 . . . n]

    Just match the pattern with itself.

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    22/30

    IntroductionThe ProblemNaive algorithm

    Knuth-Morris-Pratt algorithm

    Can we do better?Improved algorithmComputing f

    Final remarks

    ConclusionThink/read aboutReferencesThankyou

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    23/30

    Conclusion

    KMP algorithm Time complexity: O(n + m)

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    24/30

    Conclusion

    KMP algorithm Time complexity: O(n + m)

    Space complexity: O(m)

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    25/30

    Think/read about

    Number of distinct substrings in a string

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    26/30

    Think/read about

    Number of distinct substrings in a string Multiple patterns and single text

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    27/30

    Think/read about

    Number of distinct substrings in a string Multiple patterns and single text

    Matching regular expresssions

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    28/30

    References

    Thomas H. Cormen; Charles E. Leiserson, Ronald L. Rivest,Clifford Stein (2001). Section 32.4: The Knuth-Morris-Prattalgorithm, Introduction to Algorithms, Second edition, MIT

    Press and McGraw-Hill, 923931. ISBN 978-0-262-03293-3. The original paper: Donald Knuth; James H. Morris, Jr,

    Vaughan Pratt (1977). Fast pattern matching in strings.SIAM Journal on Computing 6 (2): 323350.doi:10.1137/0206024.

    An explanation of the algorithm and sample C++ code byDavid Eppstein:http://www.ics.uci.edu/ eppstein/161/960227.html

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    29/30

    Thank you

    Questions?

    Introduction Knuth-Morris-Pratt algorithm Final remarks

    http://find/http://goback/
  • 8/14/2019 KMP by Rajeev K Krishna

    30/30

    The End

    http://find/http://goback/