String MatchingString Matching AlgorithmsAlgorithms
• Introduction (32)Introduction (32)• Naïve String Matching Algorithm (32.1)Naïve String Matching Algorithm (32.1)
IntroductionIntroduction
• DefinitionsDefinitions
3232
- Formal Definition of String Matching Problem- Formal Definition of String Matching Problem
- Assume text is an array - Assume text is an array T[1..n] of length n and the T[1..n] of length n and the pattern is an array P[1..m] of pattern is an array P[1..m] of length m ≤ nlength m ≤ n
Explanation:Explanation:
This basically means that there is a string This basically means that there is a string array T which contains a certain number of array T which contains a certain number of characters that is larger than the number of characters that is larger than the number of characters in string array P. P is said to be characters in string array P. P is said to be the pattern array because it contains a the pattern array because it contains a pattern of characters to be searched for in pattern of characters to be searched for in the larger array T.the larger array T.
• DefinitionsDefinitions
3232
- Alphabet- Alphabet
- It is assumed that the - It is assumed that the elements in P and T are elements in P and T are drawn from a finite alphabet drawn from a finite alphabet ΣΣ. .
-Example-Example
ΣΣ = {a,b, …z} = {a,b, …z}
ΣΣ = {0,1} = {0,1}
Sigma simply defines what characters are Sigma simply defines what characters are allowed in both the character array to be allowed in both the character array to be searched and the character array that searched and the character array that contains the subsequence to be searched for.contains the subsequence to be searched for.
Explanation:Explanation:
3232
• DefinitionsDefinitions- Strings- Strings
- - ΣΣ* denotes the set of all * denotes the set of all finite length strings formed finite length strings formed by using characters from the by using characters from the alphabetalphabet- The zero length empty - The zero length empty string denoted by string denoted by εε and is a and is a member of member of ΣΣ* * - The length of a string x is - The length of a string x is denoted by |x|denoted by |x|
- The concatenation of two - The concatenation of two strings x and y, denoted xy, strings x and y, denoted xy, has length |x| + |y| and has length |x| + |y| and consists of the characters in consists of the characters in x followed by the characters x followed by the characters in yin y
3232
• DefinitionsDefinitions- Shift- Shift
- If P occurs with shift s in T, - If P occurs with shift s in T, then we call s a valid shiftthen we call s a valid shift
-If P does not occurs with -If P does not occurs with shift s in T, we call s an shift s in T, we call s an invalid shiftinvalid shift
3232
- String Concatenation Example- String Concatenation Example
ΣΣ = {A,B,C,D,E,H,1,2,6,9} = {A,B,C,D,E,H,1,2,6,9}
String X = A125 , |X| = 4String X = A125 , |X| = 4 String Y = HE69D, |Y| = String Y = HE69D, |Y| = 55
• DefinitionsDefinitions
The Concatenator
String Z = A125HE69D, |x| = 9String Z = A125HE69D, |x| = 9
The Concatenator
3232
- String Concatenation Example- String Concatenation Example
ΣΣ = {A,B,C,D,E,H,1,2,6,8} = {A,B,C,D,E,H,1,2,6,8}
String X = A125 , |X| = 4String X = A125 , |X| = 4 String Y = HE69D, |Y| = 5String Y = HE69D, |Y| = 5
• DefinitionsDefinitions
String Z = A125HE69D , |x| = 9String Z = A125HE69D , |x| = 9
• DefinitionsDefinitions- Prefix- Prefix
- String w is a prefix of a - String w is a prefix of a string x if x = wy for some string x if x = wy for some string y string y εε ΣΣ**- w[x means that string w is a - w[x means that string w is a prefix of string xprefix of string x
Explanation:Explanation:
If a string w is a prefix of siring x this means If a string w is a prefix of siring x this means that there exists some string y that when that there exists some string y that when added onto the back of string w will make w = added onto the back of string w will make w = xx
**==
String YString W
String X
3232
3232
• DefinitionsDefinitions- Prefix Examples- Prefix Examples
To Prefix Or Not To PrefixTo Prefix Or Not To Prefix
ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}
ExamplesExamples: :
String x = AABBAABBABABString x = AABBAABBABAB
String w =AABBAAString w =AABBAA
Is w[x ?Is w[x ? Why?Why?
3232
• DefinitionsDefinitions
To Prefix Or Not To PrefixTo Prefix Or Not To Prefix
ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}
ExamplesExamples: :
String x = AABBAABBABABString x = AABBAABBABAB
String w =AABABAAString w =AABABAA
Is w[x ?Is w[x ? Why?Why?
• DefinitionsDefinitions- Suffix- Suffix
- String w is a suffix of a - String w is a suffix of a string x if x = yw for some y string x if x = yw for some y εε ΣΣ**- w]x means that string w is a - w]x means that string w is a suffix of string xsuffix of string x
Explanation:Explanation:
If a string w is a suffix of string x this means If a string w is a suffix of string x this means that there exists some string y that when that there exists some string y that when added onto the front of string w will make w = added onto the front of string w will make w = xx
**==
String WString Y
String X
3232
3232
• DefinitionsDefinitions- Suffix Examples- Suffix Examples
Et Tu Suffix?Et Tu Suffix?
ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}
ExamplesExamples: :
String x = AABBAABBABABString x = AABBAABBABAB
String w = BABBAString w = BABBA
Is w[x ?Is w[x ? Why?Why?
3232
• DefinitionsDefinitions- Suffix Examples- Suffix Examples
Et Tu Suffix?Et Tu Suffix?
ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}
ExamplesExamples: :
String x = AABBAABBABABString x = AABBAABBABAB
String w = BABABString w = BABAB
Is w[x ?Is w[x ? Why?Why?
Naïve String Matching AlgorithmNaïve String Matching Algorithm
• DefinitionsDefinitions
3232
- Formal Definition of String Matching Problem- Formal Definition of String Matching Problem
- Assume text is an array - Assume text is an array T[1..n] of length n and the T[1..n] of length n and the pattern is an array P[1..m] of pattern is an array P[1..m] of length m ≤ nlength m ≤ n
Explanation:Explanation:
This basically means that there is a string This basically means that there is a string array T which contains a certain number of array T which contains a certain number of characters that is larger than the number of characters that is larger than the number of characters in string array P. P is said to be characters in string array P. P is said to be the pattern array because it contains a the pattern array because it contains a pattern of characters to be searched for in pattern of characters to be searched for in the larger array T.the larger array T.
32.132.1
• Basic ExplanationBasic Explanation
The Naïve String Matching The Naïve String Matching Algorithm takes the pattern that is Algorithm takes the pattern that is being searched for in the “base” being searched for in the “base” string and slides it across the base string and slides it across the base string looking for a match. It keeps string looking for a match. It keeps track of how many times the track of how many times the pattern has been shifted in pattern has been shifted in varriable s and when a match is varriable s and when a match is found it prints the statement found it prints the statement “Pattern Occurs with Shirt s” .“Pattern Occurs with Shirt s” .This algorithm is This algorithm is
also sometimes also sometimes known as the known as the Brute Force Brute Force algorithmalgorithm..
32.132.1
• Algorithm Pseudo CodeAlgorithm Pseudo Code
NAÏVE-STRING-MATCHER(T,P)NAÏVE-STRING-MATCHER(T,P)1 N ← length [T]N ← length [T]2 M ← length[P]M ← length[P]3 For s ← 0 to n –mFor s ← 0 to n –m4 do if P[1…m] = T[s+1 .. S+m]do if P[1…m] = T[s+1 .. S+m]5 then print “Pattern Occurs with shift” sthen print “Pattern Occurs with shift” s
- Examples- Examples
- - Demonstration 1
- - Demonstration 2
• Algorithm Time AnalysisAlgorithm Time Analysis
NAÏVE-STRING-MATCHER(T,P)NAÏVE-STRING-MATCHER(T,P)1 N ← length [T]N ← length [T]2 M ← length[P]M ← length[P]3 For s ← 0 to n –mFor s ← 0 to n –m4 do if P[1…m] = T[s+1 .. S+m]do if P[1…m] = T[s+1 .. S+m]5 then print “Pattern Occurs with shift” sthen print “Pattern Occurs with shift” s
- The worst case is when the algorithm has a - The worst case is when the algorithm has a substring to find in the string it is searching substring to find in the string it is searching that is repeated throughout the whole string. that is repeated throughout the whole string. An example of this would be a substring of An example of this would be a substring of length alength amm that is being searched for in a that is being searched for in a substring of length asubstring of length ann..
• Algorithm Time AnalysisAlgorithm Time Analysis
The algorithm is O((n-m)+1)*m) The algorithm is O((n-m)+1)*m)
n = length of string being n = length of string being searchedsearchedm = length of substring being m = length of substring being comparedcompared
Inclusive subtractionInclusive subtraction
- The Naïve String Matcher is not an optimal - The Naïve String Matcher is not an optimal solutionsolution- It is inefficient because information gained - It is inefficient because information gained about the text for one value of s is entirely about the text for one value of s is entirely ignored in considering other values of s.ignored in considering other values of s.
Comments:Comments:
32.132.1
• QuestionsQuestions
Top Related