String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

16
String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University

Transcript of String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Page 1: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

String MatchingChapter 32 Highlights

Charles TappertSeidenberg School of CSIS, Pace

University

Page 2: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

String Matching Problemin this chapter

Problem: Find all valid shifts s with which a given pattern P occurs in a given text T

This problem occurs in text editing, DNA sequence searches, and Internet search engines

Example:

Page 3: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

String Matching AlgorithmsPreprocessing & Matching

Times

Page 4: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Notation and Terminology

(Sigma-star) = set of all finite-length strings of alphabet sigma (eta is empty string)

String w is a prefix of string x, denoted w [ x, if x = wy for some string y

String w is a suffix of string x, denoted w ] x, if x = yw for some string y

Example: ab [ abcca and cca ] abcca

Page 5: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Problem Re-statement

in notation/terminology

Denote a k-char prefix P[1..k] of pattern P by Pk

Similarly, denote a k-char prefix of text T by Tk

Matching problem: Given n = T.length and m = P.length, find all shifts s in range 0<=s<=n-m such that P ] Ts+m

Page 6: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Naïve String Match Algorithm

sliding “template” pattern match

Page 7: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Problem 1-1 How many template comparisons are made? How many were matches and how many non-matches? How many computation units are used?

Problem 1-2 How many computation units are used?

Naïve String Match Algorithm

sliding “template” pattern match

Page 8: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata Algorithm

Efficient – examine each text char only once

Page 9: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata Algorithm

Example: simple two-state finite automaton:

Transition function (delta)

State transition diagram

Page 10: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata Algorithm

Final-state function

Final-state function (phi)

Page 11: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata Algorithm

Construct the automaton

Suffix function (small sigma)

Page 12: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata Algorithm

Construct the automaton

Example:

State m

P = a b a b a c a

Page 13: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata AlgorithmCritical transition function

(delta)

Transition function (delta) obtained from Suffix function (small sigma)

Page 14: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata AlgorithmMatching operation

Transition function (delta)

Page 15: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata AlgorithmCompute transition function

Transition function (delta)

Page 16: String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.

Finite Automata Algorithm

Problem 3-1 Problem 3-2