1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information...

A simple fast hybrid pattern-

matching algorithm

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Authors: Frantisek Franek, Christopher G. Jennings , W. F. Smyth Publisher: Journal of Discrete Algorithms 2007

Present: Chung-Chan Wu

Date: December 11, 2007

Outline

Introduction Algorithm Description

• KMP (Knuth-Morris-Pratt)

• Boyer-Moore

• Sunday shift

The Hybrid Algorithm (FJS) Extension Experimental Results Conclusions

Introduction

This contribution resides in these categories: • In an effort to reduce processing time, we propose a mixture of

Sunday’s variant of BM with KMP.

• Our goal is to combine the best/average case advantages of Sunday’s algorithm (BMS) with the worst case guarantees of KMP

According to the experiments we have conducted, our new algorithm (FJS) is among the fastest in practice for the computation of all occurrences of a pattern p = p[1..m] in a text string x = x[1..n] on an alphabet Σ of size k.

KMP (Knuth-Morris-Pratt)

Main Feature• Perform the comparisons from left to right

• Space and time complexity ： O(m)

• Searching phase ： O(m+n)

• A pre-compute table called pi-table to compare backward.

• The π value will avoid another immediate mismatch

• the character of the prefix in the pattern must be different from the character comparing presently.

• The best worst case running time in software algorithm.

KMP (Knuth-Morris-Pratt)

index 0 1 2 3 4 5 6 7

pattern[ i ] G C A G A G A G

π-value -1 0 0 -1 1 -1 1 -1

Input string:pattern:

GCA T CGC AGAGAG T A T A C AG T A CG

GC AGAGAG

Boyer-Moore

Main Feature• Performs the comparisons from right to left

• Preprocessing phase ： O(m+δ) in Space and time complexity

• Searching phase ： O(mn)

• A pre-compute table called delta_1 and delta_2.

• Perform well in best / average case.

BM - Observation 1

If char is known not occur in pattern, then we know we need not consider the possibility of an occurrence of the pattern.

contains no bbad-character shift

b does not occur in the pattern, use δ1

BM - Observation 2

If the rightmost occurrence of char in pattern is δ1 characters from the right end of pattern, then we know we can slide pattern down δ1 positions without checking for matches.

bad-character shift

b occurs in the pattern, use δ1

contains no bb

BM - Observation 3(a)

The good-suffix shift consists in aligning the segment

y[i+j+1 … j+m-1] = x[i+1 … m-1] with its rightmost occurrence in x that is preceded by a character different from x[i].

good-suffix shift

u reoccurs in pattern preceded by c ≠ a, use δ2

BM - Observation 3(b)

If there exists no such a segment, the shift consists in aligning the longest suffix v of y[i+j+1 … j+m-1] with a matching prefix of x.

good-suffix shift

Only a suffix v of u reoccurs in pattern, use δ2

Boyer-Moore Example

δ1 A E L M P X rest

shift 4 6 1 3 2 5 7

δ2 E X A M P L E

shift 12 11 10 9 8 7 1

HERE I S A S I MPLE EXAMPLE

EXAMPLE

Sunday Shift

shift 4 6 1 3 2 5 7

prefix b

m p l eaxe

shift 5 1 2 4 3 6 8

prefix b

m p l eaxe

Boyer Moore

Sunday Shift

FJS Algorithm

Definitions• Search p = p[1..m] in x = x[1..n]

• by shifting p from left to right along x.

• position j = 1 of p is aligned with a position i 1∈ ..n − m + 1 in x

• partial match: if a mismatch occurs at j >1, we say that a partial match has been determined with p[1..j − 1].

• i’ = i + m - j

i i’

FJS Algorithm

Strategy• Whenever no partial match of p with x[i..i + m − 1] has been

found,

• Sunday shifts are performed to determine the next position i’ at which x[ i’ ] = p[m]. When such an i has been found, KMP matching is then performed on p[1..m− 1] and x[i −m+ 1..i − 1].

• If a partial match of p with x has been found, KMP matching is continued on p[1..m].

• once a suitable i’ has been found, the first half of FJS just performs KMP matching in a different order:

• position m of p is compared first, followed by 1, 2, . . . , m − 1

FJS Algorithm

Pre-processing• Sunday’s array Δ = Δ[1..k], computable in O(m + k) time.

• KMP array β’ = β’[1..m+1], computable in O(m) time.

FJS Algorithm

index 0 1 2 3 4 5 6 7

pattern[ i ]

G C A G A G A G

π-value -1 0 0 -1 1 -1 1 -1δ1 A C G rest

shift 2 7 1 9

GCA T CGCAGAGAGT A T ACAGT ACG

GCAGAGAG

Extension

The alphabet-based preprocessing arrays of BM-type algorithms are their most useful feature, but they can be a source of trouble as well.

The ASCII alphabet:

• Text were usually of 8 bits or less.

• The processing time can be regardless. The natural language text:

• Wide characters

• DNA data: {A, T, C, G} is mapping into {00, 01, 10, 11}, the alphabets of size varying by powers of 2 from 2 to 64.

• An example DNA : ACTG

• The preprocessing time is a bottleneck.

42 )2(

Environment

Experiment – Frequency

These patterns occur 3,366,899 times

Experiment – Pattern Length

Experiment – Alphabet Size

Experiment – Pathological Cases

Conclusion

We have tested FJS against four high-profile competitors (BMH, BMS, RC, TBM) over a range of contexts:

• pattern frequency (C1), pattern length (C2), alphabet size (C3), and pathological cases (C4).

FJS was uniformly superior to its competitors, with an up to 10% advantage over BMS and RC

For FJS the pathological cases (C4) are those in which the KMP part is forced to execute on prefixes of patterns where KMP provides no advantage

we presented a hybrid exact pattern-matching algorithm, FJS, which combines the benefits of KMP and BMS. It requires

O(m + k) time and space for preprocessing

1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information...

Documents

Transcript of 1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information...

National Cheng Kung University ... - adms-acad.ncku.edu.tw

Chapter 8 Microwave sensing Introduction to Remote Sensing Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng-Kung.

HTML Darby Tien-Hao Chang Department of Electrical Engineering National Cheng Kung University.

National Chung Cheng University,Taiwan,R.O.C eCos demo using x86 PCs I-Hung Lin Date:2004/4/29.

September 15, 2015 National Cheng Kung University, Tainan ... · I Preface Both of the NCKU-DPRC (the Disaster Prevention Research Center, National Cheng Kung University, Taiwan)

Presentation at National Center for Theoretical Sciences & National Cheng Kung University

Response of the Upper Ocean to Winds Physical oceanography Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng Kung.

Guide to the Department - Graduate Physics and Astronomy ...graduate.physics.sunysb.edu › orientation › 2018 › welcome.pdf · Cheng-Tsung Tsai (National Cheng Kung) Makato Tsuneto

Case Study: National Cheng Kung University...Executive summary Taiwan’s National Cheng Kung University (NCKU), a leading comprehensive university, has long enjoyed high rankings

Primer Design and Computer Program Sean Tsai ©2008, National Cheng Kung University Medical College.

Chapter 4 Photogrammetry Introduction to Remote Sensing Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Science National Cheng-Kung.

Image Restoration Digital Image Processing Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng Kung University Last.

Geostrophic Currents Physical oceanography Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng Kung University Last.

Deep Circulation in the Ocean Physical oceanography Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng Kung University.

Annual Report Of National Cheng Kung University Chapter · Annual Report Of . National Cheng Kung University Chapter . ... seminar in 2016 fall ... glasses, Holography, Li-Fi studio

Thopson NFA Presenter: Yuen-Shuo Li Date: 2014/5/7 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Take back the web - Cheng Kung University team review

The - IARIA Journals · Salvador E. Venegas-Andraca, Tecnológico de Monterrey / Texia, SA de CV, Mexico Szu-Chi Wang, National Cheng Kung University, Tainan City, Taiwan R.O.C. ...

ELI: Bare-Metal Performance for I/O Virtualization 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

Two-Phase Flow in Microchannels with Application to PEM ... · B.Sc., Chung Yuan Christian University, 1993 M.Sc., National Cheng Kung University, 1995 Ph.D., National Cheng Kung