Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David...

36
cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans

Transcript of Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David...

Page 1: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

cs3102: Theory of Computation

Class 10: DFAs in Practice

Spring 2010University of VirginiaDavid Evans

Page 2: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Menu

• Today:– Preparing for Exam 1– Language class for Deterministic PDAs– Applications of DFAs

• Thursday:– Exam Review (if you send questions and/or topics)– Applications of probabilistic DFAs and Grammars

Page 3: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Exam 1

• In class, next Tuesday, 2 March • Covers:

Classes 1-9(10 and 11)

Sipser Ch 0-2

Problem Sets 1-3 + Comments

Exam 1

Note: unlike nearly all other sets we draw in this class, all of these sets are finite, and the size (roughly) represents the relative size.

Page 4: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

What’s on the Exam?Definitions

Language, problem, setsConstructing and understanding computing models

Finite automata (DFA, NFA)Pushdown automata (DPDA, NPDA)Grammars (Context-Free Grammar)

Language Classes: Regular and Context FreeShow a language is in the classShow a language is not in the classProve or disprove a closure property

Proof MethodsProof by InductionProof by ConstructionUnderstand and use the pumping lemmas for RL and CFL

Sample exam on website should give you a good idea what to expect

Your exam will probably also have “what’s wrong with this proof” questions

Page 5: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Exam 1 Notesheet

For Exam 1, you may use only:– Your own brain and body– A low-tech writing instrument (pen or pencil) – A single page (both sides) of notes that you create

You may work with others to create your notes page.

Page 6: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Admiral Grace Hopper

John von Neumann

Albert Einstein

Page 7: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Exam Help Available

• Office Hours:– Thursdays, 8:30-9:30am– Thursdays, after class– Fridays, 10-11:30am (Sonali in Stacks)– Mondays, 1:15-3pm

• TA’s Exam Review Session– This Sunday, 5-6:30pm, Olsson 228E

Page 8: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

s

All Languages

RegularLanguages

(DFA, NFA, RE, RG)

Finite Languages

Context-Free(CFG or NPDA)

w

an

anbncn

ww

Where are the languages recognized by a Deterministic PDA?

Page 9: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Set Equivalence

A = B A B and B A

Sets A and B are equivalent if A is a subset of B and B is a subset of A.

BA

A B B A

Page 10: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Formalism Equivalence

Page 11: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Formalism Equivalence

Page 12: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Formalism Non-Equivalence

Page 13: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

s

All Languages

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free(CFG or NPDA)

Which of these could be true?

anbn

Page 14: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

How can we distinguish these two plausible possibilities?

Page 15: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

How can we distinguish these two plausible possibilities?

Find some language A that can be recognized by some NPDA but not by any DPDA.

A

Prove by construction: for any NPDA, there is a DPDA that recognizes the same language.

Page 16: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.
Page 17: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

ε, ε$

a, ε+

ε, εε

b, +εε, $ ε

ε, ε

εb, +ε

b, εεε, $ ε

Page 18: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proof by contradiction: Assume there is a DPDA that recognizes A. Show how to construct a NPDA that recognizes some language we know is not context free.

Proved by construction: We showed an NPDA that recognizes A.

Page 19: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proof by contradiction. Suppose there is a DPDA M that recognizes A.It must be in an accept state only after processing aibi and aib2i.

…a, αβ b, αβ

2i transitions, consuming 0i1i

…b, αβ b, αβ

i transitions, consuming 1i

Construct M’: copy all the states on the second half, replacing b with c:

…a, αβ b, αβ …c, αβ c, αβ

What is the language of M’?

Page 20: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proof by contradiction. Suppose there is a DPDA M that recognizes A.It must be in an accept state only after processing aibi and aib2i.

…a, αβ b, αβ …b, αβ b, αβ

Construct M’: copy all the states on the second half, replacing b with c:

…a, αβ b, αβ …c, αβ c, αβ

Not a Context-Free Language!

We have a contradiction: if A is in L(DPDA), we could use the DPDA that recognizes A to construct an DPDA that recognizes a non-context-free language! Hence, A must not be in L(DPDA).

Page 21: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

s

All Languages

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free(CFG or NPDA)

anbn

A

Deterministic Context-Free LanguagesRecognized by a DPDA (or DCFG)

Context-Free Languages DeterministicContext-Free Languages

Regular Languages

Page 22: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

DFAs in Practice

Page 23: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

MalwareScanner

W32.Bolzano.Gen: 576a222bd2c20400558b4c240cd9ffff07fbffffff{0-2}5c4e544c445200{0-2}5c57494e4e545c73797374656d33325c6e746f736b726e6c2e65786500{0-29}3b4658

W32.MyLife.E: 7a6172793230*40656d61696c2e636f6d

Note: These are the signatures from ClamAV, an open source virus scanner.

FilesNetwork

Traffic

Page 24: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

String Matching

q0 q1 q2 q3 q4 q5

t r u t h

We hold these truths to be self-evident, that …

How much work is it to scan a string of length N for a signature?

Page 25: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Faster String Matching

q0 q1 q2 q3 q4 q5

t r u t h

We hold these truths to be self-evident, that …

s[4] = h?s[10] = h?

truthtruth

s[9] = t?s[8] = u?

truthtruth

truthSkip table:a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q,

r, s, v, w, x, y, z: 6h: 0r: 4t: 1u: 2

Page 26: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

DFA / Skipping DFA

Is a “Skipping DFA” still a DFA?

(That is, does it still only accept the Regular Languages?)

Page 27: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

J. Strother Moore (UT Austin)

Boyer-Moore Fast String Searching Algorithm (1977)

Best case: N/(w+1) comparisons where N is the length of the text and w is the length of the search string

Is this fast enough for a malware scanner?

Page 28: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Virus Detection

Total number of signatures: 720,033

2

4

6

8

10

12

11/01 05/02 12/02 06/03 01/04 08/04 02/05 09/05 03/06

Date

Size

(MB

)Symantec

RAV AV

Nate Paul’s study

Can we scan one input for many possible malware signatures quickly?

Page 29: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Combining DFAs?Regular languages closed under union:

q0

qA0

qB0

qA1

qB1

ε

ε

a

a

How many states are there now?

Page 30: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Signatures

First byte: Set of signatures:00000000 ~720000/25600000001 ~720000/25600000010 ~720000/256…11111111 ~720000/256

Page 31: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Try a Trie

q0

q00

q01

q02

qFF

0x00

0x01

0x02

0xFF

q0000

q0001

q0002

q01FF

0x00

0x01

0x02

0xFF…

720000/(256*256) ~ 11

Alfred V. Aho and Margaret J. Corasick, 1975

q0000Alureona

0x02

Page 32: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Scanner Demo

http://www.virustotal.com

Page 33: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Evasive Malware

Metamorphic Code: as virus propagates, each new copy is different

How hard is it to automatically modify code without changing its behavior?

Page 34: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Detecting Evasive Malware

• Less exact signatures (e.g., W32.MyLife.E:

7a6172793230*40656d61696c2e636f6d)– Dangerous – start matching benign programs if you’re not

careful!• Behavioral signatures: match the behavior, not the

program text– Undecidable in general (we’ll see in a few weeks)– Expensive and difficult in practice (but done by all decent

scanners)

Page 35: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Faster String Scanning

Page 36: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Charge

• We focus on DFAs, NFAs, PDAs, CFGs, etc. as abstract models: Number of states, time to process, etc. don’t matter

• Lots of real applications of these models: but in practice, what matters is different

If you have topics you want me to review, post comments (on today’s class announcement) by 5pm tomorrow.