Formal Language Theory

16
Formal Language Theory

description

Formal Language Theory. Homework. Read documentation on Graphviz http://graphviz.org/ http://www.graphviz.org/pdf/dotguide.pdf Use graphviz to generate figures like these (more or less):. Back to Regular Expressions. 10 . A more interesting example. import re - PowerPoint PPT Presentation

Transcript of Formal Language Theory

Page 1: Formal Language Theory

Formal Language Theory

Page 2: Formal Language Theory

Homework• Read documentation on Graphviz

– http://graphviz.org/– http://www.graphviz.org/pdf/dotguide.pdf

• Use graphviz to generate figures like these (more or less):

Page 3: Formal Language Theory

Back to Regular Expressions

import re

myString="I have red shoes and blue pants and a green shirt. My phone number is 8005551234 and my friend's phone number is (800)-565-7568 and my cell number is 1-800-123-4567. You could also call me at 18005551234 if you'd like.”

phoneNumbersRegEx=re.compile(''1?-?\(?\d{3}\)?-?\d{3}-?\d{4}'')print phoneNumbersRegEx.findall(myString)

10. A more interesting example

Answer is here, but let’s derive it together

Page 4: Formal Language Theory

Formal Definition of Regular Expressions

• <expr> character• <expr> ( <expr> )• Concatenation: <expr> <expr> <expr>• Union: <expr> <expr> + <expr>• Kleene Star: <expr> ( <expr> ) *

• Characters:– lower case: a-z– upper case: A-Z– digits: 0-9– special cases: \t \n– octal codes: \000– any single character: .

Page 5: Formal Language Theory

An Equivalence Relation (=R)• A Partition of S ≡ Set of Subsets of S

– Mutually Exclusive & Exhaustive• Equivalence Classes ≡ A Partition such that

– All the elements in a class are equivalent (with respect to =R)– No element from one class is equivalent to an element from another

• Example: Partition integers into evens & odds• Even integers: 2,4,6…• Odd integers: 1,3,5…

– x =R y x has the same parity as y

• Three Properties– Reflexive: a =R a– Symmetric: a =R b b =R a– Transitive: a =R b & b =R c a =R c

Page 6: Formal Language Theory

>>> for s in wn.synsets('car'): print s.lemma_names['car', 'auto', 'automobile', 'machine', 'motorcar']['car', 'railcar', 'railway_car', 'railroad_car']['car', 'gondola']['car', 'elevator_car']['cable_car', 'car']

>>> for s in wn.synsets('car'): print flatten(s.lemma_names) + ': ' + s.definitioncar auto automobile machine motorcar: a motor vehicle with four wheels; usually

propelled by an internal combustion enginecar railcar railway_car railroad_car: a wheeled vehicle adapted to the rails of railroadcar gondola: the compartment that is suspended from an airship and that carries

personnel and the cargo and the power plantcar elevator_car: where passengers ride up and downcable_car car: a conveyance for passengers or freight on a cable railway

Word Net (Ch2):An Equivalence Relation

Page 7: Formal Language Theory

Synonymy: An Equivalence Relation?

Page 8: Formal Language Theory

Comments

Page 9: Formal Language Theory

A Partial Order (≤R)• Powerset({x,y,z})

– Subsets ordered by inclusion– a≤Rb ab

• Three properties– Reflexive:

• a≤a– Antisymmetric:

• a≤b & b≤a a=b– Transitivity:

• a≤b & b≤c a≤c

Page 10: Formal Language Theory

Wordnet: A Partial Order>>> for h in wn.synsets('car')[0].hypernym_paths()[0]:

print h.lemma_names['entity']['physical_entity']['object', 'physical_object']['whole', 'unit']['artifact', 'artefact']['instrumentality', 'instrumentation']['container']['wheeled_vehicle']['self-propelled_vehicle']['motor_vehicle', 'automotive_vehicle']['car', 'auto', 'automobile', 'machine', 'motorcar']

Page 11: Formal Language Theory

Helps = wn.synsets('car')[0]>>> s.name'car.n.01'>>> s.pos'n'>>> s.lemmas[Lemma('car.n.01.car'), Lemma('car.n.01.auto'),

Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]

>>> s.examples['he needs a car to get to work']>>> s.definition'a motor vehicle with four wheels; usually propelled

by an internal combustion engine'>>> s.hyponyms()[0:3][Synset('stanley_steamer.n.01'),

Synset('hardtop.n.01'), Synset('loaner.n.02')]>>> s.hypernyms()[Synset('motor_vehicle.n.01')]

Page 12: Formal Language Theory

CFGs: Context Free Grammars

(Ch8)

Page 13: Formal Language Theory

Ambiguity

Page 14: Formal Language Theory

• The Chomsky Hierarchy– Type 0 > Type 1 > Type 2 > Type 3– Recursively Enumerable > CS > CF > Regular

• Examples– Type 3: Regular (Finite State):

• Grep & Regular Expressions• Right-Branching: A a A• Left-Branching: B B b

– Type 2: Context-Free (CF): • Center-Embedding: C … x C y• Parenthesis Grammars: <expr> ( <expr> )• w wR

– Type 1: Context-Sensitive (CS): w w– Type 0: Recursively Enumerable– Beyond Type 0: Halting Problem

Page 15: Formal Language Theory

Syntax & Semantics• Syntax: Symbol pushing / Parsing

– Parsing: use context-free grammar to map string tree• Semantics: Meaning (making sense of trees)

– Is synonymy an equivalence relation?

• Dichotomy is important both for– Natural Languages (English, FIGS, CJK, etc.)

• FIGS: French, Italian, German & Spanish• CJK: Chinese, Japanese & Korean

– as well as Artificial Languages• Python, HTML, Javascript, SQL, C

Page 16: Formal Language Theory

Summary

Chapter 1• NLTK (Natural Lang Toolkit)

– Unix for Poets without Unix– Unix Python

• Object-Oriented– Polymorphism:

• “len” applies to lists, sets, etc.• Ditto for: +, help, print, etc.

• Types & Tokens– “to be or not to be”– 6 types & 4 tokens

• FreqDist: sort | uniq –c• Concordances

Chapters 2-8• Chapter 3: URLs• Chapter 2

– Equivalence Relations:• Parity• Synonymy (?)

– Partial Orders: • Wordnet Ontology

• Chapter 8: CF Parsing– Chomsky Hierarchy

• CS > CF > Regular