Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

26
Lecture 21: Languages and Grammars

Transcript of Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Page 1: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Lecture 21:

Languages and Grammars

Page 2: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Natural Language vs. Formal Language

Page 3: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Rules of English Grammarverifying a sentence

Page 4: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Phrase Structure Grammar

Page 5: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

An Example

Page 6: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Another Example

Page 7: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Example

Page 8: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Example

Page 9: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Example

Page 10: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Example

Page 11: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Example

Page 12: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Types of Phrase Structure Grammars

Page 13: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

A grammar is is used to generate or evaluate members of a formal language.  the context-free grammar G1 below, generates the language {0n1n for n>=0}.

A -> 0A1

A -> (the empty string)

A grammar is a set of production rules as shown above.  A production rule is comprised of a variable, a right-arrow, and a string of variables and terminals. 

The variables are used to control the arrangement of possible substitutions and the terminals are symbols from the alphabet of the language being generated or recognized. 

By convention, variables are represented by uppercase letters.  One of the variables is designated as the start variable, usually the left-hand symbol of the first rule in the grammar.

Introduction to Grammars

Page 14: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

There are usually choices for applying production rules, which means that in order to create a particular member of a language we need to specify an order of application of the rules of the grammar, called a derivation. 

For example, the following is a derivation to show that 00001111 is a member of the language defined by G1.

start     A

rule 1    0A1

rule 1    00A11

rule 1    000A111

rule 1    0000A1111

rule 2    00001111

Applying Production Rules

Page 15: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

All the member strings that can be generated through derivations using G1 are collectively referred to as the language of the grammar G1 or L(G1). 

We say that a context-free grammar is one in which candidate strings are accepted or rejected as members of L(G1) based strictly on their structures and not based on the context in which they appear.  The syntax of a programming language is (mostly) definable with a context-free grammar. 

For example, the reserved words of a programming language are parsed and recognized as reserved words regardless of their context. 

(Can you think of an example in a programming language in which context is important?)

The Language of the Grammar

Page 16: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Formal Definition of a Context-Free Grammar

A context-free grammar is a 4-tuple (V,,R,S), where

1. V is a finite set of variables

2. is a finite set, disjoint from V, called the terminals

3. R is a finite set of rules, with each rule being a variable and a string of variables and terminals, and

4. S is an element of V called the start variable.

If u, v, and w are strings of variables and terminals, and A -> w is a rule of the grammar, we say that uAv yields uwv, written uAv -> uwv. 

Write u v if u=v or if a sequence u1,u2,. . ., uk exists for k>=0 and,

u -> u1 -> u2 -> . . . -> uk -> v.

->*

Page 17: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

A Practical Example - The following context-free grammar can be used to derive (or recognize) in-fix arithmetic expressions in x,y and z.  The start variable is S and the termials are +, -, *, /, (, ), x, y, and z.

S -> T+S | T-S | T T -> T*T | T/T | (S) | x | y | z

Use the grammar above to show that (x+y)/(x-y)*z is a valid in-fix expression.

S -> T -> T/T -> T/T*T -> ... -> (S)/(S)*T -> ... -> (T+S)/(T-S)*T -> ... -> (x+y)/(x-y)*z

An Example

Page 18: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

How do we go about demonstrating that a particular string is not derivable with a given context-free grammar?

For example how would we use the grammar above to show that (x+y)*/z is not a valid in-fix expression? 

The problem we are dealing with is that we have no means to choose which rules to apply or in which order to apply them. 

If the same grammar generates the same string in more than one way, we say that the grammar is ambiguous. 

Using a grammar to generate arbitrary parts of a candidate string in a random order can lead to an exponential number of alternate paths to test. 

What we need is a systematic method for applying the rules to show that a particular string is or is not derivable.

The Problem with Grammars

Page 19: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Chomsky Normal Form

A context-free grammar is said to be in Chomsky normal form (CNF) when every rule has the form

A -> BC A -> a

where a is any terminal and A, B, and C are any variables except B and C are not the start variable. A rule that replaces the start variable with nothing (i.e. S -> e ) is permitted. 

We can show that "Any context-free language is generated by a context-free grammar in Chomsky normal form."  Perhaps more important to us than the proof itself is the proof idea which we can use to convert a context-free grammar into a CNF grammar.

Page 20: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

CNF grammars are very useful since they simplify derivations making them more easily implemented as computer algorithms.  The method of conversion of a context-free grammar into CNF is detailed below:

1. Create a new start variable.

2. Eliminate all rules of the form A -> .

3. Eliminate all unit rules of the form A -> B.

4. Convert the remaining rules into the form A -> BC or A -> a

Converting a Grammar into CNF

Page 21: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Example:  Lets work through an example of this conversion process. In the following, rules being eliminated will be shown in blue, rules being added will be shown in red.  Given the grammar,

we apply Step 1, creating a new start variable S0,

Now we need to eliminate rules of the form A->.  In order to do this without changing the grammar, we must also create replacement rules to permit the same derivations that were possible when the epsilon rules were present. 

For example, removing B -> requires that we add a rule S -> a to cover the derivation S-> aB -> a that would have been possible with the rule B -> .  Similarly we need to include the rule A -> to account for the A -> B -> derivation.

An Example

Page 22: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

We still need to remove the A -> rule.

Next we remove all unit rules.  Removing  S0 -> S we must add S0 -> SA | AS | ASA | aB | a. We can remove S -> S without adding any other rules. 

Page 23: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

When we remove A -> B we must add the rule A -> b.  Removing A -> S we must add A -> SA | AS | ASA | aB | a.

This gives us,

Finally we must make sure that each rule is in one of the two forms allowed in CNF.  Specifically we can have only rules that replace a single variable with a pair of variables or rules that replace a single variable with a single terminal.

Page 24: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Therefore we must modify the rules, S0 -> ASA, S -> ASA, S -> aB, A -> ASA, and A -> aB.  We create new variables and new rules to ensure that the grammar derivation is not changed.

The original grammar is shown on the left and the CNF of our grammar is on the right.

Page 25: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

The CNF grammar is larger and less readable than our original grammar, so what is the advantage of the CNF grammar? We may learn more if we attempt to derive candidate strings using each form.

Derivation using the Original Grammar

Page 26: Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

The value of the CNF grammar begins to be revealed in these sample derivations.  Although the CNF grammar has a larger rule set, its derivations are significantly shorter.

Derivation using the CNF Grammar