Prolog for Linguists Symbolic Systems 139P/239P

83
Prolog for Linguists Symbolic Systems 139P/239P John Dowding Week 5, Novembver 5, 2001 [email protected]

description

Prolog for Linguists Symbolic Systems 139P/239P. John Dowding Week 5, Novembver 5, 2001 [email protected]. Office Hours. We have reserved 4 workstations in the Unix Cluster in Meyer library, fables 1-4 4:30-5:30 on Thursday this week Or, contact me and we can make other arrangements. - PowerPoint PPT Presentation

Transcript of Prolog for Linguists Symbolic Systems 139P/239P

Page 1: Prolog for Linguists Symbolic Systems 139P/239P

Prolog for LinguistsSymbolic Systems 139P/239P

John Dowding

Week 5, Novembver 5, 2001

[email protected]

Page 2: Prolog for Linguists Symbolic Systems 139P/239P

Office Hours

We have reserved 4 workstations in the Unix Cluster in Meyer library, fables 1-4

4:30-5:30 on Thursday this week

Or, contact me and we can make other arrangements

Page 3: Prolog for Linguists Symbolic Systems 139P/239P

Course Schedule

1. Oct. 82. Oct. 153. Oct. 224. Oct. 295. Nov. 5 (double up)6. Nov. 127. Nov. 26 (double up)8. Dec. 3

No class on Nov. 19

Page 4: Prolog for Linguists Symbolic Systems 139P/239P

More about cut!

Common to distinguish between red cuts and green cuts Red cuts change the solutions of a predicate Green cuts do not change the solutions, but effect the efficiency

Most of the cuts we have used so far are all red cuts

%delete_all(+Element, +List, -NewList)delete_all(_Element, [], []).delete_all(Element, [Element|List], NewList) :- !, delete_all(Element, List, NewList).delete_all(Element, [Head|List], [Head|NewList]) :- delete_all(Element, List, NewList).

Page 5: Prolog for Linguists Symbolic Systems 139P/239P

Green cuts

Green cuts can be used to avoid unproductive backtracking% identical(?Term1, ?Term2)identical(Var1, Var2):-

var(Var1), var(Var2),!, Var1 == Var2.

identical(Atomic1,Atomic2):- atomic(Atomic1), atomic(Atomic2),!, Atomic1 == Atomic2.

identical(Term1, Term2):-compound(Term1),compound(Term2),functor(Term1, Functor, Arity),functor(Term2, Functor, Arity),identical_helper(Arity, Term1, Term2).

Page 6: Prolog for Linguists Symbolic Systems 139P/239P

Input/Output of Terms

Input and Output in Prolog takes place on StreamsBy default, input comes from the keyboard, and output goes to the screen.Three special streams: user_input user_output user_error

read(-Term)write(+Term)nl

Page 7: Prolog for Linguists Symbolic Systems 139P/239P

Example: Input/Output

repeat/0 is a built-in predicate that will always resucceed

% classifing terms

classify_term :-

repeat,

write('What term should I classify? '),

nl,

read(Term),

process_term(Term),

Term == end_of_file.

Page 8: Prolog for Linguists Symbolic Systems 139P/239P

Streams

You can create streams with open/3

open(+FileName, +Mode, -Stream)

Mode is one of read, write, or append.

When finished reading or writing from a Stream, it should be closed with close(+Stream)

There are Stream-versions of other Input/Output predicates read(+Stream, -Term) write(+Stream, +Term) nl(+Stream)

Page 9: Prolog for Linguists Symbolic Systems 139P/239P

Characters and character I/O

Prolog represents characters in two ways: Single character atoms ‘a’, ‘b’, ‘c’ Character codes

Numbers that represent the character in some character encoding scheme (like ASCII)

By default, the character encoding scheme is ASCII, but others are possible for handling international character sets.Input and Output predicates for characters follow a naming convention: If the predicate deals with single character atoms, it’s name ends in _char. If the predicate deals with character codes, it’s name ends in _code.

Characters are character codes is traditional “Edinburgh” Prolog, but single character atoms were introduced in the ISO Prolog Standard.

Page 10: Prolog for Linguists Symbolic Systems 139P/239P

Special Syntax I

Prolog has a special syntax for typing character codes: 0’a is a expression that means the character codc that represents

the character a in the current character encoding scheme.

Page 11: Prolog for Linguists Symbolic Systems 139P/239P

Special Syntax II

A sequence of characters enclosed in double quote marks is a shorthand for a list containing those character codes.

“abc” = [97, 98, 99]

It is possible to change this default behavior to one in which uses single character atoms instead of character codes, but we won’t do that here.

Page 12: Prolog for Linguists Symbolic Systems 139P/239P

Built-in Predicates:

atom_chars(Atom, CharacterCodes) Converts an Atom to it’s corresponding list of character codes, Or, converts a list of CharacterCodes to an Atom.

put_code(Code) and put_code(Stream, Code) Write the character represented by Code

get_code(Code) and get_code(Stream, Code) Read a character, and return it’s corresponding Code

Checking the status of a Stream: at_end_of_file(Stream) at_end_of_line(Stream)

Page 13: Prolog for Linguists Symbolic Systems 139P/239P

Review homework problems: last/2

% last(?Element, ?List)

last(Element, [Element]).

last(Element, [_Head|Tail]):-

last(Element, Tail).

Or

last(Element, List):-

append(_EverthingElse, [Element], List).

Page 14: Prolog for Linguists Symbolic Systems 139P/239P

evenlist/1 and oddlist/1

%evenlist(?List).

evenlist([]).

evenlist([_Head|Tail]):-

oddlist(Tail).

%oddlist(+List)

oddlist([_Head|Tail]):-

evenlist(Tail).

Page 15: Prolog for Linguists Symbolic Systems 139P/239P

palindrome/1

%palindrome1(+List).

palindrome1([]).

palindrome1([_OneElement]).

palindrome1([Head|Tail]):-

append(Rest, [Head], Tail),

palindrome1(Rest).

Page 16: Prolog for Linguists Symbolic Systems 139P/239P

Or, palindrome/1

%palindrome(+List)palindrome(List):- reverse(List, List).

%reverse(+List, -ReversedList)reverse(List, ReversedList):- reverse(List, [], ReversedList).

%reverse(List, Partial, ReversedList)reverse([], Result, Result).reverse([Head|Tail], Partial, Result):- reverse(Tail, [Head|Partial], Result).

Page 17: Prolog for Linguists Symbolic Systems 139P/239P

subset/2

%subset(?Set, ?SubSet)

subset([], []).

subset([Element|RestSet], [Element|RestSubSet]):-

subset(RestSet, RestSubSet).

subset([_Element|RestSet], SubSet):-

subset(RestSet, SubSet).

Page 18: Prolog for Linguists Symbolic Systems 139P/239P

union/3

%union(+Set1, +Set2, -SetUnion)

union([], Set2, Set2).

union([Element|RestSet1], Set2, [Element|SetUnion]):-

union(RestSet1, Set2, SetUnion),

\+ member(Element, SetUnion),

!.

union([_Element|RestSet1], Set2, SetUnion):-

union(RestSet1, Set2, SetUnion).

Page 19: Prolog for Linguists Symbolic Systems 139P/239P

intersect/3

%intersect(+Set1, +Set2, ?Intersection)

intersect([], _Set2, []).

intersect([Element|RestSet1], Set2, [Element|Intersection]):-

member(Element, Set2),

!,

intersect(RestSet1, Set2, Intersection).

intersect([_Element|RestSet1], Set2, Intersection):-

intersect(RestSet1, Set2, Intersection).

Page 20: Prolog for Linguists Symbolic Systems 139P/239P

split/4

%split(+List, +SplitPoint, -Smaller, -Bigger).

split([], _SplitPoint, [], []).

split([Head|Tail], SplitPoint, [Head|Smaller], Bigger):-

Head =< SplitPoint,

!, % green cut

split(Tail, SplitPoint, Smaller, Bigger).

split([Head|Tail], SplitPoint, Smaller, [Head|Bigger]):-

Head > SplitPoint,

split(Tail, SplitPoint, Smaller, Bigger).

Page 21: Prolog for Linguists Symbolic Systems 139P/239P

merge/3

%merge(+List1, +List2, -MergedList)

merge([], List2, List2).

merge(List1, [], List1).

merge([Element1|List1], [Element2|List2], [Element1|MergedList]):-

Element1 =< Element2,

!,

merge(List1, [Element2|List2], MergedList).

merge(List1, [Element2|List2], [Element2|MergedList]):-

merge(List1, List2, MergedList).

Page 22: Prolog for Linguists Symbolic Systems 139P/239P

Sorting: quicksort/2

% quicksort(+List, -SortedList)

quicksort([], []).

quicksort([Head|UnsortedList], SortedList):-

split(UnsortedList, Head, Smaller, Bigger),

quicksort(Smaller, SortedSmaller),

quicksort(Bigger, SortedBigger),

append(SortedSmaller, [Head|SortedBigger], SortedList).

Page 23: Prolog for Linguists Symbolic Systems 139P/239P

Sorting: mergesort/2

% mergesort(+List, -SortedList).

mergesort([], []).

mergesort([_One], [_One]):-

!.

mergesort(List, SortedList):-

break_list_in_half(List, FirstHalf, SecondHalf),

mergesort(FirstHalf, SortedFirstHalf),

mergesort(SecondHalf, SortedSecondHalf),

merge(SortedFirstHalf, SortedSecondHalf, SortedList).

Page 24: Prolog for Linguists Symbolic Systems 139P/239P

Merge sort helper predicates

% break_list_in_half(+List, -FirstHalf, -SecondHalf)break_list_in_half(List, FirstHalf, SecondHalf):-

length(List, L),HalfL is L /2,first_n(List, HalfL, FirstHalf, SecondHalf).

% first_n(+List, +N, -FirstN, -Remainder)first_n([Head|Rest], L, [Head|Front], Back):-

L > 0,!,NextL is L - 1,first_n(Rest, NextL, Front, Back).

first_n(Rest, _L, [], Rest).

Page 25: Prolog for Linguists Symbolic Systems 139P/239P

Lexigraphic Ordering

We can extending sorting predicates to sort all Prolog terms using a lexigraphic ordering on terms.

Defined recursively: Variables @< Numbers @< Atoms @< CompoundTerms Var1 @< Var2 if Var1 is older than Var2 Atom1 @< Atom2 if Atom1 is alphabetically earlier than Atom2. Functor1(Arg11, … Arg1N) @< Functor2(Arg21,…, Arg2M) if

Functor1 @< Functor2, or Functor1 = Functor2 and N @< M, or Functor1=Functor2, N=M, and Arg11 @< Arg21, or Arg11 @= Arg21 and Arg12 @< Arg22, or …

Page 26: Prolog for Linguists Symbolic Systems 139P/239P

Built-in Relations:

Less-than @<

Greater than @>

Less than or equal @=<

Greater than or equal @>=

Built-in predicate sort/2 sorts Prolog terms on a lexigraphic ordering.

Page 27: Prolog for Linguists Symbolic Systems 139P/239P

Tokenizer

A token is a sequence of characters that constitute a single unit

What counts as a token will vary A token for a programming language may be different from a

token for, say, English.

We will start to write a tokenizer for English, and build on it in further classes

Page 28: Prolog for Linguists Symbolic Systems 139P/239P

Homework

Read section in SICTus Prolog manual on Input/OutputThis material corresponds to Ch. 5 in Clocksin and Mellish, but the Prolog manual is more up to date and consistent with the ISO Prolog Standard

Improve the tokenizer by adding support for contractions can’t., won’t haven’t, etc. would’ve, should’ve I’ll, she’ll, he’ll He’s, She’s, (contracted is and contracted has, and possessive)

Don’t hand this in, but hold on to it, you’ll need it later.

Page 29: Prolog for Linguists Symbolic Systems 139P/239P

My tokenizer

First, I modified to turn all tokens into lower case

Then, added support for integer tokens

Then, added support for contraction tokens

Page 30: Prolog for Linguists Symbolic Systems 139P/239P

Converting character codes to lower case

% occurs_in_word(+Code, -LowerCaseCode)

occurs_in_word(Code, Code):-

Code >= 0'a,

Code =< 0'z.

occurs_in_word(Code, LowerCaseWordCode):-

Code >= 0'A,

Code =< 0'Z,

LowerCaseWordCode is Code + (0'a - 0'A).

Page 31: Prolog for Linguists Symbolic Systems 139P/239P

Converting to lower case

% case for regular word tokensfind_one_token([WordCode|CharacterCodes], Token, RestCharacterCodes):-

occurs_in_word(WordCode, LowerCaseWordCode),find_rest_word_codes(CharacterCodes, RestWordCodes, RestCharacterCodes),atom_chars(Token, [LowerCaseWordCode|RestWordCodes]).

find_rest_word_codes(+CharacterCodes, -RestWordCodes, -RestCharacterCodes)find_rest_word_codes([WordCode|CharacterCodes], [LowerCaseWordCode|RestWordCodes],

RestCharacterCodes):-occurs_in_word(WordCode, LowerCaseWordCode),!, % red cutfind_rest_word_codes(CharacterCodes, RestWordCodes, RestCharacterCodes).

find_rest_word_codes(CharacterCodes, [], CharacterCodes).

Page 32: Prolog for Linguists Symbolic Systems 139P/239P

Adding integer tokens

% case for integer tokensfind_one_token([DigitCode|CharacterCodes], Token, RestCharacterCodes):- digit(DigitCode), find_rest_digit_codes(CharacterCodes, RestDigitCodes, RestCharacterCodes), atom_chars(Token, [DigitCode|RestDigitCodes]).

% find_rest_digit_codes(+CharacterCodes, -RestDigitCodes, -RestCharacterCodes)find_rest_digit_codes([DigitCode|CharacterCodes], [DigitCode|RestDigitCodes],

RestCharacterCodes):-digit(DigitCode),!, % red cutfind_rest_digit_codes(CharacterCodes, RestDigitCodes, RestCharacterCodes).

find_rest_digit_codes(CharacterCodes, [], CharacterCodes).

Page 33: Prolog for Linguists Symbolic Systems 139P/239P

Digits

%digit(+Code)

digit(Code):-

Code >= 0'0,

Code =< 0'9.

Page 34: Prolog for Linguists Symbolic Systems 139P/239P

Contactions

Turned unambiguous contractions into the corresponding English wordLeft ambiguous contractions contracted.Handled 2 cases Simple contractions:

He’s => He + ‘sHe’ll => He + willThey’ve => They + have

Exceptionscan’t => can + notwon’t => will + not

Page 35: Prolog for Linguists Symbolic Systems 139P/239P

Simple Contractions

simple_contraction("'re", "are").

simple_contraction("'m", "am").

simple_contraction("'ll", "will").

simple_contraction("'ve", "have").

simple_contraction("'d", "'d"). % had, would

simple_contraction("'s", "'s"). % is, has, possessive

simple_contraction("n't", "not").

Page 36: Prolog for Linguists Symbolic Systems 139P/239P

handle_contractions/2

% handle_contractions(+TokenChars, -FrontTokenChars, RestTokenChars)

handle_contractions("can't", "can", "not"):-

!.

handle_contractions("won't", "will", "not"):-

!.

handle_contractions(FoundCodes, Front, NewCodes):-

simple_contraction(Contraction, NewCodes),

append(Front, Contraction, FoundCodes),

Front \== [],

!.

Page 37: Prolog for Linguists Symbolic Systems 139P/239P

Modify find_one_token/3

% case for regular word tokensfind_one_token([WordCode|CharacterCodes], Token, RestCharacterCodes):-

occurs_in_word(WordCode, LowerCaseWordCode),

find_rest_word_codes(CharacterCodes, RestWordCodes, TempCharacterCodes),

handle_contractions([LowerCaseWordCode|RestWordCodes], FirstTokenCodes, CodesToAppend),

append(CodesToAppend, TempCharacterCodes, RestCharacterCodes),

atom_chars(Token, FirstTokenCodes).

Page 38: Prolog for Linguists Symbolic Systems 139P/239P

Dynamic predicates and assert

Add or remove clauses from a dynamic predicate at run time.To specify that a predicate is dynamic, add

:- dynamic predicate/Arity.to your program.assert/1 adds a new clauseretract/1 removes one or more clausesretractall/1 removes all clauses for the predicateCan’t modify compiled predicates at run timeModifying a program while it is running is dangerous

Page 39: Prolog for Linguists Symbolic Systems 139P/239P

assert/1, asserta/1, and assertz/1

Asserting facts (most common)assert(Fact)

Asserting rulesassert( (Head :- Body) ).

asserta/1 adds the new clause at the front of the predicate

assertz/1 adds the new clause at the end of the predicate

assert/1 leaves the order unspecified

Page 40: Prolog for Linguists Symbolic Systems 139P/239P

Built-In: retract/1

retract(Goal) removes the first clause that matches Goal.

On REDO, it will remove the next matching clause, if any.

Retract facts:retract(Fact)

Retract rules:retract( (Head :- Body) ).

Page 41: Prolog for Linguists Symbolic Systems 139P/239P

Built-in: retractall/1

retractall(Head) removes all facts and rules whose head matches.

Could be implemented with retract/1 as:

retractall(Head) :-

retract(Head),

fail.

retract(Head):-

retract( (Head :- _Body) ),

fail.

retractall(_Head).

Page 42: Prolog for Linguists Symbolic Systems 139P/239P

Built-In: abolish(Predicate/Arity)

abolish(Predicate/Arity) is almost the same as

retract(Predicate(Arg1, …, ArgN))

except that abolish/1 removes all knowledge about the predicate, where retractall/1 only removes the clauses of the predicate.

That is, if a predicate is declared dynamic, that is remembered after retractall/1, but not after abolish/1.

Page 43: Prolog for Linguists Symbolic Systems 139P/239P

Example: Stacks & Queues

:- dynamic stack_element/1.empty_stack :- retractall(stack_selement(_Element)).

% push_on_stack(+Element)push_on_stack(Element):- asserta(stack_element(Element)).

% pop_from_stack(-Element)pop_from_stack(Element):- var(Element), retract(stack_element(Element)), !.

Page 44: Prolog for Linguists Symbolic Systems 139P/239P

Queues

% dynamic queue_element/1.empty_queue :- retractall(queue_element(_Element)).

%put_on_queue(+Element)put_on_queue(Element):- assertz(queue_element(Element)).

%remove_from_queue(-Element)remove_from_queue(Element):- var(Element), retract(queue_element(Element)), !.

Page 45: Prolog for Linguists Symbolic Systems 139P/239P

Example: prime_number.

:- dynamic known_prime/1.

find_primes(Prime):-

retractall(known_prime(_Prime)),

find_primes(2, Prime).

find_primes(Integer, Integer):-

\+ composite(Integer),

assertz(known_prime(Integer)).

find_primes(Integer, Prime):-

NextInteger is Integer + 1,

find_primes(NextInteger, Prime).

Page 46: Prolog for Linguists Symbolic Systems 139P/239P

Example: prime_number (cont)

%composite(+Integer)

composite(Integer):-

known_prime(Prime),

0 is Integer mod Prime,

!.

Page 47: Prolog for Linguists Symbolic Systems 139P/239P

Aggregation: findall/3.

findall/3 is a meta-predicate that collects values from multiple solutions to a Goal:

findall(Value, Goal, Values)

findall(Child, parent(james, Child), Children)

Prolog has other aggregation predicates setof/3 and bagof/3, but we’ll ignore them for now.

Page 48: Prolog for Linguists Symbolic Systems 139P/239P

findall/3 and assert/1

findall/3 and assert/1 both let you preserve information across failure.:- dynamic solutions/1.findall(Value, Goal, Solutions):-

retractall(solutions/1),assert(solutions([])),call(Goal),retract(solutions(S)),append(S, [Value], NextSolutions),assert(solutions(NextSolutions)),fail.

findall(_Value, Goal, Solutions):-solutions(Solutions).

Page 49: Prolog for Linguists Symbolic Systems 139P/239P

Special Syntax III: Operators

Convenience in writing terms

We’ve seem them all over already:union([Element|RestSet1], Set2, [Element|SetUnion]):-

union(RestSet1, Set2, SetUnion),

\+ member(Element, SetUnion),

!.

This is just an easier way to write the term:‘:-’(union([Element|RestSet],Set2,[Element|SetUnion]),

‘,’(union(RestSet1,Set2,SetUnion),

‘,’(‘\+’(member(Element, SetUnion),

!)))

Page 50: Prolog for Linguists Symbolic Systems 139P/239P

Operators (cont)

Operators can come before their arguments (prefix) \+, dynamic

Or between their arguments (infix) , + is <

Of after their arguments (postfix) Prolog doesn’t use any of these (yet)

The same Operator can be more than one type :-

Page 51: Prolog for Linguists Symbolic Systems 139P/239P

Precedence and Associativity

Operators also have precedence 5 * 2 + 3 = (5 * 2) + 3

Operators can be associative, or not,

Left associative or right associative

Explicit parenthesization can override defaults for associatiativity and precendence

Page 52: Prolog for Linguists Symbolic Systems 139P/239P

Built-in: current_op/3

current_op/3 gives the precedence and associativity of all current operators.

current_op(Precedence, Associativity, Operator)

where Precedence in an integer 1-1200

and Associativity is of fx or fy for prefix operators xf or yf for postfix operators xfx, xfy, yfx, yfy for infix operators

Page 53: Prolog for Linguists Symbolic Systems 139P/239P

Associativity

These atoms: fx, fy, xf, yf, xfx, xfy, yfx, yfy draw a “picture” of the associativity of the operator: The location of the f tells if the operator is prefix, infix,

or postfix. x means that the argument must be of lower precedence y means that the argument must be of equal or lower

precedence. A y on the left means the operator is left associative A y on the right means the operator is right associative

Page 54: Prolog for Linguists Symbolic Systems 139P/239P

Operator Examples

Precedence Associativity Operator

1200 xfx :-

1150 fx dynamic

1000 xfy ,

900 fy \+

700 xfx =

700 xfx is

700 xfx <

500 yfx +

500 fx +

400 yfx *

300 xfx mod

Page 55: Prolog for Linguists Symbolic Systems 139P/239P

Creating new operators

Built-in op/3 creates new operators

op(+Precedence, +Associativity, +Operator)

:- op(700, xfx, equals).

:- op(650, fx, $).

:- op(650, xf, cents).

$Dollars equals Cents cents :-

Cents is 100 * Dollars.

Page 56: Prolog for Linguists Symbolic Systems 139P/239P

Consult

The operation for reading in a file of Prolog clauses and treating them as a program is traditional known as “consulting” the file.

We will write a simple consult/1 predicate, and build on it over time.

We will write similar

Page 57: Prolog for Linguists Symbolic Systems 139P/239P

Consult (cont)

consult_file(File):-

open(File, read, Stream),

consult_stream(Stream),

close(Stream).

consult_stream(Strea):-

repeat,

read(Stream, Term),

consult_term(Term),

at_end_of_stream(Stream),

!.

Page 58: Prolog for Linguists Symbolic Systems 139P/239P

Consult (cont)

consult_term((:- Goal)):-

!,

call(Goal).

consult_term((Goal :- Body)):-

!,

assertz((Goal :- Body)).

consult_term(Fact):-

assertz(Fact).

Page 59: Prolog for Linguists Symbolic Systems 139P/239P

Parsing, grammars, and language theory

The development of Prolog (by Colmeraur at Marseilles) was motivated in part by a desire to study logic and language.

Grammars are formal specifications of languages

Prolog takes these specifications and treats them as logical theories about language, and as computations

Grammar Proof Computation

Pereira and Warren, Parsing as Deduction, 1984.

Ideas from Prolog/Logic Programming, particularly unification, are found in modern Linguistics.

Page 60: Prolog for Linguists Symbolic Systems 139P/239P

Overview of formal language theory

An Alphabet is a set of symbols

A Sentence is a finite sequence of symbols from some alphabet

A Language L is a (potentially infinite) set of sentences from some alphabet

A Grammar is a finite description of a language

L(G) is the language described by the grammar G

We will be interested in several problems: Is a given sentence a member of L(G)? What structure does G assign to the sentence?

Page 61: Prolog for Linguists Symbolic Systems 139P/239P

Context-Free Grammars

A Context-Free Grammar consists of: An alphabet A set of nonterminal symbols N (N=) A distinguished start symbol SN A set of production rules of the form:

A B1 … BN, where A N and B1 … BN (N )

Page 62: Prolog for Linguists Symbolic Systems 139P/239P

CFG: example

S NP VPNP DET NVP V VP V NPDET theDET aN manN menN womanN women

N catN catsN dogN dogsV likeV likesV sleepV sleeps

Page 63: Prolog for Linguists Symbolic Systems 139P/239P

Derivations

S => NP VP=> DET N VP=> the N VP=> the man VP=> the man V NP=> the man likes NP=> the man likes DET N=> the man likes the N=> the man likes the woman

Page 64: Prolog for Linguists Symbolic Systems 139P/239P

A Prolog Program for that CFG

s(S) :- np(NP), vp(VP), append(NP, VP, S).

np(NP) :- det(DET), n(N), append(DET, N, NP).

vp(VP) :- v(V), V=VP.

vp(VP) :- v(V), np(NP),

append(V, NP, VP).

det([the]).

det([a]).

n([man]).

n([men]).

n([woman]).

n([women]).

n([cat]).

n([cats]).

n([dog]).

n([dogs]).

v([like]).

v([likes]).

v([sleep]).

v([sleeps]).

Page 65: Prolog for Linguists Symbolic Systems 139P/239P

Automatically generating that grammar

We can define an operator to define grammar rules,

And update consult_file/1 to translate them into Prolog clauses automatically

These facilities are already built into the built-in consult/1, but we will build them ourselves

Page 66: Prolog for Linguists Symbolic Systems 139P/239P

Updates to consult_file

:- op(1200, xfx, '-->').

% Add a new clause to consult_term/1

consult_term((NT --> Rule)):-

!,

grammar_rule_body(Rule, Body, Phrase),

functor(Goal, NT, 1),

arg(1, Goal, Phrase),

assertz((Goal :- Body))

Page 67: Prolog for Linguists Symbolic Systems 139P/239P

grammar_rule_body/3

grammar_rule_body((Rule1, Rule2),(Body1, Body2, append(Phrase1, Phrase2, Phrase)), Phrase):-!,grammar_rule_body(Rule1, Body1, Phrase1),grammar_rule_body(Rule2, Body2, Phrase2).

grammar_rule_body(List, true, List):-is_list(List),!.

grammar_rule_body(NT, Goal, Phrase):-atom(NT),functor(Goal, NT, 1),arg(1, Goal, Phrase).

Page 68: Prolog for Linguists Symbolic Systems 139P/239P

The grammar can now look like this:

s --> np, vp.

np --> det, n.

vp --> v.

vp --> v, np.

det --> [the].

det --> [a].

n --> [man].

n --> [men].

n --> [woman].

n --> [women].

n --> [dog].

n --> [dogs].

v --> [like].

v --> [likes].

v --> [sleep].

v --> [sleeps].

Page 69: Prolog for Linguists Symbolic Systems 139P/239P

A better way to do the translation

So, we can transform the grammar into a program automatically,

But, it’s not a very good program

We could try to move the assert/3 around, but that would not be very reversible.

Instead, use difference lists Use two variables, one to keep track of the start of each

phrase, and one to keep track of it’s end.

Page 70: Prolog for Linguists Symbolic Systems 139P/239P

Difference lists as indicies

Traditional parsing uses indicies to keep track of phrase boundaries

the man likes the dog 0 1 2 3 4 5

“the man” is an NP spanning 0-2“likes the dog” is a VP spanning 2-5We’ll use difference lists to indicate spans,“the dog” is an NP spanning [the,dog]-[]“the man” is an NP spanning [the,man,likes,the,dog]-[likes,the,dog]

Page 71: Prolog for Linguists Symbolic Systems 139P/239P

Difference list grammar rule translation

s np, vp.

Translates to:

s(S0, SN) :- np(S0, S1), vp(S1, SN).

Instead of one variable, we have two, for the start and end points of the phrase,And the phrases are linked so that the end of one phrase is the same as the start of the adjacent phrase.

Page 72: Prolog for Linguists Symbolic Systems 139P/239P

Ruling out ungrammatical phrases

We’ve got a little grammar, but it accepts a lot of ungrammatical sentences

First, let’s deal with number agreement between subject NP and the verb:

Conventional to indicate ungrammatical sentences with a *

The man sleeps.

*The man sleep.

Page 73: Prolog for Linguists Symbolic Systems 139P/239P

We *could* just add more rules…

s np_sing, vp_sings np_plural, vp_plural.np_sing det, n_sing.np_plural det, n_plural.vp_sing v_sing.vp_plural v_plural.vp_sing v_sing np_sing.vp_sing v_sing np_plural.vp_plural v_plural, np_sing.vp_plural v_plural, np_plural.det [the].det [a].

n_sing [man].n_sing [woman].n_sing [cat].n_sing [dog].n_plural [men].n_plural [women].n_plural [cats].n_plural [dogs].v_sing [likes].v_sing [sleeps].v_plural [like].v_plural [likes].

Page 74: Prolog for Linguists Symbolic Systems 139P/239P

Features

But, this leads to duplicating a lot of rulesWhat if we want to eliminate other ungrammatical sentences: Number agreement between determiner and noun Transitive and Intransitive verbs

A man sleeps.*A men sleep.The men like the cat.*The men like.The men sleep.*The men sleep the cat.

Page 75: Prolog for Linguists Symbolic Systems 139P/239P

Features

We can add features on rules to express these constraints concisely.

s(Number) np(Number), vp(Number).np(Number) det(Number), n(Number).vp(Number) v(Number, intranitive).vp(Number) v(Number, transitive), np(_).det(singular) [a].det(_) [the].n(singular) [man].n(plural) [men].v(singular, transitive) [likes].v(singular, intransitive) [sleeps].

Page 76: Prolog for Linguists Symbolic Systems 139P/239P

Improved Consult

consult_term((NT --> Rule)):-!,grammar_rule_body(Rule, Body, Start, End),make_nonterminal(NT, Start, End, Goal),assertz((Goal :- Body)).

make_nonterminal(NT, Start, End, Goal):-NT =.. List,append(List, [Start,End], FullList),Goal =.. FullList.

Page 77: Prolog for Linguists Symbolic Systems 139P/239P

Improved Consult (cont)

grammar_rule_body((Rule1, Rule2),(Body1, Body2), Start, End):-

!,

grammar_rule_body(Rule1, Body1, Start, Next),

grammar_rule_body(Rule2, Body2, Next, End).

grammar_rule_body(List, true, Start, End):-

is_list(List),

!,

append(List, End, Start).

grammar_rule_body(NT, Goal, Start, End):-

make_nonterminal(NT, Start, End, Goal).

Page 78: Prolog for Linguists Symbolic Systems 139P/239P

Possible Class Projects

Should demonstrate competence in Prolog programming

Expect problems with solutions in 5-20 pages of code range.

Talk/email with me about your project

Page 79: Prolog for Linguists Symbolic Systems 139P/239P

Information extraction from a web page

Pick a web page with content that might be well represented in a Prolog database: Sports statistics TV listings

Write a program to parse the HTML, extract the relevant information, and turn it into a Prolog database.

Page 80: Prolog for Linguists Symbolic Systems 139P/239P

Question-Answering

Write a program to accept user’s questions typed at the keyboard, parse them, and generate answers from a known database.

Page 81: Prolog for Linguists Symbolic Systems 139P/239P

Breadth-first Prolog interpreter

Write a breadth-first Prolog interpreter

Test it with some simple programs, and compare it with depth-first Prolog, and iterative deepening.

Page 82: Prolog for Linguists Symbolic Systems 139P/239P

Compare/contrast with LP language

Select another logical programming language Mercury, Eclipse, etc.

Test a variety of the kinds of programs we have written in this class (generate-and-test, DCGs, etc.), and see how they would be written.

Only consider this if you are confident that you have already demonstrated Prolog competence.

Page 83: Prolog for Linguists Symbolic Systems 139P/239P

What to cover in remaining weeks

We’ve got 4 more “sessions”, I have these plans: Another session on DCGs A session on iterative deepening Some time on logical foundations/theorem proving

Any thoughts on other things you’ld like to cover?

More review?

Help with class projects?