LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 16: 10/19.
-
date post
18-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of LING 438/538 Computational Linguistics Sandiway Fong Lecture 16: 10/19.
Administrivia
• review homework #3
• new homework #4– out today– usual rules apply - due next Thursday
Last Time
• Spelling errors and correction• Error Correction
– correct• Bayesian Probability
– Minimum Edit Distance Computation• Dynamic Programming
Minimum Edit Distance
• example– assuming
• insert =1• delete=1• substitution=2• (or 0 for
substituting the same character)
• recursive formula– incrementally computed from minimum edit
distances of shorter stringsintentexecut
intentexecu
intenexecut
intenexecu
one edit operation away
L
D B
min(L+1,D+0,B+1)
cost: 1+2+2+1+2=8
Minimum Edit Distance Computation
• one formula Microsoft Excel implementation
$ in a cell referencemeans don’t change when copiedfrom cell to celle.g. in C$1, 1 stays the samein $A3,A stays the same (not 3)
min(C2+1,B3+1,B2+if(C$1=$A3,0,2)) min(D2+1,C3+1,C2+if(D$1=$A3,0,2))
min(C3+1,B4+1,B3+if(C$1=$A4,0,2))
inc colinc rowrow
column protected
protected
Minimum Edit Distance Computation
• demo example pairs
– intention, intent:– intention, intentional:– intention, ten:– intention, ton:– intention, teen:
• min edit distance(assuming substitution cost 2)
3
2
6
6
7
Question 1
• 438/538 (4pts)• Give the minimum size regular
expression for the FSA below (2pt)
• Minimum size regular expression for the FSA:– a+b*
• not minimum size in terms of number of symbols:– aa*b*– (aa*)|(aa*b*)
s x ya
a
bε
Question 1
• 438/538 (4pts)• Give an equivalent FSA
without the ε-transition (2pts)– answer in the form of a
diagram or formal definition or Prolog definition are all ok
• Equivalent ε-free FSA
s x ya
a
bε
s a b
a b
a b
How to arrive at this answer?
by inspectionor by consideration of a+b*b* = ε | b+
s a
a
as b
b
b
Question 1
• 438/538 (4pts)• Give an equivalent FSA
without the ε-transition (2pts)– answer in the form of a
diagram or formal definition or Prolog definition are all ok
• Set-of-States Construction method:
s x ya
a
bε
{s} {x,y} {y}
a b
a ba
s a b
a b
a b
Question 2
• 438/538 (8pts)• convert the NDFSA into a
deterministic FSA (3pts)
figure 2.27
in the textbook
{1}a
{2}b
{3,4}a
{2,3}b
a
{1}a
{2}b
{3,4}a
{2,3}b
a
• set-of-states construction:
Question 2
• 438/538 (8pts)• implement both the NDFSA
and the equivalent FSA in Prolog using the “one predicate per state” encoding
• Prolog code:one([a|L]) :- two(L).two([b|L]) :- three(L).two([b|L]) :- four(L).three([]).three([a|L]) :- two(L).four([a|L]) :- three(L).
strings abab and abaaba, how many steps (transitions + final stop)?
Question 2
• 438/538 (8pts)• implement both the NDFSA
and the equivalent FSA in Prolog using the “one predicate per state” encoding
• Prolog code:s1([a|L]) :- s2(L).s2([b|L]) :- s34(L).s34([]).s34([a|L]) :- s23(L).s23([]).s23([b|L]) :- s34(L).s23([a|L]) :- s2(L).
{1}a
{2}b
{3,4}a
{2,3}b
a
strings abab and abaaba, how many steps (transitions + final stop)?
Question 3
• 438/538 (8pts)• (5pts) Give a FSA in Prolog
that accepts a binary string (made up of 0’s and 1’s) if and only if it begins with a 1 and contains exactly one 0– examples:
– 1111011
– 10
– *111011101
• FSA:
11
2
1
03
1
Question 3
• 438/538 (8pts)• (5pts) Give a FSA in Prolog
that accepts a binary string (made up of 0’s and 1’s) if and only if it begins with a 1 and contains exactly one 0
• (3pts) Given the regular expression equivalent of the FSA
• Regular Expression:– 11*01*
Question 1
• 438/538 (8pts)
• Implement the e-insertion rule • (Context-Sensitive) Spelling
Rule: (3.5) e / {x,s,z}^__ s#
– as a FST in Prolog
• Goals:– pass through non-matching
cases unchanged – implement rule exactly– no deletion of boundaries ^
and #
Question 2
438/538 (6pts) • What does the Porter Stemmer output for the
following words:– (2 pts) availability– (2 pts) shipping– (2pts) unbelievable
• Show the steps (stages) in your answer
Question 2
438/538 (6pts) – the Porter Stemmer handles -ement for cases like
• replacement replac(e)
– it doesn’t handle statement stat(e)• i.e. it outputs statement
– Why? Explain (2pts)– Modify the Porter rule responsible to allow for statement
stat(e)• Submit your rule (2pts)• Give 2 examples where the modified rule would be too liberal,
i.e. it overstems (2pts)