Computational Intelligence 696i Language Lecture 6 Sandiway Fong.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.
-
date post
20-Dec-2015 -
Category
Documents
-
view
222 -
download
4
Transcript of LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.
![Page 1: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/1.jpg)
LING/C SC/PSYC 438/538Computational Linguistics
Sandiway Fong
Lecture 1: 8/21
![Page 2: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/2.jpg)
Part 1
• Administrivia
![Page 3: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/3.jpg)
Administrivia
• Where– S SCI 224
• When– TR 12:30–1:45PM
(Computer Lab)
• No Class Scheduled For– Thursday October 18th
– Thursday November 22nd (Thanksgiving)
• Office Hours– catch me after class, or
– by appointment
– Location: Douglass 311
![Page 4: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/4.jpg)
Administrivia
• Map
– Office (Douglass)
– Classroom (S SCI)
![Page 5: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/5.jpg)
Administrivia
• Email– [email protected]
• Homepage– http://dingo.sbs.arizona.edu/~sandiway
• Lecture slides– available on homepage after each class– in both PowerPoint (.ppt) and Adobe PDF formats
• animation: in powerpoint
![Page 6: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/6.jpg)
Administrivia
• Course Objectives– Theoretical
• Introduction to a broad selection of natural language processing techniques
• Survey course
– Practical• Acquire some
expertise– Use of tools
– Parsing algorithms
– Write grammars and machines
![Page 7: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/7.jpg)
Administrivia
Reference Textbook
• Speech and Language Processing, Jurafsky & Martin, Prentice-Hall 2000
– 21 chapters (900 pages)– Concepts, algorithms, heuristics– This course concentrates on the sentence level
stuff
• Sound/speech side• Prof. Y. Lin Speech Tech LING 578 (this
semester)
• Prof. Y. Lin Statistical NLP LING 539 (Spring 2008)
• More advanced course– LING 581: Advanced Computational Linguistics
– required for HLT Master’s Program students
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 8: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/8.jpg)
Administrivia
• Laboratory Exercises– To run tools and write grammars– you need access to computational facilities
• use your PC or Mac• run Windows, Linux or MacOSX
– Homework exercises
![Page 9: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/9.jpg)
Administrivia
• Grading– 3 homeworks – Exams
• a mid-term• a final• mix of theoretical
and practical exercises
Grading Summary
Homeworks30%
Midterm30%
Final40%
![Page 10: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/10.jpg)
Administrivia
• Homeworks – Homeworks will be
presented/explained in class
• (good chance to ask questions)
– Please attempt homeworks early
• (then you can ask questions before the deadline)
– you have one week to do the homework
• (midnight deadline)
• (email submission to me)
• e.g. homework comes out on Thursday,
• it is due in my mailbox by next Thursday midnight
![Page 11: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/11.jpg)
Administrivia
• Homework Policy– You may discuss your homework with others– You must write up your homework by yourself– You must cite sources and references
• Code of Academic Integrity• http://dos.web.arizona.edu/uapolicies/cai1.html
– Late homeworks are subject to points deduction – Really late homeworks, e.g. a week late, will not be
accepted– Emergencies and scheduled absences: inform instructor to
make alternative arrangements
![Page 12: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/12.jpg)
Administrivia
• Requirements: 438 vs. 538538 =
438 +
1 classroom presentation of a selected chapter from the textbook
+438 extra credit homework and exam questions are obligatory
![Page 13: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/13.jpg)
Administrivia
• Requirements: 538
Percentage
Homeworks25%
Midterm25%
Final35%
Class Presentation15%
![Page 14: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/14.jpg)
Class Questionnaire
• I’ll pass my laptop around ...– Use PhotoBooth
• Fill in Excel spreadsheet– Name
– PhotoBooth #
– Major
– Any programming expertise?
– Have a laptop?
– Knowledge of Linguistics?
click on redbutton to takea picture of yourself
![Page 15: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/15.jpg)
Part 2
• Introduction
![Page 16: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/16.jpg)
Human Language Technology (HLT)
• ... is everywhere
• information is organized and accessed using language
![Page 17: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/17.jpg)
Human Language Technology (HLT)
Beginnings• c. 1950 (just after WWII)
– Electronic computers invented for• numerical analysis• code breaking
Grand Challenges for Computers...Grand Challenges for Computers...Killer AppsKiller Apps: :
– Language comprehension tasks and Machine Translation (MT)Language comprehension tasks and Machine Translation (MT)
References– Readings in Machine Translation– Eds. Nirenburg, S. et al. MIT Press 2003. – (Part 1: Historical Perspective)
• Read Chapter 1 of the textbook• www.cs.colorado.edu/~martin/SLP/slp-ch1.pdf
![Page 18: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/18.jpg)
Human Language Technology (HLT)
• Cryptoanalysis Basis– early optimism
[Translation. Weaver, W.]• Citing Shannon’s work, he asks: • “If we have useful methods for solving almost any cryptographic
problem, may it not be that with proper interpretation we already have useful methods for translation?”
![Page 19: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/19.jpg)
Human Language Technology (HLT)
• Popular in the early days and has undergone a modern revival
The Present Status of Automatic Translation of Languages (Bar-Hillel, 1951)
– “I believe this overestimation is a remnant of the time, seven or eight years ago, when many people thought that the statistical theory of communication would solve many, if not all, of the problems of communication”
– Much valuable time spent on gathering statistics
![Page 20: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/20.jpg)
Human Language Technology (HLT)
• uneasy relationship between linguistics and statistical analysis
Statistical Methods and Linguistics (Abney, 1996)– Chomsky vs. Shannon
• Statistics and low (zero) frequency items– Smoothing
• No relation between order of approximation and grammaticality
• Parameter estimation problem is intractable (for humans)– IBM (17 million parameters)
![Page 21: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/21.jpg)
Human Language Technology (HLT)
• recent exciting developments in HLT– precipitated by progress in
• computers: stochastic machine learning methods• storage: large amounts of training data
– general available of corpora (Linguistic Data Consortium)• University of Arizona Library System is a subscriber• you can borrow many CDROMs of data
![Page 22: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/22.jpg)
Human Language Technology (HLT)
• Killer Application?
![Page 23: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/23.jpg)
Natural Language Processing (NLP)Computational Linguistics
• Question:– How to process natural languages on a computer
• Intersects with:– Computer science (CS)– Mathematics/Statistics – Artificial intelligence (AI)– Linguistic Theory– Psychology: Psycholinguistics
• e.g. the human sentence processor
![Page 24: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/24.jpg)
Natural Language Properties
which properties are going to be difficult for computers to deal with?
• Grammar (Rules for putting words together into sentences)– How many rules are there?
• 100, 1000, 10000, more …
– Portions learnt or innate– Do we have all the rules written down somewhere?
• Lexicon (Dictionary)– How many words do we need to know?
• 1000, 10000, 100000 …
![Page 25: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/25.jpg)
Computers vs. Humans
• Knowledge of language– Computers are way
faster than humans• They kill us at arithmetic
and chess
– But human beings are so good at language, we often take our ability for granted
• Processed without conscious thought
• Exhibit complex behavior
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
IBM’s Deep Blue
![Page 26: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/26.jpg)
Examples
• Innate Knowledge?– Which report did you file without reading?– (Parasitic gap sentence)– file(x,y)– read(u,v)
x = youy = reportu = x = youv = y = reportand there are no other possible interpretations
*the report was filed without reading
![Page 27: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/27.jpg)
Examples
• Changes in interpretation• John is too stubborn to talk to• John is too stubborn to talk to Bill
talk_to(x,y)
(1) x = arbitrary person, y = John
(2) x = John, y = Bill
![Page 28: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/28.jpg)
Examples
• Ambiguity– Where can I see the bus stop?
– stop: verb or part of the noun-noun compound bus stop– Context (Discourse or situation)
– Where can I see [the [NN bus stop]]?– Where can I see [[the bus] [V stop]]?
![Page 29: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/29.jpg)
Examples
• Ungrammaticality– *Which book did you file the report without
reading?– ?*Which book did you file it without
reading?
– * = ungrammatical– ungrammatical vs. incomprehensible
![Page 30: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/30.jpg)
Example
• The human parser has quirks• Ian told the man that he hired a secretary • Ian told the man that he hired a story
• Garden-pathing: a temporary ambiguity• tell: multiple syntactic frames for the verb
• Ian told [the man that he hired] [a story]• Ian told [the man] [that he hired a secretary]
Ian told the agent that he unmasked a secret
![Page 31: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/31.jpg)
Frequently Asked Questions from the Linguistic Society of America (LSA)
• http://www.lsadc.org/info/ling-faqs.cfm
![Page 32: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/32.jpg)
LSA (Linguistic Society of America) pamphlet
• by Ray Jackendoff
• A Linguist’s Perspective on What’s Hard for Computers to Do …
– is he right?
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
![Page 33: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/33.jpg)
If computers are so smart, why can't they use simple English?
• Consider, for instance, the four letters read; they can be pronounced as either reed or red. How does the machine know in each case which is the correct pronunciation? Suppose it comes across the following sentences:
• (l) The girls will read the paper. (reed) • (2) The girls have read the paper. (red) • We might program the machine to pronounce read as reed if it
comes right after will, and red if it comes right after have. But then sentences (3) through (5) would cause trouble.
• (3) Will the girls read the paper? (reed) • (4) Have any men of good will read the paper? (red) • (5) Have the executors of the will read the paper? (red) • How can we program the machine to make this come out
right?
![Page 34: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/34.jpg)
If computers are so smart, why can't they use simple English?
• (6) Have the girls who will be on vacation next week read the paper yet? (red)
• (7) Please have the girls read the paper. (reed)• (8) Have the girls read the paper?(red)• Sentence (6) contains both have and will before read, and both
of them are auxiliary verbs. But will modifies be, and have modifies read. In order to match up the verbs with their auxiliaries, the machine needs to know that the girls who will be on vacation next week is a separate phrase inside the sentence.
• In sentence (7), have is not an auxiliary verb at all, but a main verb that means something like 'cause' or 'bring about'. To get the pronunciation right, the machine would have to be able to recognize the difference between a command like (7) and the very similar question in (8), which requires the pronunciation red.
![Page 35: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/35.jpg)
Berkeley Parser
• http://nlp.cs.berkeley.edu/Main.html#Parsing
The Berkeley Parser is the most accurate and one of the fastest parsers for a variety of languages.
![Page 36: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/36.jpg)
Berkeley Parser
• l) The girls will read the paper. (reed)
Verb Tags (Part of Speech Labels)VB - Verb, base formVBD - Verb, past tenseVBG - Verb, gerund or present participleVBN - Verb, past participleVBP - Verb, non-3rd person singular presentVBZ - Verb, 3rd person singular present
![Page 37: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/37.jpg)
Berkeley Parser
• (2) The girls have read the paper. (red)
Verb Tags (Part of Speech Labels)VB - Verb, base formVBD - Verb, past tenseVBG - Verb, gerund or present participleVBN - Verb, past participleVBP - Verb, non-3rd person singular presentVBZ - Verb, 3rd person singular present
![Page 38: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/38.jpg)
Berkeley Parser
• (3) Will the girls read the paper? (reed)
Verb Tags (Part of Speech Labels)VB - Verb, base formVBD - Verb, past tenseVBG - Verb, gerund or present participleVBN - Verb, past participleVBP - Verb, non-3rd person singular presentVBZ - Verb, 3rd person singular present
![Page 39: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/39.jpg)
Berkeley Parser
• (4) Have any men of good will read the paper? (red)
Verb Tags (Part of Speech Labels)VB - Verb, base formVBD - Verb, past tenseVBG - Verb, gerund or present participleVBN - Verb, past participleVBP - Verb, non-3rd person singular presentVBZ - Verb, 3rd person singular present
![Page 40: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/40.jpg)
Berkeley Parser
• (5) Have the executors of the will read the paper? (red)
Verb Tags (Part of Speech Labels)VB - Verb, base formVBD - Verb, past tenseVBG - Verb, gerund or present participleVBN - Verb, past participleVBP - Verb, non-3rd person singular presentVBZ - Verb, 3rd person singular present
![Page 41: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/41.jpg)
Part 3
• software already installed here
![Page 42: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/42.jpg)
Your Homework for Today
• Download and Install Perl– Active State Perl
• Install SWI-Prologhttp://www.SWI-Prolog.org/
![Page 43: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/43.jpg)
Perl Resources
• http://www.perl.com/– tutorials etc.
• http://perldoc.perl.org/perlintro.html
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 44: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/44.jpg)
Perl Resources
Google is yourfriend:
many resourcesout there!
![Page 45: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/21.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d425503460f94a1e34e/html5/thumbnails/45.jpg)
Prolog Resources
• Useful Online Tutorials– An introduction to Prolog
• (Michel Loiseleur & Nicolas Vigier)
• http://invaders.mars-attacks.org/~boklm/prolog/
– Learn Prolog Now! • (Patrick Blackburn, Johan Bos & Kristina
Striegnitz)
• http://www.coli.uni-saarland.de/~kris/learn-prolog-now/lpnpage.php?pageid=online