Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.
-
Upload
dina-campbell -
Category
Documents
-
view
215 -
download
0
Transcript of Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.
![Page 1: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/1.jpg)
Introduction & Information Theory
Ling570Advanced Statistical Methods in NLP
January 3, 2012
![Page 2: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/2.jpg)
RoadmapCourse Overview
Information theory
![Page 3: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/3.jpg)
Course Overview
![Page 4: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/4.jpg)
Course InformationCourse web page:
http://courses.washington.edu/ling572
![Page 5: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/5.jpg)
Course InformationCourse web page:
http://courses.washington.edu/ling572Syllabus:
Schedule and readingsLinks to other readings, slides, links to class recordingsSlides posted before class, but may be revised
![Page 6: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/6.jpg)
Course InformationCourse web page:
http://courses.washington.edu/ling572Syllabus:
Schedule and readingsLinks to other readings, slides, links to class recordingsSlides posted before class, but may be revised
Catalyst tools: GoPost discussion board for class issuesCollectIt Dropbox for homework submission and TA
commentsGradebook for viewing all grades
![Page 7: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/7.jpg)
GoPost Discussion BoardMain venue for course-related questions,
discussionWhat not to post:
Personal, confidential questions; Homework solutions
![Page 8: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/8.jpg)
GoPost Discussion BoardMain venue for course-related questions,
discussionWhat not to post:
Personal, confidential questions; Homework solutionsWhat to post:
Almost anything else course-related Can someone explain…? Is this really supposed to take this long to run?
![Page 9: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/9.jpg)
GoPost Discussion BoardMain venue for course-related questions, discussion
What not to post:Personal, confidential questions; Homework solutions
What to post: Almost anything else course-related
Can someone explain…? Is this really supposed to take this long to run?
Key location for class participationPost questions or answersYour discussion space: Michael & I will not jump in often
![Page 10: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/10.jpg)
GoPostEmily’s 5-minute rule:
If you’ve been stuck on a problem for more than 5 minutes, post to the GoPost!
![Page 11: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/11.jpg)
GoPostEmily’s 5-minute rule:
If you’ve been stuck on a problem for more than 5 minutes, post to the GoPost!
Mechanics:Please use your UW NetID as your user idPlease post early and often !
Don’t wait until the last minuteKeep up with the GoPost – hard to use
retrospectivelyNotifications:
Decide how you want to receive GoPost postings
![Page 12: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/12.jpg)
EmailShould be used only for personal or confidential
issuesGrading issues, extended absences, other problems
General questions/comments go on GoPost
![Page 13: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/13.jpg)
EmailShould be used only for personal or confidential
issuesGrading issues, extended absences, other problems
General questions/comments go on GoPost
Please send email from your UW account Include Ling572 in the subject If you don’t receive a reply in 24 hours (48 on
weekends), please follow-up
![Page 14: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/14.jpg)
Homework SubmissionAll homework should be submitted through CollectIt
Tar cvf hw1.tar hw1_dir
Homework due 11:45 Thursdays
Late homework receives 10%/day penalty (incremental)
Most major programming languages accepted C/C++/C#, Java, Python, Perl, Ruby
If you want to use something else, please check first
Please follow naming, organization guidelines in HW
All programming assignments should run on the CL cluster under Condor
![Page 15: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/15.jpg)
Homework Assignments (Mostly) Implementation tasks designed to get hands-
on understanding of ML approaches
Focus on core concepts, not minute optimizations If gold standard achieves 90.7%, 89.8% is okay
Not scored directly on efficiency, but.. If it’s too slow, hard to debug, test, etc
Not scored on optimal software design either Try to avoid hardcoding, but don’t need complex design
![Page 16: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/16.jpg)
GradingHomework assignments: 80%
Reading assignments: 10%
Class participation: 10%
No midterm or final exams
One homework assignment may be dropped
![Page 17: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/17.jpg)
GradesGrades in Catalyst Gradebook
TA feedback returned through CollectIt
![Page 18: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/18.jpg)
GradesGrades in Catalyst Gradebook
TA feedback returned through CollectIt
Extensions: only for extreme circumstances Illness, family emergencies
Incomplete: only if all work completed up last two weeksUW policy
![Page 19: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/19.jpg)
WorkloadCLMS courses carry a heavy workload
Ling572 is no exception
![Page 20: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/20.jpg)
WorkloadCLMS courses carry a heavy workload
Ling572 is no exception
Estimates (per week):~3 hours: Lecture10-12 hours: Homework assignments
Highly variable, depending on prior programming exp.
1-3 hours: Reading + reading assignments
![Page 21: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/21.jpg)
WorkloadCLMS courses carry a heavy workload
Ling572 is no exception
Estimates (per week): ~3 hours: Lecture 10-12 hours: Homework assignments
Highly variable, depending on prior programming exp. 1-3 hours: Reading + reading assignments
Tracking: GoPost thread for each assignment: please post
Consider automatic time tracker (e.g. ‘hamster’ for linux)
![Page 22: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/22.jpg)
RecordingsAll classes will be recorded
Links to recordings appear in syllabusAvailable to all students, DL and in class
![Page 23: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/23.jpg)
RecordingsAll classes will be recorded
Links to recordings appear in syllabusAvailable to all students, DL and in class
Please remind me to:Record the meeting (look for the red dot)Repeat in-class questions
![Page 24: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/24.jpg)
RecordingsAll classes will be recorded
Links to recordings appear in syllabusAvailable to all students, DL and in class
Please remind me to:Record the meeting (look for the red dot)Repeat in-class questions
Note: Instructor’s screen is projected in classAssume that chat window is always public
![Page 25: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/25.jpg)
Contact InfoGina: Email: [email protected]
Office hour:Fridays: 12:30-1:30 (after Treehouse meeting)Location: Padelford B-201Or by arrangement
Available by Skype or Adobe Connect
![Page 26: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/26.jpg)
Contact InfoGina: Email: [email protected]
Office hour:Fridays: 12:30-1:30 (after Treehouse meeting)Location: Padelford B-201Or by arrangement
Available by Skype or Adobe Connect
TA: Michael Wayne Goodman: Email: [email protected] hour: Time: TBD, see GoPostLocation: Treehouse
![Page 27: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/27.jpg)
Online OptionPlease check you are registered for correct
sectionCLMS in-class: Section AState-funded: Section BCLMS online: Section C
![Page 28: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/28.jpg)
Online OptionPlease check you are registered for correct
sectionCLMS in-class: Section AState-funded: Section BCLMS online: Section C
Online attendance for in-class studentsNot more than 2 times per term (e.g. missed bus,
ice)
![Page 29: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/29.jpg)
Online OptionPlease check you are registered for correct section
CLMS in-class: Section AState-funded: Section BCLMS online: Section C
Online attendance for in-class studentsNot more than 2 times per term (e.g. missed bus, ice)
Please enter meeting room 5-10 before start of classTry to stay online throughout class
![Page 30: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/30.jpg)
Online TipIf you see:
You are not logged into Connect. The problem is one of the following: the permissions on the resource you are trying to access are incorrectly set.Please contact your instructor/Meeting Host/etc. you do not have a Connect account but need to have
one. For UWEO students: If you have just created your UW NetID or just enrolled
in a course…..
Clear your cache, close and restart your browser
![Page 31: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/31.jpg)
Course Description
![Page 32: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/32.jpg)
Course PrerequisitesProgramming Languages:
Java/C++/Python/Perl/..
Operating Systems: Basic Unix/linux
CS 326 (Data structures) or equivalentLists, trees, queues, stacks, hash tables, …Sorting, searching, dynamic programming,..
Stat 391 (Probability and statistics): random variables, conditional probability, Bayes’ rule, ….
Ling 570 (or similar)
![Page 33: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/33.jpg)
Course Prerequisites Programming Languages:
Java/C++/Python/Perl/..
Operating Systems: Basic Unix/linux
CS 326 (Data structures) or equivalent Lists, trees, queues, stacks, hash tables, … Sorting, searching, dynamic programming,..
Stat 391 (Probability and statistics): random variables, conditional probability, Bayes’ rule, ….
Ling 570 (or similar)
If you haven’t taken Ling570 or Ling472, please email me.
![Page 34: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/34.jpg)
TextbookNo textbook
Online readings
![Page 35: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/35.jpg)
TextbookNo textbook
Online readings
Reference / Background: Jurafsky and Martin, Speech and Language
Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition, 2008Available from UW Bookstore, Amazon, etc
Manning and Schutze, Foundations of Statistical Natural Language ProcessingEarly edition available online through UW library
![Page 36: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/36.jpg)
Course GoalsUnderstand the basis of machine learning
algorithms that achieve state-of-the-art results
![Page 37: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/37.jpg)
Course GoalsUnderstand the basis of machine learning
algorithms that achieve state-of-the-art results
Focus on classification and sequence labeling
![Page 38: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/38.jpg)
Course GoalsUnderstand the basis of machine learning
algorithms that achieve state-of-the-art results
Focus on classification and sequence labeling
Concentrate on basic concepts of machine learning techniques and application to NLP tasks Not a computational learning theory class
Won’t focus on proofs
![Page 39: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/39.jpg)
Model QuestionsMachine learning algorithms
Decision trees and naïve bayesMaxEnt and Support Vector Machines….
![Page 40: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/40.jpg)
Model QuestionsMachine learning algorithms
Decision trees and naïve bayesMaxEnt and Support Vector Machines….
Key questionsWhat is the model?What assumptions does the model make? How many parameters does the model have?
![Page 41: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/41.jpg)
Model QuestionsTraining: How are the parameters learned?
Decoding: How does the model assign values?
![Page 42: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/42.jpg)
Model QuestionsTraining: How are the parameters learned?
Decoding: How does the model assign values?
Pros and Cons:How does the model handle…
outliers? missing data? noisy data? Is it scalable? How long does it take to train? decode?How much training data is needed? Labeled?
Unlabeled?
![Page 43: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/43.jpg)
Tentative Outline for Ling572
Unit #0 (0.5 weeks): Basics Introduction Information theoryClassification review
![Page 44: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/44.jpg)
Outline for Ling572
Unit #0 (0.5 weeks): Basics Introduction Information TheoryClassification review
Unit #1 (3 weeks): Classic Machine LearningK Nearest NeighborsDecision TreesNaïve BayesPerceptrons (?)
![Page 45: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/45.jpg)
Outline for Ling572Unit #3: (4 weeks): Discriminative Classifiers
Feature SelectionMaximum Entropy ModelsSupport Vectors Machines
![Page 46: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/46.jpg)
Outline for Ling572Unit #3: (4 weeks): Discriminative Classifiers
Feature SelectionMaximum Entropy ModelsSupport Vectors Machines
Unit #4: (1.5 weeks): Sequence LearningConditional Random FieldsTransformation Based Learning
![Page 47: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/47.jpg)
Outline for Ling572Unit #3: (4 weeks): Discriminative Classifiers
Feature SelectionMaximum Entropy ModelsSupport Vectors Machines
Unit #4: (1.5 weeks): Sequence LearningConditional Random FieldsTransformation Based Learning
Unit #5: (1 week): Other TopicsSemi-supervised learning,…
![Page 48: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/48.jpg)
Outline for Ling572Topics:
Feature selection approaches
Beam search
Toolkits:Mallet, libSVM
Using binary classifiers for multiclass classification
![Page 49: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/49.jpg)
Early NLPEarly approaches to Natural Language
ProcessingSimilar to classic approaches to Artificial
Intelligence
![Page 50: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/50.jpg)
Early NLPEarly approaches to Natural Language
ProcessingSimilar to classic approaches to Artificial
Intelligence
Reasoning, knowledge-intensive approaches
![Page 51: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/51.jpg)
Early NLPEarly approaches to Natural Language
ProcessingSimilar to classic approaches to Artificial
Intelligence
Reasoning, knowledge-intensive approaches
Largely manually constructed rule-based systems
![Page 52: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/52.jpg)
Early NLPEarly approaches to Natural Language
ProcessingSimilar to classic approaches to Artificial
Intelligence
Reasoning, knowledge-intensive approaches
Largely manually constructed rule-based systems
Typically focused on specific, narrow domains
![Page 53: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/53.jpg)
Early NLP: IssuesRule-based systems:
![Page 54: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/54.jpg)
Early NLP: IssuesRule-based systems:
Too narrow and brittleCouldn’t handle new domains: out of domain -> crash
![Page 55: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/55.jpg)
Early NLP: IssuesRule-based systems:
Too narrow and brittleCouldn’t handle new domains: out of domain -> crash
Hard to maintain and extendLarge manual rule bases incorporate complex
interactionsDon’t scale
![Page 56: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/56.jpg)
Early NLP: IssuesRule-based systems:
Too narrow and brittleCouldn’t handle new domains: out of domain -> crash
Hard to maintain and extendLarge manual rule bases incorporate complex
interactionsDon’t scale
Slow
![Page 57: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/57.jpg)
Reports of the Death of NLP…ALPAC Report: 1966
Automatic Language Processing Advisory Committee
![Page 58: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/58.jpg)
Reports of the Death of NLP…ALPAC Report: 1966
Automatic Language Processing Advisory Committee
Failed systems efforts, esp. MT, lead to defunding
![Page 59: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/59.jpg)
Reports of the Death of NLP…ALPAC Report: 1966
Automatic Language Processing Advisory Committee
Failed systems efforts, esp. MT, lead to defunding
Example: (Probably apocryphal)English -> Russian -> English MT“The spirit is willing but the flesh is weak.”“The vodka is good but the meat is rotten.”
![Page 60: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/60.jpg)
…Were Greatly Exaggerated
Today:
Watson wins Jeopardy!
SIRI speaks and understands
Google searches and translates
![Page 61: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/61.jpg)
So What Happened?Statistical approaches and machine learning
![Page 62: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/62.jpg)
So What Happened?Statistical approaches and machine learning
Hidden Markov Models boosted speech recognition
![Page 63: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/63.jpg)
So What Happened?Statistical approaches and machine learning
Hidden Markov Models boosted speech recognition
Noisy channel model gave statistical MT
![Page 64: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/64.jpg)
So What Happened?Statistical approaches and machine learning
Hidden Markov Models boosted speech recognition
Noisy channel model gave statistical MT
Unsupervised topic modeling
Etc
![Page 65: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/65.jpg)
So What Happened?Many stochastic approaches developed 80s-90s
Rise of machine learning accelerated 2000-present
Why?
![Page 66: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/66.jpg)
So What Happened?Many stochastic approaches developed 80s-90s
Rise of machine learning accelerated 2000-present
Why?Large scale data resources
Web dataTraining corpora: Treebank, TimeML, Discourse
treebankWikipedia, etc
![Page 67: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/67.jpg)
So What Happened?Many stochastic approaches developed 80s-90s
Rise of machine learning accelerated 2000-present
Why?Large scale data resources
Web dataTraining corpora: Treebank, TimeML, Discourse treebankWikipedia, etc
Large scale computing resourcesProcessors, storage, memory: local and cloud
![Page 68: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/68.jpg)
So What Happened?Many stochastic approaches developed 80s-90s
Rise of machine learning accelerated 2000-present
Why? Large scale data resources
Web dataTraining corpora: Treebank, TimeML, Discourse treebankWikipedia, etc
Large scale computing resourcesProcessors, storage, memory: local and cloud
Improved learning algorithmsSupervised, semisupervised, unsupervised, structured…
![Page 69: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/69.jpg)
Information Theory
![Page 70: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/70.jpg)
EntropyCan be used a measure of
Match of model to data
How predictive an n-gram model is of next word
Comparison between two models
Difficulty of a speech recognition task
![Page 71: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/71.jpg)
EntropyInformation theoretic measure
Measures information in model
Conceptually, lower bound on # bits to encode
![Page 72: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/72.jpg)
EntropyInformation theoretic measure
Measures information in model
Conceptually, lower bound on # bits to encode
Entropy: H(X): X is a random var, p: prob fn
)(log)()( 2 xpxpXHXx
![Page 73: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/73.jpg)
EntropyInformation theoretic measure
Measures information in grammar
Conceptually, lower bound on # bits to encode
Entropy: H(X): X is a random var, p: prob fn
E.g. 8 things: number as code => 3 bits/trans Alt. short code if high prob; longer if lower
Can reduce
)(log)()( 2 xpxpXHXx
![Page 74: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/74.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i)
![Page 75: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/75.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
![Page 76: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/76.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
![Page 77: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/77.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
![Page 78: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/78.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
![Page 79: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/79.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
8
1
38/1log8/1log8/1)(i
bitsXH
![Page 80: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/80.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
Some horses more likely:1: ½; 2: ¼; 3: 1/8; 4: 1/16; 5,6,7,8: 1/64
8
1
38/1log8/1log8/1)(i
bitsXH
![Page 81: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/81.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
Some horses more likely:1: ½; 2: ¼; 3: 1/8; 4: 1/16; 5,6,7,8: 1/64
8
1
38/1log8/1log8/1)(i
bitsXH
![Page 82: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/82.jpg)
Computing EntropyPicking horses (Cover and Thomas)
Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8
Some horses more likely:1: ½; 2: ¼; 3: 1/8; 4: 1/16; 5,6,7,8: 1/64
0, 10, 110, 1110, 111100, 111101, 111110, and 111111.
bitsipipXHi
2)(log)()(8
1
8
1
38/1log8/1log8/1)(i
bitsXH
![Page 83: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/83.jpg)
Entropy of a SequenceBasic sequence
)(log)(1
)(1
1211
1
n
LW
nn WpWpn
WHn n
![Page 84: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/84.jpg)
Entropy of a SequenceBasic sequence
Entropy of language: infinite lengthsAssume stationary & ergodic
Shannon-Breiman-Mcmillan Theorem
)(log)(1
)(1
1211
1
n
LW
nn WpWpn
WHn n
![Page 85: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/85.jpg)
Entropy of a SequenceBasic sequence
Entropy of language: infinite lengthsAssume stationary & ergodic
Shannon-Breiman-Mcmillan Theorem
)(log)(1
)(1
1211
1
n
LW
nn WpWpn
WHn n
),...,(log1
lim)(
),...,(log),...,(1
lim)(
1
11
nn
nLW
nn
wwpn
LH
wwpwwpn
LH
![Page 86: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/86.jpg)
Entropy of EnglishShannon’s experiment
Subjects guess strings of letters, count guessesEntropy of guess seq = Entropy of letter seq1.3 bits; Restricted text
![Page 87: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/87.jpg)
Entropy of EnglishShannon’s experiment
Subjects guess strings of letters, count guessesEntropy of guess seq = Entropy of letter seq1.3 bits; Restricted text
Build stochastic model on text & computeBrown computed trigram model on varied corpusCompute (per-char) entropy of model1.75 bits
![Page 88: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/88.jpg)
Cross-EntropyComparing models
Actual distribution unknown pUse simplified model to estimate m
Closer match will have lower cross-entropy
![Page 89: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/89.jpg)
Cross-EntropyComparing models
Actual distribution unknown pUse simplified model to estimate m
Closer match will have lower cross-entropy
![Page 90: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/90.jpg)
Cross-EntropyComparing models
Actual distribution unknown pUse simplified model to estimate m
Closer match will have lower cross-entropy
![Page 91: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/91.jpg)
Cross-EntropyComparing models
Actual distribution unknown pUse simplified model to estimate m
Closer match will have lower cross-entropy
![Page 92: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/92.jpg)
Cross-EntropyComparing models
Actual distribution unknown pUse simplified model to estimate m
Closer match will have lower cross-entropy
![Page 93: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/93.jpg)
Cross-EntropyComparing models
Actual distribution unknown pUse simplified model to estimate m
Closer match will have lower cross-entropy
![Page 94: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/94.jpg)
Relative EntropyCommonly known as Kullback-Liebler divergence
Expresses difference between probability distributions
![Page 95: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/95.jpg)
Relative EntropyCommonly known as Kullback-Liebler divergence
Expresses difference between probability distributions
![Page 96: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/96.jpg)
Relative EntropyCommonly known as Kullback-Liebler divergence
Expresses difference between probability distributions
Not a proper distance metric:
![Page 97: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/97.jpg)
Relative EntropyCommonly known as Kullback-Liebler divergence
Expresses difference between probability distributions
Not a proper distance metric: asymmetricKL(p||q) != KL(q||p)
![Page 98: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/98.jpg)
Joint & Conditional Entropy
Joint entropy:
![Page 99: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/99.jpg)
Joint & Conditional Entropy
Joint entropy:
Conditional entropy:
![Page 100: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/100.jpg)
Joint & Conditional Entropy
Joint entropy:
Conditional entropy:
![Page 101: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/101.jpg)
Joint & Conditional Entropy
Joint entropy:
Conditional entropy:
![Page 102: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/102.jpg)
Perplexity and Entropy
Given that
Consider the perplexity equation:
PP(W) = P(W)-1/N =
![Page 103: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/103.jpg)
Perplexity and Entropy
Given that
Consider the perplexity equation:
PP(W) = P(W)-1/N =
![Page 104: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/104.jpg)
Perplexity and Entropy
Given that
Consider the perplexity equation:
PP(W) = P(W)-1/N = = =
![Page 105: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/105.jpg)
Perplexity and Entropy
Given that
Consider the perplexity equation:
PP(W) = P(W)-1/N = = = 2H(L,P)
Where H is the entropy of the language L
![Page 106: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/106.jpg)
Mutual InformationMeasure of information in common between two
distributions
![Page 107: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/107.jpg)
Mutual InformationMeasure of information in common between two
distributions
![Page 108: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/108.jpg)
Mutual InformationMeasure of information in common between two
distributions
![Page 109: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/109.jpg)
Mutual InformationMeasure of information in common between two
distributions
![Page 110: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/110.jpg)
Mutual InformationMeasure of information in common between two
distributions
Symmetric: I(X;Y) = I(Y;X)
![Page 111: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/111.jpg)
Mutual InformationMeasure of information in common between two
distributions
Symmetric: I(X;Y) = I(Y;X)
I(X;Y) = KL(p(x,y)||p(x)p(y))
![Page 112: Introduction & Information Theory Ling570 Advanced Statistical Methods in NLP January 3, 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062717/56649e4a5503460f94b3e695/html5/thumbnails/112.jpg)
Next TimeA little review
Decision TreesApplications of entropy