600.465 - Intro to NLP - J. Eisner1 Syntactic Features Morphology, heads, gaps, etc.
Introduction to NLP · 2020. 12. 23. · Introduction to NLP What is Natural Language processing...
Transcript of Introduction to NLP · 2020. 12. 23. · Introduction to NLP What is Natural Language processing...
-
Introduction to NLP
What is Natural Language
processing (NLP),
Motivation,
Stages of NLP,
- Morphological Analysis,
- Syntactic Analysis,
- Semantic Analysis,
- Pragmatic Analysis,
- Discourse Analysis,
Terms of NLP
- Parsing (Syntactic Analysis),
- Word Sense Resolution,
- Reference Resolution.
Stages of NLP (Examples),
- Morphological Analysis,
- Syntactic Analysis,
- Semantic Analysis,
- Pragmatic Analysis,
Ambiguity,
Lexicon,
Simple Applications,
Bigger Applications,
Spoken Dialogue System,
Language Technology,
The State of Art,
Explore: Topics based
Research Areas:@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
Course Grading Evaluation
Course Activities (Grading Criteria)
Class Participation:
Assignment/Quizzes:
Semester Project:
Research Article implementation/ demo:
Mid Term Exam:
Final Term Exam:
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
Course Books (for Reference)
Speech And Language Processing :
An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition
By: Daniel Jurafsky and James H. Martin
Published by Prentice Hall, 2000.
Handbook of Natural Language
Processing :
By: Nitin Indurkhya and F. J. Damfrau
Published by CRC Press.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
Course contents
Objective Introduction,
Regular Expressions &
Automata,
Morphology & Finite State
Transducers,
N-Grams,
Parts of Speech,
Syntax & Context-free grammars
- Parsing,
Lexicalized and Probabilistic
Parsing,
Semantic Representation &
Representing Meaning
Semantic analysis & lexical
Semantics
Wrap up,
Machine Translation
Information Extraction
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
1. What is Natural Language processing (NLP)
NLP
The sub-domain of artificial intelligence concerned with the task of developing programs possessing some capability of ‘understanding’ a natural language in order to achieve some specific goal
• A transformation from one representation (the input text) to another (internal representation)
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
1. What is Natural Language processing (Motivation)
Machine Translation:
- Translation of text or speech from
one language.
Database Interface:
- Using natural language to query
from database. Applications
Machine
Translation
Data
base
Inter
face
Report Abstraction
Sto
ry
Un
dersta
nd
ing
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
Report Abstraction:
- To get the result/meanings of report
automatically
Example: frequently used to abstract the
medical reports.
e.g., word “Lecture” type in searching
engine.
Story Understanding:
- Understanding natural language to
determine the story.
- Using different timelines to make a
story.
e.g., word “Lecture” type results are;
=> Lecture notes of computer science. “or”
=> Lecturer “or”
=> Lecture of ….
Applications
Machine
Translation
Data
base
Inter
face
Report Abstraction
Sto
ry
Un
dersta
nd
ing
1. What is Natural Language processing (Motivation)
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
2. Stages of NLP
Morphological Analysis
Individual words are analyzed into their
components
Understanding word structure
Ex: browser>browse er
Syntactic Analysis
Linear sequences of words are transformed into structures that show how the words relate to each other.
To see the structure of sentence
ex:I ate apple,I ate sky
Discourse Analysis
Resolving references between sentences
Coherent structured group of sentence
Pragmatic Analysis
(between sentences)
To reinterpret what was said to; what was actually meant
part of the process of extracting information from text
Semantic Analysis
A transformation is made from the input text to an
internal representation that reflects the meaning
To understand the meanings of sentences
using parsing
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
2.1. The Steps in NLP
Pragmatics
Syntax
Semantics
Pragmatics
Syntax
Semantics
Discourse
Morphology**we can go up, down and up and
down and combine steps too!!
**every step is equally complex
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
2.1. The Steps in NLP (Cont…)
Morphology: Concerns the way words are built up from
smaller meaning bearing units.
Example browser>browse+er
Syntax: concerns how words are put together to form
correct sentences and
- what structural role each word has.
Semantics: concerns what words mean.
- and how these meanings combine in sentence (or
sentences) to form sentence meanings.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
2.1. The Steps in NLP (Cont…)
Pragmatics: concerns how sentences are used in different
situations and;
- how it affects the interpretation of the sentence.
Example; checking grammar of overall sentence.
Discourse: concerns how the immediately preceding
sentences affect the interpretation of the next sentence.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
3. Terms of NLP (3.1 Parsing Syntactic Analysis)
Assigning a syntactic and logical form to an input sentence
– uses knowledge about word and word meanings (lexicon)
Lexicon is knowledge base of words and their meanings.
– uses a set of rules defining legal structures (grammar)
Ahmad ate the apple.
(S (NP (NAME Ahmad))
(VP (V ate)
(NP (DET the)
(N apple))))
Syntax : General variable (Actual variable definition);@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
3. Terms of NLP (3.2 Word Sense Resolution)
Many words have many meanings or senses
We need to resolve which of the senses of an
ambiguous word is invoked in a particular use of the
word
I made her duck. (made her a bird for lunch or made
her move her head quickly downwards?)
He left his mouse. (Determining the word mouse
refers to computer device or an animal)
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
3. Terms of NLP (3.3 Reference Resolution)
Discourse Knowledge Implicit and Explicit Knowledge
Domain Knowledge (Registration transaction)
World Knowledge = The non-linguistic information that helps a reader or listener interpret the meanings of words and sentences. Also called extra-
linguistic knowledge
• U: I would like to register in an IAS Course.
• S: Which number?
• U: Make it 333.
• S: Which section?
• U: Which section starts at 7:00 am?
• S: section 5.
• U: Then make it that section.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
3. Steps + Terms of NLP (Class Participation)
Give at least 2 Proper examples of following cases:
Morphological Analysis,
Syntactic Analysis
Semantic Analysis
Pragmatic Analysis
Discourse Analysis
Parsing Syntactic Analysis,
Word Sense Resolution,
Reference Resolution.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
4. Stages of NLP (Examples), [Stage#1]
I want to print Ali’s .init file
I (pronoun) want (verb) to (prep) to(infinitive) print (verb) Ali (noun) ‘s
(possessive) .init (adj) file (noun) file (verb)
Surface formstems
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
4. Stages of NLP (Examples), [Stage#2]
I (pronoun) want (verb) to (prep) to(infinitive) print (verb) Ali (noun) ‘s
(possessive) .init (adj) file (noun) file (verb)
stems
Parse
tree
S
NPVP
SVPRO
I
NP
NP VP
PRO V
ADJto
want
NP
ADJ N
print
Ali’s.init file
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
4. Stages of NLP (Examples), [Stage#3]
NPVP
SV
I
NP
NP VP
PRO V
ADJto
want
NP
ADJ N
print
Ali’s.init
S
PRO
file
I
want print
Ali
.init
file
who
what
who
Who’s
what
type
Semantic Net
Parse tree
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
4. Stages of NLP (Examples), [Stage#4 and#5]
I
want print
Ali
.init
file
who
what
who
Who’s
what
typeSemantic Net
To whom the pronoun ‘I’ refers
To whom the proper noun ‘Ali’ refers
What are the files to be printed
Execute the command
lpr /ali/stuff.init
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
4. Stages of NLP (Examples),
Morphologic
al Analysis
Syntactic
Analysis
Semantic
Analysis
Discourse
Analysis
Pragmatic
Analysis
Internal
representatio
n
lexicon
user
Surface
form
Perform
action
stems
parse
tree
Resolve
references
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
5. Ambiguity
Fruit flies like to feast on a banana; in contrast,
the species of flies known as “time flies” like an
arrow.
Time passes along in the same manner as an arrow gliding through space.
I order you to take timing measurements on flies, in the same manner as you would time an arrow. (other different meanings)
more than one meaning for the same sentence
Tim
e flie
s like a
n a
rrow
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
5. Ambiguity (Cont…)
The chicken is ready to eat @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
6. Lexicon
Lexicon is a vocabulary data bank, that contains the language words and their linguistic information.
There are many on-line lexicon.
WordNet is a lexical database that contains English vocabulary words.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
7. Simple Applications
Word counters (wc in UNIX)
Spell Checkers,
grammar checkers
Predictive Text on mobile handsets
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
7.1 Bigger Applications
Intelligent systems (example: Siri)
NLU interfaces to databases (example: OpenNLP to convert English sentences to SQL queries)
Computer aided instruction
Information retrieval
Intelligent Web searching (example: Google search engine)
Data mining (example: NLP text mining)
Machine translation
Speech recognition
Natural language generation
Question answering
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
7.2 Spoken Dialogue System
Speech Recognition
Speech Synthesis
Semantic Interpretation
Response Generation
Dialogue Management
Discourse Interpretation
U
s
e
r
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
7.2 Spoken Dialogue System (Cont…)
Signal Processing: Convert the audio wave into a sequence of feature vectors.
Speech Synthesis: Generate synthetic speech using parsing technique to response.
Semantic Interpretation: Determine the meaning and relationbetween the words.
Discourse Interpretation: Understand what the user intends by interpreting utterances between sentences.
Dialogue Management: Determine system goals in response to user utterances based on user intention (medium e.g., dialogue box).
Response Generation: Predicting discourse knowledge and make relative response to request user.
Speech Recognition: Decode the sequence of feature vectors into a sequence of words.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
7.2 Spoken Dialogue System (Cont…)
“Levels of Sophistication in a Dialogue System”
Touch-tone replacement:
System Prompt: "For checking information, press or say one." Caller Response: "One."
Directed dialogue:
System Prompt: "Would you like checking account information or rate information?" Caller Response: "Checking", or "checking account," or "rates."
Natural language:
System Prompt: "What transaction would you like to perform?" Caller Response: "Transfer Rs. 500 from checking to savings.“
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
7.3 Language Technology in NLP
Coreference resolution
Word sense disambiguation (WSD)
Parsing
Machine translation (MT)
Sentiment analysisBest roast chicken in San Francisco!
The waiter ignored us for 20 minutes.
Carter told Mubarak he shouldn’t run again.
I need new batteries for my mouse.
The 13th Shanghai International Film Festival…
第13届上海国际电影节开幕…
I can see Alcatraz from the window!
Named entity recognition (NER)
mostly solved
Spam detectionLet’s go to Agra!
Buy 2DF&EC …
✓✗
Einstein met with UN officials in PrincetonPERSON ORG LOC
Colorless green ideas sleep furiously.
Part-of-speech (POS) tagging
ADJ ADJ NOUN VERB ADV
making good progress
Information extraction (IE)You’re invited to our dinner party, Friday May 27 at 8:30
PartyMay 27add
Paraphrase
Summarization
Dialog
still really hard
The Dow Jones is up
Housing prices rose
Economy is good
Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?
XYZ acquired ABC yesterday
ABC has been taken over by XYZ
Castro Theatre at 7:30. Do you want a ticket?
The S&P500 jumped
Question answering (QA)
Where is Citizen Kane playing in SF?
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
7.3 Language Technology in NLP (Cont…)
Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥
segmentation issues
dark horseget cold feet
lose facethrow in the towel
neologismsunfriendRetweet
bromance
tricky entity namesworld knowledge
Mary and Sue are sisters.
Mary and Sue are mothers.
But that’s what makes it fun!
the New York-New Haven Railroad
the New York-New Haven Railroad
idiomsnon-standard English
Where is A Bug’s Life playing …
Let It Be was recorded …
… a mutation on the for gene …
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
Neologism is a relatively recent or isolated term, word, or phrase that may be in the process
of entering common use, but that has not yet been fully accepted into mainstream language.
World Knowledge
The non-linguistic information that helps a reader or listener interpret the meanings of
words and sentences. Also called extra-linguistic knowledge.
•not consisting of or related to language. Example: such as whistles, yells, laughs, and cries.
Idiom
An idiom is a phrase or an expression that has a figurative, or sometimes literal, meaning.
Example (Once in a blue moon) means (Happens very rarely).
Non standard English: not conforming in pronunciation, grammar, vocabulary, etc., to the
usage characteristic of and considered acceptable by most educated native speakers
Example :Great job @justinbieber! Were SOO PROUD of what you’ve accomplished! U taught us 2 #neversaynever & you yourself should never give up either
7.3 Language Technology in NLP (Cont…)
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
System that uses speech and language recognition
1)Amtrak-United Airlines (users interact with conversational agents)
2)Car makers (automatic speech recognition & text to speech allowing drivers to control vehicle navigation by voice ) e.g.; Tesla car.
3)Video search companies (search services with speech recognition)
4)Google (provides cross language information/ translate query and find most relevant pages)
5)Pearson and ETS (automated systems to analyze student’s essays)
6)Interactive virtual agents(serves as tutors for children learning to read)
7)Text analysis companies(automated measurements of user opinion)
8. The State of Art
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
9. Some Brief History
Foundational Insights: 1940s & 1950s
The Two Camps: 1957-1970
Four Paradigms: 1970-1983
Empiricism & Finite State Models Redux: 1983-1993
The Field Comes Together: 1994-1999
The Rise of Machine Learning: 2000-2008
On Multiple Discoveries
A Final Brief Note on Psychology
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
9.1 Foundational Insights: 1940s & 1950s
Two foundational paradigms
•Automaton(1950)
•Probabilistic Model (information-theoretic model)
Automaton is based on Turing model (1936) of algorithmic computation
Turing Model further included
•McCulloch-Pitts neuron(1943)
•Kleene(1951)
•Finite Automata and Regular Expression (1956)
Probabilistic Model was applied by Shannon(1948)
•Based on Shannon’s work, Chomsky(1956) came up with idea of finite state
machine.
These model led to field of Formal Language Theory.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
9.2 The Two Camps: 1957-1970
Speech and language processing split into two paradigm
• Symbolic: used to describe actions that purposefully and discernibly convey a particular
message or statement to those viewing it.
• Stochastic: Having a random probability distribution or pattern that may be analyzed
statistically but may not be predicted precisely.
Transformations and Discourse Analysis Project(TDAP) parsing system.
• Implemented between June 1958 to July 1959
Stochastic main hold in
• Statistics
• Electrical Engineering
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
9.3 Four Paradigms: 1970-1983
Stochastic (having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.)
for development of speech recognition algorithm.
Logic-based
started by working on Q-systems and metamorphosis grammar.
Natural language understanding
began with Winograd’s SHRDLU system.
Discourse Modeling
focused on four key areas.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
9.4 Empiricism & Finite State Models Redux: 1983-
1993
Return of two classes of models that had lost popularity.
• Finite State Model
• Return of Empiricism
• Rise of probabilistic model through speech and language processing
Speech divided into
parts of speech tagging
parsing
attachment ambiguities
semantics
This period saw considerable work on Natural Language Processing
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
9.5 The Field Comes Together: 1994-1999
• Field undergoes major changes by last five years including First probabilistic and data models became standard
Secondly, increased speed and memory of computer processing allowed commercial
exploitation of subareas of speech and language processing.
Subareas includes
Speech recognition
Spelling and grammar correction
Commercial Exploitation
Term that includes all activities used to benefit commercially from one’s property.
Example Making property, selling it, offering it for sale, or licensing its appropriation or
use
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
9.6 The Rise of Machine Learning: 2000-2008
Three Synergistic trends
Large amount of spoken and written material became widely available.
The increased focus on learning led to a more serious interplay with statistical
machine learning community.
The widespread availability of high performance computing system facilitated the
training and deployment of systems that could not have been imagined a decade earlier.
Finally, largely unsupervised statistical approaches began to receive renewed attention.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
-
10. Explore: Topics based Research Areas
(1) Named Entity Recognition :-
Proposed architecture of Model
LIST OF NAMED ENTITY TYPES WITH THE KINDS OF ENTITIES
THEY BELONG TO
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)