[email protected] Introduction to Natural Language Processing Heshaam Faili [email protected].
-
Upload
jordan-anderson -
Category
Documents
-
view
217 -
download
0
Transcript of [email protected] Introduction to Natural Language Processing Heshaam Faili [email protected].
Session Agenda
Artificial Intelligence Natural Language Processing History of NLP Statistical NLP Applications of NLP
AI Concepts and Definitions
Encompasses Many Definitions AI Involves Studying Human
Thought Processes Representing Thought Processes on
Machines
Artificial Intelligence Behavior by a machine that, if
performed by a human being, would be considered intelligent
“…study of how to make computers do things at which, at the moment, people are better” (Rich and Knight [1991])
Theory of how the human mind works (Mark Fox)
AI Objectives
Make machines smarter (primary goal)
Understand what intelligence is Make machines more useful
(practical purpose)
(Winston and Prendergast [1984])
Turing Test for Intelligence
A computer can be considered to be smart only when a human interviewer, “conversing” with both an unseen human being and an unseen computer, can not determine which is which
Major AI Areas
Expert Systems Natural Language Processing Speech Understanding Robotics and Sensory Systems Computer Vision and Scene Recognition Intelligent Computer-Aided Instruction Neural Computing Fuzzy Logic Genetic Algorithms Intelligent Software Agents
What is NLP ?
Natural Language is one of fundamental aspects of human behaviors.
One of the final aim of human-computer communication.
Provide easy interaction with computer
Make computer to understand texts.
Major Disciplines Studying LanguageDiscipline Typical Problem
Linguists How do words from phrases and sentences?
Psycholinguists How do people identify the structure of sentences?
Philosophers What is meaning and how do words and sentences acquires it?
Natural Language Processing
How is the structure of sentences identified?
Interaction Level
The level that computer and human interact.
NL used for make Interaction level near to human.
Human Computer
Command-lineNL UIGraphical UI
Interaction level
Other Titles
The most common titles, apart from Natural Language Processing include:
Automatic Language Processing Computational Linguistics Natural Language Understanding
Computational Linguistics
This is the application of computers to the scientific study of human language.
This definition suggests that there are connections with Cognitive Science, that is to say, the study of how humans produce and understand language.
Computational Linguistics Historically, Computational Linguistics
has been associated with work in Generative Linguistics and formerly included the study of formal languages (eg finite state automata) and programming languages.
Natural Language Understanding Distinguish a particular approach to
Natural Language Processing. The people using this title tend to lay
much emphasis on the meaning of the language being processed, in particular getting the computer to respond to the input in an apparently intelligent fashion.
Natural Language Understanding
At one time, those who belonged to the Natural Language Understanding camp avoided the use of any syntactic processing, but textbooks that bear this title now include significant sections on syntactic processing, which suggests that the edge of the title has been rather blunted. (For instance, see Allen (1987; part 1).
Motivation for NLP
Understand language analysis & generation
Communication Language is a window to the mind Data is in linguistic form Data can be in Structured (table form),
Semi structured (XML form), Unstructured (sentence form).
Language Processing Level 1 – Speech sound (Phonetics &
Phonology) Level 2 – Words & their forms (Morphology,
Lexicon) Level 3 – Structure of sentences (Syntax,
Parsing) Level 4 – Meaning of sentences (Semantics) Level 5 – Meaning in context & for a purpose
(Pragmatics) Level 6 – Connected sentence processing in
a larger body of text (Discourse)
Phonetics
Concerns processing or identifying Languages Accents Pauses Word boundaries Amplitude, Tone
Also includes background noise elimination E.g. “I got up late” and “I got a plate”
sound similar
Lexicon
Deals with vocabulary of words Uses Dictionary, Wordnet etc. Various levels of richness in
dictionary, e.g. tense, senses, usage, etc.
Resources – Princeton, Euro-wordnet, …
Syntax
Involves parsing and understanding structure of grammar
Challenges Ungrammatical sentences Word order – fixed, free Word attachment and scope
e.g. Old men and women were rescued. Only old men or old women too
Prepositional phrase attachment e.g. I saw the boy with a telescope With associated with boy or telescope?
Semantics
Concerned with “meaning” Creates a structure for a sentence
Main verb associated with agent, object, instrument, etc.
E.g. I ate rice with spoon.
instrument
eat
spoonI rice
agentobj
– Challenges
• Representation
• Domain (straddles into pragmatics)
• To construct meaning from individual meanings
Pragmatics
Use of the sentence in a situation Understanding user's intention E.g. Is that water? response
different on dining table and in chemistry lab
Applications: Search engine tuned to user preferences
Discourse
Processing of connected text Co-reference – Two expressions in the
utterance, both refer to the same thing. Examples
Pronoun to noun binding – John is sleeping. He is lazy (He refers to John)
In an article – George Bush, Mr. Bush, The President of United States, The President
General to specific – Ferrari launched a new model. This car is much better than the previous one. Car refers to new model launched
NLP History (1)
The first recognizable NLP application was a dictionary look-up system developed at Birkbeck College, London in 1948.
NLP History (2)
NLP from 1966-1980 Augmented Transition Networks Case Grammar
Semantic representations Conceptual Dependency Semantic network Procedural semantics
NLP History (3)
The key systems were: LUNAR: A database interface system that used ATNs and Woods'
Procedural Semantics. LIFER/LADDER: One of the most impressive of NLP systems. It was
designed as a natural language interface to a database of information about US Navy ships.
NLP from 1980 - 1990
- Grammar Formalisms NLP from 1990- 2000
- Multilinguality and Multimodality NLP from 2000-now
- Statistical Approaches and Practical Uses
Why NLP is Hard?
Why NLP is Hard?
Why NLP is Hard?
Why NLP is Hard?
Why NLP is Hard?
Basics of statistical NLP
Consider NLP problems as sequence labeling tasks
Amenable to machine learning (training and generalization)
In classical NLP – rules are obtained from linguists
In statistical NLP – probabilities are learnt from data
Data-Driven Approach
The issues in this approach are - Corpora collection (coherent piece of text) Corpora cleaning – spelling, grammar,
strange characters’ removal Annotation
Named entity recognition POS detection Parsing Meaning
Again: The biggest challenge is Ambiguity.
Sequence Labeling Tasks
In the order of complexity - Dealing words – POS tagging,
Named Entity Recognition (NER), Sense disambiguation
Phrases – Chunking Sentences – Bracketing Paragraphs – Co-referencing
Examples of Levels
Example Sentence – The dog Bill went near cat Jack. It bit it
POS Tagging – The dog Bill went near cat Jack. It bit it DT NN NNP VBD PP NN NNP PN VBD PN
NER – <person-name>Bill</person-name> <person-name>Jack</person-name>
Sense – Using Wordnet {dog, animal} – synset-id synset-id assigned to each sense
Chunking
(Beginning, Intermediate, End) (The dog Bill) went near (the cat Jack) B I E BIE BIE B I E It bit it BIE BIE BIE
Higher Order Structures
Bracketing – [S [NP] [VP [V [PP [P [NP]]]]]] [S [NP]
[VP [V [NP]]]] Co-referencing
The dog Bill went near the cat Jack. It bit it
1 2 3 4 5 6 7 8 9 10 11 References – 2<-9, 7<-11, 2<-3, 7<-8
Sequence labeling task is a classification task
POS NER Sense Chunking Bracketin
g
• word->POS cat{NN, VBD ...}
• word->Name cat{person, place}
• word->sense-id{001 ... N}
• word->{B, I, E}
• sentence->{has_tree, no_tree}
Task Classification
Learning Algorithm
Knowledge Based Rules Decision Trees Decision Lists
Statistical Graphical Models – HMM Neural Networks Support Vector Machines (SVM)
Applications
Machine Translation: different strategies Systran: www.Systransoft.com Google: Translate.google.com
Question – Answering MIT Q&A system( START ): http://start.csail.mit.edu/
Summarization: Information Extraction Spell Checking
Microsoft Spell Checker Call centre MT for SMS
NLP Laboratory The first aim is to establish a virtual center for
NLP related researches Defining of practical applications specially on
Persian POS TAGGER, Spell Checker, n-gram model, Machine
translation, NER , Document Classification, Search Engine, Summarization,
Defining several research projects Sharing different resources and experiences Make a foundation of NLP-Suite
Like TINA : MIT NLP-SUITE Contact me for any request on NLP domain