[email protected] Introduction to Natural Language Processing Heshaam Faili [email protected].

44
[email protected] .ir Introduction to Natural Language Processing Heshaam Faili [email protected]

Transcript of [email protected] Introduction to Natural Language Processing Heshaam Faili [email protected].

Page 1: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Introduction toNatural Language Processing

Heshaam [email protected]

Page 2: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Session Agenda

Artificial Intelligence Natural Language Processing History of NLP Statistical NLP Applications of NLP

Page 3: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

AI Concepts and Definitions

Encompasses Many Definitions AI Involves Studying Human

Thought Processes Representing Thought Processes on

Machines

Page 4: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Artificial Intelligence Behavior by a machine that, if

performed by a human being, would be considered intelligent

“…study of how to make computers do things at which, at the moment, people are better” (Rich and Knight [1991])

Theory of how the human mind works (Mark Fox)

Page 5: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

AI Objectives

Make machines smarter (primary goal)

Understand what intelligence is Make machines more useful

(practical purpose)

(Winston and Prendergast [1984])

Page 6: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Turing Test for Intelligence

A computer can be considered to be smart only when a human interviewer, “conversing” with both an unseen human being and an unseen computer, can not determine which is which

Page 7: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Major AI Areas

Expert Systems Natural Language Processing Speech Understanding Robotics and Sensory Systems Computer Vision and Scene Recognition Intelligent Computer-Aided Instruction Neural Computing Fuzzy Logic Genetic Algorithms Intelligent Software Agents

Page 8: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

What is NLP ?

Natural Language is one of fundamental aspects of human behaviors.

One of the final aim of human-computer communication.

Provide easy interaction with computer

Make computer to understand texts.

Page 9: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Major Disciplines Studying LanguageDiscipline Typical Problem

Linguists How do words from phrases and sentences?

Psycholinguists How do people identify the structure of sentences?

Philosophers What is meaning and how do words and sentences acquires it?

Natural Language Processing

How is the structure of sentences identified?

Page 10: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Interaction Level

The level that computer and human interact.

NL used for make Interaction level near to human.

Human Computer

Command-lineNL UIGraphical UI

Interaction level

Page 11: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Other Titles

The most common titles, apart from Natural Language Processing include:

Automatic Language Processing Computational Linguistics Natural Language Understanding

Page 12: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Computational Linguistics

This is the application of computers to the scientific study of human language.

This definition suggests that there are connections with Cognitive Science, that is to say, the study of how humans produce and understand language.

Page 13: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Computational Linguistics Historically, Computational Linguistics

has been associated with work in Generative Linguistics and formerly included the study of formal languages (eg finite state automata) and programming languages.

Page 14: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Natural Language Understanding Distinguish a particular approach to

Natural Language Processing. The people using this title tend to lay

much emphasis on the meaning of the language being processed, in particular getting the computer to respond to the input in an apparently intelligent fashion.

Page 15: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Natural Language Understanding

At one time, those who belonged to the Natural Language Understanding camp avoided the use of any syntactic processing, but textbooks that bear this title now include significant sections on syntactic processing, which suggests that the edge of the title has been rather blunted. (For instance, see Allen (1987; part 1).

Page 16: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Motivation for NLP

Understand language analysis & generation

Communication Language is a window to the mind Data is in linguistic form Data can be in Structured (table form),

Semi structured (XML form), Unstructured (sentence form).

Page 17: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Language Processing Level 1 – Speech sound (Phonetics &

Phonology) Level 2 – Words & their forms (Morphology,

Lexicon) Level 3 – Structure of sentences (Syntax,

Parsing) Level 4 – Meaning of sentences (Semantics) Level 5 – Meaning in context & for a purpose

(Pragmatics) Level 6 – Connected sentence processing in

a larger body of text (Discourse)

Page 18: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Phonetics

Concerns processing or identifying Languages Accents Pauses Word boundaries Amplitude, Tone

Also includes background noise elimination E.g. “I got up late” and “I got a plate”

sound similar

Page 19: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Lexicon

Deals with vocabulary of words Uses Dictionary, Wordnet etc. Various levels of richness in

dictionary, e.g. tense, senses, usage, etc.

Resources – Princeton, Euro-wordnet, …

Page 20: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Syntax

Involves parsing and understanding structure of grammar

Challenges Ungrammatical sentences Word order – fixed, free Word attachment and scope

e.g. Old men and women were rescued. Only old men or old women too

Prepositional phrase attachment e.g. I saw the boy with a telescope With associated with boy or telescope?

Page 21: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Semantics

Concerned with “meaning” Creates a structure for a sentence

Main verb associated with agent, object, instrument, etc.

E.g. I ate rice with spoon.

instrument

eat

spoonI rice

agentobj

– Challenges

• Representation

• Domain (straddles into pragmatics)

• To construct meaning from individual meanings

Page 22: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Pragmatics

Use of the sentence in a situation Understanding user's intention E.g. Is that water? response

different on dining table and in chemistry lab

Applications: Search engine tuned to user preferences

Page 23: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Discourse

Processing of connected text Co-reference – Two expressions in the

utterance, both refer to the same thing. Examples

Pronoun to noun binding – John is sleeping. He is lazy (He refers to John)

In an article – George Bush, Mr. Bush, The President of United States, The President

General to specific – Ferrari launched a new model. This car is much better than the previous one. Car refers to new model launched

Page 24: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

NLP History (1)

The first recognizable NLP application was a dictionary look-up system developed at Birkbeck College, London in 1948.

Page 25: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

NLP History (2)

NLP from 1966-1980 Augmented Transition Networks Case Grammar

Semantic representations Conceptual Dependency Semantic network Procedural semantics

Page 26: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

NLP History (3)

The key systems were: LUNAR: A database interface system that used ATNs and Woods'

Procedural Semantics. LIFER/LADDER: One of the most impressive of NLP systems. It was

designed as a natural language interface to a database of information about US Navy ships.

NLP from 1980 - 1990

- Grammar Formalisms NLP from 1990- 2000

- Multilinguality and Multimodality NLP from 2000-now

- Statistical Approaches and Practical Uses

Page 27: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Why NLP is Hard?

Page 28: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Why NLP is Hard?

Page 29: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Why NLP is Hard?

Page 30: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Why NLP is Hard?

Page 31: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Why NLP is Hard?

Page 32: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Basics of statistical NLP

Consider NLP problems as sequence labeling tasks

Amenable to machine learning (training and generalization)

In classical NLP – rules are obtained from linguists

In statistical NLP – probabilities are learnt from data

Page 33: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Noisy Channel Metaphor

Speech TextSignal

- I want food.

- It is cold today.

Noisy

Page 34: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Data-Driven Approach

The issues in this approach are - Corpora collection (coherent piece of text) Corpora cleaning – spelling, grammar,

strange characters’ removal Annotation

Named entity recognition POS detection Parsing Meaning

Again: The biggest challenge is Ambiguity.

Page 35: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Sequence Labeling Tasks

In the order of complexity - Dealing words – POS tagging,

Named Entity Recognition (NER), Sense disambiguation

Phrases – Chunking Sentences – Bracketing Paragraphs – Co-referencing

Page 36: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Examples of Levels

Example Sentence – The dog Bill went near cat Jack. It bit it

POS Tagging – The dog Bill went near cat Jack. It bit it DT NN NNP VBD PP NN NNP PN VBD PN

NER – <person-name>Bill</person-name> <person-name>Jack</person-name>

Sense – Using Wordnet {dog, animal} – synset-id synset-id assigned to each sense

Page 37: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Chunking

(Beginning, Intermediate, End) (The dog Bill) went near (the cat Jack) B I E BIE BIE B I E It bit it BIE BIE BIE

Page 38: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

ParsingS

NP VP

DT NP V PP

N N P NP

dog Billnear

wentthe

the cat Jack

Page 39: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Higher Order Structures

Bracketing – [S [NP] [VP [V [PP [P [NP]]]]]] [S [NP]

[VP [V [NP]]]] Co-referencing

The dog Bill went near the cat Jack. It bit it

1 2 3 4 5 6 7 8 9 10 11 References – 2<-9, 7<-11, 2<-3, 7<-8

Page 40: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Sequence labeling task is a classification task

POS NER Sense Chunking Bracketin

g

• word->POS cat{NN, VBD ...}

• word->Name cat{person, place}

• word->sense-id{001 ... N}

• word->{B, I, E}

• sentence->{has_tree, no_tree}

Task Classification

Page 41: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Learning Algorithm

Knowledge Based Rules Decision Trees Decision Lists

Statistical Graphical Models – HMM Neural Networks Support Vector Machines (SVM)

Page 42: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

Applications

Machine Translation: different strategies Systran: www.Systransoft.com Google: Translate.google.com

Question – Answering MIT Q&A system( START ): http://start.csail.mit.edu/

Summarization: Information Extraction Spell Checking

Microsoft Spell Checker Call centre MT for SMS

Page 43: Hfaili@ece.ut.ac.ir Introduction to Natural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir.

[email protected]

NLP Laboratory The first aim is to establish a virtual center for

NLP related researches Defining of practical applications specially on

Persian POS TAGGER, Spell Checker, n-gram model, Machine

translation, NER , Document Classification, Search Engine, Summarization,

Defining several research projects Sharing different resources and experiences Make a foundation of NLP-Suite

Like TINA : MIT NLP-SUITE Contact me for any request on NLP domain

([email protected])