Introduction to NLP · 2020. 12. 23. · Introduction to NLP What is Natural Language processing...

40
Introduction to NLP What is Natural Language processing (NLP), Motivation, Stages of NLP, - Morphological Analysis, - Syntactic Analysis, - Semantic Analysis, - Pragmatic Analysis, - Discourse Analysis, Terms of NLP - Parsing (Syntactic Analysis), - Word Sense Resolution, - Reference Resolution. Stages of NLP (Examples), - Morphological Analysis, - Syntactic Analysis, - Semantic Analysis, - Pragmatic Analysis, Ambiguity, Lexicon, Simple Applications, Bigger Applications, Spoken Dialogue System, Language Technology, The State of Art, Explore: Topics based Research Areas: @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Transcript of Introduction to NLP · 2020. 12. 23. · Introduction to NLP What is Natural Language processing...

  • Introduction to NLP

    What is Natural Language

    processing (NLP),

    Motivation,

    Stages of NLP,

    - Morphological Analysis,

    - Syntactic Analysis,

    - Semantic Analysis,

    - Pragmatic Analysis,

    - Discourse Analysis,

    Terms of NLP

    - Parsing (Syntactic Analysis),

    - Word Sense Resolution,

    - Reference Resolution.

    Stages of NLP (Examples),

    - Morphological Analysis,

    - Syntactic Analysis,

    - Semantic Analysis,

    - Pragmatic Analysis,

    Ambiguity,

    Lexicon,

    Simple Applications,

    Bigger Applications,

    Spoken Dialogue System,

    Language Technology,

    The State of Art,

    Explore: Topics based

    Research Areas:@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • Course Grading Evaluation

    Course Activities (Grading Criteria)

    Class Participation:

    Assignment/Quizzes:

    Semester Project:

    Research Article implementation/ demo:

    Mid Term Exam:

    Final Term Exam:

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • Course Books (for Reference)

    Speech And Language Processing :

    An Introduction to Natural Language Processing,

    Computational Linguistics, and Speech Recognition

    By: Daniel Jurafsky and James H. Martin

    Published by Prentice Hall, 2000.

    Handbook of Natural Language

    Processing :

    By: Nitin Indurkhya and F. J. Damfrau

    Published by CRC Press.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • Course contents

    Objective Introduction,

    Regular Expressions &

    Automata,

    Morphology & Finite State

    Transducers,

    N-Grams,

    Parts of Speech,

    Syntax & Context-free grammars

    - Parsing,

    Lexicalized and Probabilistic

    Parsing,

    Semantic Representation &

    Representing Meaning

    Semantic analysis & lexical

    Semantics

    Wrap up,

    Machine Translation

    Information Extraction

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 1. What is Natural Language processing (NLP)

    NLP

    The sub-domain of artificial intelligence concerned with the task of developing programs possessing some capability of ‘understanding’ a natural language in order to achieve some specific goal

    • A transformation from one representation (the input text) to another (internal representation)

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 1. What is Natural Language processing (Motivation)

    Machine Translation:

    - Translation of text or speech from

    one language.

    Database Interface:

    - Using natural language to query

    from database. Applications

    Machine

    Translation

    Data

    base

    Inter

    face

    Report Abstraction

    Sto

    ry

    Un

    dersta

    nd

    ing

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • Report Abstraction:

    - To get the result/meanings of report

    automatically

    Example: frequently used to abstract the

    medical reports.

    e.g., word “Lecture” type in searching

    engine.

    Story Understanding:

    - Understanding natural language to

    determine the story.

    - Using different timelines to make a

    story.

    e.g., word “Lecture” type results are;

    => Lecture notes of computer science. “or”

    => Lecturer “or”

    => Lecture of ….

    Applications

    Machine

    Translation

    Data

    base

    Inter

    face

    Report Abstraction

    Sto

    ry

    Un

    dersta

    nd

    ing

    1. What is Natural Language processing (Motivation)

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 2. Stages of NLP

    Morphological Analysis

    Individual words are analyzed into their

    components

    Understanding word structure

    Ex: browser>browse er

    Syntactic Analysis

    Linear sequences of words are transformed into structures that show how the words relate to each other.

    To see the structure of sentence

    ex:I ate apple,I ate sky

    Discourse Analysis

    Resolving references between sentences

    Coherent structured group of sentence

    Pragmatic Analysis

    (between sentences)

    To reinterpret what was said to; what was actually meant

    part of the process of extracting information from text

    Semantic Analysis

    A transformation is made from the input text to an

    internal representation that reflects the meaning

    To understand the meanings of sentences

    using parsing

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 2.1. The Steps in NLP

    Pragmatics

    Syntax

    Semantics

    Pragmatics

    Syntax

    Semantics

    Discourse

    Morphology**we can go up, down and up and

    down and combine steps too!!

    **every step is equally complex

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 2.1. The Steps in NLP (Cont…)

    Morphology: Concerns the way words are built up from

    smaller meaning bearing units.

    Example browser>browse+er

    Syntax: concerns how words are put together to form

    correct sentences and

    - what structural role each word has.

    Semantics: concerns what words mean.

    - and how these meanings combine in sentence (or

    sentences) to form sentence meanings.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 2.1. The Steps in NLP (Cont…)

    Pragmatics: concerns how sentences are used in different

    situations and;

    - how it affects the interpretation of the sentence.

    Example; checking grammar of overall sentence.

    Discourse: concerns how the immediately preceding

    sentences affect the interpretation of the next sentence.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 3. Terms of NLP (3.1 Parsing Syntactic Analysis)

    Assigning a syntactic and logical form to an input sentence

    – uses knowledge about word and word meanings (lexicon)

    Lexicon is knowledge base of words and their meanings.

    – uses a set of rules defining legal structures (grammar)

    Ahmad ate the apple.

    (S (NP (NAME Ahmad))

    (VP (V ate)

    (NP (DET the)

    (N apple))))

    Syntax : General variable (Actual variable definition);@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 3. Terms of NLP (3.2 Word Sense Resolution)

    Many words have many meanings or senses

    We need to resolve which of the senses of an

    ambiguous word is invoked in a particular use of the

    word

    I made her duck. (made her a bird for lunch or made

    her move her head quickly downwards?)

    He left his mouse. (Determining the word mouse

    refers to computer device or an animal)

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 3. Terms of NLP (3.3 Reference Resolution)

    Discourse Knowledge Implicit and Explicit Knowledge

    Domain Knowledge (Registration transaction)

    World Knowledge = The non-linguistic information that helps a reader or listener interpret the meanings of words and sentences. Also called extra-

    linguistic knowledge

    • U: I would like to register in an IAS Course.

    • S: Which number?

    • U: Make it 333.

    • S: Which section?

    • U: Which section starts at 7:00 am?

    • S: section 5.

    • U: Then make it that section.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 3. Steps + Terms of NLP (Class Participation)

    Give at least 2 Proper examples of following cases:

    Morphological Analysis,

    Syntactic Analysis

    Semantic Analysis

    Pragmatic Analysis

    Discourse Analysis

    Parsing Syntactic Analysis,

    Word Sense Resolution,

    Reference Resolution.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 4. Stages of NLP (Examples), [Stage#1]

    I want to print Ali’s .init file

    I (pronoun) want (verb) to (prep) to(infinitive) print (verb) Ali (noun) ‘s

    (possessive) .init (adj) file (noun) file (verb)

    Surface formstems

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 4. Stages of NLP (Examples), [Stage#2]

    I (pronoun) want (verb) to (prep) to(infinitive) print (verb) Ali (noun) ‘s

    (possessive) .init (adj) file (noun) file (verb)

    stems

    Parse

    tree

    S

    NPVP

    SVPRO

    I

    NP

    NP VP

    PRO V

    ADJto

    want

    NP

    ADJ N

    print

    Ali’s.init file

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 4. Stages of NLP (Examples), [Stage#3]

    NPVP

    SV

    I

    NP

    NP VP

    PRO V

    ADJto

    want

    NP

    ADJ N

    print

    Ali’s.init

    S

    PRO

    file

    I

    want print

    Ali

    .init

    file

    who

    what

    who

    Who’s

    what

    type

    Semantic Net

    Parse tree

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 4. Stages of NLP (Examples), [Stage#4 and#5]

    I

    want print

    Ali

    .init

    file

    who

    what

    who

    Who’s

    what

    typeSemantic Net

    To whom the pronoun ‘I’ refers

    To whom the proper noun ‘Ali’ refers

    What are the files to be printed

    Execute the command

    lpr /ali/stuff.init

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 4. Stages of NLP (Examples),

    Morphologic

    al Analysis

    Syntactic

    Analysis

    Semantic

    Analysis

    Discourse

    Analysis

    Pragmatic

    Analysis

    Internal

    representatio

    n

    lexicon

    user

    Surface

    form

    Perform

    action

    stems

    parse

    tree

    Resolve

    references

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 5. Ambiguity

    Fruit flies like to feast on a banana; in contrast,

    the species of flies known as “time flies” like an

    arrow.

    Time passes along in the same manner as an arrow gliding through space.

    I order you to take timing measurements on flies, in the same manner as you would time an arrow. (other different meanings)

    more than one meaning for the same sentence

    Tim

    e flie

    s like a

    n a

    rrow

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 5. Ambiguity (Cont…)

    The chicken is ready to eat @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 6. Lexicon

    Lexicon is a vocabulary data bank, that contains the language words and their linguistic information.

    There are many on-line lexicon.

    WordNet is a lexical database that contains English vocabulary words.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 7. Simple Applications

    Word counters (wc in UNIX)

    Spell Checkers,

    grammar checkers

    Predictive Text on mobile handsets

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 7.1 Bigger Applications

    Intelligent systems (example: Siri)

    NLU interfaces to databases (example: OpenNLP to convert English sentences to SQL queries)

    Computer aided instruction

    Information retrieval

    Intelligent Web searching (example: Google search engine)

    Data mining (example: NLP text mining)

    Machine translation

    Speech recognition

    Natural language generation

    Question answering

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 7.2 Spoken Dialogue System

    Speech Recognition

    Speech Synthesis

    Semantic Interpretation

    Response Generation

    Dialogue Management

    Discourse Interpretation

    U

    s

    e

    r

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 7.2 Spoken Dialogue System (Cont…)

    Signal Processing: Convert the audio wave into a sequence of feature vectors.

    Speech Synthesis: Generate synthetic speech using parsing technique to response.

    Semantic Interpretation: Determine the meaning and relationbetween the words.

    Discourse Interpretation: Understand what the user intends by interpreting utterances between sentences.

    Dialogue Management: Determine system goals in response to user utterances based on user intention (medium e.g., dialogue box).

    Response Generation: Predicting discourse knowledge and make relative response to request user.

    Speech Recognition: Decode the sequence of feature vectors into a sequence of words.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 7.2 Spoken Dialogue System (Cont…)

    “Levels of Sophistication in a Dialogue System”

    Touch-tone replacement:

    System Prompt: "For checking information, press or say one." Caller Response: "One."

    Directed dialogue:

    System Prompt: "Would you like checking account information or rate information?" Caller Response: "Checking", or "checking account," or "rates."

    Natural language:

    System Prompt: "What transaction would you like to perform?" Caller Response: "Transfer Rs. 500 from checking to savings.“

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 7.3 Language Technology in NLP

    Coreference resolution

    Word sense disambiguation (WSD)

    Parsing

    Machine translation (MT)

    Sentiment analysisBest roast chicken in San Francisco!

    The waiter ignored us for 20 minutes.

    Carter told Mubarak he shouldn’t run again.

    I need new batteries for my mouse.

    The 13th Shanghai International Film Festival…

    第13届上海国际电影节开幕…

    I can see Alcatraz from the window!

    Named entity recognition (NER)

    mostly solved

    Spam detectionLet’s go to Agra!

    Buy 2DF&EC …

    ✓✗

    Einstein met with UN officials in PrincetonPERSON ORG LOC

    Colorless green ideas sleep furiously.

    Part-of-speech (POS) tagging

    ADJ ADJ NOUN VERB ADV

    making good progress

    Information extraction (IE)You’re invited to our dinner party, Friday May 27 at 8:30

    PartyMay 27add

    Paraphrase

    Summarization

    Dialog

    still really hard

    The Dow Jones is up

    Housing prices rose

    Economy is good

    Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?

    XYZ acquired ABC yesterday

    ABC has been taken over by XYZ

    Castro Theatre at 7:30. Do you want a ticket?

    The S&P500 jumped

    Question answering (QA)

    Where is Citizen Kane playing in SF?

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 7.3 Language Technology in NLP (Cont…)

    Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥

    segmentation issues

    dark horseget cold feet

    lose facethrow in the towel

    neologismsunfriendRetweet

    bromance

    tricky entity namesworld knowledge

    Mary and Sue are sisters.

    Mary and Sue are mothers.

    But that’s what makes it fun!

    the New York-New Haven Railroad

    the New York-New Haven Railroad

    idiomsnon-standard English

    Where is A Bug’s Life playing …

    Let It Be was recorded …

    … a mutation on the for gene …

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • Neologism is a relatively recent or isolated term, word, or phrase that may be in the process

    of entering common use, but that has not yet been fully accepted into mainstream language.

    World Knowledge

    The non-linguistic information that helps a reader or listener interpret the meanings of

    words and sentences. Also called extra-linguistic knowledge.

    •not consisting of or related to language. Example: such as whistles, yells, laughs, and cries.

    Idiom

    An idiom is a phrase or an expression that has a figurative, or sometimes literal, meaning.

    Example (Once in a blue moon) means (Happens very rarely).

    Non standard English: not conforming in pronunciation, grammar, vocabulary, etc., to the

    usage characteristic of and considered acceptable by most educated native speakers

    Example :Great job @justinbieber! Were SOO PROUD of what you’ve accomplished! U taught us 2 #neversaynever & you yourself should never give up either

    7.3 Language Technology in NLP (Cont…)

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • System that uses speech and language recognition

    1)Amtrak-United Airlines (users interact with conversational agents)

    2)Car makers (automatic speech recognition & text to speech allowing drivers to control vehicle navigation by voice ) e.g.; Tesla car.

    3)Video search companies (search services with speech recognition)

    4)Google (provides cross language information/ translate query and find most relevant pages)

    5)Pearson and ETS (automated systems to analyze student’s essays)

    6)Interactive virtual agents(serves as tutors for children learning to read)

    7)Text analysis companies(automated measurements of user opinion)

    8. The State of Art

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 9. Some Brief History

    Foundational Insights: 1940s & 1950s

    The Two Camps: 1957-1970

    Four Paradigms: 1970-1983

    Empiricism & Finite State Models Redux: 1983-1993

    The Field Comes Together: 1994-1999

    The Rise of Machine Learning: 2000-2008

    On Multiple Discoveries

    A Final Brief Note on Psychology

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 9.1 Foundational Insights: 1940s & 1950s

    Two foundational paradigms

    •Automaton(1950)

    •Probabilistic Model (information-theoretic model)

    Automaton is based on Turing model (1936) of algorithmic computation

    Turing Model further included

    •McCulloch-Pitts neuron(1943)

    •Kleene(1951)

    •Finite Automata and Regular Expression (1956)

    Probabilistic Model was applied by Shannon(1948)

    •Based on Shannon’s work, Chomsky(1956) came up with idea of finite state

    machine.

    These model led to field of Formal Language Theory.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 9.2 The Two Camps: 1957-1970

    Speech and language processing split into two paradigm

    • Symbolic: used to describe actions that purposefully and discernibly convey a particular

    message or statement to those viewing it.

    • Stochastic: Having a random probability distribution or pattern that may be analyzed

    statistically but may not be predicted precisely.

    Transformations and Discourse Analysis Project(TDAP) parsing system.

    • Implemented between June 1958 to July 1959

    Stochastic main hold in

    • Statistics

    • Electrical Engineering

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 9.3 Four Paradigms: 1970-1983

    Stochastic (having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.)

    for development of speech recognition algorithm.

    Logic-based

    started by working on Q-systems and metamorphosis grammar.

    Natural language understanding

    began with Winograd’s SHRDLU system.

    Discourse Modeling

    focused on four key areas.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 9.4 Empiricism & Finite State Models Redux: 1983-

    1993

    Return of two classes of models that had lost popularity.

    • Finite State Model

    • Return of Empiricism

    • Rise of probabilistic model through speech and language processing

    Speech divided into

    parts of speech tagging

    parsing

    attachment ambiguities

    semantics

    This period saw considerable work on Natural Language Processing

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 9.5 The Field Comes Together: 1994-1999

    • Field undergoes major changes by last five years including First probabilistic and data models became standard

    Secondly, increased speed and memory of computer processing allowed commercial

    exploitation of subareas of speech and language processing.

    Subareas includes

    Speech recognition

    Spelling and grammar correction

    Commercial Exploitation

    Term that includes all activities used to benefit commercially from one’s property.

    Example Making property, selling it, offering it for sale, or licensing its appropriation or

    use

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 9.6 The Rise of Machine Learning: 2000-2008

    Three Synergistic trends

    Large amount of spoken and written material became widely available.

    The increased focus on learning led to a more serious interplay with statistical

    machine learning community.

    The widespread availability of high performance computing system facilitated the

    training and deployment of systems that could not have been imagined a decade earlier.

    Finally, largely unsupervised statistical approaches began to receive renewed attention.

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

  • 10. Explore: Topics based Research Areas

    (1) Named Entity Recognition :-

    Proposed architecture of Model

    LIST OF NAMED ENTITY TYPES WITH THE KINDS OF ENTITIES

    THEY BELONG TO

    @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)