Prosody and NLP

24
Prosody and NLP Seminar by Nikhil : 06005004 Adith : 06005005 Prachur : 06D05011 We have a presentation this Friday ? We have a presentation this Friday ? We have a presentation this Friday ?

description

Seminar by Nikhil: 06005004 Adith: 06005005 Prachur: 06D05011. Prosody and NLP. We have a presentation this Friday ? We have a presentation this Friday ? We have a presentation this Friday ?. Abstract. - PowerPoint PPT Presentation

Transcript of Prosody and NLP

Page 1: Prosody and NLP

Prosody and NLP

Seminar by Nikhil : 06005004Adith : 06005005

Prachur : 06D05011

We have a presentation this Friday ?We have a presentation this Friday ?We have a presentation this Friday ?

Page 2: Prosody and NLP

Abstract

Speech Processing and Natural Language Processing share a common area of study: Language. However, over time, they have grown to have little in common regarding theoretical models or methods of analysis. NLP takes written text as the starting point for it's analysis,  however, a lot of valuable information is lost in encoding speech as merely text. It is commonly accepted that intonational features of spoken language can greatly aid NLP tasks (like adjective scope resolution).

We explore the foundations of the study of Prosody and observe some approaches that use prosodic cues to aid NLP.

Page 3: Prosody and NLP

Motivation

• Language is not text driven but speech driven.

• NLP currently has written text as the starting point for it's analysis (primarily due to the abundance of such data).

• A lot of information is lost on ignoring spoken features and just looking at the text.

Page 4: Prosody and NLP

A Way Out ?

• Utilize spoken features for NLP tasks

• NLP needs all the help it can get.

Dealing at the pragmatics or discourse level is extremely untenable

Prosodic cues carry useful pragmatic information

Page 5: Prosody and NLP

What is Prosody, exactly?

• Comes from Poetry, prosody refers to the study of poetic meters[1] (rhythms)

• Written text treats words as the basic building blocks of language.

• Spoken language treats syllables as the basic building block.

• Syllables : Consisting of Onset, Nucleus and Coda

Page 6: Prosody and NLP

• Wikipedia has this to say :  Prosody is the rhythm, stress, and intonation of connected speech (as opposed to smaller elements like syllables or words). 

Prosody may reflect : features of the speaker, emotional state of a speaker, features of the utterance, ironic or sarcastic, emphasis, contrast, and focus

What is Prosody, exactly? (contd.)

Page 7: Prosody and NLP

Intonation ?• Conveys paralinguistic information, emphasis and

contrast.

• Intonation on a particular word could differentiate between sentence moods.

– You are finISHED (interrogative)– You are FINIshed (imperative)

Image courtesy Google Image Search

Page 8: Prosody and NLP

• Stress is applied on Content Words in spoken utterances.

    cOntent - Noun. "I really liked their presentation's content.“

contEnt - Verb. "I have done my best. I am content.“

• Stress on a pair of words distinguishes between the syntactic role played by each word in the pair.

• tight<pause>rope : A rope that is held taut.• tight-rope : A circus-act uses this contraption :)

[2]

And Stress!

Page 9: Prosody and NLP

Courtesy

tom The Dancing Bug

Page 10: Prosody and NLP

Prosodic cues

 Prosodic functions important for linguistics are – Marking of boundaries (syntactic, semantic or

dialogue units.)

– Relative duration of phonetic segments

– At syllable level : Energy, intensity, duration and intonation of syllable.

We shall see two approaches of using these features in tasks central to NLP.

Page 11: Prosody and NLP

• Cue Used : Relative duration of phonetic segments

• Aim : To improve the parsing of ambiguous

sentences.

• Method : Augmenting the syntax grammar with a few

non-terminals and rules.

Prosody-Augmented Syntax Grammars[3]

Page 12: Prosody and NLP

• Concept of “Word Break Indices” used to show prosodic decoupling between neighboring words.

• E.g. - • Andrea 1 moved 1 the 0 bottle 3 under 0 the 0

bridge.

• Andrea 1 moved 3 the 0 bottle 1 under 0 the 0 bridge.

Break indices were generated by analysing the coda that have a pause.Coda is the final sounded consonant of a word E.g. – cup , milk

Page 13: Prosody and NLP

Grammar Modification

• Original grammar rules like S -> NP VP etc. are changed to S -> NP link1 VP.

• “Link” non-terminals are used for the word-break indices.

• The Parse Trees for the aforementioned example become…

Page 14: Prosody and NLP

Example 1

S

• Andrea 1 moved 1 the 0 bottle 3 under 0 the 0 bridge.

PP Vmoved NPN

Punder

NP

Nbottle

VP

Dthe

N bridge

NP

Dthe

Andrea

Link1

1

Link2

1 Link4

3

Link3

0

Link5

0

Link6

0

Page 15: Prosody and NLP

Example 2• Andrea 1 moved 3 the 0 bottle 1 under 0 the 0 bridge.

S

PP

Vmoved

NPN

Punder

NP

Nbottle

VP

Dthe

N bridge

NP

Dthe

Andrea

Link1

1

Link2

3

Link4

1

Link3

0

Link5

0

Link6

0

Page 16: Prosody and NLP

Results

• The incorporation of prosody resulted in a reduction of about 25% in the number of parses found .

• Parse times increase about 37%.

• Extremely common cases of syntactic ambiguity can be resolved with prosodic information, and that grammars can be modified to take advantage of prosodic information for improved parsing

Page 17: Prosody and NLP

Using Prosodic Features in Language Models[4]

• The outlined approach uses syllable-based prosodic cues, namely– Duration of the syllable– Average energy (intensity)– The average F0 (fundamental frequency of the

syllable) contour– The slope of the F0 contour (visualised as

intonation-rising or falling/flat)

Page 18: Prosody and NLP

Recognition of Prosodic Features

Courtesy [4]

Page 19: Prosody and NLP

Prosody in Language Model

• We want to measure P(wn | wn-1,wn-2,…,F)

• Naively modelled by linear interpolation : – Assumption : prosody features independent of

previous words (not true!!). P(wn | wn-1,wn-2,…,F) = αP(wn | wn-1,wn-2,…) + (1- α)P(wn|F)

• We want something better

Page 20: Prosody and NLP

Factored Language Model

• Instead of a word W we will deal with a set of word-factors F={f1,f2..fk} (Factors may include the word itself)

Page 21: Prosody and NLP

• Here, F is chosen as {W, prosodic features}

• The four prosodic features are encoded as binary numbers(s0 to s15).

• These numbers are assigned to each syllable of the word .

• For e.g. the prosodic representation for the word “Actually” can be either ‘s10s12s6’ or ‘s10s15s6’ .

Page 22: Prosody and NLP

Conclusion

• Prosodic Cues can play an important role as a heuristic for many NLP tasks.

• All is not one way traffic though. POS tagging (since its relatively accurate) is used to aid speech synthesis tasks which conventionally used only prosodic cues[5]

Page 23: Prosody and NLP

Future Work

• Handling prosodic information is a first step towards integration of Speech Processing and NLP

Courtesy ZITS

Page 24: Prosody and NLP

References1. Fromkin, Rodman and Hyams, An Introduction to

Language, 7th Ed, Thomson and Wadsworth,20022. En.wikipedia.org : Prosody, Austrian School of

Thought, Intonation3. John Bear and Patti Price (1990), “Prosody,

Syntax and Parsing”, in proceedings of the 28th annual meeting of the ACL

4. Songfang Huang and Steve Renals (2007), “Using Prosodic Features in Language Models for Meetings”, IRTG annual meeting

5. http://speech.iiit.net/~raghavendra/Webpage/ppprts.pdf