Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah...

271
Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 {steveng, hmcarvey, leahh, shawnc}@icsi.berkeley.edu
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah...

Page 1: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Beyond the PhonemeA Juncture-Accent Model of Spoken Language

Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu ChangInternational Computer Science Institute1947 Center Street, Berkeley, CA 94704

{steveng, hmcarvey, leahh, shawnc}@icsi.berkeley.edu

Page 2: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Acknowledgements and Thanks

Research FundingU.S. Department of DefenseU.S. National Science Foundation

Page 3: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

For Further Information

Consult the web site:

www.icsi.berkeley.edu/~steveng

Page 4: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

OVERTURE

The Central Challenge for Models of Speech Recognition

Page 5: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Serial Frame Perspective on SpeechTraditional models of speech recognition assume the identity of a phonetic segment is derived from

a detailed spectral profile of the acoustic signal computed for each time interval (frame) of speech

Page 6: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere

sequences of phonetic segments (“phones”) ….

Page 7: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere

sequences of phonetic segments (“phones”) ….

Strung together like “beads on a string”

Page 8: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonemic Beads on a String Illustrated In traditional models of speech recognition words are conceptualized as

mere sequences of phonetic segments (“phones”) ….

Strung together like “beads on a string”

No quarter is provided for stress accent or other syllabic properties

Page 9: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Language - The Traditional PerspectiveThe “classical” view of spoken language posits a quasi-arbitrary relation between

the lower and higher tiers of linguistic organization

Cat= [k] + [ae] + [t]

Cat = /k/ + /ae/ + /t/

ASR systems focus on decoding words from sequences of phones

Page 10: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

A Challenge for the “Phonemic Beads on a String” Approach to Speech Recognition

Pronunciation Variability

Page 11: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse

Page 12: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced

Page 13: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced

(as the following two slides illustrate for the word “AND” based on manual phonetic annotation of a corpus comprising telephone dialogues)

Page 14: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

How Many Pronunciations of “and”?

82 ae n63 eh n45 ix n35 ax n34 en30 n20 ae n dcl d17 ih n17 q ae n11 ae n d

7 q eh n7 ae nx6 ae ae n6 ah n5 eh nx4 uh n4 ix nx4 q ae n dcl d3 eh n d3 q ae nx

3 eh2 ae n dcl2 ae2 ax m2 ax n d2 ae eh n dcl d2 eh n dcl d2 ax nx2 q ae ae n2 q ix n2 ix n dcl d2 ih 2 eh eh n2 q eh nx2 ix d n1 eh m1 ax n dcl d1 aw n1 ae q1 eh dcl

N Pronunciation N Pronunciation

Canonical pronunciation

Page 15: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

How Many Pronunciations of “and”?

1 ah nx1 ae n t1 eh d1 ah n dcl d1 ey ih n dcl1 ae ix n1 ae nx ax1 ax ng1 ay n1 ih ah n d1 ae hh1 ih ng1 ix1 ae n d dcl1 ix dcl d1 ae eh n1 hh n1 ix n t1 ae ax n dcl d1 iy eh n

1 m1 ae ae n d1 nx1 q ae ae n1 q ae ae n dcl d1 q ae eh n dcl d1 q ae ih n1 aa n1 q ae n d1 ? nx1 q ae n q1 eh n m1 q eh en dcl1 eh ng1 q eh n q1 em1 q eh ow m1 q ih n1 q ix en1 er

N Pronunciation N Pronunciation

Page 16: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Pronunciation Variability of Real SpeechThe are literally dozens of ways in which common words are pronounced

And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)

Page 17: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Pronunciation Variability of Real SpeechThe are literally dozens of ways in which common words are pronounced

And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)

(which together account for 35% of the word tokens in the corpus)

Page 18: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

1   I 6 4 9   5 3   5 3   a y

2   a n d 5 2 1   8 7   1 6   a e n

3   th e 4 7 5    7 6   2 7   d h a x

4   y o u 4 0 6   6 8   2 0   y ix

5   th a t 3 2 8   1 1 7   1 1   d h a e

6   a 3 1 9   2 8   6 4   a x

7   to 2 8 8   6 6   1 4   tc l t u w

8   k n o w 2 4 9   3 4   5 6   n o w

9   o f 2 4 2   4 4   2 1   a x v

1 0   it 2 4 0   4 9   2 2   ih

1 1   y e a h 2 0 3   4 8   4 3   y a e

1 2   in 1 7 8   2 2   4 5   ih n

1 3   th e y 1 5 2   2 8   6 0   d h e y

1 4   d o 1 3 1   3 0   5 4   d c l d u w

1 5   s o 1 3 0   1 4   7 4   s o w

1 6   b u t 1 2 3   4 5   1 2   b c l b a h tc l t

1 7   is 1 2 0   2 4   5 0   ih z

1 8   lik e 1 1 9   1 9   4 6   l a y k c l k

1 9   h a v e 1 1 6   2 2   5 4   h h a e v

2 0   w a s 1 1 1   2 4   2 3   w a h z

2 1   w e 1 0 8   1 3   8 3   w iy

2 2   it's 1 0 1   1 4   2 0   ih tc l s

2 3   ju s t 1 0 1   3 4   1 7   jh ix s

2 4   o n 9 8   1 8   4 9   a a n

2 5   o r 9 4   2 3   3 6   e r

2 6   n o t 9 2   2 4   2 4   m a a q

2 7   th in k 9 2   2 3   3 2   th ih n g k c l k

2 8   fo r 8 7   1 9   4 6   f e r

2 9   w e ll 8 4   4 9   2 3   w e h l

3 0   w h a t 8 2   4 0   1 4   w a h d x

3 1   a b o u t 7 7   4 6   1 2   a x b c l b a w

3 2   a ll 7 4   2 7   2 4   a o l

3 3   th a t's 7 4   1 9   1 6   d h e h s

3 4   o h 7 4   1 7   6 1   o w

3 5   re a lly 7 1   2 5   4 5   r ih l iy

3 6   o n e 6 9   8   7 8   w a h n

3 7   a re 6 8   1 9   4 2   e r

3 8   I'm 6 7 9   2 6   q a a m

3 9   rig h t 6 1   2 1   2 8   r a y

4 0   u h 6 0   1 6   4 1   a h

4 1   th e m 6 0   1 8   2 3   a x m

4 2   a t 5 9   3 6   8   a e d x

4 3   th e re 5 8   2 8   2 2   d h e h r

4 4   my 5 8   9   6 6   m a y

4 5   me a n 5 6   1 0   5 8   m iy n

4 6   d o n 't 5 6   2 1   1 4   d x o w

4 7   n o 5 5   8   7 7   n o w

4 8   w ith 5 5   2 0   3 5   w ih th

4 9   if 5 5   1 8   4 1   ih f

5 0   w h e n 5 4   1 8   3 1   w e h n

5 1   c a n 5 4   2 8   1 5   k c l k a e n

5 2   th e n 5 1   1 9   3 8   d h e h n

5 3   b e 5 0   1 1   7 6   b c l b iy

5 4   a s 4 9   1 6   1 8   a e z

5 5   o u t 4 7   1 9   2 2   a e d x

5 6   k in d 4 7   1 7   2 1   k c l k a x n x

5 7   b e c a u e 4 6   3 1   1 5   k c l k a x z

5 8   p e o p le 4 5   2 1   4 4  p c l p iy p c l l e l

5 9   g o 4 5   5   8 3   g c l g o w

6 0   g o t 4 5   3 2   1 5   g c l g a a

6 1   th is 4 4   1 1   4 7   d h ih s

6 2   s o me 4 3   4   4 8   s a h m

6 3   w o u ld 4 1   1 6   2 9   w ih d c l

6 4   th in g s 4 1   1 5   5 2   th ih n g z

6 5   n o w 3 9   1 1   6 9   n a w

6 6   lo t 3 9   9   4 7   l a a d x

6 7   h a d 3 9   1 9   2 4   h h a e d c l

6 8   h o w 3 9   1 1   5 3   h h a w

6 9   g o o d 3 8   1 3   2 7   g c l g u h d c l

7 0   g e t 3 8   2 0   1 3   g c l g e h d x

7 1   s e e 3 7   6   8 0   s iy

7 2   fro m 3 6   1 0   2 8   f r a h m

7 3   h e 3 6   7   3 9   iy

7 4   me 3 5   5   8 7   m iy

7 5   d o n 't 3 5   2 1   1 4   d x o w

7 6   th e ir 3 3   1 9   2 5   d h e h r

7 7   mo re 3 2   1 1   5 6   m a o r

7 8   it's 3 1   1 4   2 0   ih tc l s

7 9   th a t's 3 1   2 0   1 6   d h e h s

8 0   to o 3 1   6   6 0   tc l t u w

8 1   o k a y 3 1   1 7   4 5   o w k c l k e y

8 2   v e ry 3 0   1 1   3 6   v e h r iy

8 3   u p 3 0   1 1   3 4   a h p c l p

8 4   b e e n 3 0   1 1   5 1   b c l b ih n

8 5   g u e s s 2 9   8   4 2   g c l g e h s

8 6   time 2 9   8   6 2   tc l t a y m

8 7   g o in g 2 9   2 1   1 3   g c l g o w ih n g

8 8   in to 2 8   2 0   1 4   ih n tc l t u w

8 9   th o s e 2 7   1 2   4 2   d h o w z

9 0   h e re 2 7   1 1   2 5   h h iy e r

9 1   d id 2 7   1 3   2 3   d c l d ih d x

9 2   w o rk 2 5   8   6 6   w e r k c l k

9 3   o th e r 2 5   1 4   2 6   a h d h e r

9 4   a n 2 5   1 2   2 8   a x n

9 5   I'v e 2 5   7   4 6   a y v

9 6   th in g 2 4   9   5 2   th ih n g

9 7   e v e n 2 4   7   4 0   iy v ix n

9 8   o u r 2 3   9   3 3   a a r

9 9   a n y 2 3   1 1   2 3   ix n iy

1 0 0   w e 're 2 3   8   2 5   w e y r

How Many Different Pronunciations?

1  I 649  53  53  ay2  and 521  87  16  ae n3  the 475   76  27  dh ax4  you 406  68  20  y ix5  that 328  117  11  dh ae6  a 319  28  64  ax7  to 288  66  14  tcl t uw8  know 249  34  56  n ow9  of 242  44  21  ax v

10  it 240  49  22  ih11  yeah 203  48  43  y ae12  in 178  22  45  ih n13  they 152  28  60  dh ey14  do 131  30  54  dcl d uw15  so 130  14  74  s ow16  but 123  45  12  bcl b ah tcl t17  is 120  24  50  ih z18  like 119  19  46  l ay kcl k19  have 116  22  54  hh ae v20  was 111  24  23  w ah z

Rank Word N #PronMost CommonPronunciation

MCP%Total

The 20 most frequency words account for 35% of the tokens

Page 19: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

QUESTION

How do listeners decode the speech signal given the large amount of

pronunciation variation?

Page 20: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART ONE

Anatomy of a Syllable

Page 21: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Language - A Syllable-Centric PerspectiveA more empirically grounded perspective of spoken language focuses on the

SYLLABLE as the interface between “sound” and “meaning”

Within this framework the relationship between the syllable and the higher and lower tiers is non-arbitrary and systematic statistically

Page 22: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

Page 23: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position

Page 24: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

Page 25: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

Page 26: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is an onset?

Page 27: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is a onset? What is a nucleus?

Page 28: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is a onset? What is a nucleus? What is a coda?

Page 29: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is an onset? What is a nucleus? What is a coda?

The following slides provide a brief (and gentle) introduction to syllable structure

Page 30: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

“J” = JUNCTUREOGI Numbers95 corpus

Page 31: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

“J” = JUNCTUREOGI Numbers95 corpus

Page 32: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

“J” = JUNCTUREOGI Numbers95 corpus

Page 33: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

Many syllables contain a CODA (also typically a CONSONANT)

“J” = JUNCTUREOGI Numbers95 corpus

Page 34: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

Many syllables contain a CODA (also typically a CONSONANT)

The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)

“J” = JUNCTUREOGI Numbers95 corpus

Page 35: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

Many syllables contain a CODA (also typically a CONSONANT)

The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)

Followed in popularity by Onset + Nucleus (“Two”)

“J” = JUNCTUREOGI Numbers95 corpus

Page 36: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

Many syllables contain a CODA (also typically a CONSONANT)

The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)

Followed in popularity by Onset + Nucleus (“Two”)

Onset segments often differ in significant ways from coda segments

“J” = JUNCTUREOGI Numbers95 corpus

Page 37: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART TWO

Spectro-Temporal Profiles

Page 38: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Spectro-Temporal Profile (STeP)Certain specific (and important) properties of the syllable are not well

represented in terms of the traditional 2.5-D spectrographic representation

Page 39: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Spectro-Temporal Profile (STeP)Certain specific (and important) properties of the syllable are not well

represented in terms of the traditional 2.5-D spectrographic representation

STRESS ACCENT and JUNCTURE are two such properties

Page 40: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Spectro-Temporal Profile (STeP)Certain specific (and important) properties of the syllable are not well

represented in terms of the traditional 2.5-D spectrographic representation

Stress Accent and Juncture are two such properties

A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

Page 41: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Spectro-Temporal Profile (STeP)Certain specific (and important) properties of the syllable are not well

represented in terms of the traditional 2.5-D spectrographic representation

Stress Accent and Juncture are two such properties

A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

As shown in “miniature” below …..

Page 42: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Spectro-Temporal Profile (STeP)Certain specific (and important) properties of the syllable are not well

represented in terms of the traditional 2.5-D spectrographic representation

Stress Accent and Juncture are two such properties

A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

As shown in “miniature” below …..

STePs are derived from averages of hundreds of individual instances

Page 43: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Spectro-Temporal Profile (STeP)Certain specific (and important) properties of the syllable are not well represented in terms of

the traditional 2.5-D spectrographic representation

Stress Accent and Juncture are two such properties

A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

As shown in “miniature” below ….

(and as shown in expanded form on the following slides)

STePs are derived from averages of hundreds of individual instances

Page 44: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Spectro-Temporal Profile - DiSyllabic Word

[s]

[eh]

[vx]

[en]

juncture accented syllable

unaccented syllable

“Seven”

mean duration

Full-spectrumperspective

OGI Numbers95

[s] [eh] [vx] [en]

Page 45: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

[s]

[eh]

[vx][en]

juncture accented syllable

unaccented syllable

mean duration

“Seven” High-frequency

perspective

OGI Numbers95

[s] [eh] [vx] [en]

Spectro-Temporal Profile - DiSyllabic Word

Page 46: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART THREE

Scientific Approach to Speech Recognition

Page 47: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….

A Scientific Approach to Speech Recognition

Page 48: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification

A Scientific Approach to Speech Recognition

Page 49: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation

A Scientific Approach to Speech Recognition

Page 50: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation(3) stress accent, and

A Scientific Approach to Speech Recognition

Page 51: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation(3) stress accent, and (4) syllable position

A Scientific Approach to Speech Recognition

Page 52: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation(3) stress accent, and (4) syllable position to ASR performance

A Scientific Approach to Speech Recognition

Page 53: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation(3) stress accent, and (4) syllable position to ASR performance

Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus

A Scientific Approach to Speech Recognition

Page 54: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation(3) stress accent, and (4) syllable position to ASR performance

Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus

And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length

A Scientific Approach to Speech Recognition

Page 55: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation(3) stress accent, and (4) syllable position to ASR performance

Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus

And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length

That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level

A Scientific Approach to Speech Recognition

Page 56: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Ascertain the contribution of ….(1) phonetic segment (and feature) classification(2) phonetic segmentation(3) stress accent, and (4) syllable position to ASR performance

Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus

And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length

That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level

We will be comparing the “baseline” system (entirely automatic recognition) with an entirely “fabricated” set of input data (derived from hand-labeled phonetic annotation + autoSAL) as well as a “half-way house” system that is partially automatic and partially not (manually derived phonetic segmentation, as well as whether each segment is vocalic or not)

A Scientific Approach to Speech Recognition

Page 57: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Entirely Stress-Accent Dependent ResultsWord Error Rate

Fabricated 1.3%Half-way House 2.0% Baseline 5.6%

Numbers95 Recognition – Stress Accent Impact

Page 58: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Entirely Stress-Accent Dependent ResultsWord Error Rate

Fabricated 1.3%Half-way House 2.0% Baseline 5.6%

The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that ….

Numbers95 Recognition – Stress Accent Impact

Page 59: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Entirely Stress-Accent Dependent ResultsWord Error Rate

Fabricated 1.3%Half-way House 2.0% Baseline 5.6%

The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that ….

Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus

Numbers95 Recognition – Stress Accent Impact

Page 60: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Entirely Stress-Accent Dependent ResultsWord Error Rate

Fabricated 1.3%Half-way House 2.0% Baseline 5.6%

The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that ….

Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus

Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20%

Numbers95 Recognition – Stress Accent Impact

Page 61: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Entirely Stress-Accent Dependent ResultsWord Error Rate

Fabricated 1.3%Half-way House 2.0% Baseline 5.6%

The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that ….

Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus

Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20%

Also important for coda – WER increases by 7-15%

Numbers95 Recognition – Stress Accent Impact

Page 62: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together)

“Standard” refers to regular recognition system

Word Error Rate Standard Onset Nucleus Coda All

Fabricated 1.29 1.33 1.61 1.63 1.76%Half-way House 1.97 2.16 2.21 2.55 2.81%Baseline 5.59 5.91 5.91 6.70 7.03%

Numbers95 Recognition – Pronunciation Impact

Page 63: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together)

“Standard” refers to regular recognition system

Word Error Rate Standard Onset Nucleus Coda All

Fabricated 1.29 1.33 1.61 1.63 1.76%Half-way House 1.97 2.16 2.21 2.55 2.81%Baseline 5.59 5.91 5.91 6.70 7.03%

Conclusions:

Onset segments are most canonical

Numbers95 Recognition – Pronunciation Impact

Page 64: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together)

“Standard” refers to regular recognition system

Word Error Rate Standard Onset Nucleus Coda All

Fabricated 1.29 1.33 1.61 1.63 1.76%Half-way House 1.97 2.16 2.21 2.55 2.81%Baseline 5.59 5.91 5.91 6.70 7.03%

Conclusions:

Onset segments are most canonical

Coda segments are least canonical

Numbers95 Recognition – Pronunciation Impact

Page 65: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together)

“Standard” refers to regular recognition system

Word Error Rate Standard Onset Nucleus Coda All

Fabricated 1.29 1.33 1.61 1.63 1.76%Half-way House 1.97 2.16 2.21 2.55 2.81%Baseline 5.59 5.91 5.91 6.70 7.03%

Conclusions:

Onset segments are most canonical

Coda segments are least canonical

Therefore, it is important to provide for pronunciation variation in ASR system

Numbers95 Recognition – Pronunciation Impact

Page 66: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately)

“Standard” refers to the regular recognition system

Word Error Rate Standard Onset Nucleus Coda

Fabricated 1.29 9.70 5.95 3.92%Half-way House 1.97 11.27 13.28 6.60%Baseline 5.59 15.70 20.22 10.13%

Numbers95 – Syllable Position Importance

Page 67: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately)

“Standard” refers to the regular recognition system

Word Error Rate Standard Onset Nucleus Coda

Fabricated 1.29 9.70 5.95 3.92%Half-way House 1.97 11.27 13.28 6.60%Baseline 5.59 15.70 20.22 10.13%

Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas

Numbers95 – Syllable Position Importance

Page 68: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately)

“Standard” refers to the regular recognition system

Word Error Rate Standard Onset Nucleus Coda

Fabricated 1.29 9.70 5.95 3.92%Half-way House 1.97 11.27 13.28 6.60%Baseline 5.59 15.70 20.22 10.13%

Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas

Conclusion:Onsets and nuclei are most important for lexical access in an ASR system

(at least for the Numbers95 corpus)

Numbers95 – Syllable Position Importance

Page 69: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART FOUR

Being Phonetically and Prosodically Annotated

Page 70: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)    

Page 71: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually    

Page 72: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually         4 hours labeled at the phone level and segmented at the syllabic level

Page 73: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level

Page 74: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material segmented at the phonetic-segment level using automatic

methods

Page 75: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material segmented at the phonetic-segment level using automatic

methods45 minutes of hand-labeled stress-accent material

Page 76: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material segmented at the phonetic-segment level using automatic

methods45 minutes of hand-labeled stress-accent materialAn additional four hours of stress-accent material automatically labeled (though

unused in the current analysis)  

Page 77: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material segmented at the phonetic-segment level using automatic

methods45 minutes of hand-labeled stress-accent materialAn additional four hours of stress-accent material automatically labeled (though

unused in the current analysis)  

There is a Lot of Diversity in the Material Transcribed

Page 78: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material segmented at the phonetic-segment level using automatic

methods45 minutes of hand-labeled stress-accent materialAn additional four hours of stress-accent material automatically labeled (though

unused in the current analysis) 

There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American

dialectal variation, speaking rate and voice quality

Page 79: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically transcribed (labeled and segmented)

Most of this material has been annotated manually     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material segmented at the phonetic-segment level using automatic

methods45 minutes of hand-labeled stress-accent materialAn additional four hours of stress-accent material automatically labeled (though

unused in the current analysis) 

There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American

dialectal variation, speaking rate and voice quality

Transcription SystemA variant of Arpabet, with phonetic diacritics such as:_gl,_cr, _fr, _n, _vl, _vd

Page 80: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….

Page 81: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….

http://www.icsi/berkeley.edu/real/stp

Page 82: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Page 83: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Page 84: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy

Page 85: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light

Page 86: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

Page 87: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

Page 88: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

Page 89: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

Page 90: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)

Page 91: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))

Page 92: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The data are available at ….

Annotation of Stress Accent

Page 93: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The data are available at ….

http://www.icsi/berkeley.edu/~steveng/prosody

Annotation of Stress Accent

Page 94: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Automatic Labeling of Stress AccentThis forty-five minutes of hand-labeled phonetic and prosodic annotation

from the Switchboard corpus was used as training data for development of an Automatic Stress Accent Labeling System (AutoSAL)

Page 95: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

How Good is AutoSAL?There is an 79% concordance between human and machine accent labels

when the tolerance level is a quarter-step

Page 96: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

How Good is AutoSAL?There is an 79% concordance between human and machine accent labels

when the tolerance level is a quarter-step

There is 97.5% concordance when the tolerance level is half a step

Page 97: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

How Good is AutoSAL?There is an 79% concordance between human and machine accent labels

when the tolerance level is a quarter-step

There is 97.5% concordance when the tolerance level is half a step

This degree of concordance is as high as that exhibited by two highly trained (human) transcribers

Page 98: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART FIVE

Stress Accent and Syllable Position

Page 99: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of Syllable StructureBefore going into the details of durational variation at the segmental level

we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent

Page 100: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Importance of Syllable StructureBefore going into the details of durational variation at the segmental level

we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent

These data serve to illustrate the sort of variation observed that is conditioned by position within the syllable

Page 101: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

All Segments

Pronunciation Variation – Syllable and Accent

Deletions

InsertionsSubstitutions

Pronunciation variation is systematic at the level of the syllable

CODATerritory

ONSETTerritory

NUCLEUSTerritory

Page 102: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

All Segments

Pronunciation Variation – Syllable and Accent

Deletions

InsertionsSubstitutions

Pronunciation variation is systematic at the level of the syllable

Particularly when stress accent is also taken into account

CODATerritory

ONSETTerritory

NUCLEUSTerritory

Page 103: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Pronunciation Variation – Syllable and Accent Pronunciation variation is systematic at the level of the syllable

Particularly when stress accent is also taken into account

BOTH syllable structure and accent level are required for a full accounting

All Segments Deletions

InsertionsSubstitutions

CODATerritory

ONSETTerritory

NUCLEUSTerritory

Page 104: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART SIX

Durational Properties of

Pronunciation Variation

Page 105: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

Page 106: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

Page 107: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level

Page 108: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level

Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none

Page 109: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

Page 110: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

The CV and CVC forms cover ca. 60% of the syllables

V = VowelC = Consonant

Page 111: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

The CV and CVC forms cover ca. 60% of the syllables

Together, the V, VC, CV and CVC forms account for 85% of syllables

V = VowelC = Consonant

Page 112: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

The CV and CVC forms cover ca. 60% of the syllables

Together, the V, VC, CV and CVC forms account for 85% of syllables

The CVCC and CCVC (complex syllable) forms account for another 10%

V = VowelC = Consonant

Page 113: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsIt is unsurprising that syllable duration is largely a function of the number of

segments within the syllable (as shown in the graph below)

Canonical Syllable Forms

V = VowelC = Consonant

Page 114: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsIt is unsurprising that syllable duration is largely a function of the number of

segments within the syllable (as shown in the graph below)

Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY”

Canonical Syllable Forms

V = VowelC = Consonant

Page 115: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsIt is unsurprising that syllable duration is largely a function of the number of

segments within the syllable (as shown in the graph below)

Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY”

This pattern is representative of accent’s impact on duration

Canonical Syllable Forms

V = VowelC = Consonant

Page 116: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Across Syllable FormsIt is unsurprising that syllable duration is largely a function of the number of

segments within the syllable (as shown in the graph below)

Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY”

This pattern is representative of accent’s impact on duration (as we’ll see)

Canonical Syllable Forms

V = VowelC = Consonant

Page 117: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”)

V = VowelC = Consonant

Page 118: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”)

The heavily accented syllables are generally 60-100% longer than their unaccented counterparts

V = VowelC = Consonant

Page 119: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”)

The heavily accented syllables are generally 60-100% longer than their unaccented counterparts

The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV)

V = VowelC = Consonant

Page 120: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”)

The heavily accented syllables are generally 60-100% longer than their unaccented counterparts

The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV)

This pattern implies that accent has the greatest impact on vocalic duration

V = VowelC = Consonant

Page 121: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most

profound impact on vocalic duration) is confirmed in the graph below

Page 122: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most

profound impact on vocalic duration) is confirmed in the graph below

Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts

Page 123: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most

profound impact on vocalic duration) is confirmed in the graph below

Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts

This pattern implies that the syllable nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)

Page 124: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART SEVEN

Stress Accent and the Vocalic Nucleus

Page 125: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration

Stress Accent’s Impact on the Vocalic Nucleus

Page 126: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration

As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form

Stress Accent’s Impact on the Vocalic Nucleus

Page 127: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration

As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form

We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact

Stress Accent’s Impact on the Vocalic Nucleus

Page 128: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Spatial Patterning

of Duration in Vocalic Nuclei

Page 129: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

A Brief Primer on Vocalic Acoustics

Page 130: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

A Brief Primer on Vocalic Acoustics

Page 131: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

• The height parameter is closely linked to the frequency of F1

A Brief Primer on Vocalic Acoustics

Page 132: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

• The height parameter is closely linked to the frequency of F1

In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:

A Brief Primer on Vocalic Acoustics

Page 133: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

• The height parameter is closely linked to the frequency of F1

In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:

A Brief Primer on Vocalic Acoustics

Page 134: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position

Spatial Patterning of Duration et al.

Page 135: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow)

Spatial Patterning of Duration et al.

Page 136: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow)

The y-axis serves as the dependent measure, expressed in terms of either duration or the proportion of fully stressed (or unstressed) nuclei

Spatial Patterning of Duration et al.

Page 137: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to

duration

Page 138: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to

duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Page 139: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Duration and Vowel Height

All nuclei Diphthongs Monophthongs

The spatial patterning of vocalic segments is systematic with respect to duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Page 140: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Duration and Vowel Height

All nuclei Diphthongs Monophthongs

The spatial patterning of vocalic segments is systematic with respect to duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Thus, duration appears to be highly correlated with vowel height

Page 141: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Duration and Vowel Height

All nuclei Diphthongs Monophthongs

The spatial patterning of vocalic segments is systematic with respect to duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Thus, duration appears to be highly correlated with vowel height

But … the situation is a little more complicated than first appearances would suggest

Page 142: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented

vocalic nuclei

Canonical Syllable Forms

Page 143: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented vocalic nuclei

Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs

Canonical Syllable Forms

Page 144: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented vocalic nuclei

Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs

Canonical Syllable Forms

Lax monophthongs

Page 145: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed

Page 146: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed

The low vowels, be they monophthongs or diphthongs, are rarely unstressed

Page 147: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed

The low vowels, be they monophthongs or diphthongs, are rarely unstressed

The high diphthongs and high/mid, tense monophthongs occupy an intermediate position

Page 148: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The high vowels are rarely fully stressed

Vocalic Identity Among Fully Stressed Nuclei

Page 149: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The high vowels are rarely fully stressed

The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed

Vocalic Identity Among Fully Stressed Nuclei

Page 150: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The high vowels are rarely fully stressed

The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed

An intermediate degree of stress accounts for the other vocalic instances

Vocalic Identity Among Fully Stressed Nuclei

Page 151: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The high vowels are rarely fully stressed

The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed

An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here)

Vocalic Identity Among Fully Stressed Nuclei

Page 152: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The vowels of heavily accented syllables are (mostly) pronounced canonically

Canonical Pronunciations Non-Canonical Pronunciations

Vocalic Variation – Importance of Stress Accent

Page 153: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The vowels of heavily accented syllables are (mostly) pronounced canonically

Low vowels are largely the province of accented syllables

Canonical Pronunciations Non-Canonical Pronunciations

Vocalic Variation – Importance of Stress Accent

Page 154: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The vowels of heavily accented syllables are (mostly) pronounced canonically

Low vowels are largely the province of accented syllables, and

High vowels the province of unaccented syllables

Vocalic Variation – Importance of Stress Accent

Canonical Pronunciations Non-Canonical Pronunciations

Page 155: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The vowels of heavily accented syllables are (mostly) pronounced canonically

Low vowels are largely the province of accented syllables, and

High vowels the province of unaccented syllables

Moreover, there’s a lexical bias towards high vowels for unaccented forms

Canonical Pronunciations Non-Canonical Pronunciations

Vocalic Variation – Importance of Stress Accent

Page 156: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The vowels of heavily accented syllables are (mostly) pronounced canonically

Low vowels are largely the province of accented syllables, and

High vowels the province of unaccented syllables

Moreover, there’s a lexical bias towards high vowels for unaccented forms

That’s reinforced in patterns of deviation from canonical pronunciation

Canonical Pronunciations Non-Canonical Pronunciations

Vocalic Variation – Importance of Stress Accent

Page 157: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Height Deviation from Canonical

Amount of Change Direction of Change

Vowels are more likely to RISE in height than to descend when unaccented

Page 158: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Height Deviation from Canonical

Amount of Change Direction of Change

Vowels are more likely to RISE in height than to descend when unaccented

Vocalic lowering of height is rare

Page 159: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Height Deviation from Canonical

Amount of Change Direction of Change

Vowels are more likely to RISE in height than to descend when unaccented

Vocalic lowering of height is rare

Most deviations from the canonical maintain vowel height

Page 160: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Height Deviation from Canonical

Amount of Change Direction of Change

Vowels are more likely to RISE in height than to descend when unaccented

Vocalic lowering of height is rare

Most deviations from the canonical maintain vowel height

More than a single height step deviation is uncommon

Page 161: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Vocalic Height Deviation from Canonical

Amount of Change Direction of Change

Vowels are more likely to RISE in height than to descend when unaccented

Vocalic lowering of height is rare

Most deviations from the canonical maintain vowel height

More than a single height step deviation is uncommon

Virtually all 2-step height deviations occur in unaccented syllables

Page 162: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

The Vowel Space Under (Full) Stress (Accent) In unaccented nuclei there is a relatively even distribution of segments

across the vowel space, with a slight bias towards the front and central vowels

Canonical Vowels Only

Page 163: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space

The Vowel Space Without (Stress) Accent

Canonical Vowels Only

Page 164: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space

The low and mid vowels “get creamed”

The Vowel Space Without (Stress) Accent

Canonical Vowels Only

Page 165: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress accent exerts a profound effect on the character of the vowel space

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 166: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress accent exerts a profound effect on the character of the vowel space

High vowels are largely associated with unaccented syllables

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 167: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress accent exerts a profound effect on the character of the vowel space

High vowels are largely associated with unaccented syllables

Low vowels are mostly associated with accented forms

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 168: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress accent exerts a profound effect on the character of the vowel space

High vowels are largely associated with unaccented syllables

Low vowels are mostly associated with accented forms

This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 169: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

Is It Stress? Vocalic Identity? Or What?

Page 170: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)

Is It Stress? Vocalic Identity? Or What?

Page 171: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 172: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low vowels tend to be much longer in duration than high vowels

Is It Stress? Vocalic Identity? Or What?

Page 173: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low vowels tend to be much longer in duration than high vowelsThis is the case even for diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 174: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low vowels tend to be much longer in duration than high vowelsThis is the case even for diphthongs

Low vowels are rarely without some measure of stress accent

Is It Stress? Vocalic Identity? Or What?

Page 175: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low vowels tend to be much longer in duration than high vowelsThis is the case even for diphthongs

Low vowels are rarely without some measure of stress accentThis is true for monophthongs as well as diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 176: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low vowels tend to be much longer in duration than high vowelsThis is the case even for diphthongs

Low vowels are rarely without some measure of stress accentThis is true for monophthongs as well as diphthongs

High vowels are RARELY fully stressed

Is It Stress? Vocalic Identity? Or What?

Page 177: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low vowels tend to be much longer in duration than high vowelsThis is the case even for diphthongs

Low vowels are rarely without some measure of stress accentThis is true for monophthongs as well as diphthongs

High vowels are RARELY fully stressedThis is particularly so for monophthongs, but also applies to diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 178: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low vowels tend to be much longer in duration than high vowelsThis is the case even for diphthongs

Low vowels are rarely without some measure of stress accentThis is true for monophthongs as well as diphthongs

High vowels are RARELY fully stressedThis is particularly so for monophthongs, but also applies to diphthongs

Thus, stress accent appears to be intricately involved with vocalic identity

Is It Stress? Vocalic Identity? Or What?

Page 179: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART EIGHT

Stress Accent’s Impact on Syllable Onsets

Page 180: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

Page 181: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Page 182: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

Page 183: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

The following slides suggest that this assumption is INCORRECT

Page 184: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

The following slides suggest that this assumption is INCORRECT,

And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest

Page 185: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent

level (though not quite as much as exhibited by vocalic constituents)

Page 186: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent

level (though not quite as much as exhibited by vocalic constituents)

Onset duration is similar across syllable form

Page 187: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent

level (though not quite as much as exhibited by vocalic constituents)

Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter)

Page 188: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent level (though not

quite as much as exhibited by vocalic constituents)

Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter)

The duration of unaccented onsets is similar across syllable forms

Page 189: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormOnsets of accented syllables are generally 50-60% longer than their

unaccented counterparts

Page 190: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormOnsets of accented syllables are generally 50-60% longer than their

unaccented counterparts

Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)

Page 191: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Place of Articulation – A Brief PrimerThe tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English

AnteriorLabial [p] [b] [m]Labio-dental [f] [v] Inter-dental [th] [dh]

CentralAlveolar [t] [d] [n] [s] [z]

PosteriorPalatal [sh] [zh]Velar [k] [g] [ng]

From Daniloff (1973)

Page 192: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Page 193: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Usually, non-canonical realizations are manifest as segmental deletions

Page 194: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not ... )

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

Page 195: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not ... )

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

But also exhibits some interesting differences

Page 196: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not ... )

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

But also exhibits some interesting differences(which are potentially significant for models of phonetic organization)

Page 197: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not ... )

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

But also exhibits some interesting differences(which are potentially significant for models of phonetic organization)

Before we examine the segmental patterns in detail, a brief primer on the interpretation of these data is presented

Page 198: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 199: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 200: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 201: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

A large disparity between columns is marked with a blue box

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 202: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

A large disparity between columns is marked with a blue box

READY?

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 203: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

A large disparity between columns is marked with a blue box

READY? OK, Let’s go!

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 204: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Onset Statistics – ANTERIOR Place

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Stress accent has relatively little impact on anterior onset segments

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 205: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Onset Statistics – ANTERIOR Place

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Stress accent has relatively little impact on anterior onset segments

EXCEPT for [dh] and [y]

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 206: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Onset Statistics – ANTERIOR Place

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Stress accent has relatively little impact on anterior onset segments

EXCEPT for [dh] and [y]

[dh] (as in “the” and “them”) tends to delete in unaccented syllables, as does [y] (although to a lesser extent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 207: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 241 245 276 230 513 276 1030 751

d 141 143 149 134 173 128 463 405

dx 0 3 0 62 0 179 0 244

n 133 135 237 196 194 130 564 461

nx 0 2 0 40 0 73 0 115

s 289 290 284 287 187 186 760 763

TotalHeavy Light None

z 14 13 16 16 43 45 73 74

Central segments tend to “disappear” under (absence of) stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – CENTRAL Place

Page 208: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 241 245 276 230 513 276 1030 751

d 141 143 149 134 173 128 463 405

dx 0 3 0 62 0 179 0 244

n 133 135 237 196 194 130 564 461

nx 0 2 0 40 0 73 0 115

s 289 290 284 287 187 186 760 763

TotalHeavy Light None

z 14 13 16 16 43 45 73 74

Central segments tend to “disappear” under (absence) of stress (accent)

There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – CENTRAL Place

Page 209: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 241 245 276 230 513 276 1030 751

d 141 143 149 134 173 128 463 405

dx 0 3 0 62 0 179 0 244

n 133 135 237 196 194 130 564 461

nx 0 2 0 40 0 73 0 115

s 289 290 284 287 187 186 760 763

TotalHeavy Light None

z 14 13 16 16 43 45 73 74

Syllable Onset Statistics – CENTRAL PlaceCentral segments tend to “disappear” under (absence) of stress (accent)

There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions

In heavily accented syllables, central segments maintain their canonical identity

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 210: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 185 186 189 187 170 168 544 541

g 115 116 138 137 54 51 307 304

ng 0 0 2 3 1 1 3 4

sh 26 26 40 40 73 80 139 146

zh 0 1 2 9 11 17 13 27

ch 32 34 19 27 22 23 73 84

TotalHeavy Light None

jh 31 30 52 43 58 48 141 121

w 201 209 310 330 276 287 787 826

q 0 33 0 64 0 38 0 135

Syllable Onset Duration - Posterior PlacePosterior segments are remarkably stable in onset position

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 211: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Onset Statistics – Posterior PlacePosterior segments are remarkably stable in onset position

The only significant “deviation” from canonical realization is the intrusion of the glottal stop [q], which lacks phonemic status in English

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 185 186 189 187 170 168 544 541

g 115 116 138 137 54 51 307 304

ng 0 0 2 3 1 1 3 4

sh 26 26 40 40 73 80 139 146

zh 0 1 2 9 11 17 13 27

ch 32 34 19 27 22 23 73 84

TotalHeavy Light None

jh 31 30 52 43 58 48 141 121

w 201 209 310 330 276 287 787 826

q 0 33 0 64 0 38 0 135

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 212: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

r 272 269 233 215 233 162 738 646

l 184 180 226 212 220 162 630 554

hh 158 156 169 157 67 37 394 350

er 0 0 0 2 0 0 0 2

lg 0 2 0 8 0 21 0 31

el 0 1 0 0 0 0 0 1

TotalHeavy Light None

Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 213: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

r 272 269 233 215 233 162 738 646

l 184 180 226 212 220 162 630 554

hh 158 156 169 157 67 37 394 350

er 0 0 0 2 0 0 0 2

lg 0 2 0 8 0 21 0 31

el 0 1 0 0 0 0 0 1

TotalHeavy Light None

Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel

They are relatively stable at syllable onset, except in unaccented forms

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 214: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

r 272 269 233 215 233 162 738 646

l 184 180 226 212 220 162 630 554

hh 158 156 169 157 67 37 394 350

er 0 0 0 2 0 0 0 2

lg 0 2 0 8 0 21 0 31

el 0 1 0 0 0 0 0 1

TotalHeavy Light None

Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel

They are relatively stable at syllable onset, except in unaccented forms

The reduced form of [l] is [lg], a glide-like element – it tends to assume the functional status of [l] in unaccented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 215: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART NINE

Stress Accent’s Impact on Syllable Codas

Page 216: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

Page 217: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)

Page 218: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)

There is a far greater probability of segmental deletion in coda constituents

Page 219: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)

There is a far greater probability of segmental deletion in coda constituents

Accent level exerts a powerful influence on segmental deletion and on segmental duration

Page 220: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)

There is a far greater probability of segmental deletion in coda constituents

Accent level exerts a powerful influence on segmental deletion and on segmental duration

To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)

Page 221: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Coda Duration - Accent Level/Syllable FormCoda duration (on average) is similar across syllable structure, both for

accented and unaccented forms

Canonical Syllable Forms

Page 222: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Coda Duration - Accent Level/Syllable FormCoda duration (on average) is similar across syllable structure, both for

accented and unaccented forms

There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei)

Canonical Syllable Forms

Page 223: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Coda Duration - Accent Level/Syllable FormCoda duration (on average) is similar across syllable structure, both for accented and unaccented forms

There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei)

Moreover, the duration of certain coda constituents is virtually identical in accented and unaccented syllables

Canonical Syllable Forms

Page 224: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 225: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

The segments [m] and [v] are exceptions

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 226: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 227: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and

They tend to delete in unaccented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 228: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 229: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

(except for the fricatives [s] and [z])

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 230: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

(except for the fricatives [s] and [z])

The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 231: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

(except for the fricatives [s] and [z])

The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables

The major effect of stress accent is its affect on the probability of segmental deletion (which is appreciably higher in unaccented forms)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 232: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

Page 233: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

The duration of many of the coda segments do not exhibit a difference in duration

Page 234: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

The duration of many of the coda segments do not exhibit a difference in duration

Many of the unaccented central codas are short in duration, in contrast to:

Page 235: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

The duration of many of the coda segments do not exhibit a difference in duration

Many of the unaccented central codas are short in duration, in contrast to:(1) central onsets

Page 236: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

The duration of many of the coda segments do not exhibit a difference in duration

Many of the unaccented central codas are short in duration, in contrast to:(1) central onsets, (2) anterior codas

Page 237: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

The duration of many of the coda segments do not exhibit a difference in duration

Many of the unaccented central codas are short in duration, in contrast to:(1) central onsets, (2) anterior codas, (3) posterior codas

Page 238: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

The duration of many of the coda segments do not exhibit a difference in duration

Many of the unaccented central codas are short in duration, in contrast to:(1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

Page 239: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

The duration of many of the coda segments do not exhibit a difference in duration

Many of the unaccented central codas are short in duration, in contrast to:(1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

Page 240: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

ALLSyllable Forms

Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions

Page 241: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Duration - CENTRAL Place

ALLSyllable Forms

Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions

In some sense the default duration for central codas is very short

Page 242: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 170 150 196 162 51 39 417 351

g 10 10 8 10 4 5 22 25

q 0 42 0 71 0 54 0 167

ng 63 60 139 126 203 129 405 315

sh 9 9 2 2 4 6 15 17

zh 1 0 0 4 0 2 1 6

TotalHeavy Light None

ch 26 25 27 25 12 12 65 62

jh 10 10 11 10 15 12 36 32

w 0 4 0 2 0 6 0 12

Syllable Coda Statistics – Posterior PlacePosterior coda segments are relatively stable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 243: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 170 150 196 162 51 39 417 351

g 10 10 8 10 4 5 22 25

q 0 42 0 71 0 54 0 167

ng 63 60 139 126 203 129 405 315

sh 9 9 2 2 4 6 15 17

zh 1 0 0 4 0 2 1 6

TotalHeavy Light None

ch 26 25 27 25 12 12 65 62

jh 10 10 11 10 15 12 36 32

w 0 4 0 2 0 6 0 12

Syllable Coda Statistics – Posterior PlacePosterior coda segments are relatively stable under stress (accent)

The primary exception is [ng], which tends to delete in unaccented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 244: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 170 150 196 162 51 39 417 351

g 10 10 8 10 4 5 22 25

q 0 42 0 71 0 54 0 167

ng 63 60 139 126 203 129 405 315

sh 9 9 2 2 4 6 15 17

zh 1 0 0 4 0 2 1 6

TotalHeavy Light None

ch 26 25 27 25 12 12 65 62

jh 10 10 11 10 15 12 36 32

w 0 4 0 2 0 6 0 12

Syllable Coda Statistics – POSTERIOR PlacePosterior coda segments are relatively stable under stress (accent)

The primary exception is [ng], which tends to delete in unaccented syllables

The “infamous” glottal stop [q] tends to insert in this context

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 245: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 246: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)

This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 247: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)

This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms

The segment [r] tends to delete in unaccented syllables, but not otherwise

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 248: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

PART TEN

What’s Going on in Pronunciation?

Page 249: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms …

What’s Going On? (in pronunciation)

Page 250: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms …(1) those that are relatively stable across accent level, and

What’s Going On? (in pronunciation)

Page 251: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms …(1) those that are relatively stable across accent level, and (2) those that are not

What’s Going On? (in pronunciation)

Page 252: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms …(1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

What’s Going On? (in pronunciation)

Page 253: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

What’s Going On? (in pronunciation)

Page 254: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

What’s Going On? (in pronunciation)

Page 255: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups –

What’s Going On? (in pronunciation)

Page 256: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented

What’s Going On? (in pronunciation)

Page 257: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented and (2) unaccented

What’s Going On? (in pronunciation)

Page 258: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented and (2) unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

What’s Going On? (in pronunciation)

Page 259: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented and (2) unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

What’s Going On? (in pronunciation)

Page 260: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented and (2) unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

Certain segments are actually junctures – e.g., the flaps and the glottal stop

What’s Going On? (in pronunciation)

Page 261: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented and (2) unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

Certain segments are actually junctures – e.g., the flaps and the glottal stop

Several other so-called segments are junctures as well

What’s Going On? (in pronunciation)

Page 262: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented and (2) unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

Certain segments are actually junctures – e.g., the flaps and the glottal stop

Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v]

What’s Going On? (in pronunciation)

Page 263: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels form two basic groups – (1) accented and (2) unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

Certain segments are actually junctures – e.g., the flaps and the glottal stop

Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v]

None of these properties is consistent with a segmental model of language

What’s Going On? (in pronunciation)

Page 264: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Synopsis

The Rationale for a Juncture-Accent Modelof Spoken Language

Page 265: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Take Home MessagesBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Page 266: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:

The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language

Take Home Messages

Page 267: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:

The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language

The pronunciation patterns observed cut across segment and articulatory-feature classes

Take Home Messages

Page 268: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:

The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language

The pronunciation patterns observed cut across segment and articulatory-feature classes

The patterns observed display systematic variation when syllable structure and stress accent are taken into account

Take Home Messages

Page 269: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:

The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language

The pronunciation patterns observed cut across segment and articulatory-feature classes

The patterns observed display systematic variation when syllable structure and stress accent are taken into account

Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations

Take Home Messages

Page 270: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:

The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language

The pronunciation patterns observed cut across segment and articulatory-feature classes

The patterns observed display systematic variation when syllable structure and stress accent are taken into account

Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations

A preliminary juncture-accent model provides a potential starting point for developing more realistic (and robust) lexical representations

Take Home Messages

Page 271: Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

That’s All, Folks

Many Thanks for Your Time and Attention