Levels of Representation in Adult Speech Perception

Levels of Representation in Adult Speech Perception The Big Questions What levels of acoustic/phonetic/phonological representation can we distinguish in the brain? How are these representations created or modified during development? What is the flow of information (in space and time) in the mapping from acoustics to the lexicon in the brain? How does knowledge of native language categories and phonotactics constrain perception? How are phonological representations encoded? /kt/ A Category Another Category 3 III Types of Category Phonetic categories Phonological categories
Islands of acoustic consistency Graded internal structure matters Not good for computation Phonological categories Differences among category members are irrelevant Good for computation May correspond to complex acoustic distribution /kt/ Gradient Category Representations
Discrete Category Representations Sensory Maps Internal representations of the outside world. Cellular neuroscience has discovered a great deal in this area. Vowel Space Notions of sensory maps may be applicable to some aspects of human phonetic representations but theres been little success in that regard, and we shouldnt expect this to yield much. /kt/ Gradient Category Representations
Discrete Category Representations Phonological Categories are Different
Decisions about how to categorize a sound may be fuzzy, etc. But phonological processes are blind to this We dont find gradient application of phonological transformations Partial epenthesis Gradient stress etc. Developmental dissociations of category types Some Established Results
Search for phonetic maps in the brain: consistently uninformative Electrophysiology of speech perception has been dominated by studies of the Mismatch Negativity (MMN), a response elicited in auditory cortex ms after the onset of an oddball sound MMN amplitude tracks perceptual distance between standard and deviant sound; i.e. measure of similarity along many dimensions There are established effects and non-effects of linguistic category structure on the MMN non-effects in comparison of within/across category contrasts real effects in comparison of native/non-native contrasts Electroencephalography (EEG) Event-Related Potentials (ERPs)
John islaughing. s1s2s3 Magnetoencephalography
pickup coil & SQUID assembly 160 SQUID whole-head array Brain Magnetic Fields (MEG)
SQUID detectors measure brain magnetic fields around 100 billion times weaker than earths steady magnetic field. Evoked Responses M100 Elicited by any well-defined onset Varies with tone frequency
Varies with F1 of vowels May vary non-linearly with VOT variation Functional value of time-code unclear No evidence of higher-level representations (Poeppel & Roberts 1996) (Poeppel, Phillips et al. 1997) (Phillips et al. 1995; Sharma & Dorman 1999) Mismatch Response X X X X X Y X X X X Y X X X X X X Y X X X Y X X X... Mismatch Response X X X X X Y X X X X Y X X X X X X Y X X X Y X X X... Mismatch Response X X X X X Y X X X X Y X X X X X X Y X X X Y X X X...
Latency: msec. Localization: Supratemporal auditory cortex Many-to-one ratio between standards and deviants Localization of Mismatch Response
(Phillips, Pellathy, Marantz et al., 2000) Basic MMN elicitation Risto Ntnen MMN Amplitude Variation
Sams et al. 1985 How does MMN latency, amplitude vary with frequency difference
How does MMN latency, amplitude vary with frequency difference?1000Hz tone std. Tiitinen et al. 1994 Different Dimensions of Sounds
Length Amplitude Pitch you name it Amplitude of mismatch response can be used as a measure of perceptual distance Impetus for Language Studies
If MMN amplitude is a measure of perceptual distance, then perhaps it can be informative in domains where acoustic and perceptual distance diverge Place of Articulation Acoustic variation: F2 & F3 transitions Place of Articulation [b] [d]
Acoustic variation: F2 & F3 transitions Place of Articulation [b] [d]
within category between category [b] [d] Acoustic variation: F2 & F3 transitions Place of Articulation [b] [d]
within category between category [b] [d] Acoustic variation: F2 & F3 transitions Categories in Infancy High Amplitude Sucking - 2 month olds
Eimas et al. 1971 20 vs. 40 ms. VOT - yes 40 vs. 60 ms. VOT - no Infants show contrast, but this doesnt entail phonological knowledge Place of Articulation No effect of category boundary on MMN amplitude (Sharma et al. 1993) Similar findings in Sams et al. (1991), Maiste et al. (1995) but Ntnen et al. (1997) ee/ o Phonetic Category Effects
Measures of uneven discrimination profiles Findings are mixed(and techniques vary) Relies on assumption that effects of contrasts at multiple levels are additive, plus the requirement that the additivity effect be strong enough to yield a statistical interaction Logic of our studies: Eliminate contribution of lower levels by isolating the many-to-one ratio at an abstract level of representations Do this by introducing non-orthogonal variation among standards Auditory Cortex Accesses Phonological Categories: An MEG Mismatch Study
Colin Phillips, Tom Pellathy, Alec Marantz, Elron Yellin, et al. [Journal of Cognitive Neuroscience, 2000] More Abstract Categories
At the level of phonological categories, within-category differences are irrelevant Aims use MMF to measure categorization rather than discrimination focus on failure to make category-internal distinctions Voice Onset Time (VOT) 60 msec Design Fixed Design - Discrimination 20ms 40ms 60ms Design Fixed Design - Discrimination Grouped Design - Categorization
20ms 40ms 60ms Grouped Design - Categorization 0ms 8ms 16ms 24ms 40ms 48ms 56ms 64ms Non-orthogonal within-category variation: excludes grouping via acoustic streaming. Design Fixed Design - Discrimination Grouped Design - Categorization
20ms 40ms 60ms Grouped Design - Categorization 0ms 8ms 16ms 24ms 40ms 48ms 56ms 64ms Grouped Design - Acoustic Control 20ms 28ms 36ms 44ms 60ms 68ms 76ms 84ms /d/ standard vs. /d/ deviant Discrimination vs. Categorization: Vowels
Daniel Garcia-Pedrosa Colin Phillips Henny Yeung Some Concerns Are the category effects an artifact:
It is very hard to discriminate different members of the same category on a VOT scale Perhaps subjects are forming ad hoc groupings of sounds during the experiment, not using their phonological representations? Does the ~30ms VOT boundary simply reflect a fundamental neurophysiological timing constraint? Vowels Vowels show categorical perception effects in identification tasks but vowels show much better discriminability of within-category pairs Vowels & Tones Synthetic /u/-/o/ continuum
F1 varied, all else constant Amplitude envelope of F1 extracted for creation of tone controls Pure tone continuum at F1 center frequency Matched to amplitude envelope of vowel Vowel, F1 = 310Hz Pure Tone, 310Hz Design Tones Vowels First formant (F1) varies along the same Hz continuum F0, F2, voicing onset, etc. all remain constant 300Hz 320Hz 340Hz 360Hz 400Hz 420Hz 440Hz 460Hz Results: Vowels Results: Vowels Results: Tones Results: Tones Preliminary conclusions
Clear MMN in standard ms latency range in vowel but not in tone condition Both vowels and tones yield larger N100 responses Categorization effect for tones? Response to rarity of individual deviant tones, without categorization? Response to larger frequency changes when moving from standard to deviant category? Phonological Features
Colin Phillips Tom Pellathy Henny Yeung Alec Marantz Sound Groupings Phonological Features
Phonological Natural Classesexist because... Phonemes are composed of features - the smallest building blocks of language Phonemes that share a feature form a natural class Effect of Feature-based organization observed in Language development Language disorders Historical change Synchronic processes Roman Jakobson, Sound Groupings in the Brain
p, t, t, k, d, p, k, t, p, k, b, t... Sound Groupings in the Brain
p, t, t, k, d, p, k, t, p, k, b, t... Feature Mismatch: Stimuli Feature Mismatch Design Feature Mismatch Control Experiment - Acoustic Condition
Identical acoustical variability No phonological many-to-one ratio Phoneme Variation: Features I
Alternative account of the findings No feature-based grouping Independent MMF elicited by 3 low-frequency phonemes /b/ /d/ /g/ /p/ /t/ /k/ 29% 29% 29% 4% 4% 4% 12.5% 87.5% Phoneme Variation: Features II
Follow-up study distinguishes Phoneme-level frequency Feature-level status /b/ /g/ /d/ /t/ 37.5% 37.5% 12.5% 12.5% Phoneme Variation: Features II
Follow-up study distinguishes Phoneme-level frequency Feature-level status /b/ /g/ /d/ /t/ 37.5% 37.5% 12.5% 12.5% Phoneme-based classification Phoneme Variation: Features II
Follow-up study distinguishes Phoneme-level frequency Feature-level status /b/ /g/ /d/ /t/ 37.5% 37.5% 12.5% 12.5% Feature-based grouping Phoneme Variation: Features II
Design N = 10 Multiple exemplars, individually selected boundaries 2 versions recorded for all participants, reversing [voice] value Acoustic control, with all VOT values in [-voice] range /b/ /g/ /d/ /t/ 37.5% 37.5% 12.5% 12.5% Feature-based grouping Phoneme Variation: Features II
Left-anterior channels Phoneme Variation: Features II
Left-anterior channels Distinguishing Lexical and Surface Category Contrasts
Nina Kazanina Colin Phillips Bill Idsardi Nina Kazanina, Univ. of Ottawa Allophonic Variation All studies shown so far fail to distinguish surface and lexical-level category representations (underlying) Phonological category Acoustic distribution Allophonic Variation All studies shown so far fail to distinguish surface and lexical-level category representations (underlying) Russian vs. Korean Three series of stops in Korean:
plain (lenis)pa ta ka glottalized (tense, long)pataka aspiratedphthakha Intervocalic Plain Stop Voicing: /papo/[pabo]fool /ku papo/[kbabo]the fool Plain stops: Bimodal distribution of +VOT and VOT tokens Word-initially: always a positive VOT Word-medially intervocalically: a voicing lead (negative VOT) Identification/ Rating Discrimination TA (voicing leads & lags)
MEG Stimuli Russian (basic Russian [ta]-token: 00ms voicing lead, +13ms vowel lag): DA (voicing leads) TA (voicing leads & lags) -40ms -34ms -28ms -24ms -08ms -04ms +02ms +08ms (relative) -08ms -04ms +15ms +21ms (absolute) Korean (basic Korean [ta]-token: 00ms voicing lead, +29ms vowel lag): DA (voicing leads) TA (voicing lags) -40ms -36ms -30ms -24ms 00ms +07ms +11ms +15ms (relative) +29ms +36ms +40ms +44ms (absolute) Black: p < .05 White: n.s. Russian vs. Korean MEG responses indicate that Russian speakers immediately map sounds from [d-t] continuum onto categories Korean speakers do not despite the fact that the sounds show bimodal distribution in their language Perceptual space reflects the functional status of sounds in encoding word meanings Basic understanding How strong is this
Adults are prisoners of their native language sound system How strong is this Structure-adding models predict residual sensitivity to non-native sounds There is a great deal of motivation in L2 research to find ways to free perception from the constraints of L1 Phonology - Syllables Japanese versus French
Pairs like egma and eguma Difference is possible in French, but not in Japanese Behavioral Results Japanese have difficulty hearing the difference
Dupoux et al. 1999 EXECTIVE SUITE ERP Results Sequences: egma, egma, egma, egma, eguma
French have 3 mismatch responses Early, middle, late Japanese only have late response Dehaene-Lambertz et al. 2000 ERP Results - 2 Early response Dehaene-Lambertz et al. 2000 ERP Results - 3 Middle response Dehaene-Lambertz et al. 2000 ERP Results - 4 Late response Dehaene-Lambertz et al. 2000 Implications Cross-language contrast in MMN mirrors behavioral contrast Relative timing of responses that are same and different across French & Japanese is surprising from a bottom-up view of analysis - suggests a dual route Is this effect specific to comparison in an XXXXY task? Is the result robust; does it generalize to other phonotactic generalizations? What drives Perceptual Epenthesis?
Illegal syllables? Illegal sequences of consonants? (Kabak & Idsardi, 2004) What drives Perceptual Epenthesis?
Korean syllables Only [p, t, k, m, n, N, l] in coda Other consonants neutralize in coda position [c, c, ch] --> [t] in coda Voiced stops only in CVC environments (allophones of voiceless stops) Korean contact restrictions *C + N Repair 1: nasalize C[path] + [ma] --> [panma] Repair 2: denasalize N[tal] + [nala] --> [tallaPa] Restrictions apply within IntPh (Kabak & Idsardi, 2004) What drives Perceptual Epenthesis?
(Kabak & Idsardi, 2004) What drives Perceptual Epenthesis?
(Kabak & Idsardi, 2004)

Levels of Representation in Adult Speech Perception

Documents

Transcript of Levels of Representation in Adult Speech Perception

Speech Generation and Perception

Music and Speech Perception - San Jose State University · speech perception in terms of general ways that hearing and perception work. •Example: Perception of coarticulated speech

Speech perception 2 Perceptual organization of speech.

Production and Perception of Pauses in Speech · Production and Perception of Pauses in Speech Kristina Lundholm Fors Production and Perception of Pauses in Speech Kristina Lundholm

The two different parts of speech Speech Production Speech Perception.

Speech Perception [ ] recognize speech wreck a nice beach ?

Left premotor cortex and allophonic speech perception in ... · Left premotor cortex and allophonic speech perception in dyslexia: a PET study. Premotor cortex and speech perception

Theories of Speech Perception

Color Image Perception, Representation and Contrast ...eeweb.poly.edu/~yao/EL6123_s16/Color_ContrastEnhancement.pdf · Color Image Perception, Representation and Contrast Enhancement

Speech Analysis Synthesis and Perception

Acquisition, Representation, Display, and Perception of ......Acquisition, Representation, Display, and Perception of Image and Video Signals Representation of Images and Video Common

``Notic My Speech'' - Blending Speech Patterns With MultimediaSpeech perception, Cognitive speech perception, Attention on speech patterns, Coarticulation, Ganong effect, Multimedia

Speech perception and production

Speech & Music Perception

SPEECH PERCEPTION - Indiana University

Prenatal Maternal Speech Influences Newborns’ Perception ...€¦ · Prenatal Maternal Speech Influences Newborns’ Perception of Speech ... sounds causes some property of the

Language Comprehension Speech Perception Meaning Representation.

Reading & Speech Perception

II. Speech sounds. Speech production and perception transmitted Speech production (move the organs) >>>>>> >>>>>> speech perception (hearing the sounds)

SPEECH PERCEPTION - Computer Science