Post on 08-Jan-2018
Levels of Representation in Adult Speech Perception The Big
Questions What levels of acoustic/phonetic/phonological
representation can we distinguish in the brain? How are these
representations created or modified during development? What is the
flow of information (in space and time) in the mapping from
acoustics to the lexicon in the brain? How does knowledge of native
language categories and phonotactics constrain perception? How are
phonological representations encoded? /kt/ A Category Another
Category 3 III Types of Category Phonetic categories Phonological
categories
Islands of acoustic consistency Graded internal structure matters
Not good for computation Phonological categories Differences among
category members are irrelevant Good for computation May correspond
to complex acoustic distribution /kt/ Gradient Category
Representations
Discrete Category Representations Sensory Maps Internal
representations of the outside world. Cellular neuroscience has
discovered a great deal in this area. Vowel Space Notions of
sensory maps may be applicable to some aspects of human phonetic
representations but theres been little success in that regard, and
we shouldnt expect this to yield much. /kt/ Gradient Category
Representations
Discrete Category Representations Phonological Categories are
Different
Decisions about how to categorize a sound may be fuzzy, etc. But
phonological processes are blind to this We dont find gradient
application of phonological transformations Partial epenthesis
Gradient stress etc. Developmental dissociations of category types
Some Established Results
Search for phonetic maps in the brain: consistently uninformative
Electrophysiology of speech perception has been dominated by
studies of the Mismatch Negativity (MMN), a response elicited in
auditory cortex ms after the onset of an oddball sound MMN
amplitude tracks perceptual distance between standard and deviant
sound; i.e. measure of similarity along many dimensions There are
established effects and non-effects of linguistic category
structure on the MMN non-effects in comparison of within/across
category contrasts real effects in comparison of native/non-native
contrasts Electroencephalography (EEG) Event-Related Potentials
(ERPs)
John islaughing. s1s2s3 Magnetoencephalography
pickup coil & SQUID assembly 160 SQUID whole-head array Brain
Magnetic Fields (MEG)
SQUID detectors measure brain magnetic fields around 100 billion
times weaker than earths steady magnetic field. Evoked Responses
M100 Elicited by any well-defined onset Varies with tone
frequency
Varies with F1 of vowels May vary non-linearly with VOT variation
Functional value of time-code unclear No evidence of higher-level
representations (Poeppel & Roberts 1996) (Poeppel, Phillips et
al. 1997) (Phillips et al. 1995; Sharma & Dorman 1999) Mismatch
Response X X X X X Y X X X X Y X X X X X X Y X X X Y X X X...
Mismatch Response X X X X X Y X X X X Y X X X X X X Y X X X Y X X
X... Mismatch Response X X X X X Y X X X X Y X X X X X X Y X X X Y
X X X...
Latency: msec. Localization: Supratemporal auditory cortex
Many-to-one ratio between standards and deviants Localization of
Mismatch Response
(Phillips, Pellathy, Marantz et al., 2000) Basic MMN elicitation
Risto Ntnen MMN Amplitude Variation
Sams et al. 1985 How does MMN latency, amplitude vary with
frequency difference
How does MMN latency, amplitude vary with frequency
difference?1000Hz tone std. Tiitinen et al. 1994 Different
Dimensions of Sounds
Length Amplitude Pitch you name it Amplitude of mismatch response
can be used as a measure of perceptual distance Impetus for
Language Studies
If MMN amplitude is a measure of perceptual distance, then perhaps
it can be informative in domains where acoustic and perceptual
distance diverge Place of Articulation Acoustic variation: F2 &
F3 transitions Place of Articulation [b] [d]
Acoustic variation: F2 & F3 transitions Place of Articulation
[b] [d]
within category between category [b] [d] Acoustic variation: F2
& F3 transitions Place of Articulation [b] [d]
within category between category [b] [d] Acoustic variation: F2
& F3 transitions Categories in Infancy High Amplitude Sucking -
2 month olds
Eimas et al. 1971 20 vs. 40 ms. VOT - yes 40 vs. 60 ms. VOT - no
Infants show contrast, but this doesnt entail phonological
knowledge Place of Articulation No effect of category boundary on
MMN amplitude (Sharma et al. 1993) Similar findings in Sams et al.
(1991), Maiste et al. (1995) but Ntnen et al. (1997) ee/ o Phonetic
Category Effects
Measures of uneven discrimination profiles Findings are mixed(and
techniques vary) Relies on assumption that effects of contrasts at
multiple levels are additive, plus the requirement that the
additivity effect be strong enough to yield a statistical
interaction Logic of our studies: Eliminate contribution of lower
levels by isolating the many-to-one ratio at an abstract level of
representations Do this by introducing non-orthogonal variation
among standards Auditory Cortex Accesses Phonological Categories:
An MEG Mismatch Study
Colin Phillips, Tom Pellathy, Alec Marantz, Elron Yellin, et al.
[Journal of Cognitive Neuroscience, 2000] More Abstract
Categories
At the level of phonological categories, within-category
differences are irrelevant Aims use MMF to measure categorization
rather than discrimination focus on failure to make
category-internal distinctions Voice Onset Time (VOT) 60 msec
Design Fixed Design - Discrimination 20ms 40ms 60ms Design Fixed
Design - Discrimination Grouped Design - Categorization
20ms 40ms 60ms Grouped Design - Categorization 0ms 8ms 16ms 24ms
40ms 48ms 56ms 64ms Non-orthogonal within-category variation:
excludes grouping via acoustic streaming. Design Fixed Design -
Discrimination Grouped Design - Categorization
20ms 40ms 60ms Grouped Design - Categorization 0ms 8ms 16ms 24ms
40ms 48ms 56ms 64ms Grouped Design - Acoustic Control 20ms 28ms
36ms 44ms 60ms 68ms 76ms 84ms /d/ standard vs. /d/ deviant
Discrimination vs. Categorization: Vowels
Daniel Garcia-Pedrosa Colin Phillips Henny Yeung Some Concerns Are
the category effects an artifact:
It is very hard to discriminate different members of the same
category on a VOT scale Perhaps subjects are forming ad hoc
groupings of sounds during the experiment, not using their
phonological representations? Does the ~30ms VOT boundary simply
reflect a fundamental neurophysiological timing constraint? Vowels
Vowels show categorical perception effects in identification tasks
but vowels show much better discriminability of within-category
pairs Vowels & Tones Synthetic /u/-/o/ continuum
F1 varied, all else constant Amplitude envelope of F1 extracted for
creation of tone controls Pure tone continuum at F1 center
frequency Matched to amplitude envelope of vowel Vowel, F1 = 310Hz
Pure Tone, 310Hz Design Tones Vowels First formant (F1) varies
along the same Hz continuum F0, F2, voicing onset, etc. all remain
constant 300Hz 320Hz 340Hz 360Hz 400Hz 420Hz 440Hz 460Hz Results:
Vowels Results: Vowels Results: Tones Results: Tones Preliminary
conclusions
Clear MMN in standard ms latency range in vowel but not in tone
condition Both vowels and tones yield larger N100 responses
Categorization effect for tones? Response to rarity of individual
deviant tones, without categorization? Response to larger frequency
changes when moving from standard to deviant category? Phonological
Features
Colin Phillips Tom Pellathy Henny Yeung Alec Marantz Sound
Groupings Phonological Features
Phonological Natural Classesexist because... Phonemes are composed
of features - the smallest building blocks of language Phonemes
that share a feature form a natural class Effect of Feature-based
organization observed in Language development Language disorders
Historical change Synchronic processes Roman Jakobson, Sound
Groupings in the Brain
p, t, t, k, d, p, k, t, p, k, b, t... Sound Groupings in the
Brain
p, t, t, k, d, p, k, t, p, k, b, t... Feature Mismatch: Stimuli
Feature Mismatch Design Feature Mismatch Control Experiment -
Acoustic Condition
Identical acoustical variability No phonological many-to-one ratio
Phoneme Variation: Features I
Alternative account of the findings No feature-based grouping
Independent MMF elicited by 3 low-frequency phonemes /b/ /d/ /g/
/p/ /t/ /k/ 29% 29% 29% 4% 4% 4% 12.5% 87.5% Phoneme Variation:
Features II
Follow-up study distinguishes Phoneme-level frequency Feature-level
status /b/ /g/ /d/ /t/ 37.5% 37.5% 12.5% 12.5% Phoneme Variation:
Features II
Follow-up study distinguishes Phoneme-level frequency Feature-level
status /b/ /g/ /d/ /t/ 37.5% 37.5% 12.5% 12.5% Phoneme-based
classification Phoneme Variation: Features II
Follow-up study distinguishes Phoneme-level frequency Feature-level
status /b/ /g/ /d/ /t/ 37.5% 37.5% 12.5% 12.5% Feature-based
grouping Phoneme Variation: Features II
Design N = 10 Multiple exemplars, individually selected boundaries
2 versions recorded for all participants, reversing [voice] value
Acoustic control, with all VOT values in [-voice] range /b/ /g/ /d/
/t/ 37.5% 37.5% 12.5% 12.5% Feature-based grouping Phoneme
Variation: Features II
Left-anterior channels Phoneme Variation: Features II
Left-anterior channels Distinguishing Lexical and Surface Category
Contrasts
Nina Kazanina Colin Phillips Bill Idsardi Nina Kazanina, Univ. of
Ottawa Allophonic Variation All studies shown so far fail to
distinguish surface and lexical-level category representations
(underlying) Phonological category Acoustic distribution Allophonic
Variation All studies shown so far fail to distinguish surface and
lexical-level category representations (underlying) Russian vs.
Korean Three series of stops in Korean:
plain (lenis)pa ta ka glottalized (tense, long)pataka
aspiratedphthakha Intervocalic Plain Stop Voicing: /papo/[pabo]fool
/ku papo/[kbabo]the fool Plain stops: Bimodal distribution of +VOT
and VOT tokens Word-initially: always a positive VOT Word-medially
intervocalically: a voicing lead (negative VOT) Identification/
Rating Discrimination TA (voicing leads & lags)
MEG Stimuli Russian (basic Russian [ta]-token: 00ms voicing lead,
+13ms vowel lag): DA (voicing leads) TA (voicing leads & lags)
-40ms -34ms -28ms -24ms -08ms -04ms +02ms +08ms (relative) -08ms
-04ms +15ms +21ms (absolute) Korean (basic Korean [ta]-token: 00ms
voicing lead, +29ms vowel lag): DA (voicing leads) TA (voicing
lags) -40ms -36ms -30ms -24ms 00ms +07ms +11ms +15ms (relative)
+29ms +36ms +40ms +44ms (absolute) Black: p < .05 White: n.s.
Russian vs. Korean MEG responses indicate that Russian speakers
immediately map sounds from [d-t] continuum onto categories Korean
speakers do not despite the fact that the sounds show bimodal
distribution in their language Perceptual space reflects the
functional status of sounds in encoding word meanings Basic
understanding How strong is this
Adults are prisoners of their native language sound system How
strong is this Structure-adding models predict residual sensitivity
to non-native sounds There is a great deal of motivation in L2
research to find ways to free perception from the constraints of L1
Phonology - Syllables Japanese versus French
Pairs like egma and eguma Difference is possible in French, but not
in Japanese Behavioral Results Japanese have difficulty hearing the
difference
Dupoux et al. 1999 EXECTIVE SUITE ERP Results Sequences: egma,
egma, egma, egma, eguma
French have 3 mismatch responses Early, middle, late Japanese only
have late response Dehaene-Lambertz et al. 2000 ERP Results - 2
Early response Dehaene-Lambertz et al. 2000 ERP Results - 3 Middle
response Dehaene-Lambertz et al. 2000 ERP Results - 4 Late response
Dehaene-Lambertz et al. 2000 Implications Cross-language contrast
in MMN mirrors behavioral contrast Relative timing of responses
that are same and different across French & Japanese is
surprising from a bottom-up view of analysis - suggests a dual
route Is this effect specific to comparison in an XXXXY task? Is
the result robust; does it generalize to other phonotactic
generalizations? What drives Perceptual Epenthesis?
Illegal syllables? Illegal sequences of consonants? (Kabak &
Idsardi, 2004) What drives Perceptual Epenthesis?
Korean syllables Only [p, t, k, m, n, N, l] in coda Other
consonants neutralize in coda position [c, c, ch] --> [t] in
coda Voiced stops only in CVC environments (allophones of voiceless
stops) Korean contact restrictions *C + N Repair 1: nasalize
C[path] + [ma] --> [panma] Repair 2: denasalize N[tal] + [nala]
--> [tallaPa] Restrictions apply within IntPh (Kabak &
Idsardi, 2004) What drives Perceptual Epenthesis?
(Kabak & Idsardi, 2004) What drives Perceptual
Epenthesis?
(Kabak & Idsardi, 2004)