Linguistic Structure in Identifying Segments in a Second Language

44
Linguistic Structure in Identifying Segments in a Second Language Kenneth de Jong Indiana University Colloquium at the Department of Linguistics THE ohio state university May 6, 2005

description

Linguistic Structure in Identifying Segments in a Second Language. Kenneth de Jong Indiana University Colloquium at the Department of Linguistics THE ohio state university May 6, 2005. Also with help & collaboration from. NIH in R03 DC04095 & NSF in BC-9910701 Kyoko NagaoByung-jin Lim - PowerPoint PPT Presentation

Transcript of Linguistic Structure in Identifying Segments in a Second Language

Page 1: Linguistic Structure in Identifying Segments in a Second Language

Linguistic Structure in Identifying Segments in a

Second Language

Kenneth de Jong

Indiana University

Colloquium

at the Department of Linguistics

THE ohio state university

May 6, 2005

Page 2: Linguistic Structure in Identifying Segments in a Second Language

NIH in R03 DC04095 & NSF in BC-9910701

Kyoko Nagao Byung-jin Lim

Hanyong Park Noah Silbert

Minoru Fukuda of Miyazaki Municiple University

Jin-young Tak of Sejong University

Mi-hui Cho of Kyonggi University

Also with help & collaboration from

Page 3: Linguistic Structure in Identifying Segments in a Second Language

Second Language Phonetic and Phonological Research

• Very popular topic• … especially lately• … relative to a lot of what we linguists do• Very large literature• Very unsatisfying - difficult to really gain a

coherent picture of what’s really known about the field

Page 4: Linguistic Structure in Identifying Segments in a Second Language

2 Reasons for the Nature of Literature

1) Segregated research threads - different questions, different data-treatment approaches

- Classroom oriented research in such groups as TESOL- Generative-style formal analyses done by linguists- Cross-language perception studies by psychologists- Motor learning studies done by (almost) no one

2) It’s hard– Requires double the linguistic expertise, since we deal with two linguistic

systems– Requires a set of typological comparisons that will support a model of

how the two systems map onto one another– Requires sufficient detail in all of the above to make the models

reasonable

Page 5: Linguistic Structure in Identifying Segments in a Second Language

So … why do it?

• We have a hard time saying no for the 16th time

• Most people are multi-lingual, esp. today• Theoretically useful

– The level of rigor, specificially with respect to typological claims, is useful for the discipline

– Rapid learning in the second language acquirer can show us how the linguistic cognitive system works

Page 6: Linguistic Structure in Identifying Segments in a Second Language

Topic today:Segmental Identification

• Predominates phonological and psychological literature

• Relatively simple (given what we know)

• Largely abstracted from lexical access and syntactic parsing issues (until recently)

• Essentially alphabetic

Page 7: Linguistic Structure in Identifying Segments in a Second Language

Previous, very very commonly cited models

• SLM - Jim Flege: – Model of production (originally)– Production problems depend on perceptual meta-classification of

segments, where segments = allophone (more or less) – New v.s Similar = whether a segment in L2 has a corresponding

segment in L1– Beats me how we know what counts as similar, but I’m sure the

IPA has something to do with it– In early learning, similar phones are stable and functional, while

new phones are unstable and dysfunctional– Learning of new phones progresses rapidly, while similar phones

merge with L1 phones to form a stable and not-quite-accurate category

Page 8: Linguistic Structure in Identifying Segments in a Second Language

Previous, very very commonly cited models

• PAM - Cathy Best: – Model of perceptual discrimination– Discrimination abilities depend on perceptual meta-classification

of segments, where segments = gestural complexes (more or less)

– The degree to which two contrasting sounds fit into different categories, given L1 experience, determines the degree to which they can be discriminable by an L1 perceiver

– Not a model of second language learning, but of cross-language perception; technically subjects should be set free after the experiment, since the experiment breaks them by beginning the process of forming additional perceptual categories

Page 9: Linguistic Structure in Identifying Segments in a Second Language

Model Architecture: Segmental Categories are

Unitary Things• Most of these experimentally oriented models

treat segments as unitary free-standing object categories

• At odds with typical treatment in linguistic models which generally assume that cross-segment properties are operative in determining how second language classification happens

Page 10: Linguistic Structure in Identifying Segments in a Second Language

Model Architecture: Segmental Categories are

Unitary Things

Questions to be pursued• Parsing question: Segments are embedded

Iarger units of all different kinds• Cross-segment question: Segments exist in a

matrix with other segments• Within-segment question: Segments have lots

of internal structure

Page 11: Linguistic Structure in Identifying Segments in a Second Language

Model Architecture: Segmental Categories are

Unitary Things

Questions to be pursued• Parsing question: Segments are embedded

Iarger units of all different kinds• Cross-segment question: Segments exist in a

matrix with other segments• Within-segment question: Segments have lots

of internal structure

Page 12: Linguistic Structure in Identifying Segments in a Second Language

Parsing Question

• Analyses of Korean -> English database for other studies below

• Park & de Jong (2005) shows that prosodic parsing heavily affects segmental identification

• C’s in VC’s are neutralized, but C’s in VCV’s are not• Korean listeners’ accuracy voicing judgments for

word-final obstruents depend on whether they hear a count a VC release as an additional syllable

Page 13: Linguistic Structure in Identifying Segments in a Second Language

Model Architecture: Segmental Categories are

Unitary Things

Questions to be pursued• Parsing question: Segments are embedded

Iarger units of all different kinds• Cross-segment question: Segments exist in a

matrix with other segments• Within-segment question: Segments have lots

of internal structure

Page 14: Linguistic Structure in Identifying Segments in a Second Language

Experiment:Cross-segment question

• Corpus– English obstruents with /a/ to make non-words– 8 Target consonants contrasting in three binary features

Coronal LabialVoiced Voiceless Voiced

VoicelessStops /d/ /t/ /b/ /p/Fricatives // // /v/ /f/

– 4 Prosodic conditionsIntervocalic At Edge

Pre-stress /∂ ‘pa/ ‘apah’ /pa/ ‘pa’Post-stress /’a p∂/ ‘oppa’ /ap/ ‘op’

• Analysis: Look for generality across parallel segments

Page 15: Linguistic Structure in Identifying Segments in a Second Language

Experiment:Cross-segment question

• Stimuli– 4 Northern mid-western English speakers in late 20’s– Cued with orthographic fonts– One consonant per non-word item, consonant included others

besides the 8 targets– Produced in isolation

• Listeners– 41 Korean undergrads at Kyonggi University in Seoul– Very little exposure to native English-speaking people

• Procedure– Stimuli presented over headphones in a listening lab– Listeners asked to identify the consonants on a paper response

sheet– Given 14 response options + one (rarely used) for ‘other:____’

Page 16: Linguistic Structure in Identifying Segments in a Second Language

Analysis for Generalization 1: Cross-listener differences

• Question: Is segmental accuracy with one segment tied to accuracy with parallel segments

• Here: contrasting non-sibilant fricatives are new for the Korean listeners. They need to be distinguished from stops which are similar. (C.f. looking for copy machines in the kitchen.)

• Specific sub-question: is accuracy in distinguishing /t/ from // linked to accuracy in distinguishing /p/ from /f/?

• Regress accuracy for each listener in coronals against accuracy in labials

Page 17: Linguistic Structure in Identifying Segments in a Second Language

Manner accuracy: Labials vs. Coronals

• Error rates range from 50% to 10%

• Accuracy often better with coronals

• The two accuracy scores do correlate quite strongly

• But … what about, say, voiced and voiceless, where the contrast is quite different?

Page 18: Linguistic Structure in Identifying Segments in a Second Language

Manner accuracy: Voiced vs. Voiceless

• Accuracy difference is larger.

• Voiced obstruents are poorly distinguished, never less than 20% error rates

• BUT again: the two accuracy scores do correlate

• Next: split by prosodic position

Page 19: Linguistic Structure in Identifying Segments in a Second Language

Manner accuracy: Across prosodic positions

• Correlations generally in the same ball-park as we just saw, with exception of Final position

• Even here, the correlations are strongly significant

Initial(‘pa’)

Pre-stress(‘apah’)

Post-stress(‘oppa’)

Final(‘op’)

Initial 1.000 0.283 0.302 0.196

Pre-stress 1.000 0.384 0.241

Post-stress

1.000 0.140

Final 1.000

Page 20: Linguistic Structure in Identifying Segments in a Second Language

Interim Summary

• Results suggest that distinguishing stops from fricative is a single skill (or at least a set of closely related skills). Some listeners have acquired it better than others.

• Woah. Um … how do we know this isn’t just an effect of overall proficiency differences. Some listeners are more experienced, and hence are better categorizers overall?

• Good question.

• However, the correlation patterns for the manner contrasts are not obtained for all pairs. C.f., the voicing contrast below.

Page 21: Linguistic Structure in Identifying Segments in a Second Language

Voicing accuracy: Across prosodic positions

• Correlations only between – Initial (‘pa’) and pre-stress (‘apah’)– Pre-stress (‘apah’) and post-stress (‘oppa’)

• Suggests three skills: pre-vocalic, inter-vocalic, and post-vocalic

Initial Pre-stress

Post-stress

Final

Initial 1.000 0.488 0.074 0.027

Pre-stress

1.000 0.325 0.002

Post-stress

1.000 0.034

Final 1.000

Page 22: Linguistic Structure in Identifying Segments in a Second Language

Analysis for Generalization 2:Part-whole Analysis

• Boothroyd & Nittrouwer (1988) point out mathematical difference between unitary and generalized, factored models

• Factored models predict that the accuracy of the whole is the product of accuracy in each of the factors

• Here, e.g., accuracy in identifying /f/ =

accuracy in manner X accuracy in voicing X accuracy in place • ‘J-factor’:

segment accuracy = (average feature accuracy)J• With a factored model, we expect J = number of factors, here 3• With a largely unitary model, we expect J < 2 (or so, Nearey, 2003)• Benki (2003) also finds familiarity biasing in which more familiar

items exhibit lower J-factors (between 2 & 3 in his study)

Page 23: Linguistic Structure in Identifying Segments in a Second Language

Part-whole Analysis

• J-factors split by prosodic position• J-factors consistently near 3• Lowest J-factors in initial position - familiarity biasing effect? • Do similar analyses of different segments

Initial

Pre-stress

Post-stress

Final

Overall

J-factor

2.627

3.034 2.816 2.750

2.710

Page 24: Linguistic Structure in Identifying Segments in a Second Language

Part-whole Analysis• Segmental accuracy is very close to the

product of featural accuracies for each segment

• Fricatives lie almost exactly on diagonal• Stops are often slightly over diagonal• Since Korean has stops, this suggests

a familiarity biasing effect

Page 25: Linguistic Structure in Identifying Segments in a Second Language

Summary

• Evidence against a strictly segmental model of segment identification– Cross subject correlations have parallelism in accuracy rates

which is parallel to the featural structure of the consonants being acquired

• Evidence for a generalized model– Overall accuracy in segmental identification is neatly a

function of accuracy in the component features. This is particularly true for novel segments being acquired

– Related evidence below

Page 26: Linguistic Structure in Identifying Segments in a Second Language

Model Architecture: Segmental Categories are

Unitary Things

Questions to be pursued• Parsing question: Segments are embedded

Iarger units of all different kinds• Cross-segment question: Segments exist in a

matrix with other segments• Within-segment question: Segments have lots

of internal structure

Page 27: Linguistic Structure in Identifying Segments in a Second Language

Experiment:Internal structure question

• Corpus– 4 Midwestern American speakers in their mid-30’s– /pi/ and /bi/ – Metronomically Rate-varied corpus with extreme durational variability

(deJong, 2001a; 2001b)• Repetition period varied continuously from 450 ms - 250 ms• This range of rates from physiological constraints study (Nelson & Perkell, 19**)

• Procedure– Present excised syllable trains for identification

• Subjects– 23 native English speaking undergraduates from Indiana University– 14 native Japanese speaking students from Indiana University– 13 native Korean speaking students from Indiana University– All monolingual through early years

Page 28: Linguistic Structure in Identifying Segments in a Second Language

Stimulus VOT Distribution

• Plots VOT for /p/ and /b/ against syllable duration

• VOT’s shorten for /p/ at fast rates

Page 29: Linguistic Structure in Identifying Segments in a Second Language

Stimulus VOT Distribution

• Zoom in on VOT dimension

• Get near merger at very fast rates

Page 30: Linguistic Structure in Identifying Segments in a Second Language

Native Responses

• Logistical regression with identification responses

• Add 50% boundary between /p/ & /b/ for native listeners

• Slant shows normalization for rate

Page 31: Linguistic Structure in Identifying Segments in a Second Language

Question: how do Non-natives handle variability?

• Mismatch in VOT production boundary– Japanese /p/ has shorter VOT– Korean /ph/ has longer VOT

• Expect shifted identification responses– Japanese: more /b/ -> /p/ errors– Korean : more /p/ -> /b/ errors

Page 32: Linguistic Structure in Identifying Segments in a Second Language

Cross-language

• Get shifts in expected directions

• Rate normalization function is same as native listeners

0

5

10

15

20

25

30

35

40

45

75 125 175 225 275 325

Syllable duration (msec)

VOT (msec)

English

Japanese Adv. Learners

Korean Adv. Learners

Japanese Monolinguals

Korean Monolinguals

Page 33: Linguistic Structure in Identifying Segments in a Second Language

Question: how do Non-natives handle variability?

• Get expected shifted identification responses– Japanese: more /b/ -> /p/ errors– Korean : more /p/ -> /b/ errors

• Rate normalized as well.

• Question is: where? – Segmental Un-rate-differentiated Prototype: mostly in

middle of distribution– Rate Extracted Model: persistent across distribution

Page 34: Linguistic Structure in Identifying Segments in a Second Language

Undifferentiated Prototype Model

• Here’s the general distributional pattern

Page 35: Linguistic Structure in Identifying Segments in a Second Language

Undifferentiated Prototype Model

• Here are prototypical categories with centers to which stimuli are compared

Native /p/

Native /b/

Non-native /p/

Non-native /b/

Page 36: Linguistic Structure in Identifying Segments in a Second Language

Native /p/

Native /b/

Non-native /p/

Non-native /b/

Undifferentiated Prototype Model

• Using native vs. non-native centers heavily affects portions between the centers

• Distance of extreme tokens from two centers is little affected

Page 37: Linguistic Structure in Identifying Segments in a Second Language

Extracted Model

• A generalized criterion model divides space

Native Optimized Criterion

Non-native Criterion

/p/

/b/

Page 38: Linguistic Structure in Identifying Segments in a Second Language

Native Optimized Criterion

Non-native Criterion

/p/

/b/

Generalized Model

• A shifted criterion will affect identification throughout region around boundary

Page 39: Linguistic Structure in Identifying Segments in a Second Language

Non-native Differences

• Back to Actual responses

• We compare native and non-native identification and highlight tokens which differ

Page 40: Linguistic Structure in Identifying Segments in a Second Language

Japanese Differences

• Expect /b/->/p/ errors

• Get more (red squares)

• Note distribution across rates

• Also get /p/ -> /b/ errors (black diamonds)

Page 41: Linguistic Structure in Identifying Segments in a Second Language

Korean Differences

• Expect /p/->/b/ errors

• Get them (black diamonds)

• Note very odd distribution: across rates?

• Also get /p/ -> /b/ errors (red squares)

Page 42: Linguistic Structure in Identifying Segments in a Second Language

Experiment 2 Summary

• Differences in L1 typical VOT show up in mismatch errors in both Japanese and Korean

• Errors are distributed across the rates, suggesting a model in which generalized perceptual criteria are taken from L1

• Reverse direction errors also indicate another aspect of non-native boundaries: Uncertainty

Page 43: Linguistic Structure in Identifying Segments in a Second Language

Model Architecture: Segmental Categories are

Extracted ThingsQuestions to be pursued• Parsing question: Segmental identification requires

global identification of context• Cross-segment question: Segmental identification is

a function of other segments• Within-segment question: Segmental identification is

a function of generalized situation

Page 44: Linguistic Structure in Identifying Segments in a Second Language

Fine