The effects of syllable position on allophonic
variation in Québec French /ʀ/
Peter Milne Université d’Ottawa
!1
NWAV40 October 29, 2011
Presentation Objectives
The value in working with large natural language corpora is the ability to obtain large amounts of data to test research hypotheses.
– Studies at the level of the utterance can provide a comprehensive and coherent account of phonological alternations.
– The use of a corpus is an important instrument to examine the behavior of words in connected speech.
– A corpus allows quatitative and qualitative analyses far superior to intuition alone.
!2
NWAV40 October 29, 2011
Presentation Objectives
The value in working with large natural language corpora is the ability to obtain large amounts of data to test research hypotheses.
The challenge is to quickly and systematically extract the data from the corpus.
– Accumulating the linguistic data required to test and evaluate hypotheses is a time consuming and labour intensive job.
– Assisted segmentation of speech data may take as much as 10 times real time (Goldman, 2011).
– Full manual segmentation may take as much as 800 times real time (Schiel & Draxler, 2003).
– Penn Phonetics Lab Forced Aligner works well for English, but what about French?
!3
NWAV40 October 29, 2011
Presentation Objectives
The value in working with large natural language corpora is the ability to obtain large amounts of data to test research hypotheses.
The challenge is to quickly and systematically extract the data from the corpus.
Describe an automatic aligner that works on a corpus of spoken French to help address this challenge.
– An accurate forced aligner can aid in the analysis of large volumes of natural language data.
– A forced aligner can produce speech segmentation at the word and phone levels in about 5 times real time (Goldman, 2011).
– This aligned speech data can provide contextual and acoustic information about the segment under investigation: /ʀ/
!4
NWAV40 October 29, 2011
Research Questions
The value in working with large natural language corpora is the ability to obtain large amounts of data to test research hypotheses.
The challenge is to quickly and systematically extract the data from the corpus.
Describe an automatic aligner that works on a corpus of spoken French to help address this challenge.
Demonstrate a small research study on the effects of syllable position on allophonic variation in Québec French /ʀ/
– Is syllable position related to allophonic variation in Québec French /ʀ/?
– Are there measurable differences between allophones of Québec French /ʀ/?
– Can differences in the allophones be explained with reference to the syllable?
!5
NWAV40 October 29, 2011
Overview
/R/
!6
NWAV40 October 29, 2011
Overview
/R/
P2FA en français, s’il
vous plait
!7
NWAV40 October 29, 2011
Overview
/R/
P2FAData
!8
NWAV40 October 29, 2011
Overview
/R/
P2FA
Data
Future Directions
!9
NWAV40 October 29, 2011
Allophones of /ʀ/
French rhotics include trills, taps, flaps, fricatives, and approximants – no single physical property shared by all – large numbers of rhotics may co-exist as allophonic and
sociolinguistic variants of same phoneme
/ʀ/ →
[ r ] apical trill[ ʀ ] uvular trill
[ ʁ | χ ] uvular fricative[ ɣ | x ] velar fricative
[ ʁ̞ ] uvular approximant[ ɹ ] alveolar approximant
[ ∅ ] deleted
!10
NWAV40 October 29, 2011
Allophones of /ʀ/
French rhotics include trills, taps, flaps, fricatives, and approximants – no single physical property shared by all – large numbers of rhotics may co-exist as allophonic and
sociolinguistic variants of same phoneme
/ʀ/ →
[ r ] apical trill[ ʀ ] uvular trill
[ ʁ | χ ] uvular fricative[ ɣ | x ] velar fricatives
[ ʁ̞ ] uvular approximant[ ɹ ] alveolar approximant
[ ∅ ] deleted
!11
NWAV40 October 29, 2011
Allophones of /ʀ/: Historical evidence
/ʀ/ has had a variety of articulations for a very long time. – More than one phonetic realization since at least 14th century – Confusion with alveolar place of articulation
• Early Gaul or Brittany “n” written as “r” in early French – Ling(o)nes → Langres – Lund(i)num → Londres
• Same change from Latin to French – ord(i)nem → ordre – diac(o)num → diacre
!12
NWAV40 October 29, 2011
Allophones of /ʀ/: Historical evidence
/ʀ/ has had a variety of articulations for a very long time. – More than one phonetic realization since at least 14th century – Confusion with alveolar place of articulation – Confusion with velar place of articulation
• Early Brittany velars “c'h” also written as “r” – (k)nech → Ners – Pen-Nec'h → Pennère
!13
NWAV40 October 29, 2011
Allophones of /ʀ/: Phonetic evidence
/ʀ/ has had a variety of articulations for a very long time – [ ʀ ~ ʁ|χ ~ ʁ̞ ] all share similar place of articulation
!14
NWAV40 October 29, 2011
Allophones of /ʀ/: Phonetic evidence
/ʀ/ has had a variety of articulations for a very long time – [ ʀ ~ ʁ|χ ~ ʁ̞ ] all share similar place of articulation – all varieties are relatively sonorous and vocalic in nature
Borel-Maisonny (1942): Paris – not place of articulation, but size of aperture of the constriction,
laryngeal voicing, presence of vibrations Vinay (1950): Québec
– mostly [ʁ] with “appreciable frication noise” Delattre (1969): Urban Paris
– pharyngeal consonant with articulation involving root of tongue Santerre (1982): Montréal
– [ʀ] very sonorous with no high frequency noise
!15
NWAV40 October 29, 2011
Allophones of /ʀ/: Sociolinguistic evidence
[r] is no longer a dominant variety Most authors make reference to syllable position when describing /ʀ/
distributions – Vinay (1950): [ʀ] as geographic – Clermont & Cedergren (1979): [ʀ] in codas – Tousignant (1983, 1987): [ʀ] pre-vocalic > post-consonantal; [ʁ]
post-vocalic or end of syllable – Sankoff, Blondeau & Charity (2001): [ʀ, ʁ] more frequent in
codas.
!16
NWAV40 October 29, 2011
Allophones of /ʀ/: Sociolinguistic evidence
/ʀ/ has had a variety of articulations for a very long time – [ ʀ ~ ʁ|χ ~ ʁ̞ ] share similar place of articulation – all varieties are relatively sonorous and vocalic in nature
[ ʁ̞ ] is becoming more frequent – [ʀ] commonly replaced by fricatives or approximants
• informal speech, approximants becoming dominant allophone
• emphatic or careful speech, fricatives becoming dominant allophone
!17
NWAV40 October 29, 2011
Allophones of /ʀ/: Summary
/ʀ/ has had a variety of articulations for a very long time – [ ʀ ~ ʁ|χ ~ ʁ̞] share similar place of articulation – all varieties are relatively sonorous and vocalic in nature – [ ʁ̞] is becoming more frequent
Most authors make reference to syllable position when describing /ʀ/ distributions
!18
NWAV40 October 29, 2011
The Corpus
– Hansen archives of political debate • Assemblée nationale du Québec
– One week of debates • June 12-16, 2007 • > 6 hours of speech • 61 speakers (43 male, 18 female) • Age range from 24--67
N of Speakers Sex M = 19
Ages (Avg) 24-67 (51)
Years in Office (Avg) 1-31 (7)
26 F = 7 37-51 (41) 4-13 (8)
!19
NWAV40 October 29, 2011
The forced aligner: P2FA
P2FA: Penn Phonetics Lab Forced Aligner (Yuan and Liberman, 2008) – is an automatic phonetic alignment toolkit based on HTK
(Hidden Markov Model Toolkit), – takes as input a .wav audio file and a .txt orthographic
transcription file, – used in conjunction with a pronunciation dictionary, – produces a Praat TextGrid (Boersma) with interval boundaries
for both words and phones.
!20
NWAV40 October 29, 2011
An aligned example
!21
NWAV40 October 29, 2011
P2FA: Dictionary Modifications
Pronunciation dictionary: – Lexique, version 3 (New et al, 2001, 2004)
• >135,000 words • includes orthographic, phonemic, syllabification, part of
speech, gender, number, frequency – Word list generated from transcription
• ortho [outsym] [probability] P1 P2 P3 etc • expanded with alternate pronunciations
– ministre [] [] m i n i s t ʁ – ministre [] [] m i n i s t – ministre [] [] m i n i s – ministre [] [] m i n i s t ʁ ә
• 235,401 entries
!22
NWAV40 October 29, 2011
P2FA: Acoustic Model Mappings
Identical consonant mappings: – [b d f g j k l m n p s t v w z ʃ ʒ]
Identical vowel mappings: – [e a i o u ә ɛ ɔ]
Ad hoc mappings: – œ̃ → UH2 “contain” – ɛ ̃→ EY0 “complain” – ɑ̃ → AA1 “song” – ø → AH0 “popular” – œ → AH0 “foot” – ɔ̃ → AA0 “pond” – y → UW1 “dew” – ʀ → HH “hot”
!23
NWAV40 October 29, 2011
An aligned example
!24
NWAV40 October 29, 2011
An aligned example
start phone = start word = word initial
!25
NWAV40 October 29, 2011
An aligned example
“e” + “ә” = V_V
!26
NWAV40 October 29, 2011
An aligned example
V_V + word initial: “les recommandations” =
SimpleOnset
!27
NWAV40 October 29, 2011
Syllable position by Context
SimpleOnset SimpleCoda ComplexOnset ComplexCoda
V_V ☺ ☺
C_V ☺ ☺ ☺
V_C ☺
!28
NWAV40 October 29, 2011
Acoustic measurements
!29
NWAV40 October 29, 2011
Acoustic measurements: Intensity
Minimum intensity
!30
NWAV40 October 29, 2011
Energy in first formant
floor = f1-.5(bandwidth) ceiling = f1+.5(bandwidth) Energy in first formant: Get band energy... floor ceiling
!31
NWAV40 October 29, 2011
Centre of gravity
Centre of gravity: stop filtered at 500Hz
!32
NWAV40 October 29, 2011
Variables explained
Dependent Variables – Energy in first formant and Centre of Gravity
• Both of these values relate to measurements of both energy and frequency.
!33
NWAV40 October 29, 2011
Variables explained
Dependent Variables – Energy in first formant and Centre of Gravity
• [ ʁ χ ] ≠ [ ʀ ʁ̞ ]
!34
NWAV40 October 29, 2011
Variables explained
Dependent Variables – Energy in first formant and Centre of Gravity
Independent Variables – Allophone (2 levels)
• Approximants [ ʀ ʁ̞] – expect to have higher values for formant energy,
lower values for centre of gravity • Fricatives [ ʁ χ ]
– expect to have lower values for formant energy, higher values for centre of gravity
!35
NWAV40 October 29, 2011
Variables explained
Dependent Variables – Energy in first formant and Centre of Gravity
Independent Variables – Allophone (2 levels) – Syllable (4 levels)
• SimpleOnset • SimpleCoda • ComplexOnset • ComplexCoda
!36
NWAV40 October 29, 2011
Variables explained
Dependent Variables – Energy in first formant and Centre of Gravity
Independent Variables – Allophone (2 levels) – Syllable (4 levels)
Control for Context – V_V, C_V, V_C
!37
NWAV40 October 29, 2011
Q1: Syllable position and allophonic variation
Is syllable position related to allophonic variation in Québec French /ʀ/?
Relationship between syllable position and allophonic variation tested through contingency tables with χ2 values to evaluate relationship.
– Isolate two conditions in order to control for either syllable position or context.
!38
NWAV40 October 29, 2011
Q1 Similar contexts: V_V
V_V SimpleOnset SimpleCoda Totals
Approximants [ʀ ʁ̞] 195 33 228
Fricatives [ʁ χ] 95 15 110
Totals 290 48 338
Intervocalic context has two syllable positions: SimpleOnset and SimpleCoda
!39
NWAV40 October 29, 2011
Q1 Similar contexts: V_V
V_V SimpleOnset SimpleCoda Totals
Approximants [ʀ ʁ̞] 195 (196) 33 (32) 228
Fricatives [ʁ χ] 95 (94) 15 (16) 110
Totals 290 48 338
Intervocalic context has two syllable positions: SimpleOnset and SimpleCoda
Intervocalic context (V_V) – χ2(1)=0.43, p=0.836 – odds ratio = 0.93 – equally likely to be Approximant [ʀ ʁ̞] in onset or coda
!40
NWAV40 October 29, 2011
Q1 Similar contexts: C_V
SimpleOnset ComplexOnset ComplexCoda TotalsApproximants [ʀ ʁ̞] 2 60 2 64
Fricatives [ʁ χ] 8 143 18 169Totals 10 203 20 233
Post-consonantal context has three syllable positions: SimpleOnset, ComplexOnset, ComplexCoda
!41
NWAV40 October 29, 2011
Q1 Similar contexts: C_V
SimpleOnset ComplexOnset ComplexCoda TotalsApproximants [ʀ ʁ̞] 2 (3) 60 (56) 2 (5) 64
Fricatives [ʁ χ] 8 (7) 143 (147) 18 (15) 169Totals 10 203 20 233
Post-consonantal context has three syllable positions: SimpleOnset, ComplexOnset, ComplexCoda
Post-consonantal context (C_V) – χ2(2)=3.788, p=0.150 – odds ratio = 1.68 – slightly more likely to be [+son] in ComplexOnset than
SimpleOnset
!42
NWAV40 October 29, 2011
Q1 Similar syllables: SimpleOnset
V_V C_V TotalsApproximants [ʀ ʁ̞] 195 2 197
Fricatives [ʁ χ] 95 8 103Totals 290 10 300
SimpleOnset can occur in two contexts: intervocalic (V_V) or post-consonantal (C_V)
!43
NWAV40 October 29, 2011
Q1 Similar syllables: SimpleOnset
V_V C_V TotalsApproximants [ʀ ʁ̞] 195 (190) 2 (7) 197
Fricatives [ʁ χ] 95 (100) 8 (3) 103Totals 290 10 300
SimpleOnset can occur in two contexts: intervocalic (V_V) or post-consonantal (C_V)
– χ2(1)=9.569, p=0.002 – odds ratio = 8.2 – more likely to be [ʀ ʁ̞] in V_V than C_V
!44
NWAV40 October 29, 2011
Q1 Similar syllables: SimpleCoda
V_V V_C TotalsApproximants [ʀ ʁ̞] 33 0 33
Fricatives [ʁ χ] 15 12 27Totals 48 12 60
SimpleCodas can occur in two contexts: intervocalic (V_V) or pre-consonantal (V_C).
!45
NWAV40 October 29, 2011
Q1 Similar syllables: SimpleCoda
V_V V_C TotalsApproximants [ʀ ʁ̞] 33 (26) 0 (7) 33
Fricatives [ʁ χ] 15 (22) 12 (5) 27Totals 48 12 60
SimpleCodas can occur in two contexts: intervocalic (V_V) or pre-consonantal (V_C).
– χ2(1)=18.333, p<0.001 – odds ratio = 27.5 – much more likely to be [ʀ ʁ̞] in V_V than V_C
!46
NWAV40 October 29, 2011
Q1: Syllable position and allophonic variation
Is syllable position related to allophonic variation in Québec French /ʀ/?
– Possibly not. – In similar contexts, differences between Observed and
Expected frequencies are not significant. • Although need to test more syllable positions to be sure.
!47
NWAV40 October 29, 2011
Q1: Syllable position and allophonic variation
Is syllable position related to allophonic variation in Québec French /ʀ/?
– Possibly not. – In similar contexts, differences between Observed and
Expected frequencies are not significant. Is context related to allophonic variation in Québec French /ʀ/?
– Possibly. – In similar syllable positions, approximants [ʀ ʁ̞] are more likely
to occur intervocalically (V_V) than either post- or pre-consonantally (C_V, V_C). • Although need to test more syllable positions.
!48
NWAV40 October 29, 2011
Q2: Differences between allophones
Are there measurable differences between allophones of Québec French /ʀ/?
Can differences in the allophones be explained with reference to the syllable?
Differences in related dependent variables (energy in first formant, centre of gravity) according to independent variables (syllable position, allophone) tested through multivariate analysis of variance (MANOVA).
!49
NWAV40 October 29, 2011
Q2: Differences between allophones
Are there measurable differences between allophones of Québec French /ʀ/?
Can differences in the allophones be explained with reference to the syllable?
Differences in related dependent variables (energy in first formant, centre of gravity) according to independent variables (syllable position, allophone) tested through multivariate analysis of variance (MANOVA).
– Separate statistical models for each of two contexts: • Intervocalic (V_V) • Post-consonantal (C_V)
– 581 items drawn from corpus
!50
NWAV40 October 29, 2011
Q2: Summary Statistics
V_V EF1 COGSimpleOnset Approximants [ ʀ ʁ̞ ] 1.248 (0.891) 1459 (432)
Fricatives [ ʁ χ ] 0.724 (0.682) 1690 (445)SimpleCoda Approximants [ ʀ ʁ̞ ] 1.067 (0.675) 1258 (303)
Fricatives [ ʁ χ ] 0.859 (0.860) 1350 (296)C_VSimpleOnset Approximants [ ʀ ʁ̞ ] 0.807 (0.615) 1483 (336)
Fricatives [ ʁ χ ] 0.400 (0.403) 1739 (530)ComplexOnset Approximants [ ʀ ʁ̞ ] 0.975 (0.699) 1516 (502)
Fricatives [ ʁ χ ] 0.531 (0.546) 1760 (625)ComplexCoda Approximants [ ʀ ʁ̞ ] 0.876 (0.772) 1106 (338)
Fricatives [ ʁ χ ] 0.435 (0.700) 1751 (550)
!51
NWAV40 October 29, 2011
Q2 Results: Intervocalic V_V
!52
NWAV40 October 29, 2011
Q2 V_V: Main effect of syllable positionF(1,334) = 7.478, p < 0.001
!53
NWAV40 October 29, 2011
Q2 V_V: Main effect of allophones
F(1,334) = 20.607, p < 0.001
!54
NWAV40 October 29, 2011
Q2 V_V: Differences in centre of gravity by syllable position
Main effect of syllable due to different values for centre of gravity. F(1,334) = 14.270, p < 0.001 Centre of gravity is lower in SimpleCodas than in SimpleOnsets
!55
NWAV40 October 29, 2011
Q2 V_V: Differences in energy in first formant by allophones
Effect of allophones due to different values for energy in first formant. F(1,334) = 25.605, p < 0.001 Approximants [ʀ ʁ̞] had higher values for energy in first formant than Fricatives [ʁ χ].
!56
NWAV40 October 29, 2011
Q2 V_V: Differences in centre of gravity by allophones
Effect of allophones due to different values for centre of gravity. F(1,334) = 18.775, p < 0.001 Approximants [ʀ ʁ̞] had lower values for centre of gravity than Fricatives [ʁ χ].
!57
NWAV40 October 29, 2011
Q2 Results: Post-consonantal C_V
!58
NWAV40 October 29, 2011
Q2 C_V: Main effect of allophonesF(2,237) = 15.497, p < 0.001
!59
NWAV40 October 29, 2011
Q2 C_V: Differences in energy in first formant by allophones
Effect of allophones due to different values for energy in first formant. F(1,237) = 25.552, p < 0.001 Approximants [ʀ ʁ̞] had higher values for energy in first formant than Fricatives [ʁ χ].
!60
NWAV40 October 29, 2011
Q2 C_V: Differences in centre of gravity by allophones
Effect of allophones due to different values for centre of gravity. F(2,237) = 9.757, p = 0.002 Approximants [ʀ ʁ̞] had lower values for centre of gravity than Fricatives [ʁ χ].
!61
NWAV40 October 29, 2011
Q2: Differences between allophones
Are there measurable differences between allophones of Québec French /ʀ/?
– Possibly yes. – Approximants [ʀ ʁ̞] had significantly higher values for energy in
first formant and lower values for centre of gravity than Fricatives [ʁ χ]. • This is true for both V_V and C_V items.
!62
NWAV40 October 29, 2011
Q2: Differences between allophones
Are there measurable differences between allophones of Québec French /ʀ/?
– Possibly yes. – Approximants [ʀ ʁ̞] had significantly higher values for energy in
first formant and lower values for centre of gravity than Fricatives [ʁ χ].
Can differences in the allophones be explained with reference to the syllable?
– In this data set, no. – Only intervocalic simple codas showed a significant difference
in centre of gravity.
!63
NWAV40 October 29, 2011
Future Directions
To continue with curent thread: – More data from current corpus
• 60+ hours currently available – Add data from other natural language corpora
• Assemblée nationale de France – Extract more “difficult” syllable positions
• V_C = ComplexCoda – Systematically identify allophones
• Discriminant analysis
!64
NWAV40 October 29, 2011
Future Directions
To continue with curent thread: To expand in different directions:
– Larger corpora • data from other corpora?
– Longitudinal studies of change • Assemblée nationale du Québec archives?
– Dialectal variation • data from Phonologie du Français Contemporain?
– Other examples • schwa omission / insertion? • consonant cluster simplification?
!65
Top Related