PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and...

69
Computational Modelling of Sound Pattern Acquisition February 13-14, 2010 PROGRAM & ABSTRACTS Department of Linguistics University of Alberta Edmonton, AB, Canada

Transcript of PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and...

Page 1: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Computational Modelling

of Sound Pattern

Acquisition

February 13-14, 2010

PROGRAM &

ABSTRACTS

Department of Linguistics

University of Alberta

Edmonton, AB, Canada

Page 2: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Workshop on Computational Modelling of Sound Pattern Acquisition

Department of Linguistics University of Alberta February 13-14, 2010

Saturday, February 13, 2010 8:00-9 Registration and check-in 9-9:30 Adam Albright (MIT) Well-formedness across the word: Modeling markedness

interactions 9:30-10 Vsevolod Kapatsinski (U. Oregon) Humans and models learning palatalization patterns

in miniature artificial languages: In support of particular salience of typical product characteristics

10-10:30 Jeff Heinz (U. Delaware) Learning gradient long-distance phonotactics by estimating strictly piecewise distributions

10:30-10:50 Break 10:50-11:20 Karen Jesney, Joe Pater & Robert Staubs (UMass) Restrictive learning with

distributions over underlying representations 11:20-11:50 Bruce Hayes (UCLA) Accidentally-true constraints in phonotactic learning 11:50-1:30 Lunch (not provided) 1:30-2 Emily Cliff and Robert Kirchner (U. Alberta) Getting type frequency effects in an

exemplar model 2-2:30 Andrew Wedel (U. Arizona) Functional load and feedback stabilization of phonemic

category distinctions 2:30-3 Fred Mailhot (Carleton) Modelling the acquisition of vowel harmony with a lazy

learner 3:00-4:30 Poster session I (light refreshments) 4:30-5 Jeff Mielke (U. Ottawa) Getting the features you want 5-5:30 Janet Pierrehumbert (Northwestern) Predicting variation in the order of acquisition of

morphophonological patterns. 6:30-? Dinner/party

Poster Session I 1. Hagen Peukert (U. Kassel) Phonemic distribution of sounds as a basis for word boundary detection in 6- to 8-

months-olds 2. Rebecca Morley (OSU) From Sound Change to Grammar Change: words, lexicons, and learners 3. Philip Dilts & Anne-Michelle Tessier (U. Alberta) A computational implementation of Error-Selective Learning 4. Ashley Farris-Trimble (U. Iowa) Modeling the acquisition of cumulative faithfulness effects 5. Keith S Apfelbaum and Bob McMurray (U. Iowa) When discrimination is not enough: An associative model of the

development of phonological cue weighting. 6. Frans Adriaans, Natalie Boll-Avetisyan & René Kager (Utrecht) A Lexicon-Free Approach to the Induction of

OCP-Place 7. Julian Bradfield (U. Edinburgh) Acquisition and the Complexity of Phonemes and Inventories 8. Bob McMurray (U. Iowa) and Allard Jongman (U. Kansas) Statistical and exemplar approaches to speech

perception: What abilities must also develop? 9. Kathryn Pruitt (UMass), Brian Smith (UMass), Andy Martin (UCLA) & Joe Pater (UMass) A Darwinian account

of underrepresentation of doubly marked forms

*abstracts are listed in the program in alphabetical order by presenter

Page 3: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Sunday, February 14, 2010 9-9:30 Elliott Moreton (UNC) Constraint induction and simplicity bias 9:30-10 Alan Yu (U. Chicago) The implications of analyzing channel bias rationally 10-10:30 Robert Daland (UCLA) Learning metrical segmentation: the problem of function

words 10:30-11:45 Poster session II (light refreshments) 11:45-1:15 Lunch (not provided) 1:15-1:45 Benjamin Munson (U. Minnesota), Mary E. Beckman (OSU), Jan Edwards (U.

Wisconsin-Madison), Jeff Holliday (OSU), Hannah Julien (U. Minnesota), and Fangfang Li (U. Lethbridge) Modeling the acquisition of anterior lingual sibilant fricatives in English: Integrating behavioral data with computational learning models

1:45-2:15 Paul Boersma & Katerina Chladkova (U. Amsterdam) Phonetic perception-production asymmetries reflect phonological feature structure

2:15-2:45 Michael Becker (Harvard) Target selection in error selective learning 2:45-3:05 Break 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning

phonemes with a pseudo-lexicon 3:35-4:05 Giorgio Magri (Jean Nicod Institute) An online model of the 'early stage' of the

acquisition of phonology Poster Session II

1. Brian Dillon, Ewan Dunbar, Bill Idsardi (U. Maryland) A single-stage computational model of phoneme category acquisition: Results from Inuktitut

2. Jeffrey Heinz, Cesar Koirala (U. Delaware) Feature-based Generalization 3. Rebecca Colavin, R. Levy, Sharon Rose (UCSD) Modeling OCP-Place with the Maximum

Entropy Phonotactic Learner 4. Kristine Yu (UCLA) Linear separability and feature selection in the acquisition of tones 5. Jan-Willem van Leussen (U. Amsterdam) The emergence of natural vowel patterns in a

phonetic/phonological acquisition model 6. Jason Naradowsky, Joe Pater, David Smith, Robert Staubs (UMass) Learning Hidden Metrical

Structure with a Log-Linear Model of Grammar 7. Tamás Biró (U. Amsterdam) From Performance Errors to Optimal Competence -- Learnability

of OT and HG with Simulated Annealing 8. Robert Kirchner (U. Alberta) An exemplar based speech production model 9. John Alderete, Paul Tupper (SFU) and Stefan Frisch (U. S. Fla.) Phonotactic learning without a

priori constraints: A connectionist analysis of Arabic cooccurrence restrictions

*

*abstracts are listed in the program in alphabetical order by presenter

Page 4: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

A Lexicon-Free Approach to the Induction of OCP-Place

OCP-Place (McCarthy, 1988) has the gradient e!ect that labial-labial pairs across a vowel areunderrepresented in the lexicons of many languages (Arabic: Frisch, Pierrehumbert, & Broe,2004; Dutch: Kager & Shatzman, 2007). Previous proposals for the learning of constraintssuch as OCP-Place assume that phonotactic constraints are the result of abstractions overstatistical patterns in the lexicon (Frisch et al., 2004; Hayes & Wilson, 2008). Psycholinguisticstudies, however, suggest that constraints on non-adjacent consonants may be learnable fromcontinuous speech input. Specifically, human learners are able to track transitional probabilitiesof non-adjacent consonants in a stream of artificial continuous speech (Newport & Aslin, 2004;Bonatti, Pena, Nespor, & Mehler, 2005). Such a lexicon-free approach is supported by studiesshowing that pre-lexical infants already possess probabilistic knowledge of the native languagephonotactics (Jusczyk, Luce, & Charles-Luce, 1994) and use phonotactics to learn words fromcontinuous speech (Mattys & Jusczyk, 2001).

The current study extends the lexicon-free approach to the induction of abstract, feature-based phonotactic constraints on non-adjacent consonants. Specifically, we look at whetherOCP-Place is learnable from a corpus of transcribed continuous speech. If OCP-Place canbe induced from continuous speech, then it may act as a valuable cue during word learning,since the occurrence of consonants sharing place of articulation in the speech stream wouldsignal the presence of a word boundary to the learner.

To simulate the induction of OCP-Place, we use StaGe, a computational model designedto learn phonotactic generalizations from unsegmented input (Adriaans & Kager, to appear).StaGe was trained on non-adjacent consonants in the Spoken Dutch Corpus (Goddijn & Bin-nenpoorte, 2003). Word boundaries within utterances were removed. The model detects statis-tical underrepresentations of specific C(V)C sequences. Underrepresentations in the statisticaldistribution cause the learner to induce a markedness constraint *c1Vc2 for the specific pair ofconsonants c1, c2. When phonologically similar constraints arise, the learner adds a phonotacticgeneralization, abstracting over the feature di!erence between constraints. For example, *mVf ,*mVv ! *mV{f, v}, etc. The model thus learns specific and abstract constraints. Interest-ingly, StaGe uses no explicit mechanism for similarity avoidance to learn these constraints.That is, the model does not assess the feature di!erence between c1 and c2. Rather, the modelgeneralizes over specific underrepresentations in the data, considering only similarities betweendi!erent consonants in c1-position, and between di!erent consonants in c2-position.

An artificial language learning experiment tested whether human learners use OCP-Place insegmentation. Dutch participants were exposed to sequences of CV syllables, where coronals (T)followed two labials (P). The stream contained no statistical cues to word boundaries, and thushad three possible segmentations: TPP, PPT, PTP. We found a significant preference for PTPwords (which respect OCP-Place) over PPT and TPP words (which violate it; PTP>PPT;58%**; PTP>TPP; 55%*), indicating that OCP-Place a!ects word segmentation.

In a post-hoc analysis we try to characterize the exact phonotactic knowledge that wasused by the participants during segmentation. We test the outputs of di!erent models on theirability to predict the human data. Crucially, we compare StaGe to a segment-based statisticallearner (Newport & Aslin, 2004) and to a pre-defined OCP-Place. All three models are signif-icant predictors of the human preferences. (Statistical Learning: R2=0.3969***; OCP-Place:R2=0.2917**; StaGe: R2=0.5111***) Additional stepwise analyses confirmed that StaGe isthe best predictor of the data. Inspection of the constraint set that is learned by StaGe (seeTable 1) reveals that the model learns both more specific and more general versions of OCP-Place. To conclude, it seems that a lexicon-free approach to phonotactic learning is able toaccount for the learning of constraints that resemble (but not exactly match) OCP-Place.This mix of specific and abstract constraints, learned from continuous speech, provides a betterfit to human word segmentation data than models that rely on pure consonant distributions(i.e., without referring to features) or on a single, pre-defined OCP-Place.

1

Frans Adriaans, Natalie Boll-Avetisyan & René Kager

dmcken
Typewritten Text
dmcken
Typewritten Text
Page 5: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

References

Adriaans, F., & Kager, R. (to appear). Adding generalization to statistical learning: Theinduction of phonotactics from continuous speech. Journal of Memory and Language.

Bonatti, L. L., Pena, M., Nespor, M., & Mehler, J. (2005). Linguistic constraints on statisticalcomputations. Psychological Science, 16 (6), 451-459.

Frisch, S. A., Pierrehumbert, J. B., & Broe, M. B. (2004). Similarity avoidance and the OCP.Natural Language & Linguistic Theory, 22 , 179-228.

Goddijn, S., & Binnenpoorte, D. (2003). Assessing manually corrected broad phonetic tran-scriptions in the Spoken Dutch Corpus. In Proceedings of the 15th ICPhS (p. 1361-1364).Barcelona.

Hayes, B., & Wilson, C. (2008). A maximum entropy model of phonotactics and phonotacticlearning. Linguistic Inquiry , 39 , 379-440.

Jusczyk, P. W., Luce, P. A., & Charles-Luce, J. (1994). Infants’ sensitivity to phonotacticpatterns in the native language. Journal of Memory and Language, 33 , 630-645.

Kager, R., & Shatzman, K. (2007). Phonological constraints in speech processing. In B. Los &M. van Koppen (Eds.), Linguistics in the netherlands 2007 (p. 100-111). John Benjamins.

Mattys, S. L., & Jusczyk, P. W. (2001). Phonotactic cues for segmentation of fluent speech byinfants. Cognition, 78 , 91-121.

McCarthy, J. J. (1988). Feature geometry and dependency: A review. Phonetica, 43 , 84–108.Newport, E. L., & Aslin, R. N. (2004). Learning at a distance: I. Statistical learning of

non-adjacent dependencies. Cognitive Psychology , 48 , 127-162.

Table 1: The induced constraints that were used in the segmentation of the artificial language.Possible segmentations of consonant sequences in the language (e.g., P.PTP, PP.TP, PPT.P)are evaluated using strict domination.

ID CONSTRAINT RANKING

01 *[b]V[m] 1480.881602 *[m]V[p,f] 1360.180103 *[m]V[p,b,f,v] 1219.156504 *[C]V[p,t] 376.258405 *[p,b,f,v]V[p,b,t,d,f,v,s,z] 337.791006 *[p,f]V[C] 295.749407 *[C]V[t,s,S] 288.438908 *[p,b,f,v]V[t,d,s,z,S,Z,Ã] 287.573909 *[C]V[p,b,t,d] 229.1519

(V = vowel, C = obstruents = [ p, b, t, d, k, g, f, v, s, z, S, Z, x, È, h, Ã])

2

Frans Adriaans, Natalie Boll-Avetisyan & René Kager

Page 6: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Well-formedness across the word: Modeling markedness interactions

Adam Albright, MIT

Many recent studies have attempted to predict the acceptability of phonological structures based ontheir frequency, or the frequency of similar structures. For example, initial and medial consonant clustersare unsurprisingly judged to be more acceptable if they have higher type frequency (Hay, Pierrehumbert, andBeckman 2004), or if they are featurally similar to high frequency clusters (Hayes & Wilson 2008; Albright2009; Davidson & Wilson 2008). These studies have focused on modeling the acceptability of a cluster,while controlling for (or ignoring) structures elsewhere in the word. In this talk, I consider evidence thatthe acceptability of a cluster can change depending on what other structures are present in the word. Suchinteractions are interesting for computational models of phonological acceptability, because different mod-els make different predictions about the degree of interaction between structures. In OT, the probability ofa candidate surface form is determined by its worst violation, predicting no contribution from lower-rankedconstraints, and no interaction between constraints. In weighted constraint models such as Harmonic Gram-mar, as well as in n-gram models, the cost of each violation is assessed independently, and then combinedto determine a candidate’s probability (‘additive interaction’). A final possibility is that the cost of a vio-lation depends on what other violations are present in the word, such that the probability of two structuresco-occurring is less than the joint probability of each occurring independently (‘super-additive interaction’).

Super-additive effects are not predicted by any standard model of constraint interaction. However, thereare indications that they do, in fact, occur. Frisch (1996) points out that lack of words such as *spap and*skick in English (Davis 1984) may be interpreted as a super-additive interaction of a *sC cluster constraintand a *[!cor]. . . [!cor] OCP constraint. I present data from lexical counts showing a wide range of casesin which combinations of certain clusters are underattested relative to what we would expect, given the jointprobability of the two clusters independently. For example, s+sonorant clusters (#sw, #sm, #sn) have lowtype frequency in English, as do final lC clusters (lk#, lp#, lf #, etc.); therefore, words with combinations ofthe two (sNVlC) are expected to be quite uncommon. Counts from CELEX reveal that such words are evenrarer than expected (Fisher’s exact, p < .0001)—in fact, the sole monomorphemic example is smelt. Frischprovides a diachronic account of such effects, according to which words with improbable combinationsare less likely to be created or retained. If this is correct, then a simple additive model such as HarmonicGrammar would suffice to model the computation of acceptability. I present experimental evidence showingthat this effect is not purely diachronic; in fact, speakers judge non-words like swilk, snelp, or smulf as lessacceptable than would be predicted based on the acceptability of simpler forms such as swick, snell, andmulf. Crucially, however, not all combinations of clusters show super-additive effects. Relatively commonclusters, such as initial voiceless stops + liquids, or final st#, co-occur approximately as often as expected,and the acceptability of non-words like crast or prust follows from the acceptability of their subparts.

The challenge, then, is to provide a computational model that penalizes certain constraint violations morein the presence of another violation. I propose a model in which acceptability judgments arise through acombination of two levels of evaluation: (1) a non-grammatical evaluation of phonotactic probability, whichassesses the joint probability of the substrings in a word, and (2) evaluation by a grammar of weightedconstraints, further penalizing sequences that violate high-weighted constraints. For grammatically licitcombinations such as #kr and st#, acceptability is determined by simple joint probability. For grammaticallypenalized clusters such as #sw or lk#, phonotactic probability and grammatical probability combine to yieldsuper-additive effects. I sketch a model in which learners factor out phonotactic probability in learningweights of grammatical constraints.

Adam Albright

Page 7: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Typical approaches to phonological development emphasize discrimination abilities as indicators of phonological knowledge. Werker and Tees (1984) showed that infants tune their phonological abilities to their native language during this time, such that by 12 months infants can discriminate distinctions that cross phonetic boundaries used in their native language, but can no longer discriminate contrasts that are not used. As a result, it is widely assumed that by 12 months, infants have acquired the phonological categories of their native language. However, 14-month old infants show deficits in applying phonological knowledge in word learning. Stager and Werker (1997) found that despite intact discrimination abilities, when learning to associate minimal pair words (bih and dih) with novel visual referents infants fail to treat the words as distinct. They interpreted this as evidence of task difficulty blocking the application of intact phonological knowledge. Numerous other accounts have maintained assumptions of fully developed phonological

abilities, with the effects emerging from factors outside phonology (Pater, Stager & Werker, 2004; Swingley & Aslin, 2007; Werker, Cohen, Lloyd, Casasola & Stager, 1998). Recent work calls these interpretations into question. Rost and McMurray (2009) found that using variable training exemplars improves word learning abilities. Specifically, when infants are exposed to words with variability in dimensions that are non-criterial for word identity (e.g. pitch and timbre), they are better able to learn minimal pair word referents. This suggests that failures to learn minimal pair words result from more than difficulty using intact phonological abilities, as the variable training set makes the learning task more difficult, not less. Perhaps, then, there is more to phonological development than discrimination can reveal. We present an associative model examining these results (Figure 1). This model assumes infants are equally willing to associate criterial (e.g. VOT) cues and non-criterial (e.g. pitch) cues with word identity. The model employs Hebbian learning to learn auditory-visual pairings. During training, an auditory word is presented as a series of activations across an array of acoustic dimensions (VOT, pitch, timbre), while a visual object is represented as activation of a localist visual unit. The model strengthens connection weights between coactivated auditory and visual units, while anti-correlated units decrement connection weights. Through these basic associative mechanisms, the model comes to mirror empirical patterns of word learning and suggests that difficulties at 14 months may result from incomplete phonological development. When trained without variability, the single speaker’s voice becomes associated with both referents, making it difficult for the model to discriminate them. However, when the speaker is variable during training, each speaker is only weakly associated with the visual referents, and thus contributes little to the ability to discriminate the cues. This emergent cue-weighting allows the model to simulate a number of results in the empirical literature. Using these same assumptions, the model also captures patterns of data consistent with mispronunciation effects in early word learning (Ballen & Plunkett, 2005; White & Morgan, 2008). It shows sensitivity to different degrees of phonological mismatch, and maps well onto the empirical data. Additionally, we present a dynamic version of the network that uses a more realistic architecture, to explore the generality of our associative account of incomplete phonological development.

The results from our simulations encourage a drastic rethinking of early phonological abilities. Despite strong discrimination abilities, we suggest that infants are unaware which dimensions are meaningful for word identification. Though 14-month old infants may have learned how to categorize across phonological dimensions, they are not yet aware how to weight different cues, and this ability emerges over simple associations with specific cue values. Learning where to categorize boundaries is only part of the task of phonological development; our model suggests that the additional task of learning how to use this information if still developing early in the second year of life.

Keith S Apfelbaum and Bob McMurray

Page 8: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

References

Ballem, K.D. & Plunkett, K. (2005). Phonological specificity in children at 1;2. Journal of Child

Language, 32(1), 159-173. Pater, J., Stager, C.L. & Werker, J.F. (2004). The lexical acquisition of phonological contrasts. Language,

80, 361-379. Rost, G.C. & McMurray, B. (2009). Speaker variability augments phonological processing in early word

learning. Developmental Science, 12(2), 339-349. Stager, C.L. & Werker, J.F. (1997). Infants listen for more phonetic detail in speech perception than in

word-learning tasks. Nature, 388(6640), 381-382. Swingley, D. & Aslin, R.N. (2007). Lexical competition in young children's word learning. Cognitive

Psychology, 54(2), 99-132. Werker, J.F., Cohen, L.B., Lloyd, V.L., Casasola, M., & Stager, C. L. (1998). Acquisition of word-object

associations by 14-month-old infants. Developmental Psychology, 34(6), 1289-1309. Werker, J.F. & Tees, R.C. (1984). Cross-language speech perception: Evidence for perceptual

reorganization during the first year of life. Infant Behavior and Development, 7, 49-63.

White, K.S. & Morgan, J.L. (2008). Sub-segmental detail in early lexical representations.

Journal of Memory and Language, 59(1), 114-132. Figure 1

VOT Pitch Timbre

Visual

Auditory units

buk puk

VOT Pitch Timbre

Visual

Auditory units

buk puk

Figure 1: Schematic of the associative model used to model phonological cue weighting.

Keith S Apfelbaum and Bob McMurray

Page 9: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

!"#$%& '%(%)&*+, *, -##+# .%(%)&*/% 0%"#,*,$!"#$%& '%(%)&*+, *' " #"#%(- .*')/''%. 01%,+2%,+, *, 01+,+(+$*)"( ")3/*'*&*+,4 5, &"#$%& '%(%)&*+,67*(.#%, 81+ 1"9% " )%#&"*,:"#;%.,%'' 0#%''/#% *, &1%*# 01+,+(+$- <%4$4 "(( 8+#.' 2/'& =% &#+7"*)>/'% *& ,+& +,(- &+ 81*?(% .+8, 0#+'+.*)"((- 2"#;%. 8+#.'6 "' *' +@%, +='%#9%.6 =/& "('+ &+ !"#$%8+#.' &1"& 1"9% &1%'% 2"#;%. '&#/)&/#%' *, &1% A#'& 0(")%4 B*' 0"0%# +C%#' ", ","(-'*' +D &"#$%&'%(%)&*+, *, &%#2' +D E##+# F%(%)&*9% G%"#,*,$ <!%''*%# HIIJ6 HIIKL M%N%# O !%''*%# HIPI>6 *, 81*7 "'*,$(% 2"#;%.,%'' )+,'&#"*,& .#*9%' A#'& &1% "9+*.",)% ",. &1%, &1% '*20(*A)"&*+, +D "./(& D+#2'4

1% 23%,+4%,+,5 Q1*(.#%, .+,R& +,(- *20+'% #%'&#*)&*+,' +, 8+#.' &1%- 0#+./)%6 &1%- "('+7++'% "./(& D+#2' &1"& 1"00%, &+ =%?%# )+,D+#2 &+ &1% '"2% #%'&#*)&*+,' <F78"#&S O G%+,"#. PKTHLF78"#&S %& "(4 PKTJ>4 B*' 0"0%# D+)/'%' +, 0"?%#,' +D &#/,)"&*+, *, ,"&/#"( ",. %(*)*&%. '0%%7 +D "U%=#%8 '0%";*,$ 7*(. #%0+#&%. =- V."2O M"&WE( <HIIJ>4 B*' 7*(. *' +='%#9%. &+ 0#+./)% &#+7%%'<!!6 FX> D"*&1D/((- 2+#% +@%, &1", *"2=' <XF6 !!>6 "' *, <P>4 B% '"2% .*'0#%D%#%,)% +D *"2=' *'"('+ '%%, *, 1*' !&'()*+Y 5"2=' "#% ZZ[ +D 1*' "?%20&' "& PLIH6 ",. &1% 0#+0+#&*+, +D *"2=' #*'%'$#"./"((- &+ \\[ "& PLIJ4

<P> ]#+./)&*+,' ",. "?%20&' +D 0+(-'-(("=*) &"#$%&' <V."2 O M"&WE( HIIJ>

V$%0#+./)&*+,' "?%20&'

!! &"#$%&' [^!_ !! &"#$%&' [^!_ [ !! &"#$%&'PLIH4IIWPLI`4Ia K a\[ J T\[ ZZ[PLI`4PZWPLIZ4HZ Z` PZ[ HK ZT[ ZI[PLIa4IZWPLIa4IT `K PI[ HK HT[ Z`[PLIa4PaWPLIa4HK `a PP[ aJ HT[ \H[PLI\4IHWPLI\4HI ZK H[ aa HK[ a`[PLI\4H\ H\ I[ a` PP[ \J[PLIJ4IHWPLIJ4IK aP Z[ KK Z[ \\[

V./(& U%=#%8 *' 0#%.+2*,",&(- *"2=*) <!J\[ +D &1% 9+)"=/("#-6 M+(+S;- O M%N%# HIIJ>4 F+&1*' 7*(. *' $#"./"((- "?%20&*,$ ", *,)#%"'*,$(- #%0#%'%,&"&*9% '"20(% +D &1% "./(& 9+)"=/("#-6 "'2"#;%.,%'' 0#%''/#%' "((+84 5, +&1%# 8+#.'6 " '*,$(% 2"#;%.,%'' 0#%''/#% "$"*,'& *"2=' )"/'%'*"2=*) 8+#.' %*&1%# &+ =% &#/,)"&%. +# &+ =% "9+*.%. "(&+$%&1%#4

6,"(7'*'5 V9+*.",)% *' 2+.%(%. 1%#% =- 7++'*,$ &1% ,/(( )",.*."&% "' &1% +/&0/&6 81*7 9*W+("&%' :b]cdef <]#*,)% O F2+(%,';- PKK`gHIIZ6 %& '%34>4 B% 7*(. '&"#&' +/& 8*&1 &1% *,*&*"(W'&"&%$#"22"# *, <H>6 81*7 2%",' &1"& &1% 7*(. *' '*(%,&6 "' *' *,.%%. +='%#9%.4

<H> :" h" :b]cdef

F*,)% &1% ,/(( +/&0/& 9*+("&%' ,+ 2"#;%.,%'' )+,'&#"*,&'6 *&' )+20"#*'+, 8*&1 &1% "./(& D+#2 *'*,D+#2"&*9%6 "' *, <`>4

<`> : h :b]cdef

"4 "./(& D+#2 # i G X

B% $#"22"# *' ,+8 "' *, <Z>6 2%",*,$ &1"& &1% '"2% 2"#;%.,%'' )+,'&#"*,& &1"& %"#(*%# )"/'%."9+*.",)%6 ,+8 )"/'%' '*20(*A)"&*+, +D &1% "./(& D+#24

<Z> :b]cdef":" h

P

Michael Becker

Page 10: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

j+8 &1% 7*(. )", 0"*# &1%*# '*20(*A%. D+#2 8*&1 &1% "./(& D+#26 ",. (%"#, D#+2 &1% ,%8 %##+#Y

<a> :b]cdef : h

"4 "./(& D+#2 # '*20(*A%. D+#2 G X

V& &1*' 0+*,& &1% 7*(. 1"' &1% %9*.%,)% &1%- ,%%. &+ #%"7 &1% "./(& $#"22"#Y

<\> h" :b]cdef":

X1*(% &1*' ","(-'*' +C%#' " )+20(%&% 0"&1 +D (%"#,*,$6 *& 0#+)%%.' *, 3/",&/2 (%"0'Y B% $#"2W2"# 7",$%' "=#/0&(-6 81%#%"' &1% +='%#9%. 7*(. 0#+./)&*+,' 7",$% $#"./"((-4 B*' *' Ak%. *,E##+# F%(%)&*9% G%"#,*,$ =- '&+#*,$ &1% 7*(.R' %##+#' *, " lQ"7%m6 81%#% &1%- $#"./"((- .%)"- +9%#&*2% ",. "((+8 0#+./)&*+,' &+ '(+8(- "00#+"7 &1% "./(& +/&0/&'4

B*' (%"#,*,$ 2%7",*'2 *' *20(%2%,&%. "' " )+20/&%# 0#+$#"2 &1"& &";%' "' *,0/& " #%0#%W'%,&"&*9% "./(& (%k*)+,6 ",. ", *,*&*"(W'&"&% $#"22"#4 B% (%"#,%# $#"./"((- ")3/*#%' &1*' (%k*)+,",. "?%20&' &+ 0#+./)% *&L &1% "?%20&' =%)+2% $#"./"((- 2+#% '/))%''D/( "' :"#;%.,%'' *' A#'&.%2+&%. =%(+8 :b]cdef ",. ("&%# =%(+8 h"*&1D/(,%''4

8+,)(9'*+,'5 B*' 0"0%# '1+8' 1+8 " '*,$(% :"#;%.,%'' 0#%''/#% A#'& )"/'%' "9+*.",)% +D8+#.' &1"& 1"9% " 2"#;%. '&#/)&/#%6 ",. ("&%# /,D"*&1D/( 0#+./)&*+,' +D &1%'% '"2% 8+#.'4 B%)+20/&"&*+,"( *20(%2%,&"&*+, +D &1% 2+.%( $%,%#"&%' &1% +='%#9%. 0"?%#,6 *, 81*7 &1% 7*(.R'%"#(- "?%20&' #%n%)& 01+,+(+$*)"((-W=*"'%. '/='%&' +D &1% (%k*)+,6 81*(% ("&%# "?%20&' =%)+2%*,)#%"'*,$(- 2+#% #%0#%'%,&"&*9%6 ",. &1% 0#+./)&*+,' =%)+2% *,)#%"'*,$(- 2+#% "./(&W(*;%4

:%;%#%,)%'V."26 o"(*& O p/&* M"&WE( <HIIJ>4 B% &#+7"*) =*"' *' /,*9%#'"(Y E9*.%,)% D#+2 U%=#%84 U",.+/&D#+2 o%,%#"&*9% V00#+"7%' &+ G",$/"$% V)3/*'*&*+,6 M"#)%(+,"4

M%N%#6 :*7"%( O V,,%W:*7%((% !%''*%# <HIPI>4 !#"q%)&+#*%' +D D"*&1D/(,%'' *, 7*(.W'0%)*A) 01+,+(W+$-4 !"(; 0#%'%,&%. "& &1% TZ&1 V,,/"( :%%&*,$ +D &1% GFV6 M"(&*2+#%6 :r4

]#*,)%6 V(", O ]"/( F2+(%,';- <PKK`gHIIZ>4 ,)*$(!-$*./'#0.1 2#3+*0!$3* 43*'0!5*$#3 $3 6'3'0!*$"'60!((!04 pkD+#.Y M("N8%((4 ^spVWa`J_4

F78"#&S6 s*7"#. o4 O G4 G%+,"#. <PKTH>4 r+ 7*(.#%, 0*N ",. 7++'%t ", %k"2*,"&*+, +D 01+,+W(+$*)"( '%(%)&*+, ",. "9+*.",)% *, %"#(- (%k*)"( ")3/*'*&*+,4 7#803!- #9 2:$-% ;!3<8!<' <4 `PKu``\4

F78"#&S6 s*7"#. o46 G4 G%+,"#.6 r4 G+%= O G4 F8",'+, <PKTJ>4 V?%20&%. '+/,.' "#% '+2%&*2%',+&Y V, %k0",.%. 9*%8 +D 01+,+(+$*)"( '%(%)&*+, ",. "9+*.",)%4 7#803!- #9 2:$-% ;!3<8!<' =>4ZPPvZPT4

!%''*%#6 V,,%W:*7%((% <HIIJ>4 =$!+'+ !3% +*!<'+ $3 ):#3#-#<$5!- !5>8$+$*$#34 ]14r4 .*''%#&"&*+,6w,*9%#'*&- +D :"''"7/'%?'6 V21%#'&4

!%''*%#6 V,,%W:*7%((% <HIIK>4 h#%3/%,)- +D 9*+("&*+, ",. )+,'&#"*,&W="'%. 01+,+(+$*)"( (%"#,*,$4;$3<8! ==<4 \u`T4

H

Michael Becker

Page 11: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

From Performance Errors to Optimal CompetenceLearnability of OT and HG with Simulated Annealing

Tamas BiroACLC, University of Amsterdam, [email protected]

A self-evident, and yet too often ignored fact about (child) language ac-quisition is that the learner acquiring her linguistic competence is exposedto the teacher’s linguistic performance – hence, also to performance errors,fast speech forms, or other variations. The performance pattern, which maybe more complex than “simple random noise”, could in theory render thelearning problem extremely di!cult, but a clever learning algorithm couldalso make use of the errors, thereby enriching the allegedly poor stimulus.

The computational approach employed in this paper has a threefoldstructure. Linguistic competence (both of the teacher, and of the learner)is modelled either by standard Optimality Theory (Prince and Smolensky1993), or by a symbolic Harmonic Grammar with exponential weights (asdiscussed, for instance, in Biro 2009a). Performance patterns are producedeither by always taking the most harmonic form, or by symbolic simulatedannealing (Bıro 2006), an algorithm introducing performance errors as afunction of the “speech rate”. Finally, online learning employs either PaulBoersma’s update rule (Boersma 1997), or Giorgio Magri’s (2009) one.

The grammar (“phenomenon”) studied is the abstract string grammarproposed by Biro (2007), arguably mimicking a simple but typical phonolog-ical grammar. As several constraint rankings or weight families correspondto the same language, the learner is not expected to converge to the teacher’scompetence (grammar), but to his performance (distribution of forms). Inparticular, the learner’s distance from the teacher is measured by the Jensen-Shannon divergence between a sample of the teacher’s performance patternand a sample of the learner’s performance pattern. The learner is said tohave learnt the target language if this distance is smaller than the divergenceof two random samples of the same size produced by the teacher.

Table 1 summarizes the results of an initial experiment (Biro 2009b).Magri’s approach is significantly faster than Boersma’s. If performance er-rors are present, then learning OT is faster than learning HG. Yet, we do notwant to draw far-reaching conclusions from this toy grammar; so the talkwill focus more on methodological issues of this novel approach, such as the“stability” of the learning process, its dependence on the initial conditionsand the order of the learning data, etc.

1

Tamás Bíró

Page 12: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

OT 10-HG 4-HG 1.5-HGalways gramm. M 13 ; 27 ; 45 13 ; 28 ; 46 12 ; 27 ; 48 15 ; 30 ; 47

B 23 ; 43 ; 65 22 ; 41 ; 64 22 ; 42 ; 64 23 ; 40 ; 60sim. annealing, M 53 ; 109 ; 233 63 ; 140 ; 328 60 ; 148 ; 366 83 ; 199 ; 508tstep = 0.1 B 80 ; 171 ; 462 92 ; 240 ; 772 92 ; 239 ; 785 117 ; 290 ; 694sim. annealing, M 64 ; 131 ; 305 62 ; 134 ; 304 63 ; 137 ; 329 72 ; 163 ; 437tstep = 1 B 90 ; 212 ; 560 92 ; 233 ; 572 84 ; 212 ; 646 101 ; 242 ; 616

Table 1: Comparing four competence models (OT vs. exponential HG withdi"erent bases), three performance algorithms (always the grammatical formvs. simulated annealing with di"erent production speeds) and two learningmethods (Boersma’s update rule vs. Magri’s update rule). For each combi-nation, 2000 learning experiments were conducted, measuring the number oflearning steps until convergence. A cell contains the 1st quartile, the medianand the 3rd quartile of the distribution of these learning steps.

References

Tamas Bıro (2006). Finding the Right Words: Implementing OptimalityTheory with Simulated Annealing. PhD thesis, University of Groningen.Also as ROA-896.Tamas Biro (2007). ’The benefits of errors: Learning an OT grammar witha structured candidate set’. In Proceedings of the Workshop on CognitiveAspects of Computational Language Acquisition, pages 81–88, ACL Prague,June 2007. ROA-929.Tamas Biro (2009a). ’Elephants and Optimality Again. SA-OT accounts forpronoun resolution in child language’. To appear in: Selected papers fromCLIN 19. ROA-1038.Tamas Biro (2009b). ’Learning Competence from Performance Data’. Posterpresented at KNAW Academy Colloquium on Language Acquisition andOptimality Theory July 2-3, 2009, Amsterdam. http://www.birot.hu/publications/Biro-KNAW-poster-2009.pdf.Paul Boersma (1997). ’How we learn variation, optionality, and probability’.IFA Proceedings 21: 43-58.Giorgio Magri (2009). ’New update rules for on-line algorithms for theRanking problem in Optimality Theory’. Handout. LMA workshop, DGfS31, Osnabruck, March 2009.Alan Prince and Paul Smolensky (1993/2004). Optimality Theory: Con-straint Interaction in Generative Grammar. Rutgers U Center for CognitiveScience Technical Report 2. ROA-537. Malden, MA & Oxford: Blackwell.

2

Tamás Bíró

Page 13: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

1

Phonetic perception-production asymmetries reflect phonological feature structure 1. Observed production vs. observed perception: diagonal vs. horizontal category boundaries. We observe in the languages of the world that in the production of a vowel, the distribution of its tokens in the F1-F2 space shows large variation both in the F1 and the F2 direction, i.e., the distributions of e.g. /e/ and /i/ overlap both in F1 and in F2, as shown in Fig. 1 (left). As a result, the production boundary between /e/ and /i/ is diagonal, as shown in the Figure; this boundary is defined as those vowel tokens that had been intended equally likely as /e/ and as /i/. Acoustic analyses of vowel productions confirm that these boundaries are indeed diagonal (Peterson and Barney 1952, Hillenbrand et al. 1995, Adank et al. 2004, Strange et al. 2007, Escudero et al. 2009).

If a listener hears a certain F1-F2 pair, her optimal perception strategy is to perceive this F1-F2 pair as the vowel category that was most likely intended by the speaker; in this way, the listener can minimize her perception errors. With this optimal perception strategy, the category boundaries in perception (i.e., the tokens that have an equal chance of being perceived as either of the two neighbouring vowel categories) will correspond to the category boundaries in production, that is, the perceptual boundaries will be diagonal, just as the production boundaries in Fig. 1 (left).

This correspondence between the shapes of the perceptual boundaries and the shapes of the produced boundaries is, however, not what we observe in real language users. From listening experiments (Savela 2009, Chistovich et al. 1966) we can see that while the perceptual boundaries between /a/ and /e/ and between /a/ and /o/ are indeed diagonal (in five-vowel systems), the perceptual boundaries between /i/ and /e/ and between /o/ and /u/ are typically horizontal; this is shown in Fig. 1 (right). In other words, for the distinction between high and mid vowels human listeners ignore the F2 cue, although this cue is utilized in their language environment. This discrepancy between perception and production, which seems not to have been noticed before, calls for an explanation. Here we propose a plausible explanation in terms of phonetically based phonological features, supported by computer simulations with artificial language users.

2. Computational modelling with phonemes: diagonal perceptual boundaries. The perception of the vowel distributions in Figure 1 has been modelled within OT, namely in terms of cue constraints such as “an F1 value of [x] is not the phonological vowel category /e/” and “an F2 value of [y] is not the phonological vowel category /i/” (Boersma and Escudero 2008). These cue constraints exist for all possible values of F1 and F2 and for all five vowel categories, and can thus be summarized as the connections in Fig. 2 (left). In the virtual infant’s initial state, all cue constraints are ranked at the same height; this virtual baby is then fed with combinations of F1, F2 and the correct vowel category, and a simulated error-driven perceptual learning procedure (Boersma 1997) causes the cue constraints to become ranked in an optimal way. Fig. 3 (left) shows the ultimate perceptual behaviour of one typical virtual learner: all perceptual boundaries have become diagonal. We conclude that the phonemic cue model of Fig. 2 (left) does not suffice to explain how real human listeners behave.

3. Computational modelling with features: realistic horizontal boundaries. Unlike Boersma and Escudero (2008), who modelled the phonological surface structure in terms of five unanalysed vowel phonemes, we decided to analyse the vowels as combinations of six features instead: we replace the vowel phoneme /a/ with the feature combination /low, central/, /e/ with /mid, front/, /i/ with /high, front/, /o/ with /mid, back/, and /u/ with /high, back/. The vowel perception process is then modelled with six families of featural cue constraints, namely “an F1 of [x] is not /high/”, “an F1 of [x] is not /mid/”, and F1 of [x] is not /low/”, “an F2 of [y] is not /front/”, “an F2 of [y] is not /central/”, and “an F2 of [y] is not /back/”, as summarized in Fig. 3 (right). The simulated listener now ends up with the realistic perceptual behaviour of Fig. 3 (right).

This result can be understood as follows. A pair of vowels that differ in two features allows cue trading of F1 and F2, and can therefore share a diagonal boundary; this happens between /a/ and /e/ and between /a/ and /o/. A pair of vowels that differ in only one feature allows only F1 or F2 as a distinguishing cue, and can therefore only share a horizontal boundary (between /e/ and /i/ and between /o/ and /u/) or a vertical boundary (between /e/ and /o/ and between /i/ and /u/).

4. Conclusion. If cue constraints connect to features rather than to phonemes, simulated listeners will show the same behaviour as humans. This result is robust: the same result emerges if Optimality Theory is replaced with Harmonic Grammar or with the perceptron.

If human perception can indeed be modelled in terms of cue connections, we have found direct evidence for the existence of phonological features. In general, we have provided a method for detecting phonological structure from asymmetries between phonetic perception and production.

Paul Boersma and Katerina Chladkova

Page 14: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

2

Fig. 1. Left: Distribution of vowel tokens in a typical five-vowel system. Dark grey disks denote one standard deviation, light grey disks two standard deviations. The lines denote the production boundaries between pairs of vowels along the front or back edges of the vowel space. Right: Vowel perception in a typical five-vowel system. The lines denote the perceptual boundaries between pairs of vowels along the front or back edges of the vowel space (the disks show the production, for reference).

Fig. 2. Phonemic cue connections (left) and featural cue connections (right).

Fig. 3. Predicted perception if categories are phonemes (left) or features (right), computed by running F1-F2 pairs through the final simulated perception grammar. The disks show the production, for reference.

Adank, P., Van Hout, R., and Smits, R. (2004). An acoustic description of the vowels of Northern and Southern

standard Dutch. JASA 116: 1729–1738. Boersma, P. (1997). How we learn variation, optionality, and probability. IFA Proceedings 21: 43–58. Boersma, P., and Escudero, P. (2008). Learning to perceive a smaller L2 vowel inventory: an Optimality Theory

account. In P. Avery, E. Dresher & K. Rice (eds.) Contrast in phonology: theory, perception, acquisition. Berlin & New York: Mouton de Gruyter. 271–301.

Chistovich, L., Fant, G., and de Serpa Leitao, A. (1966). Mimicking and perception of synthetic vowels, part II. STL-QPSR 7 (3), 1–3.

Escudero, P., Boersma, P., Rauber, A.S., and Bion, R.A.H. (2009). A cross-dialect acoustic description of vowels: Brazilian and European Portuguese. JASA 126: 1379–1393.

Hillenbrand, J., Getty, L.A., Clark, M.J., and Wheeler, K. (1995). Acoustic characteristics of American English vowels. JASA 97: 3099–3111.

Peterson, G.E., and Barney, H.L. (1952). Control methods used in a study of vowels. JASA 24: 175–184. Savela, J. (2009). Role of selected spectral attributes in the perception of synthetic vowels. Ph.D. dissertation,

University of Turku. Strange, W., Weber, A., Levy, E.S., Shafiro, V., Hisagi, M., and Nishi, K. (2007). Acoustic variability within and

across German, French, and American English vowels: phonetic context effects. JASA 122: 1111–1129.

a

e

i

o

u

! F2

! F

1HUMAN PRODUCTION

a

e

i

o

u

! F2

! F

1

HUMAN PERCEPTION

/a/ /e/ /i/ /o/ /u/

F1 F2

/high/ /mid/ /low/ /front/ /central/ /back/

F1 F2

uuuuuuuuuuuuuuuuuaaaaaiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiaiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuooeeeeiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuoooeeeeiiiiiiiiiiiiiiiuuuuuuuuuuuuuuooooeeeeeiiiiiiiiiiiiiiuuuuuuuuuuuuuoooooeeeeeeiiiiiiiiiiiiiuuuuuuuuuuuuooooooeeeeeeeiiiiiiiiiiiiuuuuuuuuuuuoooooooeeeeeeeeiiiiiiiiiiiuuuuuuuuuuooooooooeeeeeeeeeiiiiiiiiiiuuuuuuuuuuooooooooeeeeeeeeeeiiiiiiiiiuuuuuuuuooooooooooaeeeeeeeeeeeiiiiiiiuuuuuuuoooooooooooaeeeeeeeeeeeiiiiiii

uuuuuooooooooooooaeeeeeeeeeeeeiiiiiiuuuooooooooooooaaaeeeeeeeeeeeeiiiii

uuoooooooooooaaaaaeeeeeeeeeeeiiiiaooooooooooooaaaaaeeeeeeeeeeeeeiia

ooooooooaaaaaaaaaaaeeeeeeeeeeieaooooooaaaaaaaaaaaaaeeeeeeeeeaea

oooooaaaaaaaaaaaaaeeeeeeeeeieaoooaaaaaaaaaaaaaaaeeeeeeeeaea

ooaaaaaaaaaaaaaaaeeeeeeeeaeaaaaaaaaaaaaaaaaaaaeeeeeeaea

aaaaaaaaaaaaaaaaaeeeeeeaaaaaaaaaaaaaaaaaaaaeeeeeaaa

aaaaaaaaaaaaaaaaeeeeeaaa! F2

! F

1

uuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiuuuuuuuuuuuuuuuuuuiiiiiiiiiiiiiiiiiiiooooooooooooooooooeeeeeeeeeeeeeeeeeeeooooooooooooooooooeeeeeeeeeeeeeeeeeeeooooooooooooooooooeeeeeeeeeeeeeeeeeee

oooooooooooooooooaeeeeeeeeeeeeeeeeeeooooooooooooooooaeeeeeeeeeeeeeeeeee

ooooooooooooooaaaeeeeeeeeeeeeeeeeeooooooooooooaaaaaeeeeeeeeeeeeeeee

ooooooooooaaaaaaaeeeeeeeeeeeeeeeoooooooaaaaaaaaaaaeeeeeeeeeeeee

ooooooaaaaaaaaaaaeeeeeeeeeeeeeoooooaaaaaaaaaaaaeeeeeeeeeeee

ooaaaaaaaaaaaaaaaeeeeeeeeaeeaaaaaaaaaaaaaaaaaeeeeeeeaee

aaaaaaaaaaaaaaaaaaeeaaeaaaaaaaaaaaaaaaaaaaaiaaaiaaa

aaaaaaaaaaaaaaaaaaaaaaaa! F2

! F

1

Paul Boersma and Katerina Chladkova

Page 15: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Acquisition and the Complexity of Phonemes and Inventories

The first question in any computational simulation of phoneme acquisition is, what is a phoneme,and what structure is placed on the space of phonemes. For some purposes, the phoneme space might beas simple as a discrete set onto which any stimulus can be mapped. In others, such as de Boer’s (2001)well known investigations into the evolution of vowel systems, the phoneme space is a two or threedimensional continuous space, equipped with a metric, and phonemes are points, clusters (in exemplartheory) or distributions (in statistical models) within this space. In the case of consonants, it is lessobvious how to do it, as the mapping between articulatory, acoustic and perceptual space is complex andfar from one–one. Given that there is some evidence that infants recognize and generalize some features(such as VOT) better than others (such as place of articulation), one option might be to place consonantphonemes in feature space. On the other hand, the ability to generalize over features appears to degrade,perhaps contributing to the difficulty of acquiring new phonemes later in life, which suggests that atsome point the mental phoneme space becomes less structured.

The question of the structure of phoneme space becomes especially interesting in the case of lan-guages with highly complex inventories, such as the Caucasian languages and the San languages –particularly since those inventories are usually rather neatly structured when laid out in IPA style charts.The San language !Xoo has perhaps the world’s most complex vowel inventory. According to Traill(1985), the phonetic vowel space comprises a five-vowel system combined (with some restrictions) withany combination of three distinctive voice qualities and also nasalization. Some simplification occurson a plausible classical phonemic analysis, but there still remain 37 distinct vocalic phonemes, some ofwhich are so rare that they may well not be encountered for some years, if at all. A similar problemarises with the better known click inventory.

In a recent conference presentation, Bradfield (2009) suggested that even in adulthood, the !Xoovowel inventory is better understood as being structured into concurrent ‘phonemes’, with each ‘voicequality’ being a ‘phoneme’ (in the style of autosegmental phonology, but less drastic). He suggestedthat it would ease the otherwise challenging acquisition problem, but supported this suggestion onlywith intuition. In this presentation, we present a first attempt to produce some quantitative informationto check this idea.

We set up a simulation in the style of de Boer, with a single ‘adult’ speaker (representing a populationof similar adults), and ‘learners’ trying to acquire the adult’s inventory. De Boer, following Steels (1997)and others, uses ‘imitation games’ – the learners try to match a vowel said by the adult, and receive extra-linguistic feedback on whether they succeeded; if not, they adjust their phoneme inventory by shifting avowel or adding a new vowel, depending on various criteria.

Our previous experience with trying to replicate de Boer’s work has suggested that his relativelydetailed modelling of articulation and acoustics is not crucial to the trend of the results, and moreoverwe are concerned only with the single issue of the topology of the phoneme space. We therefore cutdown the model by mapping articulatory space directly into acoustic and perceptual space. (Because ofthe way the agents modify their inventories, this is not as strong an assumption as it sounds.) We thenset up three models: (1) a simple 5-vowel system (reduced to height and backness, ignoring rounding);(2) a 40-vowel system comprising 5 vowels together with any combination of 3 voice qualities, wherethe learners do not structure the phoneme space (i.e. they are simply learning 40 vowels in a 5-D space),which corresponds to a simplified version of Traill (1985)’s phonemic analysis; (3) the same number ofphonetic vowels, but with learners who identify voice qualities independently of vowel quality. We thenmeasured the average number of interactions required for a learner to acquire fully the adult inventory.

The results (overleaf, with technical notes) provide significant support for the suggestion that astructured inventory is much easier to learn. Interestingly, the agents take nearly twenty times as longto learn the 40-vowel model (2) as the base 5-vowel model (1). But the structured 40-vowel model (3)takes only four times as long as the 5-vowel model to learn.

We remark that it is not yet clear to us whether Bradfield’s idea of ‘concurrent phonemes’ reallydiffers testably from the idea of learners who can generalize over features. This, along with refinementof the model, investigations of its robustness, and extension to consonants and other inventories, remainfor the future.

Julian Bradfield

Page 16: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Details of resultsThe figures here are from example runs with 100 learners. (Learners do not interact with one another, sothis is equivalent to 100 single learner runs.) A learner was deemed to have acquired the adult inventoryat the point where it could engage in 500 consecutive interactions without failure. (It would also bepossible to implement a determistic definition by matching inventories, but for ease of implementationthis was not done.)

The mean number of interactions required to achieve adulthood (i.e. the number of interactionsbefore the 500 successful ones), and standard deviations, were for each model (to 2 s.f.):Model (1): mean 41, s.d. 18.Model (2): mean 730, s.d. 180.Model (3): mean 150, s.d. 67.

The distribution is of course Poisson-like rather than normal; the high s.d.s reflect a long tail abovethe mean.

ReferencesBradfield, J. (2009). Clicks, concurrency and the complexity of Khoisan. Presentation at 17th Man-

chester Phonology Meeting.de Boer, B. (2001). The Origins of Vowel Systems. Oxford University Press.Steels, L. (1997). The synthetic modelling of language origins. Evolution of Communication 1(1),

1–35.Traill, A. (1985). Phonetic and Phonological Studies of !Xoo Bushman. Hamburg: Buske.

2

Julian Bradfield

Page 17: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

!"##$%&'#()"'*+",-"%.('"**".#/'$%'0%'"1"2)30+'245"36

7#'$/'8"33'"/#093$/:"5'#:0#'#:"')+45-.#$;$#('4*'0'):4%434&$.03')0##"+%'<'$#/')+4)"%/$#('#4'/)+"05'#4'%"8'

84+5/='0/'49/"+;"5'$%':$/#4+$.03'/4-%5'.:0%&"/='4+'$%')/(.:43$%&-$/#$.'"1)"+$2"%#/'$%;43;$%&'%4%."'

84+5/'<'.4++"30#"/'8$#:'#:"'#()"'*+",-"%.('4*'#:"')0##"+%='%4#'#:"'0&&+"&0#"'#4>"%'*+",-"%.('4*'#:"'

84+5/'$%/#0%#$0#$%&'#:"')0##"+%'?/""'"6&6'@020%%'"#'036'ABBCD6''EF()"'*+",-"%.(E':"+"'+"*"+/'#4'#:"'

%-29"+'4*'5$/#$%.#'84+5/'$%'0'30%&-0&"'8:$.:'$%/#0%#$0#"'0'&$;"%'):4%434&$.03')0##"+%6''EF4>"%'

*+",-"%.(E='$%'.4%#+0/#='+"*"+/'#4'#:"'%-29"+'4*'#$2"/'0'&$;"%'84+5'$/'-/"5'$%'0'+")+"/"%#0#$;"'.4+)-/'4*'

/)"".:='$6"6':48'E.4224%E'#:"'84+5'$/6'''G-#'#:$/'49/"+;0#$4%'$/'5$**$.-3#'#4'+".4%.$3"'8$#:'20%('4#:"+'

0/)".#/'4*'30%&-0&"'0.,-$/$#$4%='";"%'$%'):4%434&(='8:$.:'!"#'/"%/$#$;"'#4'#4>"%'*+",-"%.(6''F:"'

)+493"2'$/')4/"5'24/#'/#0+>3('$%'"1"2)30+H90/"5'0))+40.:"/'#4'):4%434&(='$%'8:$.:'84+5/'0+"'

+")+"/"%#"5'%4#'0/'09/#+0.#='/(2943$.'3"1$.03'"%#+$"/='9-#'0/'20//"/'4*'$%5$;$5-03'2"24+$"/'4*'2"0%$%&'

0%5'/4-%5'?$%.3-5$%&'*$%"'):4%"#$.'5"#0$3D6'I1"2)30+'#:"4+(':0/'0##+0.#"5'$%.+"0/$%&'0##"%#$4%'*+42'

):4%434&$/#/'0%5'):4%"#$.$0%/'?/""'"6&6'G(9""'ABBJ='K$"++":-29"+#'ABBJ='K4+#'ABBL='M$+.:%"+'N'

O44+"='*4+#:.42$%&D=')+".$/"3('5-"'#4'$#/'/#+0$&:#*4+80+5'#+"0#2"%#'4*'#4>"%'*+",-"%.('"**".#/='0/'8"33'

0/'$#/'"3"&0%#':0%53$%&'4*'$%.+"2"%#03'/4-%5'.:0%&"6''7%'"1"2)30+'#:"4+(=':48";"+='#()"'*+",-"%.('$/'

%4#'/#+0$&:#*4+80+53('.42)-#093"P'#:"+"'0+"'%4'#()"/')"+'/"='4%3('&+4-)/'4*'#4>"%/6''

Q"'/:48'#:0#'0'#()"'*+",-"%.('"**".#'$%'):4%434&$.03')0##"+%')+45-.#$;$#('"2"+&"/'*+42'0'

/$2)3"'"1"2)30+H90/"5'245"3='#:+4-&:'0%'$%#"+0.#$4%'9"#8""%'0&&+"&0#"'#4>"%'*+",-"%.('0%5'

/$2$30+$#(='8$#:4-#'"1)3$.$#'.42)-#0#$4%'4*'#()"'*+",-"%.(6''F:"'$%#-$#$4%'$/'#:0#'0')0##"+%'8$33'/)+"05'#4'

0'%"8'84+5'24/#'+"05$3('$*'#:"'.30//'4*'#4>"%/'$%/#0%#$0#$%&'#:"')0##"+%'$/'$%&&'(#)*$6"6'4*'348'8$#:$%H.30//'

):4%"#$.'/$2$30+$#('?0/$5"'*+42'#:"')0##"+%'$#/"3*D='0/'84-35'9"'#:"'.0/"'$*'#:"('0+"'#4>"%/'4*'20%('

5$**"+"%#'84+5'#()"/6''7%'/-.:'0'.0/"='0'%"8'84+5'#4>"%'%""5':0;"'3$##3"'/$2$30+$#('#4'"0.:'#4>"%'$%'#:"'

.30//'8:$3"'/#$33'+"20$%$%&'8$#:$%'#:"'E&+0;$#0#$4%03')-33E'4*'#:"')0##"+%6''7*=':48";"+='#:"'#4>"%/'0+"'

#$&:#3('.3-/#"+"5'$%#4'0'*"8'84+5'#()"/='#:"'%"8'84+5'#4>"%'8$33'"$#:"+':0;"'#4'9"';"+('/$2$30+'#4'4%"'4*'

#:"'#()"/'$%'033'$#/'5"#0$3'?"**".#$;"3('%"-#+03$/$%&'8$#:'$#D='4+'$#'8$33'9"'"/.0)"'#:"')-33'4*'#:"')0##"+%'

"%#$+"3(6''

R"*"+"%."/

G(9""='S6'?ABBJD'K:4%434&('0%5'30%&-0&"'-/"6''T029+$5&"'U%$;"+/$#('K+"//6'

@020%%='V6='W6'X)4-//$54-='K6'G4"+/20'?ABBCD'O45"33$%&'#:"'*4+20#$4%'4*'):4%4#0.#$.'+"/#+$.#$4%/'

0.+4//'#:"'2"%#03'3"1$.4%='TYV'Z[6

M$+.:%"+= 'R6 'N 'R6 'O44+" ' ?*4+#:.42$%&D 'T42)-#$%& '):4%434&$.03 ' &"%"+03$\0#$4% '4;"+ ' +"03 ' /)"".:'

"1"2)30+/6'S4-+%03'4*'K:4%"#$./']0;0$3093"'0/'R^X'JBBLHJAB_`6

K$"++":-29"+#='S6'?ABBJD'I1"2)30+'5(%02$./P'84+5'*+",-"%.(='3"%$#$4%='0%5'.4%#+0/#6'S6'G(9""'N'K6'

@4))"+ ' ?"5/6D= ' a+",-"%.( ' "**".#/ ' 0%5 ' #:" ' "2"+&"%." ' 4* ' 3$%&-$/#$. ' /#+-.#-+"= ' JbLHJ[L='

X2/#"+502P'S4:%'G"%c02$%/6'

K4+#6 'R6 ' ?ABBLD '@48'0+" '84+5/ '/#4+"5 ' $% '2"24+(d'G"(4%5'):4%"/ '0%5'):4%"2"/6 'e"8'75"0/ ' $%'

K/(.:434&('A[='JZbHJLB6

Emily Cliff and Robert Kirchner

Page 18: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

!"#$%&'()*+,-,%./$)0&12)12$)!.3&454)6'17"89),2"'"1./1&/):$.7'$7

!"#$%&'()*+$,-$.)/0#(1$'2*)3,*)4$$')1,.-$#).$5$'2%6)46)23$),#7$'2)"8)1"#$%*)23,2),**01$)#&*2&'52&7$)

8$,20.$*),'#)',20.,%)5%,**$*),*)23$).$+.$*$'2,2&"',%)$%$1$'2*)"8)+3"'"2,52&5)+."5$**&'(9):$)&'7$*2&(,2$)23$)

+$.8".1,'5$)"8)"'$)*053)1"#$%;)23$)<,6$*),'#):&%*"')=>??@A)!,B&101)C'2."+6)=!,BC'2A)D3"'"2,52&5)

E$,.'$.;),'#)*3"F)23,2)23$)1"#$%)8,&%*)2")1,-$)23$)($'$.,%&G,2&"'*)'$5$**,.6)2")+.$#&52)*+$,-$.)/0#(1$'2*)8".),)

%,'(0,($)F3$.$),)5"1+%$B)5"'*2.,&'2)&*),52&7$),'#)80.23$.1".$;)23,2)&')*"1$)5,*$*;)23$).$%,2&"'*3&+)4$2F$$')

(.,#&$'2)*+$,-$.)/0#(1$'2*),'#)23$)*2,2&*2&5*)"8)23$)%$B&5"')&*)'"2)2.,'*+,.$'29)

<,6$*)H):&%*"'I*)%$,.'$.)#$8&'$*),)*$2)"8)',20.,%)5%,**$*)4,*$#)"')#&*2&'52&7$)8$,20.$),'#)%$,.'*),)*$2)"8)

F$&(32$#)+3"'"2,52&5)5"'*2.,&'2*)46)&2$.,2&'()4$2F$$')=&A)F$&(32&'(),')$B&*2&'()*$2)"8)5"'*2.,&'2*),55".#&'()2")

23$)+.&'5&+%$)"8)!,B&101)C'2."+6;),'#)=&&A),##&'()'$F)5"'*2.,&'2*)4,*$#)"')23$&.)J4*$.7$#KCB+$52$#)=JKCA)

.,2&"*)!"#$%&'($&)*++$%'&),%-'+."%'&-$';)*2,.2&'()F&23)%"F).,2&"*),'#)1"7&'()&'5.$1$'2,%%6)3&(3$.9)

:$)2$*2$#)23$)!,BC'2)%$,.'$.)"')#,2,)8."1)L13,.&5;),)M$1&2&5)%,'(0,($9)E&-$)"23$.)M$1&2&5)%,'(0,($*)

L13,.&5)7$.4).""2*)*3"F)JND)7&"%,2&"'*)8".)+%,5$)"8),.2&50%,2&"')=O$'#$.)H)P0%,**)QRS@;)T"*$)H)U&'()>??SA9)

<"1".(,'&5)5"'*"','2*)"550.)%$**)"82$')&'),)7$.4).""2)23,')$B+$52$#)&8)23$6)5"V"550..$#)8.$$%6)=W.$$'4$.()

QRX?;)!5N,.236)QRRY;)O05-%$6)QRRS;)P.&*53;)D&$..$3014$.2)H)O."$)>??YA9)JNDVD%,5$)&'),)M$1&2&5)%,'(0,($)

+"*$*)2F")#&*2&'52)53,%%$'($*9)=QA)N"'*2.,&'2)%$'(239)JNDVD%,5$).$*2.&52&"'*)*+,')0+)2")23.$$)5"'*"','2*9)=>A)

W.,#&$'569)JNDVD%,5$).$*2.&52&"'*)&')M$1&2&5)%,'(0,($*),.$)*2."'($.)&')*"1$)F".#)+"*&2&"'*),'#)8".)*"1$)

+%,5$*)"8),.2&50%,2&"')23,')"23$.*9:$)2.,&'$#)23$)!,BC'2)%$,.'$.)"'),)5".+0*)"8)Y>Y>)L13,.&5)7$.4).""2*)#.,F')

8."1)U,'$)=QRR?A;),'#)5"1+,.$#)23$)%$,.'$.I*)+$.8".1,'5$)2")23$)/0#(1$'2*)"8)'"'5$)7$.4).""2*9)Z0#(1$'2)

#,2,)F$.$)5"%%$52$#)8."1)>?)',2&7$)L13,.&5)*+$,-$.*;)F3")F$.$),*-$#)2").,2$)23$),55$+2&4&%&26)"8)>S?)'"'5$)

7$.4).""2*;)4,%,'5$#)8".)+.$*$'5$K,4*$'5$)"8)5"'*2.,&'2)7&"%,2&"';)"4*$.7$#K$B+$52$#).,2&";)2.,'*&2&"',%)

+."4,4&%&26;)$B+$52$#)+."4,4&%&26;),'#)#$'*&269)R?)'"'5$).""2*)5"'2,&'$#)JND)7&"%,2&"'*9)[3$)#$*&(')F,*)*&1&%,.)

2")23,2)8".)L.,4&5)&')P.&*53)H)\,F,6#$3)=>??QA),'#)23$).$*0%2*)*3"F$#)23,2)*+$,-$.*),**&('$#)%"F$.).,2&'(*)2")

'"'5$)8".1*)F&23)JND)7&"%,2&"'*9

:$)&'7$*2&(,2$#))23$)5%,&1)&')<,6$*),'#):&%*"')=>??@A)23,2)(.,11,.*)23,2),53&$7$)(.$,2$*2)

$B+%,',2".6)5"7$.,($)=,*)1$,*0.$#)46),**&('&'(),)3&(3)%"(V%&-$%&3""#)2")23$)%$B&5"'A),.$),%*")23"*$)23,2)4$*2)

+.$#&52)*+$,-$.)/0#(1$'2*)"8)'"'5$)8".1*9):$)$7,%0,2$#),02"1,2&5,%%6)%$,.'$#)(.,11,.*)"8)1,'6)#&88$.$'2)

*&G$*),*)F$%%),*),)3,'#VF.&22$')(.,11,.)F3"*$)5"'*2.,&'2*)F$.$)53"*$')8."1)23"*$),7,&%,4%$)2")23$),02"1,2&5)

%$,.'$.)*"),*)2")$14"#6)JNDVD%,5$).$*2.&52&"'*)"')23$)5"V"550..$'5$)"8)*&1&%,.),'#)&#$'2&5,%)5"'*"','2*)F&23&')

,)7$.4).""29)[3$)5"'*2.,&'2*)"8)23$)3,'#VF.&22$')(.,11,.)F$.$),**&('$#)F$&(32*)7&,)!,BC'29)[3$)+.$#&52&"'*)

"8)$,53)1"#$%)F$.$)5"1+,.$#)2")23$)L13,.&5)',2&7$)*+$,-$.)/0#(1$'2*),'#)23$)=5."**V7,%&#,2$#A)%"(V

%&-$%&3""#)23$6),**&('$#)2")23$)%$,.'&'()#,2,9)

[3$)5"..$%,2&"')4$2F$$')23$)+.$#&52&"'*)"8)23$)3,'#VF.&22$')(.,11,.)F,*)3&(3$.)23,')8".)23$)4$*2)

%$,.'$#)(.,11,.)=.)])?9YS),'#).)])?9^Y).$*+$52&7$%6A9)<"F$7$.;)23$)(.,11,.*)23,2)4$*2)+.$#&52$#)*+$,-$.)

/0#(1$'2*)F$.$)'"2)23"*$)F&23)23$)3&(3$*2)%"(V%&-$%&3""#_)23$)5"..$%,2&"'*)4$2F$$')*+$,-$.)/0#(1$'2*),'#)

1"#$%)+.$#&52&"'*)+$,-$#)F&23)(.,11,.*)"8)1$#&01)*&G$)F3&%$)%"(V%&-$%&3""#)5"'2&'0$#)2")(."F)*04*2,'2&,%%6)

4$8".$)%$7$%&'()"889

)T$(,.#&'()23$)#&88$.$'5$)&')+$.8".1,'5$)4$2F$$')23$)3,'#VF.&22$'),'#),02"1,2&5,%%6)%$,.'$.)

(.,11,.*;)"0.).$*0%2*)&'#&5,2$))23,2)23$)!,BC'2)%$,.'$.)*$$1*)2")*3"F),)*2."'($.)4&,*)2"F,.#)*$%$52&'()

5"'*2.,&'2*)23,2)&'7"%7$),((.$**&7$)($'$.,%&G,2&"')23,')*+$,-$.V/0#(1$'2)#,2,)*0(($*29)P".),)(&7$')%$7$%)"8)

,550.,56)=J4*$.7$#KCB+$52$#).,2&"A;)23$)%$,.'$.`*)($'$.,%&G,2&"')3$0.&*2&5)*$%$52*)*3".2)5"'*2.,&'2*)"7$.)%"'($.)

"'$*9)L)1,/".&26)"8)23$)5"'*2.,&'2*)23,2),.$),5a0&.$#)8&.*2)*+,')"'%6)"'$)".)2F")*$(1$'2*),'#)5,+20.$)*2,2&*2&5,%)

.$(0%,.&2&$*)"8)23$)%$B&5"')"23$.)23,')JNDVD%,5$9)L*)23$)1"#$%)+."5$$#*)2"F,.#*)%"'($.)5"'*2.,&'2*)=*053),*)23$)

JNDVD%,5$)5"'*2.,&'2*)23,2)5"'*2&202$)23$)3,'#VF.&22$')(.,11,.A;)JNDVD%,5$).$*2.&52&"'*),.$)F$,-$'$#)46)23$)

$88$52)"8)23$)+.$7&"0*%6)%$,.'$#)'"'VJND).$*2.&52&"'*),'#),.$)%$**)%&-$%6)2")4$)*$%$52$#9)N.05&,%%6;)23&*)*0(($*2*)

23,2)2")1"#$%)+3"'"2,52&5),5a0&*&2&"';)5"'*2.,&'2)%$,.'&'()10*2),%%"F)$&23$.)#&.$52),5a0&*&2&"')"8)3&(3)%$7$%)

($'$.,%&G,2&"'*)*053),*)23"*$).$5"('&G$#)46)($'$.,2&7$)+3"'"%"(6)=*053),*)bc!!"#$%&'!(!!"#$%&')!".)*"1$)1$53,'&*1)F3$.$46)5"'*2.,&'2*)%$,.'$#)$,.%6)5,')4$)$%&1&',2$#)8."1)23$)(.,11,.)&8),)1".$)($'$.,%;),%4$&2)

%"'($.;)5"'*2.,&'2)&*)8"0'#9)P&',%%6;)23$)1&*,%&('1$'2)4$2F$$')1"#$%)+.$#&52&7$'$**),'#)23$)%"(V%&-$%&3""#)"8)

23$)%$,.'&'()#,2,)*0(($*2*)23,2)23$.$),.$)*2&%%)"+$')a0$*2&"'*).$(,.#&'()23$)',20.$)"8)23$).$%,2&"'*3&+)4$2F$$')

23$)*2,2&*2&5*)"8)23$)%$B&5"'),'#)*+$,-$.)/0#(1$'2*9)

Rebecca Colavin, R. Levy, and Sharon Rose

Page 19: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

;&<&"(7.829

O$'#$.;)!9)E9;),'#))P0%,**;)<9)=QRS@A9)L13,.&5)7$.4)1".+3"%"(69)C,*2)E,'*&'(;)!de)[3$)L8.&5,')M20#&$*)

N$'2$.;)!&53&(,')M2,2$)f'&7$.*&269

O$.$'2;)d9),'#)M3&1."'9)Z9)QRRS9)[3$).$+.$*$'2,2&"')"8)<$4.$F)F".#*e)C7&#$'5$)8."1)23$)"4%&(,2".6)5"'2"0.)

+.&'5&+%$9)N"('&2&"')gYe^RVS>9

O$.$'2;)d9;)M3&1."';)Z9),'#)h,-'&';)h9)>??Q9)D3"'"%"(&5,%)5"'*2.,&'2*)"').$,#&'(e)$7&#$'5$)8."1)23$)

J4%&(,2".6)N"'2"0.)D.&'5&+%$9)Z"0.',%)"8)!$1".6),'#)E,'(0,($)YY9)gYYiggX9

O05-%$6;)C9)QRRS9)[&(.&'6,).""2)5"'*"','2*),'#)23$)JND9)/$%%&0,+1"%!&/.2$+-&"%&3"%!*"-'")-)Y9^eQRVXQ9

W.$$'4$.(;)Z9)<9)QRX?9)[3$)+,22$.'&'()"8).""2)1".+3$1$*)&')M$1&2&59):".#)X9Qg>V)Q@Q9

P.&*53;)M9)L9;),'#)O0*3.,)L9)\99)>??Q9)[3$)+*653"%"(&5,%).$,%&26)"8)JNDV+%,5$)&')L.,4&59)E,'(0,($)SSeRQi

Q?g9

P.&*53;)M9)L9;)D&$..$3014$.2;)Z;)O9;),'#)O."$)!9)>??Y9)M&1&%,.&26),7"&#,'5$),'#)23$)JND9)j,20.,%)E,'(0,($)

,'#)E&'(0&*2&5)[3$".6)>>eQSRi>>@9)

<,6$*;)O9),'#):&%*"';)N9)>??@9)L)1,B&101)$'2."+6)1"#$%)"8)+3"'"2,52&5*),'#)+3"'"2,52&5)%$,.'&'(9)

E&'(0&*2&5)d'a0&.6)^Re^SRVYY?9

U,'$;)[9)E9)L13,.&5VC'(%&*3)k&52&"',.6;)J22")<,..,**"F&2Ge)QRR?9

!5N,.236;)Z9)Z9)=QRRYA9)[3$)+3"'$2&5*),'#)+3"'"%"(6)"8)M$1&2&5)+3,.6'($,%*9)d')D9)U$,2&'()=C#9A;)D,+$.*)&')

%,4".,2".6)+3"'"%"(6)ddd9)D3"'"%"(&5,%)*2.0520.$),'#)+3"'$2&5)8".1)=++9QRQi)>@^A9)N,14.&#($e)

N,14.&#($)f'&7$.*&26)D.$**9

T"*$;)M9),'#)U&'()E9)=>??SA9)M+$$53)$..".)$%&5&2,2&"'),'#)5"V"550..$'5$).$*2.&52&"'*)&')2F")C23&"+&,')

M$1&2&5)%,'(0,($*9)E,'(0,($),'#)M+$$53)X?eYXQVX?Y9

)

Rebecca Colavin, R. Levy, and Sharon Rose

Page 20: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Learning metrical segmentation: the problem of function words Metrical parsing biases are learned early. English- and French-learning 7.5-month-olds correctly segment bisyllables conforming to their language's dominant metrical pattern (English: strong-weak; French: weak-strong), and fail on bisyllables of the other pattern (Jusczyk, Houston, & Newsome, 1999; Polka & Sundara, 2003). I will argue that English-learning infants need to distinguish function words at phrase edges from other unstressed elements to arrive at the correct parsing bias (cf. Christophe, Millote, Bernal, & Lidz, 2008). I consider 3 different hypotheses as to how the correct bias may be inferred: (H1) statistical generalization over the emerging lexicon (Swingley, 2005) (H2) bootstrap from phrasal distribution of stresses (H3) infer from greater statistical coherence of stressed vowel with preceding/following vowel My analysis begins with the assumption that 7.5-month-olds do *not* distinguish function words from other unstressed elements, as motivated by Shi, Werker, & Cutler's (2006) finding that English-learning infants do not distinguish phonetic detail in function words before 11 months of age. Then I show there are problems with each hypothesis. H1 requires 7.5-month-olds to know considerably more wordforms than they are thought to understand. A child-directed corpus analysis shows that English phrases typically begin with weak elements, which under H2 would incorrectly yield an iambic bias. An information-theoretic analysis of the same corpus suggests that there is no directional asymmetry in coherence between stressed and unstressed vowels, which under H3 would not yield any clear parsing bias. These issues disappear if 7.5-month-olds are able to distinguish function words at phrase edges from other unstressed elements.

Robert Daland

Page 21: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

A computational implementation of Error-Selective Learning Philip Dilts and Anne-Michelle Tessier, University of Alberta

This paper describes the implementation of a basic version of Error-Selective Learning (ESL: Tessier 2007, 2009), an error-driven Optimality-Theoretic learner. ESL was designed to maximize the benefits of re-ranking constraints via the restrictive algorithms of Prince and Tesar (2004) and Hayes (2004), while also introducing some extra-grammatical factors that create gradual, incremental changes across stages of the learner’s development. ESL might therefore be a useful tool for modeling observed longitudinal data from real child phonologies – but only if implemented in a well-described way. In brief: ESL allows the learner to reserve judgment on the errors it encounters, storing the ranking conditions entailed by the input forms it receives in a temporary error Cache. When the errors in the Cache provide strong enough evidence for a new ranking, the most informative single error (stored a winner-loser pair with associated ERC,Prince 2002) is drawn from the Cache and added to the learner's permanent collection of ERCs (the Support). Whenever a new error is added to the Support, the learner uses a version of Biased Constraint Demotion (also incorporating Hayes 2004’s Specific Faithfulness bias) to choose a constraint ranking. The Cache is then cleared, and the process of learning repeats. When implementing ESL, we chose to take advantage of the robust and field-tested code of OTSoft (Hayes, Tesar and Zuraw 2003). A small Object-Oriented framework was created, using Microsoft Visual Basic 6, to help encapsulate the data and structure the interactions between the ESL algorithm and the existing OTSoft functions. In some cases, the new framework simply wrapped data structures and functions taken wholesale, or with slight modifications, from OTSoft. The user interacts with a simple application, selecting a file that contains a collection of tableaux, whose inputs represent the learner's lexicon. (This file is formatted as a .txt input file for OTSoft.) The user then enters a threshold – i.e. the number of errors driven by a single constraint that the learner considers sufficient to trigger re-ranking – and sets the algorithm running. The program reads the lexicon provided, storing it as an object of the TxtFile class. This lexicon object stores each input tableau as a string in an array of tableaux, with the common header stored once as a separate string. The TxtFile also exposes methods allowing the algorithm to extract these tableaux one at a time in the order in which they were read in the text file. The tableaux are then read using this function and passed into a cache object, one at a time. For each tableau, the algorithm first tells the cache object to use the OTSoft Digest function to convert the tableaux that are in the Cache into a more meaningful internal representation, and generate an array of ERCs and then asks the cache object if any constraint prefers enough Ls to exceeded the user-specified learning threshold. If so, the algorithm then (1) asks the cache object for its best ERC: one that has an L for the trigger constraint, and among those has the fewest Ls on other faithfulness constraints, and among those has the most Ws on other markedness constraints (see Tessier 2009); (2) passes this best ERC, returned by the cache object as a tableau encoded as a string, into a support object; (3) tells the support object to digest its txt representation into a more meaningful collection of violation arrays and run the BCD method provided by OTSoft on that collection of arrays. The algorithm receives the result of each round of BCD (i.e., the ranking that satisfies the conditions in the support), parses that result, and writes it to a file. As a result, the output of the program is a sequence of support-ranking pairs learned by ESL.

Philip Dilts and Anne-Michelle Tessier

Page 22: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

A single-stage computational model of phoneme category acquisition: Results from Inuktitut

Recent computational studies of speech sound category acquisition (de Boer and Kuhl 2003; Val-labha et al. 2007) implicitly continue a long-standing view of phoneme learning as a two-stage pro-cess, with learners first discovering surface phones, then subjecting these to a symbolic phonemeclustering procedure by identifying systematicity in environments (Harris 1951; Peperkamp et al.2006). This does not complete the acquisition of the phonological grammar, since the processesthat give rise to the systematicity must still be identifed. Here we argue for an alternate view oflearning and phonological cognition in which allophonic processes are treated as transformationsin continuous phonetic space with the learning problem as a simultaneous fit of categories andprocesses, so that a successful learner under our model will have learned categories and grammar.

In particular, we simulate phoneme learning in a corpus of Inuktitut vowel tokens (recorded F1–F2values); the task is to induce a mixture model: a finite set of discrete phoneme categories, eachof which is centred at some point in the phonetic space, and, subject to some probabilistic noise,generates some subset of the attested input points (Vallabha et al. 2007). Crucially, Inuktitut vowelsare subject to an allophonic process of retraction before uvular consonants, so that phonemic /i/and /u/ are realized as [e] and [o] when adjacent to a uvular consonant; the traditional approachwould find (perhaps by statistical methods) some number of (discrete) phonetic categories, then, inaddition, collapse the allophonically related phonetic categories to obtain phonemic categories.

The current approach, by contrast, operates by identifying general processes that are active in theacoustic space, and factoring out their effects prior to category acquisition. We compare the twoapproaches using in both cases a Gaussian mixture model as a neutral test-bed, using two views ofthe input data: first, a set containing raw acoustic data, and, second, data with a spectral correctionfor the lowering process. Importantly, we derive the spectral correction to the raw acoustics by con-sidering the average effect that uvulars have on all vowel tokens (figure 1); no category informationis required to derive the correction.

The results are clear. Without transformation of the input data, there is a poor fit to either pho-netic or phonemic categories (figure 2). However, with the correction for the interaction betweenuvular segments and vowel quality, the correct phonemic categorization is immediately obtained(figure 3). In contrast to current theories of phoneme category learning, the learner we present actsto learn not a set of phones and then phonemes, but instead a set of phonemes and processes. Inour Inuktitut test case, this approach to phonological learning is more successful than an approachthat attempts to identify phones. This result demonstrates the validity of this alternative conceptionof the phonological process, and suggests that acoustic category formation may actually appearunduly difficult if the effects of phonological processes are not taken into account. Having demon-strated a successful proof of concept, we then consider modifications to the procedure to replace the“omniscient” discovery of the transform with statistical methods for deriving the transform fromthe uncategorized data. We consider both an approach that assumes some initial categorization ofconsonants (following Eimas et al. 1971) and a more radically statistical approach.

1

Brian Dillon, Ewan Dunbar, and Bill Idsardi

Page 23: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

2500 2000 1500 1000

800

700

600

500

400

300

200

F2

F1 [-uvular]

[+uvular]

2500 2000 1500 1000

800

700

600

500

400

300

200

F2

F1

a

e

i

o

u

1

2

3

4

5

2500 2000 1500 1000

800

700

600

500

400

300

200

F2

F1

a

i u

1

2

3

Fig. 1 (left): Transform in acoustic space is derived by subtracting the average F1–F3 values beforeuvulars from values elsewhere; Fig. 2 (centre): Uncorrected data clusters poorly; Fig. 3 (right):Uvular-corrected data gives phonemic clusters using the same techniques.

References

DE BOER, BART, and PATRICIA KUHL. 2003. Investigating the role of infant-directed speech witha computer model. Acoustics Research Letters Online 4.129–134.

EIMAS, PETER, EINAR SIQUELAND, PETER JUSCZYK, and JAMES VIGORITO. 1971. Speechperception in infants. Science 171.303–306.

HARRIS, ZELLIG. 1951. Methods in Structural Linguistics. Chicago: University of Chicago Press.

PEPERKAMP, SHARON, ROZENN LE CALVEZ, JEAN-PIERRE NADAL, and EMMANUEL

DUPOUX. 2006. The acquisition of allophonic rules: Statistical learning with linguistic con-straints. Cognition 101.B31–B41.

VALLABHA, GAUTAM, JAMES MCCLELLAND, FERRAN PONS, JANET WERKER, and SHIGEAKI

AMANO. 2007. Unsupervised learning of vowel categories from Infant-Directed speech.Proceedings of the National Academy of Sciences 104.13273–13278.

2

Brian Dillon, Ewan Dunbar, and Bill Idsardi

Page 24: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Modeling the acquisition of cumulative faithfulness effects

Recent research has shown that cumulativity effects occur in the phonologies of natural languages (e.g., Pater, Bhatt & Potts, 2007; Farris-Trimble, 2008). Two main subtypes of cumulativity effects have been noted: cumulative markedness, in which multiple coincident markedness violations are not allowed although each individual violation is permitted, and cumulative faithfulness, in which faithfulness constraints may be violated singly, but those faithfulness violations may not occur together within a given domain. This is contrary to one of the main premises of optimality theory (OT; Prince & Smolensky, 1993/2004), in which hierarchical constraint ranking actively disallows gang-up effects of multiple lesser constraint violations. Harmonic grammar (HG; Legendre, Miyata & Smolensky, 1990), a theory based on numerically weighted constraints rather than hierarchically ranked ones, allows for the cumulativity effects that have been observed without predicting unattested effects (Pater et al., 2007). The summed violations in HG make this possible.

The acquisition of cumulative faithfulness effects (CFEs) is the focus of this paper. Consider, for instance, the data in (1) from Amahl’s early productions (Smith, 1973). At this stage, Amahl’s grammar allows neither fricatives nor final voiced obstruents. Final voiced obstruents are devoiced (1a), and fricatives are realized as stops (1b). However, when a voiced fricative occurs word-finally, the fricative is deleted, rather than undergoing both stopping and devoicing (1c). The question here is why do stopping and devoicing not combine to repair the final voiced fricative; alternatively, if deletion is a viable repair strategy, why are voiceless fricatives and final voiced stops not deleted as well? Both deletion and stopping/devoicing eliminate marked structure. Deletion, though, is a fell-swoop repair that can eliminate multiple marked structures with a single constraint violation. We argue that the cumulative violation of constraints against stopping and devoicing (IDENT[continuant] and IDENT[voice], respectively) is worse than the single violation of a constraint against deletion (MAX). This occurs in HG when the weight of MAX exceeds the weight of either of the IDENT constraints but is less than their sum. This weighting results in stopping or voicing being preferred over deletion, but deletion is preferred when stopping and voicing would otherwise coincide.

How might the child arrive at a stage in which cumulative faithfulness plays a role? We model this with a version of the Gradual Learning Algorithm derived from HG (e.g., Boersma & Pater, 2007; Jesney & Tessier, 2007; Pater, Jesney & Tessier, 2007). The HG-GLA allows for gradual changes in constraint weights as a result of a mismatch between the child’s own output and his perception of the adult production. The acquisition patterns of a child like Amahl are simulated, with the learner hearing words like ‘bed’, ‘bus’ and ‘noise’ and adjusting his grammar appropriately. We argue that because any marked structure can be repaired with a fell-swoop repair like deletion, while only certain structures can be repaired by voicing or stopping, the learner receives more evidence for increasing the weights of constraints against fell-swoop repairs, like MAX, than evidence for increasing the weights of more specific constraints, like IDENT[voice] or IDENT[continuant]. As a result, a stage occurs at which MAX has a weight greater than either of the IDENT constraints, but less than their sum, as shown in (2). A CFE is thus a natural stage on the way to fully faithful productions.

We demonstrate that certain assumptions are important to the model’s acquiring a CFE. First, the equal ranking of faithfulness constraints at the earliest stages of learning results in crucial variation in the child’s early productions. This early variation results in differential changes in weights among the faithfulness constraints, leading eventually to the CFE stage. Moreover, we show that in order to arrive at a CFE, markedness constraints must start with a weight higher than that of the faithfulness constraints, such that the weights of the faithfulness constraints have time to diverge before fully faithful productions are achieved. Likewise, the plasticity of the markedness constraints must be relatively small, allowing for gradual change. Finally, we examine the input necessary to acquiring a CFE, exploring how the characteristics of the English lexicon affect the phonological acquisition patterns of English-speaking children.

Ashley Farris-Trimble

Page 25: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

(1) Amahl (2;3 to 2;5): selected examples

a. Word-final stop devoicing

‘bed’ ‘cube’ ‘egg’ b. Fricative stopping

‘bus’ ‘sun’ ‘brush’

c. Word-final voiced fricatives delete

‘noise’ ‘cheese’ ‘please’

(2) Constraint weights over time

References Boersma, P. & J. Pater. 2008. Convergence properties of a gradual learning algorithm for Harmonic

Grammar. Ms., University of Amsterdam and University of Massachusetts, Amherst. Farris-Trimble, A. W. (2008). Cumulative faithfulness effects in phonology. Unpublished doctoral

dissertation, Indiana University, Bloomington. Jesney, K. C. & A.-M. Tessier (2007). Re-evaluating learning biases in Harmonic Grammar. University of

Massachusetts Occasional Papers 36: Papers in Theoretical and Computational Phonology. M. Becker. Amherst, MA, GLSA.

Legendre, G., Y. Miyata & P. Smolensky (1990). Harmonic Grammar - A formal multi-level connectionist theory of linguistic well-formedness: Theoretical foundations. Proceedings of the Twelfth Annual Conference of the Cognitive Science Society. Cambridge, MA, Lawrence Erlbaum:

884-891. Pater, J., R. Bhatt & C. Potts (2007). Linguistic optimization. Ms., University of Massachusetts, Amherst.

[ROA 924]. Pater, J., K. C. Jesney & A.-M. Tessier (2007). Phonological acquisition as weighted constraint

interaction. Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition-North America. A. Belikova, L. Meroni and M. Umeda. Somerville, MA, Cascadilla Proceedings Project: 339-350.

Prince, A. & P. Smolensky (1993/2004). Optimality Theory: Constraint Interaction in Generative Grammar. Malden, MA, Blackwell.

Smith, N. V. (1973). The acquisition of phonology: A case study. Cambridge, UK, Cambridge UP.

CFE stage

Ashley Farris-Trimble

Page 26: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Accidentally-true constraints in phonotactic learning

Bruce Hayes, Department of Linguistics, UCLA1

The phonotactic learning system proposed by Hayes and Wilson (2008) follows the principle of the inductive baseline: it tries to learn phonotactics using as few principles of Universal Grammar (UG) as possible. The leading idea is that one could learn from such a system’s failures just as much as from its successes. For instance, the simplest version of the system fails to learn patterns of vowel harmony or unbounded stress, but it becomes able to learn them when amplified with UG principles corresponding to classical autosegmental tiers and metrical grids—thus forming a new kind of argument for such representations.

There is a second way in which failures of the baseline system might be informative: it could learn

too much rather than too little. The baseline system involved a rather permissive concept of what can be a phonotactic constraint: a constraint’s structural description is simply a sequence of feature matrices, each representing one of the natural classes of segments in a language. Where there are C natural classes and constraints are allowed to have n matrices, there will be Cn possible constraints. In actual practice, this can be a very large number, on the order of one billion.

With such a large hypothesis space, it is imaginable that the system might find constraints that are

“accidentally true”: they have few or no exceptions in the lexicon, but are not apprehended by native speakers and play no role in their phonotactic intuitions. Hayes and Wilson’s learning simulation for the phonotactics of Wargamay may have done this. While the 100 constraints the system learned included 43 that successfully recapitulate the known phonotactic restrictions of this language (Dixon 1981), a further 57 constraints were discovered that struck the authors as complex and phonologically mystifying. An example is *[–approx, +cor][+high,+back, –main][–cons], which forbids sequences of coronal noncontinuants ([d, !, n, "]), followed by unstressed or secondary-stressed [u, u#], followed by a vowel or glide. Almost any phonologist would agree that this an unlikely configuration for a language to forbid.

Do real speakers apprehend constraints of this kind? I will report an experimental study now in

progress that addresses this question for English. When trained on English data, the Hayes/ Wilson system behaves just as it did with Wargamay, learning both sensible and accidental-seeming constraints. The current experiment uses 20 nonce-word quadruplets, each containing: (1) a word that violates exactly one constraint, of the “accidental” type; (2) a word that is violation-free but otherwise similar to (1); (3) a word that violates exactly one constraint that would be considered by phonologists to be natural (e.g. a sonority-sequencing constraint) and whose weight is roughly the same as for the constraint violated by (1); (4) a violation-free control word similar to (3).

If the model is correct, then the overall difference in participant ratings between words of categories

(1) and (2) should be the same as that between words of category (3) and (4). If found, this would be a surprising confirmation of the model and we will report it as such. However, we anticipate that the (1)-(2) difference will be smaller than the (3)-(4) difference. At this point, we can explore two hypotheses that might explain the disparity: (1) a statistical approach based on comparing the explanatory power of added constraints (Wilson 2009); (b) UG-based approaches, under which real language learners are biased to learn only natural constraints or to assign them relatively high weights.

1 This talk reports work done in collaboration with Jamie White.

Bruce Hayes

Page 27: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Hayes abstract p. 2

References

Dixon, Robert M. W. 1981. Wargamay. In Handbook of Australian languages, volume II, ed. Robert M. W. Dixon and Barry J. Blake, 1–144. Amsterdam: John Benjamins.

Hayes, Bruce and Colin Wilson (2008) A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39: 379-440.

Wilson, Colin and Marieke Obdeyn (2009) Simplifying subsidiary theory: statistical evidence from Arabic, Muna, Shona, and Wargamay. Ms., Johns Hopkins University.

Bruce Hayes

Page 28: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Feature-Based Generalization

We propose a simple and mathematically sound n-gram-like model which e!ciently and compo-sitionally computes P (NC | NC

!) (NC and NC! are natural classes) based on the estimates of the

conditional probability of a segment given a single feature and its value, e.g. P (e | [+consonantal]).Consequently even though n binary features yields an upper bound of 2n natural classes, the currentmodel only requires 2n ! |"| parameters, where |"| is the alphabet size.

The key is how probabilities are compositionally determined from the feature-based factors ofthe model. We illustrate with a simple feature system (1). (2) shows the feature-based bigramfactors and (3) the machine obtained by their composition. Although (3) is a state-diagram whichappears identical to an atomic bigram model, training happens over the feature-based factors in (2)and not over the composed machine in (3) (see Vidal et al. (2005a,b) for training PFSAs). In factthe machine in (3) need not ever be constructed since the probabilities on its transitions are givenby the equation in (5), which guarantees a well-formed probability distribution at each state, andthus over all strings. (See example (4).) It follows that the equation in (6) gives the P (NC | NC

!).Analysis reveals the model captures the intuition that featurally-similar sounds behave similarly.

For example, [tr] and [dr] are both well-formed onsets in English. Since word-initial [dr] clustersoccur less than half as often in the CMU Pronouncing Dictionary, an atomic bigram model computesP (r | d) to be about four times less likely than P (r | t). But the factored-feature bigram modelputs the two probabilities about the same. This happens because [t] and [d] share many featuresand because many words begin with both voiced and voiceless sounds.

The model’s predictions do not always match actual phonologies. For example, the factored-feature bigram model, when trained on the CMU word-initial onset clusters, predicts that wordsbeginning with [N] ought to be as likely as words beginning with [m] or [n]. This makes sense fromthe way the model generalizes: since nasals like [m,n] and velars like [k,g] begin words, so can [N].

Rather than being problematic, these cases are instructive. It becomes clear that the behavior of[+nasal,+dorsal] segments does not follow from the behavior of [+nasal] and [+dorsal] segments inEnglish. Albright (2009) suggests that learning consists of “two stages of evaluation: a preliminaryinitial assessment of probability of segment combinations and subsequent grammatical evaluation ofthe likelihood of featural cooccurences.” We agree and the factored-feature model makes it possiblefor learners to identify places where initial estimations need to be revised. In the case above, thisis achieved provided learners recognize no words begin with [N].

Unlike previous attempts at feature-based generalization, the behavior of the factored-featuremodel is analytically transparent. Although Hayes and Wilson (2008) partially attribute the suc-cess of their model to phonological features, there are two reasons to be cautious of this claim.First, even though we are able to replicate their model’s performance using features (CMU onsetclusters, correlation r=0.946 against Schole’s (1966) human subjects), we are unable to replicatetheir reported result (r=0.885) from an identical model with no features. In this case, we obtainr=0.937 using their software and data. Second, for any maxent grammar G1 whose constraints arestated in terms of featural representations, there is another maxent grammar G2 with constraintsstated over segments which describes an equivalent distribution (7). For these reasons, we believetheir model’s success is not due to their use of phonological features. On the other hand, Albright(2009) shows that segment-based and feature-based models contribute di#erently to learning, butthe exact e#ects and limits of feature-based generalization remain unknown.

In contrast, the factored-feature model is mathematically sound, simple, e!cient, and showsexactly how the likelihood of segment combinations follows directly from their featural makeup andthe estimates obtained from the corpus. As already mentioned, its incorrect predictions lead toclearly defined problems with likely solutions along the lines discussed by Albright (2009).

Jeffrey Heinz and Cesar Koirala

Page 29: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

(1)

F G

a + -b + +c - +

(2) -F

c

+F

a

b

c

ab

-G

a

+G

b

c

a

bc

Machine F Machine G

(3)

+ F +G

b+F -G

a

-F+Gcb

a

c

b

a

c

-F-G

Machine F! G

(4) P (a | b) = P (a | [+F,+G])

=P (a | [+F ]) ! P (a | [+G])!

x"{a,b,c} P (x | [+F ]) ! P (x | [+G])

For all x " {a, b, c}, P (x | [+F]) andP (x | [+G]) are parameters of the model givenby (estimates taken over) Machines F and G.

(5) Let F be some set of features, a0 a segment, and P (ai | f) are the estimable parameters. Then:

P (a0 | F) =$f"FP (a0 | f)!

ai"!$f"FP (ai | f)

(6) Consider two feature bundles, F and F!, which describe natural classes NC and NC!. Then:

P (NC | NC!) = P (F | F

!) ="

a"NC

P (a | F!)

(7). We sketch a proof by example, assuming (1) is a fragment of a larger feature system.

G1 G2

constraint weight constraint weight

*[+F][+G] w1 *ab w1

*[-G][-F] w2 *ac w1 + w2

*bb w1

*bc w1

For any constraint with weight w which isadded (e.g. *[+X], or *[+X][+Y]) w addedto the weight of all segmental sequences whichviolate it, (adding more segmental constraintswith weight w if necessary). This procedureensures that G1 and G2 assign the same max-ent scores to all words.

References

Albright, Adam. 2009. Feature-based generalisation as a source of gradient acceptability. Phonology 26:9–41.

Hayes, Bruce, and Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning.Linguistic Inquiry 39:379–440.

Vidal, Enrique, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C. Carrasco. 2005a.Probabilistic finite-state machines-part I. IEEE Transactions on Pattern Analysis and Machine Intelligence27:1013–1025.

—– 2005b. Probabilistic finite-state machines-part II. IEEE Transactions on Pattern Analysis and Machine Intelli-

gence 27:1026–1039.

Jeffrey Heinz and Cesar Koirala

Page 30: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Learning Gradient Long-Distance Phonotactics by Estimating Strictly PiecewiseDistributions

This paper presents a procedure for learning gradient long-distance phonotactic patterns which

1. does not require an independent theory of tiers contra Hayes and Wilson (2008)

2. does not require the additional structure of OT grammars;

3. is e!cient and provably correct.Examples of long-distance phonotactic patterns include consonantal harmony and vowel harmony.An example of sibilant harmony from Samala (Chumash) is given in (1). Such patterns are notalways categorical (Hayes and Londe, 2006).

Strictly k-Piecewise (SPk) distributions describe structured, well-formed probability distribu-tions. They are the probabilistic variant of SPk languages described by Rogers et al. (to appear).There, SPk grammars decide well-formedness on the basis of discontiguous subsequences of lengthk. Heinz (2007, 2009) argues that patterns generated by SP2 grammars nontrivially approximatethe typology of consonantal harmony patterns, though the learner presented there only handlescategorical patterns.

SP2 distributions allow one to compute the conditional probability of a symbol a given the set S

of preceding symbols, denoted P(a | S). Following earlier research which directly correlates likelihoodwith well-formedness (Coleman and Pierrehumbert, 1997; Hayes and Wilson, 2008), the idea is thatwords like sotos are more well-formed than words like sotoS provided that P(s | {s,o,t}) > P(S | {s,o,t}).

Generally, directly estimating conditional probabilities like P(a | S) is not feasible. This isbecause given an alphabet ", there are 2|!| sets S which must be kept track of. However, thisinfeasibility is a consequence of a strong independence assumption, e.g. that P(s | {s,t,o}) iscompletely independent of P(s | {s,d,o}).

Crucially, SPk distributions make a dependency assumption which gets around this problem:P(a | S) can be computed directly from the probabilities P(a | {b}), for all b ! S. This accords withour intuitions that the reason sotoS is ill-formed is not because P(S | {s,t,o}) is vanishingly small,but because P(S | {s}) is. Using the finite-state representation of SP2 grammars (Rogers et al., toappear), it is straightforward to establish Equations 1 and 2, which define well-formed probabilitydistributions over P(a | S) (for all a ! ", and S " ") and all logically possible words, respectively.Importantly, there are only (|"| + 1)k parameters to estimate SPk distributions, i.e. which is onpar with n-gram models (Jurafsky and Martin, 2008).

Standard, provably correct, techniques for estimating regular distributions from finite-staterepresentations (Vidal et al., 2005a,b; de la Higuera, in press) are used to estimate parametersP(a | {b}). This procedure was implemented and then run on lexical corpora of Samala (Chu-mash) (Applegate, 2007, 1972), Finnish (Goldsmith and Riggle, to appear), and English (CMUpronouncing dictionary).

The results show the e#ectiveness of the procedure. In Chumash, the procedure reveals thevirtually exceptionless sibilant harmony pattern (Table 1). In Finnish, it reveals the backnessharmony as well as the transparency of front unrounded vowels (Table 2). Similar to a finding inGoldsmith and Riggle (to appear), the harmony pattern is not only not categorical (as there aremany lexical exceptions) it is asymmetrically biased: front vowels are far less likely to follow backvowels than vice versa. In English, the procedure reveals a weak long-distance constraint againstlaterals (Table 3), consistent with earlier studies, e.g. Martin (2007).

In all languages, additional long-distance constraints among dissimilar segments are found aswell. Whether or not all such constraints are internalized by native speakers needs to be investigated(Becker et al., 2008). Extensions to the model which make us of phonological features and Bayesianpriors are briefly discussed.

Jeffrey Heinz

Page 31: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

(1) [StoyonowonowaS] ‘it stood upright’; cf. *[stoyonowonowaS] (Applegate, 1972, p.72)

Equation 1. P (a | S) ="a!"SP (a|{a!})!

ai"!"

a!"SP (ai|{a!})

for |S| > 1

Equation 2. P (a0 . . . an) = P (a0 | #) · P (a1 | a0) · P (a2 | {a0, a1}) ·

. . . · P (an | {a0, . . . , an!1}) · P (END | {a0, . . . , an})

Eq. 2 note: I assume a uniform distribution over P (ai | #) (what begins a word) though theseparameters can be estimated directly.

xP (x | {b})

s c S>tS

s 0.0325 0.0051 0.0013 0.0002c 0.0212 0.0114 0.0008 0.

b S 0.0011 0. 0.067 0.0359>tS 0.0006 0. 0.0458 0.0314

Table 1: 5,119 Samala (Chumash) words from Applegate (p.c.). These sounds contrast with aspi-rated and glottalized variants, but those distinctions are collapsed here.

xP (x | {b})

u o a y oe ae i eu 0.0555 0.0399 0.1181 0.0057 0.0022 0.0068 0.0842 0.0721o 0.0462 0.0328 0.1202 0.0046 0.0023 0.0066 0.1101 0.0698a 0.0446 0.0309 0.1293 0.0048 0.0017 0.0068 0.0945 0.0603y 0.0148 0.016 0.0377 0.0444 0.0258 0.0657 0.091 0.0716

b oe 0.0231 0.0265 0.0578 0.0299 0.0136 0.0525 0.0953 0.0666ae 0.0138 0.0137 0.0337 0.0336 0.0148 0.0864 0.0913 0.0725i 0.0296 0.0305 0.0965 0.0114 0.0061 0.0239 0.0877 0.0792e 0.0311 0.0258 0.0774 0.0142 0.0047 0.0312 0.0894 0.0714

Table 2: Results of SP2 estimation on 44,040 Finnish words from Goldsmith and Riggle (to appear).

xP (x | {b})

l r

l 0.014 0.0535b r 0.0364 0.0343

Table 3: Results of SP2 estimation on CMU dictionary (129,463 words) where rhoticized schwa and[r] have been collapsed. All other things being equal, the model predicts [r] following [l] is aboutfour times more likely than [l].

Selected ReferencesRogers, James et. al (to appear) On languages piecewise testable in the strict sense. In Proceedings of the 11thMeeting of the Association for Mathematics of Language.Vidal, Enrique et. al (2005a) Probabilistic finite-state machines-part I. IEEE Transactions on Pattern Analysis andMachine Intelligence 27:1013–1025.—– (2005b) Probabilistic finite-state machines-part II. IEEE Transactions on Pattern Analysis and Machine Intelli-gence 27:1026–1039.

Jeffrey Heinz

Page 32: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Restrictive Learning with Distributions over Underlying Representations

Overview: Language learners must acquire both a set of underlying representations (URs) and a

grammar. This paper argues that these two components are best learned when weighted lexical

constraints (Boersma 1999, Apoussidou 2007) define a distribution over URs (cf. exemplar models – e.g.,

Pierrehumbert 2001, 2003), and the learning algorithm actively seeks a restrictive solution. No

distinction between the learning of suppletive allomorphy and of “regular” URs is necessary in this

framework. Using simulations within a log-linear model of grammar (Goldwater & Johnson 2003), we

demonstrate that a learner incorporating these assumptions achieves considerable success in matching the

target language data and extends in a restrictive fashion to unobserved types of input forms. Furthermore,

this approach can straightforwardly model a range of patterns that otherwise require abstract URs or other

devices.

Distributions over URs: It is commonly assumed that learners faced with an alternation like [bat] ‘bush’

~ [bada] ‘bushes’ must determine which one of the two surface allomorphs is the target UR for the

meaning ‘bush’. We demonstrate that a strict preference for a single underlying form in such situations is

not required when the grammar is otherwise restrictive (cf. Apoussidou 2007, Eisenstat 2009). Instead,

the learner can settle upon a distribution over a set of URs posited based on the observed set of surface

allomorphs. Selection of a UR on a given iteration of EVAL then comes from interaction of positively-

formulated lexical constraints like those in (1) with standard Markedness and IO-Faithfulness constraints.

The tableaux in (2) demonstrate the basic process; the UR that can be mapped faithfully while best

satisfying the Markedness constraints is selected in each case. This basic allomorphic solution easily

extends to a range of cases, including those that have been claimed to require crucial underspecification –

e.g., Turkish devoicing (see Kager 2009 for a related proposal).

Learning Restrictively: We implement these constraints using a log-linear model of grammar coupled

with a learner that seeks to maximize the probability of observed output forms while biasing Faithfulness

constraints toward low weights (similar to the R-measure of Prince & Tesar 2004; see also Jesney &

Tessier to appear). We formally implement this bias using the objective function in (3), which maximizes

the difference between the summed weights of Faithfulness constraints and the summed weights of non-

Faithfulness constraints. The resulting grammar is restrictive even when given novel inputs that violate

the phonotactic restrictions of the target language.

Success with Complex Systems: To test our learner we used a version of the Paka language argued by

Tesar (2006) to require abstract URs. The basic system is given in (4). Crucially, only four surface

patterns are seen in the data – [!paka], [pa!ka], [!pa"ka], and [pa!ka"]; no form surfaces with a long

unstressed vowel. Our learner was trained using the set of observed forms, and then tested against the

twelve inputs obtained by concatenating a member of the set {/pa/, /pa"/, /!pa/, /!pa"/} with a member of

the set {/ka/, /ka"/, /!ka/, /!ka"/}. The test was successful; due to the restrictiveness bias implemented in

learning, each input from this expanded set mapped to a licit output form with a probability greater than

.999. No reliance on abstract URs was required.

Lexically-Conditioned Variation: Because this model encodes probabilities over a set of URs, it

extends naturally to patterns of lexically-conditioned variation that elude systems which rely on single

abstract URs. For example, French schwa-deletion is conditioned by both phonological constraints and

lexical idiosyncrasy – e.g., s(e)mestre has a lower probability of deletion than s(e)maine (Coetzee & Pater

to appear, Eychenne 2006). This is easily modeled here through the association of different weights with

the relevant lexical constraints (for a related approach, see Pierrehumbert 2001). This is a significant

advantage, given the prevalence of such patterns in both adult and child phonologies (e.g., Menn, Schmidt

& Nicholas 2009). We conclude that this approach offers considerable promise, allowing the

simultaneous learning of URs and grammars to be effectively modeled, and providing novel solutions for

a range of problems faced by single-UR approaches.

Karen Jesney, Joe Pater, and Robert Staubs

Page 33: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

(1) Lexical constraints posited based on [bat] ‘bush’ and [bada] ‘bushes’

BUSH!/bat/: Assign a violation mark (–1) if the meaning BUSH is not associated with the UR /bat/

BUSH!/bad/: Assign a violation mark (–1) if the meaning BUSH is not associated with the UR /bad/

(2) a. BUSH

*CODAVOICE

w = 4

IDENT

w = 3

*VTV

w = 2

BUSH!/bat/

w = 1

BUSH!/bad/

w = 1 H

! /bat/![bat] –1 –1

/bat/![bad] –1 –1 –1 –8

/bad/![bat] –1 –1 –4

/bad/![bad] –1 –1 –5

b. BUSH+PLURAL

*CODAVOICE

w = 4

IDENT

w = 3

*VTV

w = 2

BUSH!/bat/

w = 1

BUSH!/bad/

w = 1 H

/bat+a/![bata] –1 –1 –3

/bat+a/![bada] –1 –1 –4

/bad+a/![bata] –1 –1 –1 –6

! /bad+a/![bada] –1 –1

(3) Objective function for restrictiveness: penalizing high weights while maximizing the difference

between the summed weights of Faithfulness constraints (F) and non-Faithfulness constraints (M)

(4) Test language from Tesar (2006: 868); modeled in our system without abstract URs

/pa-/ /pa!-/ /"pa-/ /"pa!-/

/-ka/ "paka "pa!ka "paka "pa!ka

/-"ka/ pa"ka pa"ka "paka "pa!ka

/-"ka!/ pa"ka! pa"ka! "paka "pa!ka

References: Apoussidou, D. 2007. The learnability of metrical phonology. PhD dissertation. University of Amsterdam.

Boersma, P. 1999. Phonology-semantics interaction in OT, and its acquisition. In R. Kirchner, W. Wikeley & J.

Pater (eds.), Papers in Experimental and Theoretical Linguistics 6: 24-35. Edmonton: University of Alberta.

Coetzee, A. & J. Pater. to appear. The place of variation in phonological theory. In J. Goldsmith, J. Riggle & A. Yu

(eds.), Handbook of Phonological Theory, 2nd

edition. [ROA-946].

Eisenstat, S. 2009. Learning Underlying Forms with MaxEnt. MA thesis. Brown University.

Eychenne, J. 2006. Aspects de la phonologie du schwa dans le français contemporain: optimalité, visibilité

prosodique, gradience. PhD dissertation. Université de Toulouse-Le Mirail.

Goldwater, S. & M. Johnson. 2003. Learning OT rankings using a Maximum Entropy model. In Proceedings of the

Workshop on Variation in Optimality Theory, 111-120. Stockholm University.

Jesney, K. & A.-M. Tessier. to appear. Biases in Harmonic Grammar: the road to restrictive learning. Natural

Language and Linguistic Theory.

Kager, R. 2009. Lexical irregularity and the typology of contrast. In K. Hanson & S. Inkelas (eds.), The Nature of

the Word: Essays in Honor of Paul Kiparsky. Cambridge, MA: MIT Press.

Menn, L., E. Schmidt & B. Nicholas. 2009. Conspiracy and sabotage in the acquisition of phonology: dense data

undermine existing theories, provide scaffolding for a new one. Language Sciences 31: 285-304.

Pierrehumbert, J. 2001. Exemplar dynamics: Word frequency, lenition, and contrast. In J. Bybee & P. Hopper (eds.),

Frequency Effects and the Emergence of Lexical Structure, 137-157. Amsterdam: John Benjamins.

Pierrehumbert, J. 2003. Phonetic diversity, statistical learning, and acquisition of phonology. Language and Speech

46(2-3): 115-154.

Prince, A. & B. Tesar. 2004. Learning phonotactic distributions. In René Kager, Joe Pater & Wim Zonneveld (eds.),

Fixing Priorities: Constraints in Phonological Acquisition, 245-291. Cambridge: Cambridge University Press.

Tesar, B. 2006. Faithful contrastive features in learning. Cognitive Science 30(5): 863-903.

Karen Jesney, Joe Pater, and Robert Staubs

Page 34: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Humans and models learning palatalization patterns in miniature artificial languages: In support of particular salience of typical product characteristics

All theories of grammar specify the types of generalizations that a human language user relies on in using language productively and thus restrict the human language learner to pay attention only to certain types of patterns in the data to which s/he is exposed. The present work contrasts alternative theories of grammar in their ability to account for experimental results from experiments on adult learning of miniature artificial languages when exposed either to singular-plural pairs (Bybee & Newman 1995) or individual singular and plural forms presented in random order (Peperkamp et al. 2006). The theories are 1) the Minimal Generalization Learner (MGL), which posits source-oriented generalizations (rules), Albright & Hayes (2003), 2) stochastic Optimality Theory learned using the Gradual Learning Algorithm (sOT), which posits negative product-oriented generalizations (markedness constraints) plus source-oriented identity relations (faithfulness constraints), Boersma (1997); 3) the UCLA Phonotactic Learner (PL), which posits negative product-oriented generalizations, Hayes & Wilson (2008), 4) Observed-expected calculations for segment co-occurrences over product forms leading to negative product-oriented generalizations against unobserved sequences (FPB), Frisch, Pierrehumbert & Broe (2004), and 5) Network Theory (NT), which posits positive product-oriented generalizations, Bybee (2001). Computational results are presented for the first three theories. The theories are applied to the four languages featuring velar palatalization before the plural suffix –i in (1) where M, N, and K are different numbers of word pairs exemplifying a source-product mapping.

(1) The theories are shown to make the following predictions for the state of the learner after s/he is exposed to one of the four languages:

1) MGL, sOT and NT predict that the subjects who attach –i to {p;t} least should palatalize consonants before –i most. This prediction is confirmed experimentally with human learners (Figure 1). The opposite prediction is made by FPB and PL. No difference is predicted by non-stochastic theories like classical OT or rule-based generative phonology. Examples of {p;t}{p;t}i are found to strongly increase incidence of kki and weakly but significantly decrease incidence of

kti. The latter result is specifically predicted if learners calculate conditional statistics (‘how

often is –i preceded by [t]?’) over products. The negative correlation is expectedly weaker than the

positive one since conditional statistics require more training to estimate (Warker & Dell 2006).

2) Examples of tti exemplify both adding –i without changing the consonant, which would result

in a failure if palatalization if applied to [k], and ‘plurals should/tend to end in [ti]’, which

produces palatalization under NT. The addition of examples of tti increases the productivity of

alveolar palatalization in both training tasks and increases the productivity of velar palatalization when corresponding singulars and plurals are not next to each other (Figure 2). The increase in productivity of palatalization is predicted by NT, FPB, and PL, which (correctly) hypothesize that characteristics of products are more salient than characteristics of source-product mappings. The examples are (incorrectly) expected to disfavor palatalization by MGL and sOT.

Together these results provide evidence for positive product-oriented generalizations (Network Theory, Bybee 2001; see also Stemberger & Bernhardt 1999), including both first-order and second-order (conditional) generalizations (Warker & Dell 2006), i.e., typical characteristics of products appear to be more salient than typical characteristics of source-product mappings.

Singular Plural Language 1 Language 2 Language 3 Language 4

{k;g} {t;d}i M

{t;d;p;b} {t;d;p;b}i N 3N N 3N {t;d;p;b} {t;d;p;b}a 3N N 3N N

{t;d} {t;d}i 0 K

Vsevolod Kapatsinski

Page 35: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Figure 1: The effect of adding –i to {t;p} on palatalization (Singulars and plurals in random order):

Figure 2: The effect of adding –i to t on palatalization (Singulars and plurals in random order,

the dotted line shows the median):

References: Albright, A., & B. Hayes. 2003. Rules vs. analogy in English past tenses: A computational / experimental

study. Cognition 90: 119-161. Becker, M., & L. Fainleib. 2009. The naturalness of product-oriented generalizations. Rutgers Optimality

Archive, 1036-0609. Boersma, P. 1997. How we learn variation, optionality, and probability. Proceedings of the Institute of

Phonetic Sciences of the University of Amsterdam 21: 43–58. Bybee, J. L. 2001. Phonology and language use. Cambridge: Cambridge University Press. Frisch, S. A., J. B. Pierrehumbert, & M. B. Broe. 2004. Similarity avoidance and the OCP. Natural

Language and Linguistic Theory 22: 179-228. Hayes, B., & C. Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning.

Linguistic Inquiry 39: 379-440. Stemberger, J. P., & B. H. Bernhardt. 1999. The emergence of faithfulness. In B. MacWhinney, ed. The

emergence of language, 417-446. Mahwah, London: Erlbaum. Warker, J. A., & G. S. Dell. 2006. Speech errors reflect newly learned phonotactic constraints. Journal of

Experimental Psychology: Learning, Memory, and Cognition 32: 387-398.

Vsevolod Kapatsinski

Page 36: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

An exemplar-based speech production model

Exemplar Theory has attracted increasing interest from phonologists and phoneticians,

principally due to its ability to handle incremental sound change and token frequency effects

in sound patterns. Its essence is massive storage of exemplars: memories of individual

experiences of speech, including fine phonetic detail; linguistic categories are not represented

as symbols, but as 'clouds' of exemplars associated with category labels. Exemplar-based

speech processing presumes some calculation of similarity over these exemplars.

However, speech signals are variable-length time series data: it is therefore not obvious

how to align the exemplars (or portions thereof) with one another in order to calculate

similarity. I present a solution to this problem, PEBLS (Phonological Exemplar-Based

Learning System), and show that it is capable of generalising over a lexicon of real speech

exemplars, in a way that is sensitive to the token frequency of patterns. Beginning with a

corpus of speech recordings (processed into mel frequency cepstral coefficient vectors), a

frame-wise transition matrix is precomputed: a transition from frame A to frame B is high-

valued if A precedes B in some exemplar, or A precedes B' where B' is similar to B; likewise,

if A' precedes B and A' is similar to A. The matrix thus represents a network of (real-valued)

potential transitions from parts of one exemplar into parts of all other exemplars in the

lexicon.

In speech production, PEBLS computes an optimal path through the transition matrix,

using dynamic programming, with a bias towards frames associated with the target word.

Preliminary results indicate that this system is capable of generating natural sounding outputs

for particular words by concatenating discontinuous portions of the corpus. Crucially, the

'optimal' output path here reflects not merely the highest similarity score, but also a measure

of confidence, i.e. the extent to which the path conforms to patterns which are strongly

instantiated in the corpus. This is done by hierarchically clustering the scores at each step in

the dynamic programming algorithm, and selecting the cluster that maximises a function of

mean similarity value, size, and variance. This version of PEBLS, incorporating a bias

towards frames of all tokens of the target word, avoids Kirchner & Moore's (2008) arbitrary

selection of a particular exemplar of the target word to serve as the input to production.

Robert Kirchner

Page 37: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

An online model of the “early stage” of the acquisition of phonology

INTRODUCTION — Hayes’ (2004) “early stage” (ES) of the acquisition of phonology has two properties:i) the learner knows no morphology yet, i.e. can only posit fully faithful underlying forms; ii) the learnerknows some phonotactics already, i.e. entertains a restrictive grammar among the ones compatible with thepositive evidence. Current computational models of the ES are essentially batch algorithms, as in Hayes(2004), Prince and Tesar (2004) and Tessier (1995). This paper develops the first on-line model of the ES.

CHALLENGE — Existing provably convergent on-line algorithms for OT perform only constraint demotion,either gradually, as in Boersma (1998, pp. 323-327); or non-gradually, as in Tesar and Smolensky (1998).Yet, demotion-only algorithms cannot model the ES. Since the learner posits faithful underlying forms, nofaithfulness (F-) constraint is ever loser-preferring. Thus, demotion-only never modifies the initial rankingof F-constraints. Using Prince and Tesar’s (2004) Azba typology as an example, I show that this is wrong,no matter the initial ranking of the F-constraints. Thus, an on-line model of ES needs to perform promotiontoo. Yet, many authors have noticed that promotion might be fouled by the “credit problem”: a single ERCdoes not provide unambiguous information on which one of the many winner-preferring constraints shouldbe credited for accounting in the end for that ERC, and should thus be promoted. Furthermore, a recentcounterexample due to Pater (2008) against the promotion-demotion update rule of Boersma (1997) showsthat the credit problem might indeed preclude convergence. We are thus faced with the following challenge:on the one hand, we want to perform some constraint promotion in order to model the ES; on the other hand,we don’t want to perform constraint promotion, because of the credit problem.

IDEA — The promotion component of Boersma’s (1997) non-convergent update rule is (1). Crucially, thisrule does not distinguish between two ERCs such as (2): in both case, winner-preferring constraints arepromoted by 1. This is not a good idea. ERC (2b) is easy because it unambiguously says that C1 mustbe promoted. ERC (2a) is hard because it ambiguously says that either C1 or C2 should be promoted. Agood update rule should reflect this difference. I thus propose to promote C1 by 1 in the case of (2b) andto promote both C1 and C2 only by 1/2 in the case of (2a). In the general case, I propose (3). I prove thatupdate rule (3), contrary to Boersma’s (1), is guaranteed to convergence. The crucial step of the proof isto show that the components of the current ranking vector entertained by the algorithm cannot become toolarge, contrary to what happens in Pater’s counterexample.

SIMULATIONS — I have run the the promotion-demotion update rule (3) on two case studies. The first casestudy are the 41 languages in the Korean typology of Hayes (2004). I show that the model gets to the correctrestrictive ranking for all languages. The second case study are the 37 languages in the Azba typologyof Prince and Tesar (2004). I show that the model gets to the correct restrictive ranking in 35 cases, thusoutperforming a dedicated algorithm such as Hayes’s (2004) Low Faithfulness Constraint Demotion.

EXPLANATION — Consider for instance the case of Korean, specifically discussed in Hayes (2004). Therelevant set of input ERCs is (4). The target ranking must have F1 above F3, in order for M1 and M2 to beranked in between. The corresponding dynamics of the ranking values entertained by the model in the caseof uniform sampling of the input forms is (5). It shows that, as desired, F1 is promoted at a faster rate thanF3, so M1 and M2 drop just below F1 and then stop before crossing F3 too, thus yielding the right rankingconfiguration F1 ! M1, M2 ! F3. Why is it that F1 grows faster than F3? The 4th ERC in (4) triggers atmost one update (because M4 stays on top). Thus, the relative promotion speed of F1 and F3 is determinedby the relative number of updates triggered by the first ERC (that promotes F1 but not F3) vs the secondERC (that promotes F3 but not F1). I prove that, no matter the probability distribution of the input forms,the two constraints M1 and M2 stay close and keep oscillating one around the other, as also illustrate in (5).Thus, while the 1st ERC triggers an update every time it is sampled, the 2nd ERC does not (because manytimes it is sampled, M2 is ranked above M1 and thus the ERC triggers no update).

1

Giorgio Magri

Page 38: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

(1) Promotion component of Boersma’s (1997) promotion-demotion update rule: given an ERCnot compatible with the current ranking (vector), promote all winner-preferring constraints by 1.

(2) Two very different ERCs:

a.! C1 C2 C3 C4

W W E L"

b.! C1 C2 C3 C4

W E E L"

(3) Promotion component of my new promotion-demotion update rule: given an ERC not compati-ble with the current ranking (vector), promote all winner-preferring constraints by 1/w(ERC), wherew(ERC) is the total number of winner-preferring constraints in the given ERC.

(4) The relevant set of ERCs for the case aspiration/voicing in Korean, after Hayes (2004):

#

$$%

F1 F2 F3 F4 M1 M2 M3 M4

(tha, tha, ta) W W L(ada, ada, ata) W W L W

(atha, atha, ada) W W W W W L L(atha, atha, adha) W W W L W

&

''(

)ERCs relevant for the relative speed of F1 and F3

F1 = IDENT[ASPIRATION]/ONSETF2 = IDENT[ASPIRATION]F3 = IDENT[VOICE]/ONSETF4 = IDENT[VOICE]

M1 = ![-SONORANT, +VOICE]M2 = ![+VOICE][-VOICE][+VOICE]M3 = ![+SPREAD GLOTTIS]M4 = ![+SPREAD GLOTTIS, +VOICE]

(5) The dynamics of the ranking values for the Korean case (4):

0 1 2 3 4 5 6 7

x 104

0

1

2

3

4

5

6

7

8

9x 10

4

M4

F1

F3

F2

F4

M3

M1, M2

ReferencesBoersma, Paul. 1997. “How We Learn Variation, Optionality

and Probability”. In IFA Proceedings 21, 43–58. Universityof Amsterdam: Institute for Phonetic Sciences.

Boersma, Paul. 1998. Functional Phonology. Doctoral Disser-tation, University of Amsterdam. The Hague: Holland Aca-demic Graphics.

Hayes, Bruce. 2004. “Phonological Acquisition in OptimalityTheory: The Early Stages”. In Constraints in PhonologicalAcquisition, ed. R. Kager, J. Pater, and W. Zonneveld, 158–203. Cambridge University Press.

Pater, Joe. 2008. “Gradual Learning and Convergence”. Lin-guistic Inquiry 39.2:334–345.

Prince, Alan, and Bruce Tesar. 2004. “Learning Phonotactic Dis-tributions”. In Constraints in Phonological Acquisition, ed.R. Kager, J. Pater, and W. Zonneveld, 245–291. CambridgeUniversity Press.

Tesar, Bruce, and Paul Smolensky. 1998. “Learnability in Opti-mality Theory”. Linguistic Inquiry 29:229–268.

Tessier, Anne-Michelle. 1995. “Biases and Stages in Phonotac-tic Learning”. Doctoral Dissertation, Umass Amherst.

2

Giorgio Magri

Page 39: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Modelling the acquisition vowel harmony with a lazy learner

I present a simple lazily-evaluated (Aha et al. 1991) model, ExPhon, of phonological acqui-sition and knowledge which learns a pattern of synchronic alternations that closely mimics themorphophonological phenomena that characterize left-to-right stem-controlled vowel harmony, in-cluding learning about opaque and transparent vowels.

Recently there has been a move away from “classical” rationalist theories toward data-drivenmethods in phonology. Among these models, we distinguish those that use eager versus lazymethods. The former use incoming data to update representations that explicitly describe globalgeneralizations (e.g. numerically weighted constraints, probability distributions, etc.) and then dis-card their input. In contrast, lazy algorithms retain all of their training data, deferring computationuntil query time, when a similarity-based analogical mechanism computes a local approximation tosome global function. The best-known lazy algorithms in linguistics are exemplar models. Orig-inally (re-)introduced in the psychological literature on categorization (Medin & Scha!er 1978,but cf. Semon 1921 for a view that anticipates many features of contemporary models), exemplarmodels were introduced to linguistics (cf. Johnson 1997, Pierrehumbert 2001, Kirchner & Moore2009 inter alia), via research in speech perception (cf. Goldinger 1996).1 The exemplar modelsimplemented to date in phonetics and phonology have largely focused on perception (e.g. speakernormalization in Johnson 1997), or on segment-internal diachronic processes (e.g. lenition in Pier-rehumbert 2001, chain shifts in Ettlinger 2007),2 leaving the types of phenomena that typicallyinterest “traditional” phonologists (e.g. productive, general “processes”) comparatively neglected.

ExPhon is a simple exemplar-based production model, which stores its input directly, andproduces output via a distance-weighted nearest-neighbour computation. Following Johnson (2005),I take it that stored instances are “of experiences”, and that we experience “word”-sized chunks inspeech, rather than e.g. features or segments. Hence, the basic unit of storage in the model is alabelled, word-sized formant trajectory. These are fixed-rate [F1, F2] trajectories of sequences ofCV syllables (see fig. 1), derived from locus equations for CV transitions (Sussman et al. 1998).Consonants are from the set [b d g] and vowels are [i e o u]3. Labels are compositional, signalling alexical category, one of two “cases”, and an optional “plural”. Hence, words in the pseudo-languagecome in four forms, nom-sg, nom-pl, acc-sg, acc-pl. The vowels in roots have all high or alllow F2 (akin to lexical front/back vowel harmony), the plural exponent has fixed F2, while the accexponent has an F2 that alternates to agree with that of the vowels that precede it.

On the basis of limited input data, the model learns to produce harmonically “correct” noveloutputs. More specifically, the model is able to generalize and produce correct morphologicallycomplex forms which it has not been exposed to in the input, e.g. an unknown case-markedform will be output with harmonically correct F2, including neutrality (opaque or transparent).Performance in the model improves mainly as a function of type (i.e. as more words are learned,cf. figs 1 & 2, which show both root form and case form changing as lexicon grows from 10 to 50words), although token frequency also a!ects performance, as the population of nearest-neighboursare tokens, and serve to clean-up the weighted averages taken in producing some output form.

1The linguistic view of these models also has a long history, tracing its origins to discussions of analogy and“apperception” in e.g Baudouin de Courtenay’s 19th century discussions of phonetics and phonology.

2Kirchner & Moore 2008 demonstrate what could be construed as a synchronic spirantization process3Modulo the absence of F3.

1

Fred Mailhot

Page 40: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Figure 1: Formant trajectory for “digebege” af-ter 10 di!erent words have been learned.

Figure 2: Formant trajectory for “digebege” af-ter 50 words have been learned.

References:

Aha, D., Kibler, D. & Albert, M. (1991). Instance-Based Learning Algorithms. Mach Learning, 6.Goldinger, S. (1996). Words and voices: Episodic traces in spoken word identification and recog-nitive memory. JEP:LMC, 22.Johnson, K. (2005). Decision and mechanisms in exemplar-based phonology. In Beddor, Sol’e,Ohala (eds).Johnson, K. (1997). Speech perception without speaker normalization: an exemplar model. InJohnson & Mullenix (eds).Kirchner, R. & Moore, R. (2009) Computing phonological generalizations over real speech exem-plars. Presented at LSA2009.Medin, D. & Scha!er, M. (1978). Context theory of classification learning. JEP:HLM, 7.Pierrehumbert, J. (2001). Exemplar dynamics: Word frequency, lenition, and contrast. ms.Semon, R. (1921). Mnemic Psychology.Sussman, H., Fruchter, D., Hilbert, J. & Sirosh, J. (1998). Lienar correlates in the speech signal.BBS, 21.

2

Fred Mailhot

Page 41: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Learning phonemes with a pseudo-lexicon

Infants acquiring their native language in the first years of life must overcome the variability

inherent to speech, which alters phonemes according to their phonological context. Two basic approaches

to modeling this acquisition process have been proposed: top-down models use learning algorithms that

converge on a phonological grammar which correctly derives word variants from underlying words

(Dresher & Kaye 1990; Tesar & Smolensky 1998), but do so by assuming that infants are already able to

segment speech into word forms. In bottom-up models (Peperkamp et al. 2006), phonological acquisition

is assumed to proceed without any lexical knowledge—infants first learn to undo phonological rules by

attending to the distribution of individual segments.

The top-down approach is effective, but unrealistic in its assumption that infants already know

words before they begin to learn phonology. The bottom-up approach avoids this problem, but, as we will

show, is only effective on data with a very small number of allophones. We propose a third type of model

in which the infant uses a crude, easily constructed approximation of the lexicon which reaps many of the

benefits of top-down models, while retaining the psycholinguistic plausibility of bottom-up models.

We tested three learning algorithms, one representing each type of model, on both a 7.5-million-

word corpus of spoken Japanese (Maekawa et al. 2000) and a nine-million-word corpus of Dutch (Corpus

Gesproken Nederlands), in which we implemented a number of artificial phonological rules. Our rules

convert each phoneme in the corpus into an allophone which is conditioned by the preceding and

following contexts. Several corpora were constructed, ranging from one in which each phoneme has two

allophonic variants, to one in which each phoneme has as many allophones as there are possible contexts,

in order to measure the effects of allophonic complexity on learning performance. Each algorithm takes as

input one of the corpora, and must decide which pairs of allophones belong to the same phoneme.

The first, bottom-up, algorithm simply computes the Kullback-Leibler divergence between the

distributions of each pair of allophones, on the assumption that near-complementary distribution (high

KL) indicates allophones belonging to the same phoneme (Peperkamp et al. 2006). The second, top-down,

algorithm is additionally given the actual word boundaries in the corpus. It then looks for pairs of word

forms that are identical except for the initial and final segments, e.g., WABCDX and YABCDZ, which are

interpreted as evidence that the segments W and Y are allophones of the same phoneme (likewise for X

and Z), on the assumption that the two word forms represent a single word occurring in two different

contexts. Any pairs of allophones for which no such word form pair exists are classified as belonging to

different phonemes. KL divergence is then computed for each pair that remains (i.e., each pair for which

there is at least one minimally-differing pair of word forms in the corpus). Finally, the third, n-gram

algorithm is identical to the word form algorithm except that it is not given the real word boundaries.

Instead, it compiles a list of the most frequent n-grams in the corpus, and uses this as an approximation of

the set of word forms (the tests described below used the most frequent 10% of all 4-, 5-, 6-, 7-, and 8-

grams occurring in the input corpus).

The graph in Figure 1 displays the results of each of the algorithms on a range of corpora for both

Japanese and Dutch. The y-axis gives the !-statistic for each algorithm, a discrimination measure which

ranges from 0.5 (chance) to 1.0 (perfect discrimination). The total number of unique allophones in the

corpus is listed on the x-axis. The bottom-up algorithm, using KL divergence alone, does well only on the

simplest corpus—performance falls off rapidly as allophonic complexity increases. Complementary

distribution, in other words, is only a effective way of identifying phonemes if there are very few

allophones—that is, if most of the job of learning phonemic categories has already been done. The top-

down algorithm, using actual word forms, performs much better, not surprisingly given the extra

information it is given. The n-gram algorithm, although not as effective as the word form filter, is

substantially more resistant to allophonic complexity than the KL measure alone.

This research thus demonstrates formally how lexical information can contribute to the learning

of phonemes, and shows how this information can be approximated even when the learner knows nothing

about actual word boundaries. It therefore represents an important step towards developing a learning

algorithm that is both psychologically plausible and effective on realistic data.

Andrew Martin, Sharon Peperkamp and Emmanuel Dupoux

Page 42: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Figure 1. Performance of allophone clustering (!-score) as a function of allophonic complexity measured by

number of bilateral allophones in the corpus, for three algorithms (KL alone, KL + word form filter, and KL + n-

gram filter), on Japanese input (left panel) and Dutch input (right panel). Each point represents the mean

performance of the algorithm on five randomly generated corpora of the same complexity.

References

Dresher, B. E., & Kaye, J. D. (1990). A computational learning model for metrical phonology. Cognition,

34(2), 137-195.

Maekawa, K., Koiso, H., Furui, S., & Isahara, H. (2000). Spontaneous Speech Corpus of Japanese.

Proceedings of LREC, 947-952.

Peperkamp, S., Le Calvez, R., Nadal, J., & Dupoux, E. (2006). The acquisition of allophonic rules:

Statistical learning with linguistic constraints. Cognition, 101(3), B31-B41.

Tesar, B., & Smolensky, P. (1998). Learnability in optimality theory. Linguistic Inquiry, 29(2), 229-268.

Andrew Martin, Sharon Peperkamp and Emmanuel Dupoux

Page 43: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Statistical and exemplar approaches to speech perception: What abilities must also develop?

Bob McMurray Dept. of Psychology, University of Iowa

Allard Jongman Dept. of Linguistics, Kansas University

Recently, there has been an explosion of statistical models for phonetic categories acquisition (Guenther & Gjaja, 1996; de Boer & Kuhl, 2003; Vallabha, et al, 2007; Gauthier, Shi & Xu, 2007; McMurray, Aslin & Toscano, 2009). Building on exemplar (Goldinger, 1998; Pierrehumbert, 2003) and statistical learning (Maye, Werker & Gerken, 2002) theory, these models posit that perceptual categories are defined by distributional statistics (e.g. mean and variance), which be extracted by a range of unsupervised learning devices as models of phonological acquisition.

Thus far, the emphasis has been learnability – do models learn the right categories (though there have been attempts to model the developmental time course [McMurray et al, 2009], cue-weighting [Toscano & McMurray, in press], and the perceptual magnet effect [Guenther & Gjaja, 1996]). Yet, there have been no attempts to either scale models up to realistic numbers of cues and categories; or to model listener performance on the stimuli used for training/testing. This asks a fundamental question: is extracting a statistical distribution of the input enough to characterize listener performance? If not, what other abilities must develop to reach adult perception?

To address this, we collected a corpus of 2873 exemplars of the 8 English fricatives (//) in the context of 6 vowels, spoken by 20 speakers. We measured 24 cues in the frication and the vowel: spectral moments, duration and amplitude components, formant frequencies, and pitch (see Jongman, Wayland & Wong, 2000). As closely as possible, this data-set should completely describe the statistics of these fricatives in a variety of noise-inducing contexts. 240 of these fricatives were presented to adult listeners in an 8AFC identification task, with and without the vocalic portion. This reveals the pattern of performance that an ideal statistical model should display. Listeners averaged 91.2% with the complete syllable, and 76.3% with the frication only. They were better at sibilants than non-sibilants and were affected by vowel context (Figure 1).

As a first step, we used multinomial regression to map the 24 cues onto the 8 categories (c.f. Cole, Linebaugh, Munson & McMurray, in press; Nearey, 1997). Such models are sensitive to the distributions of cues corresponding to the category, but, unlike prior statistical approaches, are supervised. Thus, the model represents the upper bound to what we may expect from a system that is sensitive to statistics, and has as much (if not more) information as possible. The model showed the same pattern of performance as listeners across fricatives, but only averaged 85% with the complete syllable (Figure 2), and 79.1% with the cues in the frication only (Figure 3). Thus, the model performed near listeners for frication alone, though not for the full CV. First-order statistics may not be sufficient to account for listener’s performance, even with an overly powerful model.

What then, would it take, for listeners to develop such performance? What does the vocalic portion contribute that was not available to the model? One possibility is that listeners used the vocalic portion to identify the vowel and/or speaker, in order to compensate for its effect on cues in the frication. We modeled this parsing using linear regression to partial out the effects of speaker and vowel on each cue prior to its inclusion in the logistic classifier (as in Cole et al, in press). This model performed substantially better (92.9%; Figure 4), and also matched the pattern of performance across vowels (which the earlier models did not).

Thus, simply representing categories in terms of the statistics of the input are not sufficient to model the end-state of development (listener performance), particularly in a high-dimensional task. However, statistically sensitive learning, which also includes the ability to account for the effect of other categorical factors (e.g. vowel/speaker) on the continuous cues, can do the job. Such top-down interactions may represent the next frontier in statistical approaches to phonological development, though how and when children do this is currently an open question.

Bob McMurray and Allard Jongman

Page 44: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

0

0.2

0.4

0.6

0.8

1

Fricative

% C

orr

ect

Complete Syllable

Noise Only

Chance

0

0.2

0.4

0.6

0.8

1

Fricative

% C

orr

ect

Complete Syllable

Noise Only

Chance

Figure 1: Listener performance across the 8 fricatives.

0

0.2

0.4

0.6

0.8

1

Fricative

% C

orr

ect

Listeners (Complete)

Exemplar Model

0

0.2

0.4

0.6

0.8

1

Fricative

% C

orr

ect

Listeners (Complete)

Exemplar Model

Figure 2: A comparison of listeners performance (dashed line) to the exemplar model in the complete fricative condition.

0

0.2

0.4

0.6

0.8

1

Fricative

% C

orr

ect

Noise Only

Exemplar - Frication Noise

0

0.2

0.4

0.6

0.8

1

Fricative

% C

orr

ect

Noise Only

Exemplar - Frication Noise

Figure 3: Performance of the exemplar model compared to listeners with only the frication noise.

% C

orr

ect

0

0.2

0.4

0.6

0.8

1

Fricative

Listeners (complete)

Parsing Model% C

orr

ect

0

0.2

0.4

0.6

0.8

1

Fricative

Listeners (complete)

Parsing Model

0

0.2

0.4

0.6

0.8

1

Fricative

Listeners (complete)

Parsing Model

Figure 3: Performance of the parsing model compared to the listeners based on complete syllables.

References

• Cole, J.S., Linebaugh, G., Munson, C., and McMurray, B. (in press) Vowel-to-vowel coarticulation in English: word boundaries, perceptual parsing, and implications for phonology. Journal of Phonetics

• de Boer, B. & Kuhl, P. K. (2003). Investigating the role of infant-directed speech with a computer model. Acoustic Research Letters Online, 4, 129-134.

• Gauthier, B., Shi, R., and Xu, Y. (2007) Simulating the acquisition of lexical tones from continuous dynamic input. Journal of the Acoustical Society of America, 121(5), EL190-195.

• Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251-279.

• Guenther, F. and Gjaja, M. (1996) The perceptual magnet effect as an emergent property of neural map formation. Journal of the Acoustical Society of America, 100, 1111-1112.

• Jongman, A., Wayland, R. & Wong, S. (2000) Acoustic characteristics of English Fricatives. Journal of the Acoustical Society of America. 108(3), 1252-1263

• Maye, J., Werker, J. F., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82, B101-B111.

• McMurray, B., Aslin, R. N., & Toscano, J. C. (2009a). Statistical learning of phonetic categories: Computational insights and limitations. Developmental Science, 12, 369-378.

• Nearey, T.M. (1997) Speech perception as pattern recognition. Journal of the Acoustical Society of America, 101(6), 3241-3254.

• Pierrehumbert, J. (2003). Phonetic diversity, statistical learning, and acquisition of phonology. Language and Speech, 46, 115-154.

• Toscano, J., and McMurray, B. (in press) Cue Integration with Categories: A Statistical Approach to Cue Weighting and Combination in Speech Perception. Cognitive Science.

• Vallabha, G. K., McClelland, J. L., Pons, F., Werker, J. F., & Amano, S. (2007) Unsupervised learning of vowel categories from infant-directed speech. Proceedings of the National Academy of Sciences, 104, 13273-13278.

Bob McMurray and Allard Jongman

Page 45: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Bob McMurray and Allard Jongman

Page 46: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Getting the features you want

Je! Mielke

In this talk I present work on modeling the acquisition of phonological distinctive featureson the basis of di!erent types of phonetic and phonological data that are thought to beavailable to the language learner.

Phonetic distance between segments is calculated using a similarity metric that takesinto account acoustic similarity, vocal tract shape, laryngeal activity, and airflow (the lastthree based on ultrasound, electroglottography, and aerodynamic measurement, respectively)(Mielke, 2009). Phonological information for particular languages is based on the patterningof segments in P-base (Mielke, 2008).

This approach to features involves the following assumtions: (1) Features are not neededto explain typology (and therefore at least some sound patterns can potentially be treatedas primary, not secondary, to features. (2) Features nonetheless play a role in organizationof grammars (e.g. in well-formedness judgments about novel structures (Albright, 2009),Bach tests, etc.). (3) Features can be learned inductively from various channels of phoneticinformation, phonological alternations, and phonotactics.

Many studies that involve modeling phonological acquisition have a priori features, andit is hoped that this project will enable other people working in this area to use the featurelearning algorithm either as a front-end (generating the features needed by models of phono-logical acquistion) or as a justification for assuming that a feature system will be availableto the language learner. As currently formulated, this model requires its own front-end, toprovide the segments on which it is based. Parsing the speech stream into phoneme-sizedsegments and identifying contextual variants with one another is a separate problem beingaddressed by ongoing research (Dillon and Idsardi, 2009; Peperkamp et al., 2006).

References

Albright, Adam. 2009. Feature-based generalization as a source of gradient acceptability.Phonology 26:9–41.

Dillon, Brian, and William Idsardi. 2009. Investigating statistical approaches to building aphonology. University of Maryland ms.

Mielke, Je!. 2008. The Emergence of Distinctive Features. Oxford: Oxford University Press.

Mielke, Je!. 2009. A phonetically-base phonetic similarity metric. Paper presented at NELS40.

Peperkamp, Sharon, Rozenn Le Calvez, Jean-Pierre Nadal, and Emmanuel Dupoux. 2006.The acquisition of allophonic rules: Statistical learning with linguistic constraints. Cogni-tion 101:B31–B41.

Jeff Mielke

Page 47: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Constraint induction and simplicity bias

Elliott Moreton, University of North Carolina

November 20, 2009

Segment-to-segment phonological dependencies in natural language tend to be assimi-latory or dissimilatory, i.e., they relate two distinct tokens of a single phonetic feature inan utterance. Six minimally-di!erent experiments in phonotactic pattern learning by En-glish speakers support the hypothesis that single-feature dependencies, even when they arenot phonetically grounded, are detected more readily than two-feature dependences, evenwhen they are phonetically grounded (corroborating similar findings by Wilson (2003) andMoreton (2008)). Where can this bias towards syntagmatic featural simplicity come from?

The problem is addressed using a generalization of a proposal originally applied toparadigmatic simplicity bias (e.g., the superior learnability of the opposition [p t k]/[b dg] over that of [p d k]/[b t g], Sa!ran & Thiessen 2003). The core of the idea is thatconstraints are first induced, then weighted or ranked (Hayes and Wilson, 2008), and that,because of interaction between the data patterns and the constraint inducer, simple patternsare supported by multiple overlapping constraints, leading to faster learning in MaximumEntropy learners, whereas more complex patterns must be learned piecemeal (Pater et al.,2008).

To achieve this goal, constraints are induced according to a schema which represents con-straints as subtrees of representations, accommodating any degree of prosodic and segmentalcomplexity and making every representation a constraint (Burzio, 1999). Syntagmatic fea-ture variables allow direct expression of Agree- and OCP-type constraints. The inducersearches the (very large) constraint space using an evolutionary algorithm (Eiben and Smith,2003). The result is that single-feature syntagmatic dependencies tend to be supported bymultiple overlapping constraints. (Several shortcomings will also be discussed.)

This treatment unifies the bias towards syntagmatically-simple patterns with that to-wards paradigmatically-simple ones, and points towards applications of constraint multiplic-ity and constraint generality in explaining learning and typology. More generally, it arguesfor shifting explanatory weight from a fixed constraint set to constraint schemas, generation,and testing (Boersma, 1998; Hayes, 1999; Smith, 2002, 2004; Boersma and Pater, 2007).

1

Elliott Moreton

Page 48: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

References

Boersma, P. (1998). Functional Phonology: formalizing the interactions between articulatoryand perceptual drives. Ph. D. thesis, University of Amsterdam.

Boersma, P. and J. Pater (2007, October). Constructing constraints from language data:the case of Canadian English diphthongs. Handout, NELS 38, University of Ottawa.

Burzio, L. (1999). Surface-to-surface morphology: when your representations turn intoconstraints. MS, Department of Cognitive Science, Johns Hopkins University. ROA-341.

Eiben, A. E. and J. E. Smith (2003). Introduction to evolutionary computing. Berlin:Springer.

Hayes, B. (1999). Phonetically driven phonology: the role of optimality in inductive ground-ing. In M. Darnell, E. Moravcsik, M. Noonan, F. Newmeyer, and K. Wheatly (Eds.),Functionalism and Formalism in Linguistics, Volume 1: General Papers, pp. 243–285.Amsterdam: John Benjamins.

Hayes, B. and C. Wilson (2008). A Maximum Entropy model of phonotactics and phono-tactic learning. Linguistic Inquiry 39 (3), 379–440.

Moreton, E. (2008). Analytic bias and phonological typology. Phonology 25 (1), 83–127.Pater, J., E. Moreton, and M. Becker (2008, November). Simplicity biases in structured

statistical learning. Poster presented at the Boston University Conference on LanguageDevelopment.

Sa!ran, J. R. and E. D. Thiessen (2003). Pattern induction by infant language learners.Developmental Psychology 39 (3), 484–494.

Smith, J. L. (2002). Phonological augmentation in prominent positions. Ph. D. thesis,University of Massachusetts, Amherst.

Smith, J. L. (2004). Making constraints positional: towards a compositional model of con.Lingua 114 (2), 1433–1464.

Wilson, C. (2003). Experimental investigation of phonological naturalness. In G. Gardingand M. Tsujimura (Eds.), Proceedings of the 22nd West Coast Conference on FormalLinguistics (WCCFL 22), Somerville, pp. 533–546. Cascadilla Press.

2

Elliott Moreton

Page 49: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

From Sound Change to Grammar Change: words, lexicons, and learners

This work describes the development of an explicit model of the emergence of a new phoneme category

over time. It takes certain standard assumptions from historical and theoretical linguistics, formalizes

them, and examines their consequences in terms of predictions for the shape of possible and likely

synchronic grammars. Specifically, this modeling work is applied to the case study of palatalization, a

common process both diachronically and synchronically, in which obstruent consonants adjacent to high,

front vowels acquire a secondary palatal articulation.

A number of competing claims have been made regarding what would (Kiparsky 2004; de Lacy &

Kingston 2006) or would not (Ohala 1993; Blevins 2004) be necessary to explain the existing typology of

phonological grammars. Unfortunately, the assumptions underlying such claims are not always stated or

made explicit enough to assess. In order to seriously address this question there are, to my mind, at least

three components that must be well specified. In the first place, even the simplest plausible model of the

distribution of synchronic grammars must include some characterization of sound change, as it is

universally acknowledged that languages change over time. Along with this, it seems necessary to

include some parameters regarding the likely shapes of lexicons, under the assumption that the latter will

act as the actual input to the learner. Uncontroversially, speakers learn their native languages, possibly

with help from a UG-derived filter, but certainly from the data they encounter. Thus, the model requires

us to make decisions about a learning algorithm of some kind.

I develop such a three part model in the present work, focusing mainly on the sound change component.

However, I also consider a number of different statistical measures that might comprise a language

learning algorithm. To begin I formulate the following set of explicit hypotheses about the nature of (a

class of internal) sound changes.

! New phoneme categories emerge through the erosion of existing categories

! Sound change proceeds probabilistically on a word by word basis, and bi-directionally via

misperception errors (in the current study k > kj ; kj > k)

! Pre-existing phonotactics affect the outcome of phoneme categorization

! Category size and location in acoustic space affect the outcome of phoneme categorization

Under a model of probabilistic sound change, unnatural phonotactics (e.g., palatal variants two segments

before /b/) may arise due to idiosyncrasies of the lexicon undergoing change and the particular words that

happen to contain the transformed segments. In simulations with randomly generated lexicons over

multiple generations of speakers such phonotactics do occur, but they are never more frequent than

natural ones, and never replace the natural patterns. ‘Anti-Markedness’ distributions which violate

universal implications (e.g. kje tokens but no kji tokens) have the potential to arise regularly when changes

are incomplete – at least in a probabilistic sense. But in fact, these probabilistic violations emerge only

under certain assumptions regarding how competition for phoneme categorization occurs (and in no

instance does a categorical implicational violation result). Moreover, under either set of assumptions, the

likeliest and most stable outcomes appear to be systems of phonemic contrast in which the kj variant

appears in relatively high (although far from uniform) numbers before all vowels.

The general model presented here, even though it lacks any inherent UG constraints, nevertheless does

not proliferate unnatural or anti-markedness outcomes. This result has important implications for how we

think about sound change and grammar change – not just in the cases where ‘unexpected’ patterns

emerge, but for any type of output grammar whatsoever. Primarily, this work accomplishes some of the

necessary initial mapping of the large space of parameters that can potentially affect theoretical

predictions (e.g., strength of phonotactic biases, dependence of categorization on acoustic distance, etc.).

Further work will continue to clarify exactly what combinations of hypothetical forces are insufficient,

and which are necessary to explain the observed data.

Rebecca Morley

Page 50: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Modeling the acquisition of anterior lingual sibilant fricatives in English: Integrating behavioral data with computational learning models

Benjamin Munson, in collaboration with Mary E. Beckman, Jan Edwards, Jeff Holliday, |Hannah Julien, and Fangfang Li

Phonological development involves the acquisition of knowledge at multiple levels of representation. In the earliest stages of acquisition, children accrue information about the distribution of sounds in the primary sensory domains of hearing, seeing, and feeling. This involves accrual both of the characteristics of sounds that individuals in the ambient-language environment produce, and those that the learner herself produces. As development progresses, individuals learn systematic mappings among these representations. Later phonological development sees the emergence of representations that parse this continuous variation into language-specific categories. The emergence of these representations appears to be strongly and reciprocally related to vocabulary growth (Beckman, Munson, & Edwards, 2007; Pierrehumbert, 2003). This talk will present on the results of a set of studies completed as part of a large, collaborative project modeling early phonological category acquisition. It will focus specifically on the studies examining the acquisition of /s/ and /Σ/. It is particularly challenging to model the acquisition of these sounds using previously published computational models of how vowel categories emerge in social interactions (e.g., Westermann & Miranda, 2004, Heintz et al. 2009), as the phonetic spaces associated with fricatives are much more complex than those associated with vowels. The first part of the talk reviews recent work by Li (2008) and Li, Edwards, and Beckman (2009), which showed that the acquisition of these sounds involves the gradual attainment of an adult-like contrast. The second part of the talk reviews work by Urberg-Carlson, Munson, and Kaiser (2008, 2009) which shows that adults are able to perceive the full range of variation in children's speech—from productions that are clear examples of /s/ and /Σ/ to those that are acoustically intermediate between those endpoints—when given a non-categorical response modality, such as a visual analog scale (VAS). In these tasks, adults heard a production and saw a two-headed arrow anchored with the text "the 's' sound" at one end and "the 'sh' sound" at the other, and clicked on the location on the line that corresponded to where they believed the fricative to fall perceptually on that dimension. Listeners' ratings were strongly correlated with the acoustic parameters that Li et al. had shown to differentiate between these productions. Subsequent work by Munson, Kaiser, and Urberg-Carlson (2008) and Kaiser, Munson, Li, Holliday, Beckman, and Schellinger (2009) demonstrated that these relationships are robust to differences in task difficulty. The third part of the talk presents preliminary results from an ongoing study examining the dynamics of the relationship between adults' perception of children's productions and the acoustic characteristics of the fricatives that they produce in hypothetical interactions with children. In that task, adults rated individual children's productions on a VAS scale, then produced the same word that they had just rated. They were instructed to produce the word as if they were responding to the child whose speech they had just rated. Our ongoing analyses examine whether adults respond systematically differently to productions that are clear, adult-like examples of the target sounds and ones that are perceived to be less adult-like forms. In particular, we are examining whether adults produce greater hyperarticulation in response to non-canonical variants of /s/ and /Σ/ than they do in response to forms that are perceived as adult-like. Such a finding would be predicted by Cristià's (2008) study of fricatives in child-directed speech, and would presumably facilitate vocabulary growth by giving the child phonetically more-distinct input. [Supported by NSF grants BCS-0729306, BCS-0729140, & BCS-0729277 and NIH grant DC02932]

Benjamin Munson, Mary E. Beckman, Jan Edwards, Jeff Holliday, Hannah Julien, and Fangfang Li

Page 51: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Beckman, M.E., Munson, B., & Edwards, J. (2007). The influence of vocabulary growth on developmental changes in types of phonological knowledge. In J. Cole & J. Hualde (Eds.), Laboratory Phonology 9 (p. 241-264). New York: Mouton de Gruyter.

Cristià, A. (2009). Individual variation in infant speech processing: Implications for language acquisition theories. Unpublished doctoral dissertation, Purdue University.

Heintz, I., Beckman, M.E., Fosler-Lussier, E., & Menard, L. (2009). Evaluating parameters for mapping adult vowels to imitative babbling. In Proceedings of Interspeech 2009 (p. 688-691).

Kaiser, E., Munson, B., Li, F., Holliday, J., Beckman, M., Edwards, J., & Schellinger, S. (2009). Why do adults vary in how categorically they rate the accuracy of children's speech? Poster presented at the spring 2009 meeting of the Acoustical Society of America. Also in Journal of the Acoustical Society of America, 125, 2753.

Li, F. (2008). The phonetic development of voiceless sibilant fricatives in English, Japanese, and Mandarin Chinese. Doctoral dissertation. Department of Linguistics, Ohio State University.

Li, F., Edwards, J., & Beckman, M. E. (2009). Contrast and covert contrast: The phonetic development of voiceless sibilant fricatives in English and Japanese toddlers. Journal of Phonetics, 37, 111-124.

Munson, B., Kaiser, E., & Urberg Carlson, K. (2008). Assessment of children's speech production 3: Fidelity of responses under different levels of task delay. Poster presented at the 2008 ASHA Convention, Chicago, 20-22.

Pierrehumbert, J. (2003). Phonetic diversity, statistical learning, and acquisition of phonology. Language and Speech, 46, 115-154.

Urberg Carlson, K., Kaiser, E., & Munson, B. (2008). Assessment of children's speech production 2: Testing gradient measures of children's productions. Poster presented at the 2008 ASHA Convention, Chicago, 20-22.

Urberg-Carlson, K., Munson, B., & Kaiser, E. (2009). Gradient measures of children's speech production: Visual analog scale and equal appearing interval scale measures of fricative goodness. Poster presented at the spring 2009 meeting of the Acoustical Society of America. Also in Journal of the Acoustical Society of America, 125, 2529.

Westermann, G. & Miranda, E.R. (2004). A new model of sensorimotor coupling in the development of speech. Brain and Language, 89, 393–400.

Benjamin Munson, Mary E. Beckman, Jan Edwards, Jeff Holliday, Hannah Julien, and Fangfang Li

Page 52: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Phonemic distr ibution of sounds as a basis for word boundary detection in 6- to 8-months-olds

How do 6 to 8 month-olds segment a stream of sounds into words? One possible solution is to build a model around the specific distribution of sounds of a language, that is, transitional probabilities (Saffran et al, 1996).

The constraints defined in such a model should all be based on general cognitive abilities without hiding any language specific information in the underlying architecture. As mentioned above the most important constraint is the inclusion of transitional probabilities in the model. Second, it is important to pin down the exact form of the unit of perception. From this unit, calculations of transitional probabilities will be accom-plished. Here the phoneme seemed best suited. Using phonemes allows for a more elaborated inquiry by controlling both possibilities: syllables as well as all other existing combinations of phonemes. Third, it is clear that babies have to memorize words for a longer period of time (Jusczyk/Aslin, 1995; Jusczyk/Hohne, 1997), so that, fourth, the most frequent ones (Shi et al, 2006; Jusczyk et al, 1994) can be mapped top-down (Bortfeld et al, 2005) onto an unknown speech input.

The last implementation of the model to be presented takes representative samples of a controlled size from CHILDES (MacWhinney, 1995), converts them into an IPA-format, deletes all stress information and white spaces. Now the transitional probabilities are calculated for each combination of phoneme chains rang-ing from 1 to 10 phonemes. White spaces are marked when a threshold value is reached. This value is a vari-able manipulating the sensitivity of the child and is defined between 0 and 1. Thus, a second loop runs through ten values of a threshold limit and outputs its maximum for each phoneme chain combination.

Once the value at the optimum of both variables (partial derivative of length of phoneme chain and sensi-tivity of the child), is calculated, we assume them to be fix for a given corpus size. From the by then seg-mented sample corpus, the most frequent items (differing between 5 and 30 ‘words’ when a constant func-tion is chosen) are selected and saved in a list ordered by length. Eventually the next corpus is input and processed as described above. Depending on a fifth variable encoding the delay of the mapping, the word list is either mapped onto the corpus or delayed to the next specified period.

The simulation allows measuring the exact size of the corpus that is necessary to make a maximum of correct segmentations as a function of the child's sensitivity and a corresponding unit of perception (fig. 3). The later, so the surprising result, does not matter as long as the sound chain varies somewhere between one and five phonemes. Therefore, for word segmentation, the exact form of the unit of perception provides no additional information. The infant's sensitivity, however, should be minimized. That means phoneme chains need to occur only a few times in the same environment, so that they are recognized as a lexical unit. It also means that even if a unit appears in different sound environments here or there, the infant would chose the most frequent one relative to all others as an lexical entry.

The most important discovery of the simulation is nonetheless an important property of the segmented corpus. It is indeed true that only about one third (if parameters are optimized one half) of all segmented words are correct and that this number cannot possibly account for a starting point in the segmentation process. At first sight this is a convincing argument, but searching the corpus for more detail, it becomes clear that the wrongly segmented corpora all encode some more important information which is not at all obvious looking at it superficially. The most frequent segmentations of the wrongly segmented corpus hap-pen to be lexical items.

Only that finding allows solving the segmentation problem since out of each representative corpus a cer-tain number of words can now be extracted reliably. This number is described by a function and encodes the number of words an infant should be able to memorize (fig. 1).i The simulation showed here that the best results are attained when the function is kept constant, that is, in each period/cycle (or for each new corpus) the same amount of words is memorized. Furthermore the number of words memorized from each corpus should not exceed 30 entries since the error function raises rather drastically from there on.

It is now possible to use the extracted words as a source of additional information to define undetected word boundaries. In particular, adding a recursive structure that would map the existing words onto the sound stream after the transitional probabilities are computed, is key to compensate the shortcomings of the probability calculations.

The simulation approves the correctness of applying the statistical approach to extract the first lexical items from running speech taking the phoneme combinations as its only source of input. According to this simulation, learning phonological inventory of a language is prerequisite to building a lexicon.

Hagen Peukert

Page 53: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

References: Bortfeld, Heather, Morgan, James L. und Golinkoff, Roberta Michnick: Mommy and me. In: Psychological

Science, Band 16(4): S. 298-304, 2005. Jusczyk, Peter W. und Aslin, Richard N.: Infants' detection of the sound patterns of words in fluent speech.

In: Cognitive Psychology, Band 29(1): S. 1-23, 1995. Jusczyk, Peter W. und Hohne, Elizabeth A.: Infants' memory for spoken words. In: Science, Band 277: S.

1984-1986, 1997. Jusczyk, Peter W., Luce, Paul A. und Luce, Jan Charles: Infants sensitivity to phonotactic patterns in the

native language. In: Journal of Memory and Language, Band 33(5): S. 630-645, 1994. MacWhinney, Brian: The CHILDES Project: Tools for Analyzing Talk. Lawrence Erlbaum, Hillsdale, 1995. Saffran, Jenny R., Aslin, Richard N. und Newport, Elissa L.: Statistical learning by 8-month-old infants. In:

Science, Band 274(5294): S. 1926-1928, 1996. Shi, Rushen S., Cutler, Anne, Werker, Janet und Cruickshank, Marisa: Frequency and form as determinants

of functor sensitivity in English-acquiring infants. In: Journal of the Acoustical Society of America, Band 119(6): S. El61-El67, 2006.

 figure 1: growth of words per cycle (actual and ideal) 

 

figure 2: Optimal parameter combination (length of phoneme chain and child's sensitivity) at given corpus sizes 

                                                           i 

5,120311,3551,12)( 8

5

ef

  (for 2≤ω≤∞, 10≤ζ≤30), whereas ζ is the function specifying the words to be memorized and ω is the number of corpora or cycles passed without regress to a lexicon.

 

Hagen Peukert

Page 54: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Predicting variation in the order of acquisition of morphophonological patterns. Janet B. Pierrehumbert Northwestern University Linguistics Department and Northwestern Institute on Complex Systems. Morphophonological alternations are acquired as statistical generalizations over pairs (or sets) of related words [1, 2, 3, 4]. For example, productive palatalization of /t/ before the suffix -ion depends on learning such word pairs as create, creation and generate, generation. This now well-established observation -- which is common to analogical models and to models with statistical rules or constraints -- entails that word frequency is one predictive factor for the order of acquisition of different patterns. Learning a general pattern depends on encountering, and learning, a sufficiently large sample of words that exemplify that pattern, and high frequency words are by definition more likely to be encountered sooner during language acquisition than low frequency words. For patterns found chiefly in the erudite vocabulary (at frequencies of 1 per million or below), the word familiarity data in the Hoosier lexicon [5] suggest that even college students may not know many of the words on which generalizations might be based. Experimental studies of the productivity of such patterns indeed typically find extensive individual differences. Frequency cannot fully determine the order of acquisition of different patterns. For the acquisition of phonemes, intrinsic articulatory difficulty also plays a significant role [6]. Analogous observations can be made at more abstract levels. For example, in learning the contrast between the English compound and phrasal stress patterns, a morphosyntactic pattern which is acquired quite late, the cognitive complexity of the relationship between the prosody and the meaning appears to be a bottleneck [7]. In short, frequencies generate probabilistic lower bounds on the vocabulary level needed for acquisition (or Ioosely, on the age of acquisition). Ordering predictions take the form "INFREQUENT is unlikely to be learned before FREQUENT." By treating the additional factors as random variables, and viewing sampling effects as another source of interspeaker variation, we end up with an overall picture in which the average order of acquisition for a set of patterns, as determined by the least frequent words needed for a minimal sample to learn each one, is modulated by the uncertainty surrounding the predicted value. In this paper, I explore a new prediction about a source of variation in the order of acquisition. Contrary to naive expectation, the overall frequency of a word is not a very good predictor of the likelihood that one will encounter that word in a finite sample. Some words are very bursty, occurring repeatedly in a short sample and not occurring for a long time in a subsequent or different sample; these words typically represent the topic of discourse and make good keywords for document retrieval [8, 9]. Other words are less bursty, and some even come close to the distribution one would expect from a naive bag-of-words model, in which each word is an independent random sample from the lexicon. In a systematic analysis of semantic factors governing word burstiness in a large database of Usenet posts, Ref [10] observe that some morphological processes (such as the derivation of abstract nouns from verbs) typically yield words that are more bursty than their stems, whereas others (such as derivation of sentential adverbs from adjectives) typically yield words that are less bursty. A corollary of this

Janet Pierrehumbert

Page 55: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

observation is that some morphophonological patterns, such as palatalization before -ion, should exhibit greater interspeaker variability for the same overall word frequency statistics than others. This prediction is made precise by applying the stretched exponential scaling law developed in Ref [10]. Overall, the study contributes to a program of research in phonology that goes beyond average statistical patterns by seeking to explain the extent and nature of variability as well. References: [1] Bybee, J. L. (2001) Phonology and Language Use. Cambridge University Press. [2] Hay, J. B. and R. H. Baayen (2005) Shifting paradigms: gradient structure in morphology. Trends in Cognitive Sciences. [3] Pierrehumbert, J. (2003) Probabilistic Phonology: Discrimination and Robustness. In R. Bod, J. Hay and S. Jannedy (eds.) Probability Theory in Linguistics. The MIT Press, Cambridge MA, 177-228. [4] Daland, R., A. Sims, and J. B Pierrehumbert (2007) Much ado about nothing: a social network model of Russian paradigmatic gaps. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics in Prague, Czech Republic, June 24th-29th, 2007. http://www.aclweb.org/anthology/P/P07/P07-1118.pdf [5] Nusbaum, Howard C., David B. Pisoni & Christopher K. Davis. 1984. Sizing up the Hoosier Mental Lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report No.10 (Speech Research Laboratory, Indiana University. Bloomington). 357-376. [6] Edwards, J. & Beckman, M.E. (2008). Some cross-linguistic evidence for modulation of implicational universals by language-specific frequency effects in phonological development. Language, Learning, and Development, 4, 122-156. [7] Vogel, I. and E. Raimy (2002) The acquisition of compound vs. phrasal stress : the role of prosodic constituents. J. Child Lang. 29, 225-250. [8] Church KW, Gale WA (1995) Poisson mixtures. Nat Lang Eng 1:163-190. [9] Gries, S. T.h. (2008) Dispersions and adjusted frequencies in corpora International Journal of Corpus Linguistics, Volume 13, Number 4, 2008 , pp. 403-437(35) [10] Altmann, E. G., J. B. Pierrehumbert and A. E. Motter (2009) Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words . PLoS ONE (11 Nov 2009 | doi:10.1371/journal.pone.0007678) http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007678

Janet Pierrehumbert

Page 56: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Janet Pierrehumbert

Page 57: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

A Darwinian Account of Underrepresentation of Doubly Marked Forms

In some languages words with two marked phonological structures are rarer than the joint probability of their component structures would predict (Frisch 1996; see also Albright 2008). For example, a language that allows onsetless syllables and allows codas may have fewer VC syllables than would be expected if the chances of having an onset and having a coda were completely independent. These cases demonstrate a ‘complexity effect’, where doubly-marked forms are avoided by virtue of the markedness of their components rather than by any substantively motivated prohibition on their co-occurrence (distinguishing such examples from similarity-based co-occurrence restrictions, e.g., OCP-Place; Frisch 1996). We propose that learners are not sensitive to the statistical under-representation of doubly-marked structures on the basis of the fact that grammatical mechanisms for encoding them are problematic. We demonstrate that the unexpectedly low lexical frequencies of such forms emerges over time in a model of lexical competition, despite learners’ continued insensitivity to their actual frequency. Languages that exhibit complexity effects present a challenge to theories of phonotactics that model the well-formedness of marked structures independently, whether through linear constraint interaction (as in OT, HS, or Maxent) or local probability calculation (as in a bigram model). Since such models do not differentiate between marked structures in doubly-marked and singly-marked forms, they predict doubly-marked forms to be no worse than the combination of their components. Non-linear interactions of the relevant sort could be captured with Local Constraint Conjunction (as proposed in Levelt & van de Vijver 2004) or with the addition of arbitrarily complex phonotactic constraints, but both of these solutions pose problems for acquisition. There is no general theory of when and how conjoined constraints are learned, and Pierrehumbert (2003) argues more generally against the feasibility of complex constraints on the grounds that they require more evidence than a learner is likely to encounter. Furthermore, examples of doubly-marked structure avoidance invariably take the form of statistical under-representation, with no known cases of alternations that are sensitive to one marked structure vs. two, calling into question whether these facts should be encoded into grammatical representations. If they are not encoded directly the learner is free to learn with the simpler independence assumption in place, but this raises the question of how such facts have emerged in the lexicons of unrelated languages. We propose that apparent complexity effects are better accounted for as a consequence of lexical competition over time (following Martin 2007). Competition among forms of varying markedness produces over successive generations a steadily lower observed likelihood of a doubly-marked structure relative to its expected likelihood given the independent probabilities of each of its marked components. The frequencies of all relatively marked structures decline over time as predicted (Martin 2007), while the probability of the doubly-marked structures declines faster than either marked structure alone and thus also faster than the joint probability of the component structures. In our implementation lexical items compete for preservation at each generation, with the likelihood of two forms competing scaled relative to their current lexical frequencies, as in (1), and the probability of a form winning calculated based on the well-formedness of the two competitors, as in (2). Each generation an updated frequency for each form is determined by summing the product of (1) and (2) for each possible competitor. Well-formedness is learned at the start of each generation on the basis of the current lexical distribution, but crucially is sensitive only to the independent probabilities of each marked structure.

In sum, our proposal accounts for the under-representation of doubly-marked forms without the need for a mechanism encoding the non-independence of marked phonological structures. Our model predicts that learners are not obliged to develop special constraints for a particular under-represented co-occurrence of two marked structures because its statistical under-representation falls out as a consequence of its being less likely to enter into competitions with other lexical items and its propensity to lose when it does.

Kathryn Pruitt, Brian Smith, Andy Martin, and Joe Pater

Page 58: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

A Darwinian Account of Underrepresentation of Doubly Marked Forms

Examples (1) Likelihood of competition between X and Y

p (X v. Y) = frequency(X) * frequency(Y) (2) Likelihood of X out-performing Y in a direct competition

p (X > Y) = well-formedness(X) / [well-formedness(X) + well-formedness(B)] References Albright, A. 2008. From clusters to words: Grammatical models of nonce word acceptability.

Handout of talk presented at LSA, Chicago, IL. Jan 2008. Frisch, S. 1996. Similarity and frequency in phonology. Doctoral dissertation, Northwestern

University. Levelt, C. & R. van de Vijver. 2004. Syllable types in cross-linguistic and developmental grammars.

In R. Kager, J. Pater, and W. Zonneveld (Eds.), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press.

Martin, A. 2007. The evolving lexicon. Doctoral dissertation, UCLA. Pierrehumbert, J. 2003. Probabilistic phonology: Discrimination and Robustness. In R. Bod, J. Hay,

and S. Jannedy (Eds.), Probabilistic Linguistics. MIT Press.

Kathryn Pruitt, Brian Smith, Andy Martin, and Joe Pater

Page 59: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Learning Hidden Metrical Structure with a Log-Linear Model of Grammar

Log-linear grammar is a probabilistic extension of Optimality Theory (OT; Prince and Smolensky 1993),

or more directly, of Harmonic Grammar (HG; see overviews in Smolensky and Legendre 2006, Pater

2009). Also known as Maximum Entropy grammar, it was originally proposed for syntax by Johnson

(2002), and subsequently applied to phonology by Goldwater and Johnson (2003), Wilson (2006), Jäger

(2007), and Hayes, Zuraw, Síptar and Londe (2008), amongst others. Log-linear models have a longer

history in statistics and in NLP, and their current popularity in generative linguistics largely stems from

the availability of provably convergent learning algorithms (cf. other stochastic versions of OT).

The literature on log-linear grammar sometimes refers to these convergence guarantees without

mentioning an important caveat: that they hold only if the learner has access to the full structure of the

learning data (on this caveat for the OT Constraint Demotion Algorithms, see Tesar and Smolensky 2000,

and for log-linear learning in NLP, see Riezler 2000). Eisenstat (2008) provides a general model for the

learning of hidden structure in the log-linear framework, and shows that it succeeds on cases of learning

of phonological underlying representations (URs). It remains unknown the extent to which language

learning problems create local maxima that can trap such a learner, and the extent to which these local

maxima can be avoided by applying existing techniques for unsupervised learning.

Hidden metrical structure poses well-known challenges for previous approaches (e.g. Tesar and

Smolensky 2000, Boersma and Pater 2008). A toy example of a case that has not been studied before is

shown in Table 1. Here the hidden structure is the moraicity of the coda: an overt [cvc] syllable can either

be syllabified as bimoraic (1a,c) or monomoraic (1b,d). Weight-By-Position (W-B-P; Hayes 1995)

demands the bimoraic structure; the other constraints demand stress on a heavy syllable (W-T-S; Weight-

to-Stress; Prince 1990), and stress on the initial (S-1) or final syllable (S-2). In both target languages,

stress is on the initial syllable in a [cvcv] word (2), and on the final syllable of a [cvcv:] word (3). In L1,

[cvcvc] patterns with [cvcv:], thus requiring the bimoraic parse (1a), and in L2, it patterns with [cvcv],

requiring the monomoraic (1d).

Our learner aims to maximize the probability of the overt forms in the learning data. To do so, it

maximizes the sum of the probabilities of all corresponding full structures. Evaluation in learning is thus

solely concerned with overt forms, not surface structure/ hidden structure pairs.

A local maximum for a learner of L2 can be created by giving sufficient weight to W-B-P, thus

effectively removing the possibility of the monomoraic analysis for [cvc]. However, regularization, which

assigns a penalty that grows as weight values get higher, allows the learner to discover a solution. In

changing the shape of the learning space, regularization can restore convexity to these learning problems

(see Smith and Eisner 2006 on regularization and unsupervised learning in NLP).

We have also tested our learner on a benchmark set of learning problems developed by Tesar and

Smolensky (2000). The problem set consists of 124 languages that can be generated by 12 metrical

constraints. The learner is given the stress patterns, and must infer the correct prosodic structure. The goal

for log-linear learning was to maximize the probability of the observed data, subject to a Gaussian

regularization prior with !=1. The L-BFGS optimization algorithm found constraint weights that assigned

highest probability to the observed stress pattern in 88% of the cases, averaging over languages and word

types.

In small problems of the type illustrated in Table 1, our learner succeeds consistently. Under

regularization this success does not depend on the starting condition. The fact that our learner is not

completely successful on the 124 languages could reflect a problem with the grammatical assumptions,

the learning algorithm, or might not even be a problem at all, if these languages are unattested (Boersma

2003). In ongoing research, we are 1) examining whether these languages are found cross-linguistically,

2) exploring Smith and Eisner's Minimum Risk Annealing, which, by gradually lowering the variance

term in regularization, has been shown to avoid local maxima, and 3) investigating a constraint set that

operates over grid-based representations (Prince 1983), which posits less hidden structure than the foot-

based system of Tesar and Smolensky (2000).

Jason Naradowsky, Joe Pater, David Smith, and Robert Staubs

Page 60: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Input L1 L2 Overt Full W-B-P W-T-S S-1 S-2

1. cvcvc 1 cv'cvc a. cvµ'cvµcµ -1

b. cvµ'cvµc -1 -1

1 'cvcvc c. 'cvµcvµcµ -1 -1

d. 'cvµcvµc -1 -1

2. cvcv cv'cv a. cvµ'cvµ -1

1 1 'cvcv b. 'cvµcvµ -1

3. cvcv: 1 1 cv'cv: a. 'cvµcvµµ -1

'cvcv: b. cvµ'cvµµ -1 -1

Jason Naradowsky, Joe Pater, David Smith, and Robert Staubs

Page 61: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Phonotactic learning without a priori constraints: a connectionist analysis of Arabic cooccurrence restrictions

Summary. In this paper, we develop a connectionist model of phonotactic learning and apply it to the problem of learning root cooccurrence restrictions in Arabic. Two types of connectionist networks are developed: a multilayer network with a hidden layer and a single layer network with recurrent connections. They are both shown to classify Arabic words and nonwords in ways that are qualitatively parallel the behavior of experimental subjects in psycholinguistic studies of Arabic. In these networks, units and connections act like soft constraints in the computation of acceptability scores. Because these constraints are malleable and can change gradually over time, the networks learn phonotactic generalizations without requiring the prior existence of a list of possible phonotactic constraints, a fact that sets this model apart from many phonotactic learners. Arabic cooccurrence restrictions. Our simulations attempt to capture the gradient dissimilatory patterns of Arabic consonantal triliteral roots. In particular, we seek to analyze the phonotactic generalization that roots composed of more than one consonant within the same place class are avoided (McCarthy 1986, Pierrehumbert 1993). Our simulations described below produce acceptability scores that rate roots in ways that can be compared with a wordlikeness study of Arabic (Frisch and Zawaydeh 2001) shown in (1). A goodness of fit is found for a model if it (i) captures all and only the significant effects on acceptability from these three experiments, and (ii) the percentage of the variation explained, measured by the r2, is approximately the same. Multilayer Network (MN). Our first network is a multilayer deterministic network that takes Arabic roots as input and outputs a single score, a value between -1 and 1. The input to the network is distributed representation made up of a sequence of three segments, where each segment is a string of 17 values, either -1, 0 or 1, corresponding to the feature specifications assumed in Frisch et al. 2004. The network was trained using backpropagation with the goal to output 1 when an attested root was input and -1 when a randomly generated root (with the correct segment probabilities) was input. The mature network was then tested on the unattested roots that Frisch et al. used in their psycholinguistic study. Figure 1 below shows how the network rated representative sets of trilateral roots. For the network with 2 or 5 nodes in the hidden layer, the simulated acceptability scores showed a strong qualitative match with the worklikeness ratings of Frisch and Zawaydeh’s subjects. For both the network and the experimental subjects: (i) a large percentage of the acceptability scores were predicted by the presence or absence of two consonants within the same place class, (ii) number of neighboring attested roots and expected probability of the root (according to segment probabilities) were not significant correlated with acceptability, and (iii) there was a significant distinction between systematic gaps and accidental gaps in consonant cooccurence patterns. Recurrent Network (RN). The second connectionist network was developed to show that the above results do not depend on the specific assumptions of the Multilayer Network. Our Recurrent Network is composed of a single layer similar to the input layer of the multilayer network. It is fed by an external input representation, and the output is the activation vector of the network at equilibrium. Instead of a hidden layer, generalizations are encoded in a set of recurrent connections between the units of different segments. That is, each node of segment X is connected to all the other nodes that encode the two other segments besides segment X. Training of the network involves the successive presentation of attested Arabic roots during which the weights of recurrent connections are adjusted using the Delta Rule, a standard way of implementing gradient descent learning connectionism. A new measure of acceptability was calculated for the network by measuring the length of the distance between the external input and output vectors, essentially relating overall acceptability with the ability of the network to produce an output that is similar to the input. The mature network produced acceptability scores for Frisch and Zawaydeh’s unattested roots that again largely matched the experimental data, though being even more sensitive to the homorganic consonant restriction than experimental subjects. References: Frisch, Stefan, and Zawaydeh, Bushra. 2001. The psychological reality of OCP-Place in Arabic. Language 77: 91-106.

John Alderete, Paul Tupper, and Stefan Frisch

Page 62: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Phonotactic learning without a priori constraints: a connectionist analysis of Arabic cooccurrence restrictions

McCarthy, John J. 1986. OCP Effects: Gemination and antigemination. Linguistic Inquiry 17:207-263. Pierrehumbert, Janet. 1993. Dissimilarity in the Arabic verbal roots. In NELS 23, 367-381. (1) (1) Experimental results of Frisch and Zawaydeh 2001

a. Experiment 1. Is the homorganic cooccurrence restriction (a.k.a., the OCP) psychologically real, and not just an effect of lexical statistics? • independent variables: OCP violations, expected probability, neighborhood density • results/conclusion: significant effect of OCP found on wordlikeness ratings, no other effects found

and no interactions; OCP accounts for approximately 30% of subject variability b. Experiment 2. Do subject ratings distinguish between systematic gaps (OCP violations) and accidental gaps (non-OCP violating, rare consonant combinations)? • controlled variables: expected probability and neighborhood density • variables balanced in stimuli set: bigram probablility • result/conclusion: OCP had a significant effect on wordlikeness ratings, accounting for

approximately 21% of subject variability; so subjects distinguish between systematic and accidental gaps

c. Experiment 3. Do subject acceptability judgments exhibit different degrees of OCP violation that correlate with different degrees of featural similarity? • variables balanced in the stimuli: expected probability, neighborhood density, and bigram probability • independent variable: similarity of phonological features • result/conclusion: similarity had a significant effect on wordlikeness rating (approximately 20% of

subject variability); OCP is gradient Fig. 1. Acceptability scores given by one trial of the feedforward network with 5 hidden nodes. Box plots indicating minimum, first quartile, median, third quartile, and maximum scores are shown for four groups of triliteral roots: (a) all attested roots, (b) all possible roots, (c) roots from Experiment 1 with no OCP violation, (d) roots from Experiment 1 with an OCP violation.

John Alderete, Paul Tupper, and Stefan Frisch

Page 63: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

The emergence of natural vowel patterns in a phonetic/phonological acquisition model

Explaining patterns in vowel inventories. Large variation exists between the vowel configurations of different languages, yet a number of common patterns can also be identified in them. For example, vowels tend to be located at the periphery of the available phonetic space. There is also a preference for symmetric systems, i.e. with the same number of back and front vowels (Schwartz, Boë, Vallée & Abry 1997). Increasingly sophisticated computer models (e.g. Liljencrants & Lindblom 1972; de Boer 2001) have shown that trends in vowel systems are largely explained through maximal dispersion of vowels in the available phonetic space, eliminating the need for a prior innate bias toward certain configurations.

However, the above models of vowel dispersion are not directly compatible with current phonological theory; an exception is the exemplar-based model of Wedel (2006). I propose an alternative Optimality Theoretic (OT) / Harmonic Grammar computational model of vowel systems, in which a population of simulated language learners (agents) is equipped with a bidirectional grammar of stochastically ranked constraints and an error-driven learning algorithm (Boersma 1997).

Emerging inventories in a cue constraint grammar. In this model, agents perceive and produce phonemes by mapping between two levels of representation: a phonetic 'articulatory-acoustic' form, and a phonological 'surface form'. The phonetic forms are encoded as a pair of values representing first formant F1 and effective second formant F`2 (Bladon 1983). In perception, cue constraints (Boersma & Escudero 2008; Boersma 2009) formulated as “phonetic value [x] is not phoneme /y/” decide on the winning form. In production, these cue constraints are supplemented by articulatory constraints, formulated “do not articulate a sound with phonetic value [x]”. The rankings of the cue constraints are language-specific and change through learning; the rankings of the articulatory constraints are fixed, as they encode anatomic constraints on the available vowel space. Fig. 1 shows an example production tableau.

Agents in the simulation interact by producing and perceiving vowels through these grammars. When a perception error occurs, i.e. an agent perceives a different phoneme than was intended by the speaker, the listener shifts the constraint rankings slightly to reduce the probability of such perception errors in the future. In this manner, the agents acquire a stable shared language that has the vowel phonemes dispersed in F1/F'2 space, as was the case for 1-dimensional sibilant contrasts in Boersma & Hamann (2008). Fig. 2 shows an example of a five-vowel system that emerged from a simulation run: here the result resembles a configuration often found in natural language, the set {/i/, /e/, /a/,/ o/, /u/}.

Testing processing strategies: the case of F1 symmetry. Having thus shown that a bidirectional cue constraint grammar is able to explain dispersion in F1/F'2 vowel space, we can now use the model as a tool to explore how phonetic information should be represented in a phonological grammar. The grammar as exemplified in Fig. 1 has separate cue constraints for F1 and F'2, implying that both play a distinct role in phonological perception. However, another possibility is that vowels are processed holistically, with cue constraints operating on unique F1/F`2 combinations. A third option is that weighted F1/F'2 constraints are processed, i.e. that Harmonic Grammar is a more appropriate model than the strict ranking of OT.

In some cases these different processing strategies give different predictions as to which vowel configurations are stable and learnable. Natural inventories show a tendency for front-back vowel pairs within an inventory to be symmetrical on the phonological feature of height, which translates to their having a nearly identical distribution on the F1 continuum (Lindau 1975). In this model, F1 symmetry is unmaintainable in the case of separately ranked cue constraints, or unstable in the case of harmonic evaluation. Holistic evaluation on the other hand is able to learn systems with symmetric F1 properties.

Conclusion. The model shows that phonetically-based explanations of vowel systems can be integrated into an existing OT model of perception and production. Moreover, a comparison of different processing strategies indicates that holistic rather than formant-based processing of vowels is better able to account for the phenomenon of vowel height symmetry. This showcases that integrating phonological theory into these types of computational simulations can result in novel, linguistically relevant results.

Jan-Willem van Leussen

Page 64: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

/A/* [F!2=10]

/B/

*[F1= 4 ]

/B/

*[F1= 4]

/A/

*[F!2=9]

/B/

*[F1= 5]

/B/

*[F1=5,

F!2=9]

*[F!2=10]

/A/

*[F!2=9]

/A/

*[F1=5]

/A/

*[F1=4,

F!2=9]

*[F1=5,

F!2=10]

*[F1=4,

F!2=10]

[!"=4,

F'2=9]*! * *

[!"=4,

F'2=10]*! * *

[!"=5,

F'2=9]*! * *

![!"=5,

F'2=10]

* * *

Figure 1: a simplified example production tableau (the phonemes are arbitrarily named A and B). In this particular evaluation, production maps from a phonemic input form /A/ to a phonetic output form [F1=5 Bark, F'2=10 Bark]. The first two candidates violate a cue constraint for the phoneme /A/, whereas the third candidate violates an articulatory constraint.

Figure 2: production plots showing the transition from a chaotic, confusing five-vowel language to a stable, realistic situation. (left; one agent's mean productions after 10,000 iterations of the algorithm; right: another agent's productions after 100,00 iterations). Shading indicates articulatory ease as encoded in the articulatory constraints; ellipses encode one and two standard deviations from the mean.

BibliographyBladon, A. (1983). Two-formant models of vowel perception: Shortcomings and enhancements. Speech

Communication 2. 305-313.de Boer, B. (2001). The Origins of Vowel Systems. Oxford Linguistics. Oxford University Press.Boersma, P. (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of

Phonetic Sciences of the University of Amsterdam 21. 43–58.Boersma, P. (2009). Cue constraints and their interactions in phonological perception and production. In

P. Boersma & S. Hamann (eds.): Phonology in perception. Berlin: Mouton de Gruyter. 55-110. Boersma, P., and Escudero, P. (2008). Learning to perceive a smaller L2 vowel inventory: an Optimality

Theory account. In P. Avery, E. Dresher & K. Rice (eds.) Contrast in phonology: theory, perception, acquisition. Berlin & New York: Mouton de Gruyter. 271–301.

Boersma, P. & S. Hamann (2008). The evolution of auditory dispersion in bidirectional constraint grammars. Phonology 25. 217-270.

Liljencrants, J. & B. Lindblom (1972). Numerical simulation of vowel quality systems: the role of perceptual contrast. Language 48. 839–862 (1972).

Lindau, M. (1975): [Features] for vowels [UCLA Working Papers in Phonetics 30].Schwartz, J., L. Boë, N. Vallée & C. Abry (1997). Major trends in vowel system inventories, Journal of

Phonetics 25. 233-253.Wedel, A. (2006). Exemplar models, evolution and language change. The Linguistic Review 23. 247–274.

Jan-Willem van Leussen

Page 65: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Functional load and feedback stabilization of phonemic category distinctions

I am interested in processes influencing maintenance and loss of contrast between categories that

are primarily behaviorally-defined. Many categorial distinctions are supported by perceptually

stable facts-about-the-world, as in the differences between the categories of water and air. At the

other extreme however, we find categories such as phonemes that seem to function behaviorally

through the very fact of their difference. For example, there is nothing particularly natural about a

given boundary between adjacent vowels, nor any significant perceptual discontinuity that would

by itself support multiple categories across the vowel space. Why doesn’t random noise in

acquisition and usage rapidly erode these categorial distinctions? Instead, even though the

phonetic properties that map to a particular category can shift over time, the number of categorial

distinctions in a language often remain quite stable through change, as in the case of chain-shifts

(Hock and Joseph 1996, Kirchner 1996).

A long-standing intuition concerning ‘functional load’ holds that the greater the contribution

a particular phonemic category makes to overall lexical contrast, the less likely it is to be lost over

cycles of acquisition and usage (Martinet 1955, Hockett 1955). However, it has been notoriously

difficult to find satisfactory tests of this hypothesis within natural language data (King 1967,

Surendran & Niyogi 2006). Working instead with computational simulations of toy models, I

have shown that predictions of the functional load hypothesis can be successfully modeled if we

assume rich lexical memory. The abstract properties required by this rich-memory model of the

lexicon are:

1. Storage of some degree of non-contrastive detail of experienced tokens (reviewed in

Johnson 1997) at multiple levels of analysis (Bybee 2002), here modeled through an

exemplar-based computational architecture (Pierrehumbert 2001);

2. Feedback between perception and production behavior (e.g., Goldinger 2000, Oudeyer

2002);

3. A bias in perception toward local category centers (Kuhl 1991, Goldstone et al. 2001).

In previous work (Wedel 2004, 2006) I have shown that within this general architecture, sounds

that are contrastive in a subset of words tend to persist in their canonical form even in words in

which they carry little contrast. Sounds that are not contrastive, or contrastive in very few forms,

tend to be neutralized over time. However, these toy systems involved only a few tens of words.

Here I will show that the results of the toy simulations are maintained when the simulation is

expanded to a considerably larger lexicon.

Because predictions arising from the simulations are difficult to test in natural language, I

have also begun testing an in vivo analogue of the computational model in the spirit of Kirby et

al. 2008. This novel experimental paradigm in essence integrates subject category acquisition and

subsequent usage as components of a larger computational ‘simulation’ in order to more directly

test hypotheses about the ways learned categorization bias may influence category evolution over

time. I will present experimental results within this paradigm that are consistent with a role for

acquired perceptual bias (3. above) in maintenance of categorial distinctions over time. Finally, I

have recently begun testing the larger functional load hypothesis within this experimental

paradigm and will provide a progress report on this front.

Andrew Wedel

Page 66: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

References

Bybee, J. L. (2002). Word frequency and context of use in the lexical diffusion of phonetically

conditioned sound change. Language Variation and Change 14, 261-290.

Goldinger, S. D. (2000). The role of perceptual episodes in lexical processing. In A. Cutler, J. M.

McQueen and R. Zondervan (eds.), Proceedings of SWAP Spoken Word Access

Processes (pp. 155-159). Nijmegen: Max-Planck-Institute for Psycholinguistics.

Goldstone, R., Lippa Y, & Shiffrin R. 2001. Altering object representations through category

learning. Cognition, 78, 27-43 Hock, H. & Joseph, B. 1996. Language History, Language Change and Language

Relationship. New York: Mouton de Gruyer. Hockett, C. F. 1955. A manual of phonology.(p. 246). Baltimore: Waverly Press. Johnson, K. 1997. Speech perception without speaker normalization. In K. Johnson and J.

W. Mullennix (eds.), Talker Variability in Speech Processing. San Diego: Academic Press.

King, R. 1967. Functional Load and Sound Change. Language, 43, 831-852. Kirby, S., Cornish, H., and Smith, K. 2008. Cumulative Cultural Evolution in the Laboratory:

an experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31):10681-10686.

Kirchner, R. 1996. Synchronic Chain Shifts in Optimality Theory. Linguistic Inquiry 27, 341-

350. Kuhl, Patricia K. 1991. Human adults and human infants show a perceptual magnet effect for

prototypes of speech categories, monkeys do not. Perception and Psychophysics 50, 93-107.

Martinet, A. 1955. Economie des changements phonetiques. Bern: Francke. Oudeyer, P-Y. 2002. Phonemic coding might be a result of sensory-motor coupling

dynamics. In B. Hallam, D. Floreano, J. Hallam, G. Hayes and J. Meyer (eds.). Proceedings of the 7th International Conference on the Simulation of Adaptive

Behavior (pp. 406-416). Cambridge: MIT Press. Pierrehumbert, J. 2001. Exemplar dynamics, word frequency, lenition, and contrast. In J.

Bybee & P. Hopper (eds.), Frequency effects and the emergence of linguistic structure (pp. 137-157). Amsterdam: John Benjamins.

Surendran, D. and Niyogi, P. 2003. Measuring the usefulness (functional load) of phonological contrasts. Technical Report TR-2003–12, Department of Computer Science, University of Chicago.

Wedel, A. 2004. Category competition drives contrast maintenance within an exemplar-based production-perception loop. In Proceedings of the seventh meeting of the ACL special interest group in computational phonology. (pp. 1-10). Association for Computational Linguistics.

Wedel, A. 2006. Exemplar models, evolution and language change. The Lingusitic Review 23, 247-74.

Andrew Wedel

Page 67: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

The implications of analyzing channel bias rationally What factors shape the synchronic typology of sound patterns and how should these factors be assessed? One commonly recognized factor is known as channel bias (Moreton, 2008). That is, certain sound patterns might be intrinsically more frequent and salient than other patterns due to the relative robustness of the phonetic precursors. In this talk, I will address two closely related issues concerning the nature of the channel bias: the mechanism of phonologization and the evaluation of phonetic precursor robustness.

It is often assumed that sound change takes place when the listener mistakes the effects of the speakers’ production system and of ambient effects on the acoustic stream of her own perceptual system as representative of the speaker’s internal representations. Such an account hinges on the assumption that errors in perception lead to adjustment in perceptual and production norms. The mechanisms through which this adjustment takes place (i.e. phonologization) is not only under-articulated, it is further challenged by mounting evidence that listeners are very adept at compensating for contextual variation in speech perception and production.

In this talk, I explore the idea that the likelihood of a new variant becoming phonologized is determined by the robustness of the listener’s compensatory response. Perceptual compensation (PC) is modeled in terms of a rational analysis of speech perception and production. Using Bayesian inference, perceptual compensatory responses are explained as the consequence of the listener trying to reconcile evidence with prior beliefs or assumption. I propose that PC and phonologization are fundamentally one and the same problem. Both questions can be recast as a matter of understanding shifts in speaker’s optimization responses.

Using this same rational model of speech perception, a method, called Slope of Precursor Robustness (SPROB), is proposed, which estimates phonetic precursor robustness by the degree of uncertainty a given context introduces to the identification of an intended sound category. The utility of SPROB is illustrated through a case study of the phonetic precursors to vowel-to-vowel height dependencies (HH) and dependencies between vowel height and consonantal voicing (HV).

Alan Yu

Page 68: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

Linear separability and feature selection in the acquisition of tones

This paper is about the learnability of phonological concept classes in terms of properties depen-dent on the hypothesis spaces where the concept classes are defined: (1) linear separability, and (2)the features that define the dimensions of the spaces. Considering such properties is a prerequisiteto deciding what learners are appropriate for phonological concepts. We focus on tonal spaces andtone categories in tone languages as the concept classes to be learned, based on modeling of acousticdata we recorded of tones in isolated and connected speech in Bole, Mandarin, Cantonese, and Igbo.Here we imagine a tonal space as an high-dimensional acoustic space that gets tessellated into re-gions corresponding to each tone category in a tone language. The regions are defined by exemplars,which are vectors in the tonal space, labelled with the appropriate tone category, cf. [12, 12–13].

A fundamental geometric characterization of the distribution of concept classes in the space theylive in is if the classes are linearly separable. Two sets are linearly separable if we can separate themwith hyperplanes—we can do this as long as the convex hulls of the sets, the smallest convex setsthat contain the set of examples in each set, do not overlap [3]. Are phonological concept classes (inparticular, tones) linearly separable? If they are, then extremely constrained learners—any learnersthat learn a linear discriminant function, such as the perceptron, linear discriminant analysis (LDA),or the most basic support vector machine (SVM)—are su!cient for learning phonological conceptclasses. Indeed, in the phonetic literature, it is the standard assumption that linear discriminantfunctions, i.e. cue weighting models, are appropriate for categorization in acoustic spaces, e.g. [11,8, 5], even if the data are linearly inseparable, as in LDA modeling of Hmong tones in [1]. However,acoustic spaces as described in the literature are not typically linearly separable. The vowel cate-gories in [10] (Fig.1a) are typical—and nonlinearly separable—just drawn in a nonstandard manner.If the elliptical concept classes typically drawn around vowels in formant plots were imposed, thenthey would overlap; the convex hulls also overlap and thus the vowels are nonlinearly separable.

If phonological concepts as defined are not linearly separable, then we can retain linearly con-strained learners, but reconcile overlaps in concept classes, e.g. by introducing soft margins in SVMs.We can also consider learners that are not linearly constrained [6], or equivalently, transform thespaces in a manner that allows us to retain linearly constrained learners, e.g. with kernel methods.

However, the question of linear separability of concepts is crucially dependent on how the spacewhere the concepts lie is defined: what set of features define the space. In formant plots, the featureset is F1 and F2, and vowel concepts are typically linearly inseparable. But the addition of otherfeatures, such as as vowel inherent spectral change, has been shown to produce linear separability inEnglish vowels not linearly separable with F1 and F2 alone, cf. [13]. Feature selection in exemplartheory is implicit in the similarity metric in the classification function. Feature selection can alsobe driven explicitly to maximally separate the concept classes, which is what we do here. Our goalis to define the best feature sets, in this sense, for tonal spaces across di"erent languages.

In characterizing tonal spaces, the standard feature set includes some measure of f0 height andsome measure of f0 change ([4], i.a.). But saying these are in the feature set is too vague, sincespeech signals are dynamic. One approach to address this is to set up very high-dimensional spaces,as is done in [7] and in general in state space models like HMMs; another, which we take here, is toconsider lower-dimensional spaces in which we consider features defined over time windows and asrelations between properties at di"erent time points. As an illustration, in Figs. 1b and 1c, we defineCantonese tones in CVs from connected speech by a male speaker in a space set up by f0 heightat di"erent time slices in the vowel. Note how setting up a space with f0 at vowel onset/o"set [2]results in better separation of the tone classes than f0 at vowel onset and midpoint, but that the toneclasses are nevertheless linearly inseparable. Similarly, we find that using f0 at onset/o"set does notresult in linear separability for Bole or Cantonese tones in isolation or in connected speech, withinspeakers or across, nor does including a measure of f0 change. We find that even Mandarin tones inisolation cannot be linearly separated by f0 slope and height, though very little overlap occurs, as in[9]. These findings suggest that, to retain linear models with these small standard feature sets, wemust allow for soft margins, or find evidence of poor classification performance in humans matchingthat of linear models, and that we should also consider other features than those described here.

Kristine Yu

Page 69: PROGRAM & ABSTRACTSkirchner/Abstracts.pdf · 3:05-3:35 Andrew Martin (UCLA), Sharon Peperkamp and Emmanuel Dupoux (LCSP) Learning phonemes with a pseudo-lexicon 3:35-4:05 Giorgio

(a) Vowel space from [10]

●●

● ●●

●●

●●

● ●

●●

●●

●●●●● ●

● ●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●●● ●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

● ●●●

●●●●

●●

● ●●

●●

● ●●

●●●●

●●●

●●●

●●

●●

●●●

●●

● ●●

●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

● ●●●

● ●

● ●● ●●

●●●

●●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●● ● ●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

80 100 120 140 160 180

100

120

140

160

180

F0 onset (Hz)

F0 a

t mid

poin

t (Hz

)

0.18

0.36

0.54

0.24 0.48 0.73

1

1

targetT1T2T3T4T5T6

(b) Cantonese tones, onset/midpoint f0

●●

●●

●●

●●

●●●

●●

●●

●● ●● ●● ●●

●●

●●

●● ●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●●

●●●●

●●

●●

● ●●

●●

● ●

●●

●● ●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●

80 100 120 140 160 180

100

120

140

160

F0 at vowel onset (Hz)

F0 a

t vow

el o

ffset

(Hz)

0.18

0.36

0.54

0.22 0.45 0.67

1

1

targetT1T2T3T4T5T6

(c) Cantonese tones, onset/o!set f0

Figure 1: Linearly inseparableacoustic spaces are the norm, e.g. in vowels defined in F1-F2 space, (a), and Cantonese

tones defined by f0 height at vowel onset and midpoint, (b), or vowel onset and o"set, (c).The plots show convex hulls and marginal distributions. Note increased separability in (c) vs. (b).

References[1] J. E. Andruski and J. Costello. Using polynomial equations to model pitch contour shape in lexical tones:

An example from Green Mong. Journal of the International Phonetic Association, 34(02):125–140, 2004.[2] J. G. Barry and P. J. Blamey. The acoustic analysis of tone di"erentiation as a means for assess-

ing tone production in speakers of cantonese. The Journal of the Acoustical Society of America,116(3):1739–1748, 2004.

[3] K. P. Bennett and E. J. Bredensteiner. Duality and geometry in SVM classifiers. In Proceedingsof the Seventeenth International Conference on Machine Learning, pages 57–64. Morgan KaufmannPublishers Inc., 2000.

[4] J. T. Gandour and R. A. Harshman. Crosslanguage di"erences in tone perception: a multidimensionalscaling investigation. Language and Speech, 21(1):1–33, 1978.

[5] L. L. Holt and A. J. Lotto. Cue weighting in auditory categorization: Implications for first and secondlanguage acquisition. The Journal of the Acoustical Society of America, 119(5):3059–3071, May 2006.

[6] K. Johnson. The auditory/perceptual basis for speech segmentation. OSU Working Papers inLinguistics, 50:101–113, 1997.

[7] R. Kirchner and R. K. Moore. Computing phonological generalization over real speech exemplars.ROA 1007-1208, 2008.

[8] K. R. Kluender and A. J. Lotto. Virtues and perils of an empiricist approach to speech perception.The Journal of the Acoustical Society of America, 105(1):503–511, 1999.

[9] G.-A. Levow. Unsupervised and semi-supervised learning of tone and pitch accent. In Proceedingsof the Human Language Technology Conference of the North American Chapter of the ACL, pages224–231, 2006.

[10] J. D. Miller. Auditory-perceptual interpretation of the vowel. The Journal of the Acoustical Society ofAmerica, 85(5):2114–2134, May 1989.

[11] T. M. Nearey. Speech perception as pattern recognition. The Journal of the Acoustical Society ofAmerica, 101(6):3241–3254, June 1997.

[12] J. B. Pierrehumbert. Word-specific phonetics. In Laboratory Phonology VII, pages 101–139. Moutonde Gruyter, 2002.

[13] J. B. Pierrehumbert. Phonetic diversity, statistical learning, and acquisition of phonology. Languageand Speech, 46(2-3):115–154, June 2003.

Kristine Yu