Two models of accentuation that the brain might use John Goldsmith University of Chicago.

Post on 21-Dec-2015

215 views 2 download

Transcript of Two models of accentuation that the brain might use John Goldsmith University of Chicago.

Two models of accentuation that the brain might use

John Goldsmith

University of Chicago

What’s the point?

Take a step back from linguistic analysis, and ask: what is the simplest way to perform the computations that are central and important for the data of metrical systems?

What kind of […neural…] hardware would be good at performing that kind of computation?

An [implicit] assumption...

Linguistics is an autonomous profession, with its own autonomous methods…

But it does not address an autonomous reality, nor have its autonomous truth.

It shares truth with all other disciplines.

And...

…there is no certainty that linguistic methods will get us as close to the Truth as we wish.

No guarantee that linguistic methods will get us arbitrarily close to the truth.

Put another way, it may be that pouring more data (of a certain sort) into contemporary linguistic methods gives us a theory that is overtrained on its data.

The only escape from that is to cooperate with other research traditions.

Also implicit...

Linguistics and non-linguists (psychologists, neurocomputational modelers) must each take a step towards each other to find a common middle ground.

This means...

Non-linguists...

must realize that English is an outlier among languages...

Language

English

Linguists...

must acknowledge that much of their theory-building is motivated by their everyday convenience. (For example, they strongly prefer models whose computation requires paper or a blackboard at least, but also at most.)

Two models in neurocomputing:

1. In space: lateral inhibitionWork done jointly with Gary Larson.

Discrete unit modeling.[ 2. In time: neural oscillation ]

Dynamic computational nets Brief demonstration of the program Some background on (some aspects of)

metrical theory This network model as a minimal

computational model of the solution we’re looking for.

Its computation of familiar cases Interesting properties of this network:

inversion and learnability Link to neural circuitry

Dynamic computational nets Brief demonstration of the program Some background on (some aspects of)

metrical theory This network model as a minimal

computational model of the solution we’re looking for.

Its computation of familiar cases Interesting properties of this network:

inversion and learnability Link to neural circuitry

Let’s look at the program --

Dynamic computational nets Brief demonstration of the program Some background on (some aspects of)

metrical theory This network model as a minimal

computational model of the solution we’re looking for.

Its computation of familiar cases Interesting properties of this network:

inversion and learnability Link to neural circuitry

Metrical phonology: work during 1975-1985 Mark Liberman Liberman and Prince Morris Halle, J.-R. Vergnaud Alan Prince Bruce Hayes, especially Metrical Stress

Theory (1995)

“Create trochaic feet, from left to right”

[ x x x x x x x . . .

Patterns of alternating stress:the simplest cases

Patterns of alternating stress:the simplest cases

“Create trochaic feet, from left to right”

[ x x x x x x x . . .

S W

Patterns of alternating stress:the simplest cases

“Create trochaic feet, from left to right”

[ x x x x x x x . . .

S W S W

Patterns of alternating stress:the simplest cases

“Create trochaic feet, from left to right”

[ x x x x x x x . . .

S W S W S W

Patterns of alternating stress:the simplest cases

“Create trochaic feet, from left to right”

[ x x x x x x x . . .

S W S W S W

Patterns of alternating stress:the simplest cases

“Create trochaic feet, from left to right”

[ x x x x x x x . . .

S W S W S W

. . . x x x ]

S W

S W

Patterns of alternating stress:The other way... “Create trochaic feet, from right to left”

x x x x x x x x ]

S W

Patterns of alternating stress:The other way... “Create trochaic feet, from right to left”

x x x x x x x x ]

S WS W

Patterns of alternating stress:The other way... “Create trochaic feet, from right to left”

x x x x x x x x ]

S WS WS W

Patterns of alternating stress:The other way... “Create trochaic feet, from right to left”

x x x x x x x x ]

S WS WS WS W

Patterns of alternating stress:The other way...

“Create trochaic feet, from right to left”

[x x x x x x x x x x x ]

S WS WS WS W S W

This is all very convenient, but...

Should be be thinking about constructing structure? Computing a result?

What’s the minimal way to compute the right result?

Dynamic computational nets Brief demonstration of the program Some background on (some aspects

of) metrical theory This network model as a minimal

computational model of the solution we’re looking for.

Its computation of familiar cases Interesting properties of this network:

inversion and learnability Link to neural circuitry

Initial activatio

n

Final activation

Beta = -.9: rightward spread of activation

Alpha = -.9; leftward spread of activation

Dynamic computational nets Brief demonstration of the program Some background on (some aspects of)

metrical theory This network model as a minimal

computational model of the solution we’re looking for.

Its computation of familiar cases Interesting properties of this network:

inversion and learnability Link to neural circuitry

Examples (Hayes)Pintupi (Hansen and Hansen 1969, 1978;

Australia): “syllable trochees”: odd-numbered syllables (rightward); extrametrical ultima:

S s

S s s

S s S s

S s S s s

S s S s S s

S s S s S s s

Weri (Boxwell and Boxwell 1966, Hayes 1980, HV 1987) Stress the ultima, plus Stress all odd numbered syllables,

counting from the end of the word

Warao (Osborn 1966, HV 1987)

Stress penult syllable;

plus all even-numbered syllables, counting from the end of the word.

(Mark last syllable as extrametrical, and run.)

Maranungku (Tryon 1970)

Stress first syllable, and

All odd-numbered syllables from the beginning of the word.

Garawa (Furby 1974) (or Indonesian, …)

Stress on Initial syllable; Stress on Penult; Stress on all even-numbered syllables,

counting leftward from the end; but “Initial dactyl effect”: no stress on the

second syllable permitted.

Seminole/Creek (Muskogean)High tone falls on the final (ultima) or penult, depending

on a parity-counting procedure that starts at the beginning of the word (“parity-counting” means counting modulo 2: 1, 0, 1, 0: like counting daisy-petals). High tone is on a count of “0”. But a heavy syllable always gets count “0”.

In words with only light syllables (CV):

1 0 1 0 1 0 1

1 0 1 0 1 0 1 0

S ]

T

S S

T

Harmonic conditioning: improves well-formedness

Dynamic computational nets Brief demonstration of the program Some background on (some aspects of)

metrical theory This network model as a minimal

computational model of the solution we’re looking for.

Its computation of familiar cases Interesting properties of this network:

inversion and learnability Link to neural circuitry

)()( 111 iBiasxxiPx t

iti

ti

000

00

00

000

= Network M

Input (underlying representation)is a vector U

1* ii SMUS

Dynamics:

Output is S*: equilibrium state of (1),which by definition is:

(1)

** *SMUS

Hence:))(( *SMIU

Quite

a surprise!

Inversion, again -- note the near eigenvector property

1* ii SMUS

Dynamics:

Output is S*: equilibrium state of ,which by definition is:

** *SMUS

Hence:*)( SMIU

U = S0 M*S0

S1 M*S1S2

S* = SnM*Sn

(I is the identitymatrix)

Fast recoverability of underlying form

This means that if you take the output S* ofa network of this sort, and make the output undergo the network effect once — that’s M S* — [M’s a matrix, S a vector] and subtract that from S*— that’s (I-M) S* —you reconstruct what that network’s inputstate was. (This would be a highly desirable property if we had designed it in!)

*)( SMIU

learnability

Dynamic computational nets Brief demonstration of the program Some background on (some aspects of)

metrical theory This network model as a minimal

computational model of the solution we’re looking for.

Its computation of familiar cases Interesting properties of this network:

inversion and learnability Link to neural circuitry

neural circuity

The challenge of language: For the hearer: he must perceive the

(intended) objects in the sensory input despite the extremely impoverished evidence of them in the signal -- a task like (but perhaps harder than) visual pattern identification;

For the speaker: she must produce and utter a signal which contains enough information to permit the hearer to perceive it as a sequence of linguistic objects.

Visual context:

Mach bands

Lateral inhibitionIn a 1- or 2-dimensional array of neurons,

neurons: a. excite very close neighbors; b. inhibit neighbors in a wider

neighborhood; c. do not affect cells further away

Activation here...excitationRegion of inhibition

A brief run-through on lateral inhibition... Hartline and Ratliff 1957 in the

horseshoe crab (Limulus) Lateral inhibition leads to contrast

enhancement and edge detection, under a wide range of parameter settings.

Early models used non-recurrent connections;

Later models preferred recurrent patterns of activation...

Recurrent lateral inhibition

Recurrent models include loops of activation which retain traces of the input over longer micro-periods.

(Wilson and Cowan 1972; Grossberg 1973, Amari)

Recurrent inhibitory loops also leads to circuits that perform (temporal) frequency detection.

Recurrent lateral inhibition

…also leads to winner-take-all computations, when the weight of the lateral inhibition is great.

Most importantly for us, as noted by Wilson and Cowan 1973, lateral inhibition circuits respond characteristically to spatial frequencies.

Evolution of thinking about visual cell’s receptive field from simple characteristic field (Hubel & Wiesel) to spatial frequency detector (J.P. Jones and L.A. Palmer 1987 An evaluation of the two-dimensional gabor filter

model of simple receptive fields in cat striate cortex. J. Neurophysiol.,

58(6):1233-1258. )

From U. Bochumhttp://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/noframes/research/computerVision/imageProcessing/wavelets/gabor/gaborFilter.html

A gabor function is a product (a convolution, really) of asine wave and a gaussian distribution: in short, gabor cellsystems implement a local spatial frequency detector.

Spatial frequencies

A spatial sinewave...

A spatial square wave...

Observe how a recurrent (feedback) competitive

network of lateral inhibition gives rise to

a pattern of spatial waves.

Initially lateralinhibition gives rise to

edge detection, andclassic Mach band

phenomena.

end

Hayes’s generalizations Culminativity: each word or phrase has

a single strongest syllable bearing the main stress. TRUE IF THAT SYLLABLE IS

USED TO MAP A TONE MELODY (ETL). Rhythmic distribution: syllables bearing

equal levels of stress tend to occur spaced at equal distances.

Stress hierarchies (Liberman/Prince): several levels of stress

Lack of assimilation as a natural process

Metrical grid

x xx

x x x xx

x x x x x x x xx

The height of the grid marks rhythmic prominence. Each level may represent a possible rhythmic analysis (“layer”).

Goldsmith-Larson (dynamic computational) model

Model syllables as units with an activation level; the strength of the activation level roughly corresponds to the height of the column on the metrical grid.

Some generalizations about prosodic systems of the world

Very crude distinction between tone and non-tone languages.

It’s easier to say what a tone language is; not clear that non-tone languages form a homogeneous group.

They have accent/stress...

Light editing of Hayes’ typology of accentual systems...

“Free versus fixed stress”: when is it predictable which syllable is accented.

When it is predictable, what kinds of computation are necessary to make the prediction?

Word-based generalizations (i.e., not sensitive to word-internal morphological structure):

Rhythmic versus non-rhythmic systemsIn rhythmic systems, there are upper limits on how

many consecutive unstressed syllables there may be.

The usual limit is no more than 1.

And the usual limit is no more than 1 consecutive stressed syllable.

I

Hayes’s typologies

Free vs. fixed stress (predictable or not by rule)

Rhythmic versus morphological stress– Morphological: boundary-induced versus

use of morphological information to resolve competition

Bounded versus Unbounded stress (length of span of unstressed syllables)

Is the height of a metrical column a value of a variable? If so, this would explain the Continuous

Column Constraint: a grid is ill-formed if a grid mark on level n+1 is not matched by a grid mark on level n in the same column (an effect that shows up in several environments: in stress shift, in immobility of strong beats, main stress placement, in destressing).

Is constituency in metrical structure strongly motivated?

#(x . ) (x . ) (x . ) (x . ) ...

# á a á a á a á a ...

... (x . ) (x . ) (x . ) (x . )#

... á a á a á a á a #

We could think of assigning trochaic feet, either fromleft to right or from right to left.

Syllable weight

Syllables divided into Heavy and Light syllables, primarily by the sum of the sonority of the post-nuclear material in the syllable.

Latin stress rule: No stress on final syllables; stress on antepenult if penult is light; else Stress on (heavy) penult.

Hayes’ parametric metrical theory Choice of foot type:

– i. size (maximum: unary/binary/ternary/ unbounded)– ii. Q-sensitivity parameter– iii. Trochaic vs. iambic (S/W, W/S)

Direction of parsing: rightward, leftward Iterative foot assignment? Location (create new metrical layer, new

layer) Extrametricality...

Extrametricality

Units (segments, syllables, feet,…) can be marked as extrametrical…

if they are peripheral (at the correct periphery)…

and enough remains after they become invisible.

Dynamic computational networks(Goldsmith, Larson)

Goal: to find (in some sense) the minimum computation that gets maximally close to the data at hand.What structure is required in the empirically robust cases?

)()( 111 iBiasxxiPx t

iti

ti

000

00

00

000

= Network M

Input (underlying representation)is a vector U

1* ii SMUS

Dynamics:

Output is S*: equilibrium state of (1),which by definition is:

(1)

** *SMUS

Hence:**SMIU

Quite a surprise!

Learnability

Larson (1992) showed that these phonological systems were highly learnable from surface data.

A spatial sinewave...

A spatial square wave...

Observe how a recurrent (feedback) competitive

network of lateral inhibition gives rise to

a pattern of spatial waves.

Initially lateralinhibition gives rise to

edge detection, andclassic Mach band

phenomena.

Moras, Syllables, and Stress

Moras and syllables (sequence of CVCVCV…)