backley Englishvowels_2010

8/11/2019 backley Englishvowels_2010

1/80

1

Element Theory

and the

The Structure of English Vowels

Phillip Backley

Tohoku Gakuin University, Japan

February 2009


2/80

2

Contents

Chapter 1. Background and Introduction

Chapter 2. Representing Segmental Structure2.1 Segments have internal structure

2.2 Articulation versus perception

2.3 Elements as patterns in the speech signal

2.4 Monovalency versus bivalency

2.5 Elements and the grammar

2.6 Summary

Chapter 3. Element Theory and the Representation of Vowels

3.1 Introduction3.2 What makes |A I U| special?3.3 |A I U| as simplex expressions

3.4 |A I U| in compounds

3.4.1 Phonetic evidence for element compounds

3.4.2 Phonological evidence for element compounds

3.5 Central vowels

3.5.1 Phonetic evidence for empty vowels

3.5.2 Phonological evidence for empty vowels

Chapter 4. English Vowel Structure4.1 Introduction

4.2 Front rounding in vowels

4.3 Element dependency

4.4 The representation of English vowels

4.4.1 Introduction

4.4.2 Short vowels

4.4.3 Long monophthongs

4.4.4 Weak vowels

4.4.5 Diphthong structure

4.4.6 |I| diphthongs

4.4.7 |U| diphthongs4.4.8 |A| diphthongs

Chapter 5. Summary


3/80


4/80

4

in terms of grammaticality, for instance. The Optimality view sees grammaticality as

being determined by a once-only evaluation of some lexical input, whereas in standard theory

a grammatical form corresponds to the final stage of a serial derivation process. Yet when it

comes to segmental structure, the two approaches usually converge in the sense that they both

employ distinctive features and they both admit lexical forms comprising linear strings of

segments from which prosodic structure is largely predictable.

Distinctive features are undeniably part of the fabric of mainstream phonology. This

is not an a priori reason to accept their validity as units of linguistic structure, however. In

fact this paper claims that features do not provide the most suitable means of representing the

internal structure of language sounds. Instead, I argue that segmental representations are built

from an alternative set of units called elements, which are mapped onto patterns that humans

perceive in the speech signal. Clearly this departs from the standard view that features are

associated with the articulatory properties of speech production. Below I illustrate the use of

elements in representations by analysing the internal structure of vowels.

The discussion is organised as follows. Section 2 considers some of the problems

associated with distinctive features. In particular, it questions two common assumptions

about the nature of features: their bias towards articulation and their reliance on binary values.

(Readers who are already familiar with these issues and with the thinking behind the Element

Theory approach may skip this section altogether, and proceed to section 3). Then section 3

introduces Element Theory as an alternative way of describing segmental structure. It focuses

on the representation of vowels using the elements |A I U|. Section 4 offers an Element

Theory analysis of the vowel system(s) of English. It shows how an approach based on

phonological elements can shed light on patterns that characterise the shape and behaviour of

vowels in present-day English. Finally, section 5 summarises the main points.


5/80

5

2: Representing Segmental Structure

2.1 Segments have internal structure

There is a long tradition of using segments to describe language sounds. For example,dictionaries provide segmental (i.e. phonemic) information to show the pronunciation of a

word (e.g. segmental //), and linguists refer to inventories of segments when

comparing one language with another, or when discussing the set of contrastive sounds in a

language. Yet there is overwhelming evidence that segments are not the primary units of

sound structure. Rather, by observing how sounds behave in languages we can uncover a set

of more basic sound properties which collectively describe the internal make-up of segments;

and it is this assumption which has driven the study of segmental phonology since the time ofTrubetzkoy.

According to this view, segments with one or more of the same basic properties in

common are expected to show similar phonological behaviour, whereas segments with little

or no shared internal structure should show quite different behaviour. Identifying these basic

sound properties is therefore central to the task of explaining segmental patterns and

groupings. As any introductory course in phonology attempts to show, understanding the

nature of segment-internal properties should reveal why segments regularly cluster together

only in certain combinations and why segments interact in predictable ways as a result of

coming into contact with each other. So, although the term segment continues to serve as a

convenient label for referring to language sounds, segments themselves should not be seen as

having the status they once had as formal units of linguistic structure.

The standard approach views segments as bundles of co-occurring features, where

each feature picks out one aspect of a segments behaviour. This means that one feature alone

cannot define any individual segment; in order to characterise a segment in full we must refer

to its combined feature specification that is, to the sum of its phonological properties.

Nevertheless, single features do have a role in representation systems: each defines an entire

class of segments, where every member of the class shares the same phonological property by

virtue of having the same feature in its representation. With this one property in common,

segments from the same class should, in principle, display similar phonological behaviour

with respect to this property. For example, the feature [+coronal] unites a range of otherwise

disparate sounds including [ ], all of which may follow the vowel [] in English:

the words couch, mouth, owl, blouse, shout, count contain well-formed sequences of []


6/80

6

plus coronal, whereas a segment from any other class is banned from this position (*[],

*[], *[], etc.).

2.2 Articulation versus perceptionBecause every language shows distributional regularities of the kind just described, there is

little reason to doubt that segments have internal structure. What still remains unresolved,

however, is the question of the nature of this internal structure. In particular, what are the

linguistic units which represent the sub-segmental properties of speech sounds? As I have

noted, the standard approach assumes a set of features adapted from those employed in SPE.

From their labels alone (e.g. [high], [voice], [lateral], etc.) it is clear that features can be

traced back to phonetic propertiesprimarily, to properties referring to articulation such as

glottal state and tongue position. When they are used to analyse linguistic patterns in speech,

however, they are also associated with the kinds of phonological properties that describe

segmental contrasts and dynamic processes. So there is an underlying assumption that

phonological phenomena are motivated by phonetics, and more specifically by speech

productionthat is, by articulation.

Yet the association between phonology and articulation is not a necessary one. The

authors of Fundamentals of Language (Jakobson & Halle 1956) argued that phonological

features should be defined in auditory-acoustic terms, and this view had a major influence on

phonological studies until the time of SPE. For instance, they propose the feature pair

[compact]/[diffuse], where these labels reflect the acoustic properties of the sound classes

they represent. Specifically, these features describe how acoustic energy is distributed across

the spectrum. In compact sounds such as low vowels and back consonants it is concentrated

in the central area of the spectrum that is, the energy has a [compact] distribution in this

acoustic region; whereas in diffuse sounds such as high vowels and front consonants it

extends more widely across the spectrum in other words, the energy has a [diffuse]

distribution. The other eight feature pairs proposed in Fundamentals of Language have a

similar acoustic or hearer-oriented characterisation.

The tradition of describing segmental structure in auditory-acoustic terms came to an

abrupt end with the publication of SPE. This was despite the authors of SPE having given

little justification for rejecting auditory-acoustic features or for adopting articulatory features

instead. But such was the influence of SPE on the development of phonological theory that its

preference for articulatory features quickly caught on. And to this day most analyses of


7/80


8/80

8

In short, there seems little support for the assumption that speech sounds should be

represented in terms of articulatory properties. If anything, the arguments point towards

speech perception as being primary and speech production only secondary. This was indeed

the accepted position before SPE, as documented in the work of Sapir and Jakobson. It is also

the position that Element Theory attempts to revive. As just indicated, the acquisition facts

suggest that infant learners begin by perceiving adult input forms; on the basis of these input

forms they build mental representations, which serve as the beginnings of their native

lexicon; and only later do they go on to reproduce these stored forms as spoken language. But

while the former (perception) stage is necessary for successful acquisition, the latter

(production) stage is not, as confirmed by the ability of mutes and those with abnormalities of

the vocal apparatus to acquire a native grammar; evidently, the inability to articulate normally

is not a bar to perceiving speech. Conversely, speech production in the profoundly deaf rarely

develops to a native-like level, presumably because their means of perceiving language lacks

the necessary input from the speech signal.

Having argued that speech perception is more fundamental to the grammar than

speech production, it is natural to assume that segments should be formally described in

terms of their perceptual (i.e. auditory) propertiesthat is, from the hearers point of view.

Recall, however, that this paper is attempting to develop a representation system which

favours neither the speaker nor the hearer, but which instead models the linguistic knowledge

common to both. As suggested above, this means focusing on the speech signal the set of

acoustic events which involves the transmission of sound waves through the air and which

acts as an intermediary between the origin of a sound (the vocal organs of the speaker) and its

target (the auditory system of the hearer). This approach is motivated in Harris & Lindsey

(2000), where it is proposed that the speech signal be understood as a channel through which

speakers transmit and monitor[linguistic] information and listeners receive it (Harris &

Lindsey 2000: 185).

As a physical phenomenon, the speech signal is something that can be measured in

concrete terms. So when an utterance is transmitted between speaker and hearer it is possible

to describe its acoustic properties (e.g. amplitude, formant values). However, it seems that

most of these properties are irrelevant to the grammar, and as such, need not be encoded by

features in phonological representations. Indeed, the extensive literature on segmental

structure gives no indication that raw acoustic data such as formant values or voice onset

measurements have any place in formal phonological theory. A simple parallel can be found

in music: although the notes of a musical phrase can be described by referring to their


9/80

9

physical attributes (e.g. frequency in hertz), a musician does not need precise information of

this kind in order to perceive that phrase, store it in memory, or reproduce it as a melody. Nor

do these physical characteristics need to be written on the page of a musical score. A musical

note is identified not by raw acoustic values, but rather, by its overall acoustic shape and its

relation to other notes in the musical context.

Like musicians, language users do not classify sounds according to their acoustic

properties. It is true that phoneticians may use phonetic data such as formant frequency to

describe the sounds of a language, or to compare different languages; importantly, however,

these data do not constitute linguistic information, and as such, do not identify segmental

features. But if the speech signal is the medium by which language is transferred between

speaker and hearer, then which aspects of the signal arerelevant to the grammar and to the

communication process? The claim made by Element Theory is that humans perceive specific

information-bearing patterns in the speech signal, and that each pattern is represented by an

element, where an element is taken to be the smallest unit of segmental structure present in

mental representations. This is the position motivated in Harris & Lindsey (2000) and

summarised in Nasukawa & Backley (2008).

2.3 Elements as patterns in the speech signal

The Element-based approach assumes that hearers instinctively seek out linguistic

information: when decoding speech, they ignore most of the incoming acoustic stream and

focus only on the specifically linguistic information contained within the speech signal. Thus

Element Theory recognizes the human ability to extract from running speech only those

acoustic patterns that are relevant to language. And, as just mentioned, it further assumes that

the mental phonological categories represented by elements are mapped directly on to those

same acoustic patterns. So although elements are associated with certain physical patterns in

the speech signal, they exist primarily as mental constructsthat is, as units of phonological

structurein the internalized grammar. In order to highlight the way the term element can

refer to both the physical and the mental, Harris & Lindsey (2000) describe elements as

auditory images. This label suggests that an element is primarily a grammar-internal object

a mental image of some linguistically significant information, but that it is also a

grammar-external objecta physical pattern in the speech signal which hearers use to cue

that mental image. The defining characteristics of these speech signal patterns are described

in section 3 below.


10/80

10

So far, the discussion has given only a hearer-oriented view of elements, in which

hearers perceive the speech signal, recover information-bearing patterns from it, and then

associate those patterns with particular elements in phonological structure. But the speech

signal is a neutral medium, and must therefore carry linguistic information which is also

relevant to speakers. In the case of speakers, the same information-bearing patterns function

not as perceptual cues but as production (i.e. articulation) targets. It must be assumed that a

speakers internalized grammar includes knowledge of the mapping between elements in

lexical representation and their associated acoustic patterns in the speech signal. So in order

to phonetically interpret a word, speakers must access the lexical form of that word, associate

the elements it contains with their corresponding speech signal patterns, and use the vocal

organs to reproduce those target acoustic patterns in an utterance.

Importantly, this process of reproducing an acoustic target succeeds without the need

for an element to contain information about speech production. For the grammar to specify

any mapping between elements and articulation would be at best unnecessary, and at worst

counter-productive, since there is not always a one-to-one correspondence between the shape

of the vocal tract and the resulting sound. Consider a trained ventriloquist, for example, who

can reproduce the speech signal pattern associated with bilabial stops but without using

conventional lip closure. Even untrained speakers typically have available to them a choice of

different articulatory configurations for creating the same acoustic result. For example, to

bring about a general downward shift in vowel formant values creating a flattening of

the sound spectrum (Jakobson, Fant & Halle 1952: 31)speakers may employ lip rounding,

or a contraction of the pharynx, or a combination of the two. 1 In sum, an element in

phonological representation establishes which signal pattern a speaker must aim for, but it

does not prescribe what the speaker must do to reach the target. A suitable articulation is

something that speakers master only through being experienced users of their native language.

Before returning to the issue of distinctive features, let us review the way some basic

phonological concepts should be (re)defined in light of the preceding discussion on the nature

of Element Theory. First, the elements themselves are to be seen as acoustic images

primarily as cognitive objects which are present in lexical representations and which serve to

encode contrasts and alternations. However, elements also connect to the external world by

having a direct physical interpretation they are mapped onto certain acoustic patterns in

the speech signal which carry linguistic information. Thus a phonological representation may

1For further examples, see Harris & Urua (2001: 79).


11/80

11

be thought of as a code which allows language users to store and identify these mental

acoustic patterns.

In contrast, speech production is an aspect of language use which is not controlled by

the grammar. Tongue position, glottal state, lip attitude and the like do not constitute

linguistic information; rather, they provide a way of delivering the speech signal. So

articulation serves as a vehicle for carrying the linguistic message, but it does not constitute

the message itself. To reinforce this point, we need only consider the communication process:

when a hearer perceives information-bearing patterns in the speech signal, each pattern acts

consistently and reliably as a cue to its associated elementit makes no difference whether

the signal originates from the articulation of an actual utterance, or from the recording of an

actual utterance, or from a synthesized, unarticulated voice on a computer. In each case the

linguistic message is the same, regardless of whether the vocal organs are involved or not,

since articulation is not a component of the mental grammar.

In conclusion, there is little evidence to support the prevailing view that the basic

units of segmental structure are defined in articulatory terms. For this reason, section 3 will

argue for an alternative view of phonological representations in which features or elements

are mapped onto certain patterns in the speech signal. Although these patterns can be

characterized by their acoustic properties, they are to be understood primarily as cognitive

units which carry linguistic information about the identity of morphemes.

2.4 Monovalency versus bivalency

Before going on to introduce the elements in detail, this section addresses another issue

concerning the use of distinctive features: should features (or elements) in representations be

monovalent (single-valued) or bivalent (binary-valued)? The standard model follows a

tradition of employing bivalent features, meaning that the grammar marks the presence of a

phonological property by specifying a positive feature value, while the absence of that

property is shown by the corresponding negative value. For example, l-sounds are specified

as [+lateral] while all other sounds are [lateral]; this creates an equipollent distinction

between lateral and non-lateral, according to which [+lateral] and [lateral] appear to have

equal status because the grammar is able to refer to either category. But alongside bivalent

features such as [lateral] we also find a number of monovalent features being used in some

versions of the standard model (Steriade 1995). Unlike [lateral], a monovalent feature such

as [round] can only refer to the presence of a given property, not to its absence. This creates a


12/80

12

privative distinction between the opposing categories, because only a single value of the

feature can be expressed in representational terms.

(1)

[] vs. [] [] vs. []

a. bivalency [+round] vs. [round] [lateral] vs. [+lateral]

b. monovalency [round] vs. vs. [lateral]

As (1) shows, there are two ways of referring to the same phonological contrast,

because there are two ways of expressing the absence of a certain property. For example, to

describe a back unrounded vowel such as [] we can either use [round] (i.e. the negativevalue of the bivalent feature [round]) or we can choose to make no reference to rounding, as

indicated in (1) by (i.e. the monovalent feature [round] is absent from the segment s

representation). At first sight, the difference between [round] and seems trivial, because

the same contrast can be expressed in both systems. However, several authors including

Durand (1995), Kaye (1989), Harris (1994) and Roca (1994) have noted that the choice

between bivalency and monovalency affects our predictions about how language sounds are

grouped into natural classes and how they participate in phonological processes. That is, thetwo systems make different grammatical statements.

To illustrate this point, consider the representation of nasal vowels such as [] and [].

These belong to a natural class a non-random group whose members all share some

physical characteristic (nasal resonance) and, more importantly, some pattern of phonological

behaviour (e.g. vowel lowering, trigger of nasal harmony). It is assumed that these shared

physical and phonological characteristics are an indication that the same structural property

in this case, nasality is specified in the representation of each member of the natural

class. In other words, the common structural property defines the natural class. Furthermore,

most theories of segmental structure assume that this class-defining property corresponds to a

basic, indivisible unit in phonological representation, typically a feature or an element. In this

example the basic property is nasality, so it follows that every nasal vowel must have a

nasality feature/element in its segmental make-up.

The Amerindian language Warao (Osborn 1966) illustrates how monovalency and

bivalency make different grammatical predictions (data from Botma 2005):


13/80

13

(2) a. sun c. summerb. walking d. kind of tree

As (2a-b) show, this language has a lexical contrast between oral and nasal vowels. So in a

monovalent system of representation, the feature [nasal] appears in the structure of [] in (2b),

while [] in (2a) makes no reference to [nasal] and is therefore interpreted as an oral vowel.

Alternatively, under bivalency [] is specified as [+nasal] while [] has [nasal]. (2c-d) show

that Warao also has a process of nasal harmony, where the presence of a nasal trigger (a nasal

vowel or nasal consonant) causes all target sounds (vowels, laryngeals, glides) to its right to

be nasalised within the word domain. Any harmonic trigger in Warao is characterised as a

segment with [nasal]/[+nasal] in its lexical representation, where this feature defines a natural

class of nasals all united by similar (harmonic) behaviour.

As expected, oral vowels do not act as harmonic triggers in this language, because

they have no [nasal]/[+nasal] specification. Moreover, they do not constitute a natural class

because they display no unified active behaviour.2 Importantly, the fact that [nasal] (i.e.

oral) vowels collectively do not do something provides no justification for grouping them

together as a natural class. Yet this is exactly what the bivalent feature system does. Allowing

[nasal] to appear in representations gives it a grammatical status equal to that of [+nasal],

making it possible for the phonology to refer to [nasal] as well as [+nasal] as an active

property in some phonological process. However, the evidence does not support this position:

for example, we find no comparable process of oral harmony in which [nasal] acts as a

harmonic trigger and oralises nasal vowels. In short, it is difficult to motivate the bivalency

prediction that [nasal] and [+nasal] both exist as basic structural properties, and hence, as

two separate natural classes.

It seems, then, that the problem with bivalent features arises from their ability to refer

to negative categories that is, to properties which are absentfrom a segments structure.To reinforce this point, consider other negative features besides [nasal] that characterise oral

vowels under a bivalent feature system. Oral vowels are all non-lateral, for example. But the

feature [lateral] does not define a natural class either, because it identifies a whole range of

sound classes besides oral vowels (e.g. obstruents, nasal stops, rhotics) which cannot be

unified by the presence of even a single common property. Compare this with a true natural

class such as [+nasal], whose members comprise nasal vowels and consonants; all and only

2Note that [nasal] fails to capture the class of segments targeted by nasal harmony in Warao, because this set

includes some non-nasals (e.g. glides) but excludes other non-nasals (e.g. obstruents).


14/80

14

these sounds act as harmonic triggers in Warao because only these sounds possess the active,

class-defining feature [+nasal].

By contrast, the use of monovalent features makes it possible for the segmental

structure itself to show that nasal vowels form a grammatical set whereas oral vowels do not.

[nasal] identifies the nasal vowels as a natural class, while the lack of any equivalent feature

specification for oral vowels indicates that they have no common behaviour; furthermore, it

prevents the grammar from referring to them as a unified set. In more general terms, the

monovalent feature [nasal] groups together nasal vowels and consonants as a natural class, as

evidenced by Warao nasal harmony, whereas the arbitrary set of non-nasal segments (oral

vowels and all non-nasal consonants) displays no common properties and consequently has

no feature specification to indicate natural class status.

The conclusion to be drawn from this comparison between monovalent and bivalent

features is that bivalency makes for an altogether less restrictive system. Since bivalency

forces representations to specify either the presence or the absence of a given property, the

number and nature of specifiedand therefore potentially activephonological properties

exceeds what is actually observed in natural languages. In other words, it predicts the

possibility of many phonological processes and therefore many grammars that would

presumably be ruled out by a more constrained theory. Of course, the notion of restrictiveness

now plays a relatively minor role in theory building. By contrast, in early generative theory

the issue of restricting the generative capacity of the grammar was of central concern, when

the focus was on developing a model that could generate any possible grammar and at the

same time rule out any impossible one.

Even the authors of SPE recognised that the use of bivalent features did not square

easily with the generative ideal. This is clear from the final chapter of SPE, where they

acknowledge an asymmetry between the two values of a feature which cannot be expressed

simply by plus or minus. Their response was to propose a theory of markedness an

independent mechanism for calculating the grammatical significance of different feature

values, these calculations being based on cross-linguistic generalisations about the choice of a

default or unmarked value over its opposite value. According to their proposal, the relative

markedness of [+feature] or [feature] could be determined on the basis of, for example, how

widely a feature value was distributed across languages and the stage of acquisition when a

feature value is first used. However, the elaborate way in which markedness theory was

formulated does little to disguise its true identity as a repair strategy and an admission that


15/80


16/80

16

valency of a feature appears to be an inherent and unpredictable property of that feature

simply an observation about its behaviour in the phonology.

But if the task of identifying the basic units of segmental structure comes down to one

of observing active properties, then it is logical to assume that we can observe only what is

there, not what is absent. This means that if [+ant] and [ant] are both active in the grammar,

they must represent two distinct, equal and independent (albeit complementary) properties

that are both in some sense positive. As such, they are better expressed as a pair of

monovalent features such as [anterior] and [posterior].3Moreover, if the same idea can also

be extended to other cases where polar values are typically used, then it becomes feasible to

dispense with bivalency altogether: each negative feature displaying active phonological

behaviour is replaced with an equivalent monovalent feature, as illustrated by the

hypothetical example [ant][posterior], while redundant negative features are simply

ignored because they are linguistically insignificant. The result is a wholly monovalent

approach to the representation of segmental properties. This is the position taken in Element

Theory. The following sections will show how the notion of element is entirely consistent

with the theoretical conclusions drawn above: units in segmental representation should be

monovalent and should map onto linguistically significant patterns in the speech signal.

2.5 Elements and the grammar

From the way phonological representations are formulated in the standard approach, it is easy

to gain the impression that features occupy a separate and autonomous level of structure. Of

course they do show a direct relation with prosodic structure, by virtue of being associated to

syllabic constituents or to intervening timing units. But they appear to play no role in

determining or even influencing other aspects of the phonology. This is clear from the fact

that features have been transferred from the standard approach to quite different theoretical

models like Optimality Theory (Kager 1999, McCarthy 2002) without the need for any

modification. In the case of elements, however, the same is not true: here I show how the

decision to employ elements in representations goes hand in hand with other decisions about

the shape of the grammar. In 2.3 it was argued that elements should map onto patterns in the

acoustic signal, and in addition, in 2.4 it was claimed that they should be single-valued. Let

us now consider the effects of these two conditions on the phonological model as a whole. It

3

To my knowledge, [posterior] has never been seriously considered as a member of the feature set. However,we do find legitimate cases where the standard approach has recast a single bivalent feature in monovalent

terms: for example, [ATR] may be redefined as [ATR] and [RTR].


17/80


18/80

18

marked or positive property. Although [] does have other phonetic qualities including (in

traditional feature terms) [+high] and [round], Element Theory treats these as unmarked and

phonologically inactive;4 as such, they are not specified in this vowels structure. When a

speaker interprets |I| as [] the result in phonological terms is pure frontnessof F2, since noother elements are present to indicate other marked properties. This is also the reason why [ ]

is interpreted with the default phonetic qualities [+high] and [round]: a [+high] vowel results

from the absence of the open element |A| (see footnote 4), while a [round] vowel is the

phonetic byproduct of there being no round element |U| in the representation of []. The

elements |I A U| are discussed fully in the following section.

The previous paragraph has outlined one of the distinguishing properties of Element

Theory namely, the independent phonetic interpretability of elements. Yet an elementsability to be interpreted in isolation is something which relates not only to segmental structure

but more generally to the organization of the phonology as a whole. If phonological

representations are pronounceable as they stand, then in principle Element Theory needs no

separate level of phonetic representation. In other words, the use of elements implies a

monostratal organisation of the phonology. Once again this marks a significant departure

from the standard approach, which assumes a bi-stratal (or multi-stratal) model in which two

(or more) levels of representation are required because each serves a different function:

(4) underlying representation function: lexical storage

(units: abstract, contrastive)

surface representation function: input to articulation/perception

(units: concrete, phonetic)

The traditional arrangement in (4) presents phonology as a device for creating

phonetic objectsthat is, for taking abstract phonological forms and converting them into

concrete phonetic forms that can serve as the input to external language processes such as

articulation and perception. As Harris (1994) points out, however, this renders phonology a

performance system, its purpose being to generate phonetic representations and check the

4

To capture the height dimension in vowels, Element Theory posits |A| as the marked property. The element|A| loosely equates with the feature [+low], therefore high (i.e. non-low) vowels like [i] make no reference to |A|

in their representations. Section 3 describes the vowel elements in detail.

structure-changing

operations


19/80

19

grammaticality of utterances. In effect, it places phonology outside linguistic competence and

thus outside the confines of the grammar. Yet treating phonology as extra-grammatical

clearly goes against our understanding of what language users know. We assume, for instance,

that linguistic knowledge includes knowledge of certain phonological generalisations like

patterns of alternation and distribution, which are evidently part of linguistic competence

because they exist independently of articulation and/or perception.5

So by assuming a derivational model as in (4), the standard approach gives phonology

a somewhat ambiguous status with respect to its role in the grammar. At best, we might say

that the standard approach allows phonology to straddle both sides of the traditional division

between competence and performance: by capturing a languages structure-changing

operations (i.e. rules or constraints) it relates to competence, whereas by preparing lexical

forms for articulation and/or perception (i.e. derivational output) it relates to performance.

Clearly, however, this situation is at odds with the general assumption that phonology should

be treated as part of the core grammar.

In response, Element Theory avoids this ambiguity by keeping phonology entirely

within the domain of linguistic competence. In an element-based phonology, therefore,

phonological processes do not create phonetic or pronounceable forms; in fact, they have no

direct connection with utterances. Unlike in derivational models, their role is not to take an

abstract representation and convert it into something more physical; rather, they take an

abstract phonological form, such as a stored lexical representation, and impose structural

regularities on it so that it conforms to the grammar of a given language. For example, they

may force contiguous consonants to agree in voicing, or they may cause vowels to shorten in

closed syllables. In other words, phonological processes control grammaticality by generating

the set of grammatical phonological structures of a language. Importantly, however, the

output of such processes will be no less abstract than the input: an element-based process can

only change a phonological object into another phonological object.

Of course, the inability of an element-based phonology to generate phonetic forms is

countered by the phonetic interpretability of elements. As discussed above, it is proposed that

any element expression can be mapped onto its corresponding physical pattern in the speech

signal; moreover, this can take place at any stage of derivation, since lexical representations

and derived representations are assumed to be of the same type. In principle, then, any lexical

5

The traditional bi-stratal model in (4) is also motivated by the supposed advantage of separating idiosyncraticinformation (in lexical storage) from predictable information (in the structure-changing component). As Harris

(1994) points out, however, this position has never been strongly defended in the psycholinguistics literature.


20/80

20

form may be interpreted by a speaker or hearer as it stands. In practice, however, the result is

likely to be an ungrammatical string, because in such cases the phonology has not imposed its

characteristic effects on the grammaticality of the structure in question. So although lexical

forms in Element Theory have much in common with derived forms for example, both

involve abstract phonological representations, both employ the same structural units, and

both can be pronounced as they areit is derived forms which are consistently grammatical

and thus relevant to the process of information exchange via the speech signal.

2.6 Summary

What this discussion has shown is that the Element Theory approach to representation takes a

more abstract view of phonology than we find in the standard approach, in the sense that

phonology itself is seen as being concerned only with abstract or cognitive objects. On the

one hand, the standard approach operates primarily as a performance system, generating

phonetic forms and thereby bridging the divide between the cognitive and the physical. On

the other hand, the element-based approach operates exclusively within the cognitive domain,

providing a system for organising language users knowledge about phonological strings and

about the internal structure of morphemes. So Element Theory incorporates phonology into

the competence grammar as follows:

(5)

component controls determining

syntax sentence structure how words behave in sentences

morphology word structure how morphemes behave in words

phonology morpheme structure how elements behave in morphemes

As a component of the cognitive grammar, phonology in Element Theory has little to

say about raw phonetics. Like other theoretical approaches, it does recognise the role of

phonetic factors such as ease of articulation and/or perception in shaping the phonology; but

unlike most other approaches, it does not see any place for phonetic factors in mental

phonological representations. Similarly, speech production is viewed as a grammar-external

process specifically, as a system for transmitting linguistic information; this effectively

puts articulation on a par with writing, since both of these media function as vehicles for

delivering language but neither actually constitutes the linguistic information itself. After all,


21/80

21

the inability to write does not prevent a person from acquiring a normal grammar, and neither

does the inability to speak.

Taking all these points into consideration, this paper develops a model of segmental

representation which uses monovalent elements as the basic units of phonological structure.

Elements represent the cognitive categories that are responsible for conveying linguistic

information about the structure of morphemes. For the purposes of communication, elements

also connect to the physical world by mapping onto information-bearing patterns that humans

perceive in the speech signal. However, their cognitive function remains primary. This means

that the process of identifying elements should begin with an analysis of phonological

behaviour (e.g. distribution, alternation, natural classes); only after an element has been

identified as a grammatical unit can it be associated with a particular speech signal pattern. In

other words, phonological structure is determined primarily through data analysis, and only

secondarily through listening.


22/80

22

3: Element Theory and the Representation of Vowels

3.1 Introduction

Section 2 considered some of the problems inherent in the standard feature-based approach to

segmental representation. It also claimed that these problems could be overcome by imposing

certain conditions on the way the basic units of segmental structure are formulated. In

particular, it advocated single-valued features which stand for abstract phonological

categories. These features, which I will refer to as elements, are the units which characterize

the lexical shape of morphemes but which also map onto information-bearing acoustic

patterns in the speech signal.Element Theory claims that the segmental properties of all languages are described

using the set of six elements |A I U H N|. These fall naturally into two subgroups |A I U|

and |H N|, the former being associated primarily with vowel structure and the latter with

consonant structure. Admittedly, this split between vocalic and consonantal elements is

something of an oversimplification, since vowel elements do appear in the representation of

consonants, and vice versa. Indeed, as a consequence of abandoning distinctive features, it

becomes possible to play down the importance of the traditional categories vowel andconsonant and instead treat these terms simply as informal labels. So for the sake of

convenience I will continue to refer to vowels and consonants as segment types, but this does

not imply any formal bifurcation in terms of their segmental structure. This paper will focus

on vowel representations and therefore on the role of the elements |A I U|. For a description

of consonant representations and the remaining elements |H N|, see Backley (in prep).

Before discussing the structure of vowels in detail, it is worth making the point that

the set of vowel elements in (6a) is smaller than an equivalent set of features such as (6b):

(6) a. elements for vowels: |A|, |I|, |U|

b. features for vowels: [high], [low], [back], [round], [ATR]

In fact, this difference reflects a more general divergence between the two approaches over

the issue of generative capacity: namely, feature systems tend to over-generate while element

systems tend to under-generate. A single feature usually represents a very specific segmental

(typically articulatory) property, so in order to describe (the articulation of) a segment in full,


23/80

23

the grammar must call upon a sizeable number of different features. For example, Odden

(2005) uses 17 features to describe English consonants and a further 5 features to describe the

vowels. Unfortunately, however, having so many features available opens the door to serious

levels of over-generation, where the set of possible combinations of feature values and

thus, the set of possible segmental contrastsis far larger than that required by the grammar

of any one language. To address this problem, the phonology must restrict combinability in

some way; restrictions have come in the form of feature-geometric relations (see 2.4 above)

or negative constraints such as *[+ATR, +low] (Archangeli & Pulleyblank 1994).6

In contrast to feature theories, which generate too many segmental expressions and

thus have to impose constraints on their output, Element Theory takes the opposite position

of first generating a minimal set of contrasts capable of describing only the simplest and most

common segmental inventories. As (6) shows, this is made possible by recognizing a

relatively small number of basic structural units. Now, with only a small set of elements to

hand, the phonology must have ways of expanding its generative capacity to accommodate

larger and more complex systems of contrast. Yet according to Element Theory this is the

preferred position, claiming that this under-generation approach is more restrictive because it

gives the grammar greater control over the size and shape of segmental systems. So the

function of an element-based grammar is to generate a small set of attested forms rather than

to eliminate a potentially large set of unattested ones. In this way, the set of vowel elements

in (6a) is intentionally smalla fact which reflects the way Element Theory is committed to

addressing the issue of excessive generative capacity that continues to characterize feature-

based models.

3.2 What makes |A I U| special?

For the reasons just outlined, the set of vowel elements should initially be capable of

generating vowel systems that are typologically unmarked that is, structurally simple and

cross-linguistically widespread. Why then should |A|, |I|, and |U| qualify as the most basic

segmental properties in such systems? Crothers (1978) and other vowel typology surveys

confirm that the universally preferred inventory has the following five-vowel arrangement:

6Although the filter *[+ATR, +low] succeeds in capturing a distributional regularity, it is nonetheless arbitrary

in that it fails to explain why this combination is ungrammatical whereas, for example, [+ATR, low] iswidespread. Even illogical combinations such as *[+high, +low] cannot simply be dismissed as ungrammatical

if the features in question really do stand for abstract phonological categories rather than articulatory properties.


24/80

24

(7)

Yet despite the unmarked status of (7), it cannot be assumed that this system of five vowels

corresponds to the presence of five basic phonological properties. For instance, we cannot

automatically treat [ ] as the phonetic instantiation of a corresponding set of elements

such as |A I U E O|. In fact, there are strong arguments to indicate that the mid vowels [ ]

belong to more than one natural class (Harris 1994), which in turn suggests that [] and []

are each represented by more than one element. In other words, the phonological structure of

the mid vowels [ ] is apparently not as basic as that of the corner vowels [ ].

Treating [ ] as the least marked vowels follows naturally from their unique

properties. In describing these properties, let us begin with language typology, and with the

fact that [ ] are cross-linguistically very common, indeed present in almost every known

language. When we examine the smallest attested vowel systems, which usually comprise

only three vowels, we find such systems regularly employing only these corner vowels. The

examples in (8) are from Lass (1984):

(8) [ ] (Tamazight) [ ] (Quechua) [ ] (Moroccan Arabic)

[ ] (Greenlandic) [ ] (Amuesha) [ ] (Gadsup)

A comment is in order about phonetic vowel quality. On the understanding that the

vowel symbols in (8) stand for phonological categories rather than phonetic tokens, we do

expect to find some cross-linguistic variation in the way the same contrastive system is

interpreted phonetically. This applies not only to the systems in (8) but also to 5-vowel

systems. Take Spanish [ ] and Zulu [ ], for example. A comparison of, say,

Spanish [] with Zulu [] would show that these sounds have similar phonological properties

and play the same role in their respective systems. What counts in Element Theory (and in

related theories such as Dependency Phonology) is the behaviour of a sound with respect to

(i) natural classes and (ii) other contrastive sounds in the same system. Phonetic values are

not taken to be the main criterion for identifying melodic representationswhich, of course,


25/80


26/80


27/80


28/80

28

[bk] [+bk]

The arrangement in (10) has an articulatory bias, as it reflects tongue positionspecifically,

the height and degree of backness of the tongue needed to produce different vowel sounds.

However, a vowel square fails to capture the special status of [ ], thereby missing an

important generalization concerning typological markedness. Moreover, if Dispersion Theory

is correct in assuming that languages prefer vowels which are maximally distinct, then from

(10) we can infer that the vowels at each of the four corners of the vowel square are equally

unmarked. Yet this is clearly not the case: the [hi,bk] vowel [] is cross-linguistically less

common than [] ([+hi,bk]) or [] ([+hi,+bk]), for example.

Here I have reviewed some of the reasons for treating [ ] as basic vowels.

Element Theory characterizes the special status of these vowels by equating each with an

element from the set |A I U|, where these elements function as active phonological units in

vowel contrasts and vocalic processes. It should be noted that Element Theory is by no means

the first to recognize the significance of |A I U| as phonological primes. The vowel elements

are pre-dated by theparticlesof Particle Phonology (Schane 1984) and by the componentsof

Dependency Phonology (Anderson & Ewen 1987), both of which can be traced back to

three principal underlying and abstract 'characteristics' involved in vowel formation |u|

'roundness', |i| 'frontness', and |a| 'lowness' first proposed by Anderson & Jones (1974: 16).

What sets Element Theory apart from these other models of vowel representation, however,

is its claim that elements are associated specifically with properties of the speech signal.

Further discussion of the motivation for |A I U| can be found in Rennison (1986).

3.3 |A I U| as simplex expressions

Elements are primarily abstract units of linguistic structure: they determine the lexical shape

of morphemes, and they behave as active properties in phonological processes such as

assimilation and lenition. So we identify individual elements by studying language databy

analyzing sound contrasts, distributional patterns and dynamic phonological changes. But in

addition, elements connect to the physical world through their association with certain

patterns in the acoustic speech signal. Once an element has been identified through its

phonological properties, an analysis of its phonetic characteristics may be carried out in order

to establish its unique acoustic signature. The typological evidence reviewed in 3.2 pointed

to the existence of three vowel elements |A I U|. This section examines the speech signal


29/80

29

patterns represented by these elements; then, to reinforce the status of |A I U| as phonological

primes, it considers their roles in linguistic structures and dynamic phenomena.

Element Theory assumes that language users focus on three specific patterns in the

speech signal when producing or perceiving vowels. These patterns are revealed by analysing

the distribution of energy across the frequency band from zero to around 3kHz the

frequency range which contains the first three formants and which is therefore crucial for

perceiving vowel sounds. The figures in (11) show the signal patterns that speakers and

hearers associate with the three abstract phonological categories |A I U|. Spectrograms of the

corresponding vowel sounds [ ] are given in (12).

(11) Spectral patterns for |I|, |A| and |U|

Figure 1: |I| as a dIp Figure 2: |A| as a mAss Figure 3: |U| as a rUmp

(12) Spectrograms of [ ] showing the first three formants

Figure 4: [] Figure 5: [] Figure 6: []

The pattern for |I| in figure 1 consists of two energy peaks with a characteristic dip in

between. One peak is located at the lower end of the vowel spectrum at around 500Hz (on the

horizontal axis), and the other is at the upper end at approximately 2.5kHz. The peaks

themselves represent bands of energy, typically resulting from the convergence of two

formants; so the same pattern can also be extracted from the spectrogram for [ ] in figure 4.


30/80

30

This figure shows a low F1value for the high vowel [], as indicated by the concentration of

energy in the 0-500Hz range (cf. the leftmost peak in figure 1). This vowel also has a high F2

converging with F3at around 2.5kHz, which creates a concentration of energy at the top of

the spectrum (cf. the rightmost peak in figure 1). The sharp drop in energy in the middle ofthe spectrum, corresponding to the lighter area between 1-2kHz in figure 4, gives |I| its

mnemonic label dIp.7

The signal pattern for the element |A|, on the other hand, has the informal label mAss.

This term describes a mass of energy located in the centre of the spectrum, peaking at around

1kHz. As figure 2 shows, there is a drop in energy on either side of this mass. The same

characteristic mAss pattern is reflected in the spectrograph for [] in figure 5, where the

energy peak results from a high F1value converging with F2in the 1kHz region. Finally, the

speech signal pattern for the element |U| is characterised by a concentration of energy at the

lower end of the spectrum. In figure 3 the energy peaks are contained within the 0-1kHz band,

while across the higher frequency range we observe a steady fall. This falling spectral shape

has been dubbed rUmp. Again, the pattern is visible in the spectrograph for the corresponding

vowel: figure 6 shows how [] involves a lowering of all formants, with F1at around 500Hz

and F2at around 1kHz.

Of course, the formant patterns in figures 4-6 are subject to some inter-speaker (as

well as intra-speaker) variation. Nevertheless, the above samples taken from my own speech

should illustrate the general physical correlates of the phonological categories |A I U| when

each element is interpreted in isolation. In fact, from an Element Theory point of view such

variation is of no linguistic consequence, since the theory defines elements only in terms of

their overall spectral pattern i.e. dIp, mAss and rUmp and not by referring to raw

acoustic data such as precise formant values. In the preceding paragraphs I have used specific

frequency values to describe each spectral pattern in a precise way; but it must be stressed

that numerical data of this kind is for descriptive purposes only it has no formal place in

the Element Theory grammar.8A fuller description of the spectral properties of |A I U| can be

found in Harris & Lindsey (1995).

3.4 |A I U| in compounds

3.4.1 Phonetic evidence for element compounds

7

The labels dIp, mAssand rUmpare taken from Harris (1994: 139).8Not all models of segmental structure take this position. For example, Flemming (2002) proposes that scales of

formant values be incorporated directly into vowel representations.


31/80

31

The definition of elements as speech signal patterns appears to be consistent with the Quantal

Theory explanation for why languages favour triangular vowel systems bounded by |A I U|.

As noted above, Quantal Theory assumes that each corner of the vowel triangle is associated

with a unique and unambiguous acoustic patternwhich is exactly what the vowel elements

represent. The original Quantal Theory descriptions, which refer to patterns of converging

vowel formants, are redefined in (13) in terms of the impressionistic spectral shapes shown in

figures 1-3:

(13)

The summary in (13) shows that each vowel element has a pattern which is not only unique

but also highly distinct, given the small number of variables involved. So the three-way

contrast between [], [] and [] should be easy to recognise, and moreover, difficult to

confuse, just as the quantal approach predicts. However, most languages have vowel systems

containing more than just [ ], which means they must allow elements to combine into

compound expressions. Let us now look at compounding in more detail. We first examine the

effects of compounding on the speech signal, and then consider the phonological properties

of compounds.

It will be recalled from 3.2 that the universally unmarked vowel system consists of

the corner vowels [ ] plus the mid vowels [ ]. It has already been argued that [ ]

have a special status as basic vowels, which is reflected in the way each corresponds to a

primary unit of phonological structurei.e. an element. In contrast, the mid vowels do not

share this status. Instead, the phonological evidence indicates that [ ] are each the result of

combining two elements and interpreting these simultaneously: [] is represented by the

compound |I A| while [] comes from |U A|. Now, assuming that every element is associated

with a spectral pattern, and further assuming that all information relating to element structure

is transmitted via the speech signal, we can expect the speech signal itself to contain complex

spectral patterns when a mid vowel is interpreted. The spectral patterns for mid vowels are

shown in (14) and (15):

|I| |A| |U|

position of peak(s) low + high centre low

position of trough(s) centre low + high centre + high


32/80

32

(14) Spectral pattern for |I A| (versus |I|)

Figure 7: |I A| ([]) versus Figure 8: |I| ([])

The mid vowel [] results from the interpretation of the compound expression |I A|,

with both elements contributing to the overall shape of the composite spectral pattern in

figure 7. In the centre of the spectrum we find the dip between F 1and F2that characterises |I|,

though this is both narrower and shallower than in the pure dIppattern in figure 8 (repeated

from figure 1). The difference introduced in figure 7 is accounted for by the presence of |A|,

which produces an energy mass in the same central region with troughs on either side. In

short, the |I A| compound creates a dIp within a mAss a large central mass of energy

containing a dip inside it.

(15) Spectral pattern for |U A| (versus |U|)

Figure 9: |U A| ([]) versus Figure 10: |U| ([])

The mid vowel [] is the result of interpreting the compound expression |U A|. In

figure 9 the presence of |U| ensures that a concentration of energy is maintained at the lower

end of the spectrum, as we find with the pure rUmppattern in figure 10 (repeated from figure

3). Unlike [], however, where the energy peak is located very near the bottom of the

spectrum, the mid vowel [] shows a concentration of energy somewhat closer to the central


33/80

33

region; as Harris & Lindsey (2000) point out, the energy peak in [] is far enough above the

bottom of the frequency range to constitute a mAss, with troughs above and below (Harris &

Lindsey 2000: 196). So the |U A| compound produces a rUmpwithin a mAssa centralised

mass of energy which falls as the frequency increases.

3.4.2 Phonological evidence for element compounds

So there is phonetic evidence to indicate that mid vowels are complex structures: the spectral

pattern for |I A| (= []) combines mAssand dIp, while the pattern for |U A| (= []) combines

mAssand rUmp. But structural complexity is primarily a phonological property, which means

that support for the existence of element compounds like |I A| and |U A| should come

primarily from phonological evidence. In the case of mid vowels, the evidence focuses on the

way the individual elements in a compound become visible under certain phonological

conditions. In other words, the phonology allows us to see inside complex expressions and

observe their internal composition.

The following examples are, above all, intended to support the existence of element

compounds in the grammar. Additionally, however, they reinforce the status of |A I U| as

phonological primes, since they demonstrate how these elements regularly participate as

active units in various dynamic phenomena. In this section I shall discuss examples of vowel

processes which make reference only to the five vowels [ ] introduced so far. In

general, these processes cause the internal (element) structure of a vowel to be reorganised or

reinterpreted in some way. This is illustrated by processes such as monophthongisation,

diphthongisation and vowel coalescence. Other process types that demonstrate the workings

of element-based representations include vowel harmony and vowel reduction; I shall touch

on these below, after having discussed the structure of element compounds in more detail.

The history of English provides numerous cases of monophthong formation and

diphthong formation. Following Harris (1994: 100), I describe these two processes together,

since one is essentially a reversal of the other. Many dialects of late Middle English had the

diphthongs [](~[]) and [] in the following words (data from Jones 1989):

(16) a. Middle English []/[] b. Middle English []

day [] day law [] law

eight [] eight dauhter [] daughter

vain []vain naught [] not


34/80

34

pay [] pay baul [] ball

During the sixteenth and seventeenth centuries, however, these diphthongs began to develop

the monophthongal realisations [] and [], respectively, which survive in some dialects of

Modern English: for example, British English retains [] in law[] and ball [], while

some regions in northern England also pronounce [] in eight[] andpay[]. Expressed

in |A I U| terms, this monophthongisation process involves a simple reorganisation of the

elements in the original diphthong:

(17) a. [] [] b. [] []

N N N N

x x x x x x x x

|A| |A| |A| |A|

|I| |I| |U| |U|

(17a) shows how the interpretation of the expression |A I| has changed during the

development of the English vowel system. In late Middle English |A| and |I| were interpreted

separately, resulting in a diphthong []. In this case, speakers distributed |A| and |I| across the

two prosodic positions in the nuclear domain. Later, however, language users began to

interpret the same elements simultaneously, thereby producing a mid vowel [].9Segmental

reconfiguration of this kind typically leaves the prosodic structure untouched, so the later

interpretation [] is still tied to a long nucleus. (17b) shows how back diphthongs also

underwent a similar reconfiguration process.

Importantly, monophthong formation comes about as a result of speakers and hearers

adjusting their interpretation of the original diphthong structures. The lexical structures

themselves are unchanged nothing has been added or removed. In the absence of any

representational changes, then, what we see in (17) is the mid vowel interpretations [] of

the compound expressions |A I| and |A U|, respectively. On this basis, it should come as no

surprise that other ways of reinterpreting the same structures have also emerged. For example,

9The compound expression |A I| can be interpreted as either [] or []. Clearly, in languages with a []~[]

contrast these vowels must have distinct representations. This will be discussed below.


35/80

35

Estuary English (South-East England) has since reverted to a diphthong realisation of |A I|:

day[], eight[]. By contrast, in RP and many other dialects we also find a diphthongal

reinterpretation: day[], eight[]. These are illustrated in (18):

(18) a. Estuary English: day[i] b. RP English day[]

N N N N

x x x x x x x x

d |A| d |A| d |A| d |A|

|I| |I| |I| |I|

So, historical and dialectal evidence indicates that mid vowels are represented by

compound element expressions. Further support for the structures |A I| and |A U| comes from

other cases of English dialect variation, and in particular from the simplification (in effect,

monophthongisation) patterns found in various African Englishes. The examples in (19) are

taken from Simo Bobda (2007):

(19) a. [][] diphthong simplification

like[] Sierra Leone, Liberia

finding [] Zambia

primary [] Kenya

tribe [] Uganda

b. [][] diphthong simplification

round[] Kenya

mouth[]~[] West African Pidgin

town[] Liberia

house[] Krio

The process of diphthong simplification in African Englishes seems to be accompanied by

concomitant vowel shortening, as these cases of monophthongisation tend to result in a short


36/80

36

vowel. Nevertheless, as far as their segmental structure is concerned they reinforce the

patterns described in (17), and provide additional evidence for (i) the primary status of the

vowel elements |A I U| as active phonological units, and (ii) the representation of mid vowels

as the compounds |A I| and |A U|.

Looking beyond English, we see further evidence for the mid vowel structures |A I|

and |A U| in languages as diverse as Japanese and Maga Rukai. Kubozono (2001) describes

two processes of monophthong formation in Japanese, one historical and the other synchronic.

Towards the end of the Middle Japanese period, the diphthong [] in Sino-Japanese words

underwent monophthongisation to []:

(20) Middle Japanese monophthongisation

[] [] cherry tree ()

[] [] high(), fidelity ()

[] [] capital (), home town ()

The output forms in (20) are subject to an analysis similar to that shown in (17b) for early

English. Meanwhile, in present-day Tokyo Japanese the reinterpretation process described in

(17a) has become a characteristic of casual speech (Kubozono 2001: 63), with [] beingmonophthongised to []. The diphthong [] is retained in formal speech, however, resulting

in the alternations shown in (21):10

(21) Tokyo Japanese monophthongisation

[]~[] usually

[]~[] siblings

[]~[] painful

In view of the Japanese patterns in (20) and (21), it is clear that analysing [ ] as the

element compounds |A I| and |A U|, respectively, does not just capture mid vowel behaviour

in English; rather, it describes a property of the vowel elements themselves. This point is

reinforced by the fact that similar behaviour is also observed in other, unrelated languages. In

Maga Rukai, an Austronesian language spoken in Taiwan, a synchronic process of vowel

10Hirayama (2003) analyses the Japanese data in (21) using traditional features.


37/80

37

coalescence has created mid vowels that were not present in the proto-language (Hsin 2003).

The nouns in (22a) have the heterosyllabic vowel sequence [][] in the root of the

negative form, which corresponds to [] in the positive. This [] is the result of merging the

phonological properties [] and []. In (22b) we find a parallel alternation between [][]and []:

(22) a. [][] coalescence b. [][] coalescence

negative positive negative positive

bee hemp

bridge tooth

pan excrement

Maga Rukai has a pattern of vowel syncope determined by its iambic foot structure (Hsin

2003: 64). In (22) this is shown as the loss of [] in the root-initial syllable of the positive

form. Yet although the nuclear position itself is suppressed, its segmental content |A| is

retained; this stray element is then interpreted in the adjacent nucleus:

(23) Maga Rukai vowel coalescence: []~[]

N N N N

x x x x x x

c |A| k c |A|k |A|

|I| |I|

So Maga Rukai provides another example of a process which reconfigures a representation in

such a way as to reveal the internal structure of mid vowels. The merger of |A| and |I| in (23)

produces [] in [], while the same analysis also applies to the merger of |A| and |U| to

create [] (e.g. []).

The representations shown here follow the conventions of autosegmental phonology

in having individual elements occupy separate structural levels or tiers; in (23) for instance,

|A| and |I| reside on independent tiers. Although this arrangement is not crucial, it does offer a


38/80


39/80


40/80

40

position. This difference between schwa and other vowels is to be expected, however, if

Element Theory is correct in its claim that vowel properties are mapped onto the acoustic

signal. The presence of |A|, |I| or |U| is associated with a strong, characteristic spectral pattern;

and to produce such a pattern speakers must adopt a distinct, non-neutral vocal tract shape.

On the other hand, the absence of any characteristic spectral pattern, such as we find in [], is

naturally paired with a vocal tract configuration lacking any distinct shape. A uniformly

shaped tube is unable to manipulate formant values in any linguistically meaningful way, and

the phonetic result is schwaa central vowel of a neutral or indistinct quality.

So the spectral shape for [] shows none of the characteristic vocalic patterns dIp,

mAssor rUmp, suggesting that [] has no vowel elements in its representation. The absence

of |A I U| effectively leaves an unspecified or representationally empty vowel. As indicatedabove, the Element Theory literature also considers schwa to be informationally empty

(Harris & Lindsey 2000), in the sense that having no element structure means it contains no

linguistic information. In Element Theory, representational emptiness and informational

emptiness amount to the same thing.

But if schwa has no element structure, how can it be heard and pronounced? Harris &

Lindsey (1995) argue that the spectral pattern in figure 11 may be viewed as a baseline

resonance that exists latently in all vowels. Usually this pattern is not heard, because in the

presence of |A I U| it is overridden by the more marked patterns dIp, mAssand rUmp. In the

case of most vowels, these marked patterns are superimposed onto the baseline resonance and

have the effect of masking it entirely. In the case of schwa, however, which has no elements,

the baseline resonance is exposed. Language users associate this resonance with the central

region of the acoustic spacemore specifically, with the only area of the vowel space not

occupied by |A I U|:

(26) |A I U| areas of the vowel space

|I| |U|

|A|


41/80

41

It has already been noted that any vowel system may contain a neutral vowel, which

can vary phonetically between [] and []12. Now consider the stylised vowel space in (26),

which demonstrates why this phonetic variation is possible, or perhaps even expected. The

absence of |A I U| corresponds to a central area of the vowel space covering a sizeable rangeof different vowel qualities, any of which may be targeted by individual languages as the

interpretation of an unspecified vowel. Importantly, phonetic differences such as [] versus

[] are trivial in most languages,13because these variants refer to the same linguistic object,

namely a phonologically empty vowel. Harris & Lindsey (1995) liken the empty vowel to a

blank canvas a neutral background which becomes hidden when different colours are

painted on to it. And no matter what shade of white or grey the original canvas may be, it is

still interpreted as having no colour as long as it remains empty (i.e. unpainted).

3.5.2 Phonological evidence for empty vowels

It has been stressed that elements should be treated primarily as units of phonological

structure, and that their existence should therefore be supported by evidence from the

phonology. At first sight, however, it seems that a different approach may be needed in the

case of schwa, the empty expression ||, because it contains no elements in its representation

and thus amounts to nothing in phonological terms. In fact this is not the case. Although [ ]

has no segmental content, it is still linked to the prosodic structure specifically, to a

syllable nucleuswhich is clearly within the scope of phonology. If [] is to be viewed as

the interpretation of an empty nucleus, then it should receive a phonological analysis like any

other nucleus. Another reason for treating || as a phonological object is that this empty

expression is often the result of a phonological process that removes element structure (e.g.

from weak syllables). If elements are removed from a vowel expression until nothing remains,

then it becomes possible for the baseline resonance of an empty nucleus to be interpreted.

The following examples from Bulgarian and Turkish illustrate the phonological identity of

empty nuclei.

Like English, Bulgarian (Pettersson & Wood 1987) has a full set of vowel contrasts in

stressed positions but only a reduced set in unstressed positions, as shown in (27). Examples

of these alternating vowels are given in (28) (data from Crosswhite 2004)::

12Other realisations of an unspecified vowel are also possible: e.g. [] in the Jivaro system [ ].

13In 4.4.4 it will be argued that this is not true of English.


42/80

42

(27) Vowel system(s) of Bulgarian

stressed:

unstressed:

(28) Vowel reduction in Bulgarian (data from Crosswhite 2004)

stressed unstressed

[] village [] villages

[] of horn [] horned

[] work [] worker

Bulgarian illustrates a common pattern whereby unstressed syllables support only a

subset of the vowel contrasts that are possible in stressed syllables: [ ] are neutralised to []

in weak syllables, [ ] become [], and [ ] merge as []. Using traditional features it is not

easy to express these vowel reduction effects as a single process: [][] and [][] are

captured by [high][+high], whereas the same feature [high] is irrelevant to [][] as

both are [high]; instead, the change from [] to [] must be described as [+low][low].

Yet it is clear that the alternations in (27) are all motivated by the same conditioning factor

namely, the inability of an unstressed nucleus to support certain vowel properties. Restated in

terms of Element Theory, however, the generalisation becomes formally simple: |A| is not

licensed in unstressed syllables. As such, the element |A| is suppressed in those contexts but

language users still interpret any remaining elements.

(29) a. high vowels are unchanged (|A| not present)

[][] |I||I|

[][] |U||U|

b. mid vowels are raised (|A| suppressed)

[][] |A I||AI|

[][] |A U||AU|

c. central vowels become unspecified (|A| suppressed)

[][] | || |

[][] |A||A|


43/80

43

Bulgarian vowel reduction is a process that targets |A|, and because the high vowels in

(29a) lack |A|, they are unaffected. By contrast, the mid vowel compounds [ ] in (29b) do

contain |A|; this element is interpreted in stressed positions, but is suppressed in weak

positions; the loss of |A| leaves a sole |I| or |U| remaining, which is interpreted as the highvowel [] or [] respectively. Turning to the patterns in (29c), these provide evidence to

support the analysis of [] as an unspecified vowel. As a structurally empty vowel, [] has no

|A| and is thus unaffected by vowel reduction: [][]. On the other hand, [] has |A| in its

representation, this element being interpreted in stressed syllables. But in unstressed positions

[] loses its entire element structure through the |A|-suppression process, leaving behind an

empty nucleus which is interpreted phonetically as baseline resonance: [][]. What (29)

shows is that these vowel reduction effects can be unified as a single process only if the

grammar allows for an unspecified vowel to appear in representations. In the absence of any

positive vowel properties (i.e. elements), this vowel is interpreted as neutral or baseline

resonance, typically [].

The interpretation of phonologically empty nuclei is also observed in Turkish. This

language, like a number of other Altaic systems, has a well-documented process of vowel

harmony in which suffix vowels agree in backness with root vowels. In traditional analyses

the active property is assumed to be the feature [back], whereas in Element Theory it is the

element |I|. Recall that |I| identifies those vowels with a dIp spectral pattern; these have a

relatively high second formant, which places them in the front area of the vowel space. In

Turkish vowel harmony, when a root vowel contains |I| then the same element is also

interpreted in suffixes. For example, the genitive singular suffix in (30a) has a lexically

empty vowel, so the suffix is pronounced []. Under harmony conditions, shown in (30b), it

copies |I| from the root and the suffix vowel is interpreted as []:

(30) |I| harmony in Turkish

Nom. sg. Gen. sg. Nom. pl.

a. girl stalk

b. rope house


44/80

44

The nominative plural suffix also alternates, between its lexical form [] (with a vowel

containing |A|) and its harmonising form [] (with an additional |I|). Example structures are

shown in (31):

(31) a. b. c.

N N N N N N

x x x x x x

k | | z | | n p n |A| v l |A| r

|I| |I| |I| |I|

The forms in (30) present a somewhat simplified picture of the facts relating to vowel

harmony in Turkish.14 Nevertheless, they are consistent with the analysis of []/[] given

above, and with the claim that some grammars allow representations to contain structurally

empty nuclei. But if || really has no element content, then why is it not interpreted as

silence? Having no elements means that | | cannot be mapped on to any linguistically

significant patterns in the acoustic signal; that is, it cannot carry segmental information.

However, || isassociated with a nuclear position, and this nucleus plays an important role inthe formation of prosodic structure. In combination with other nuclei, it contributes to the

construction of higher prosodic domains such as feet and words units which convey

linguistic information deemed essential for speech perception and efficient lexical access

(Cutler & Norris 1988). There is evidence, for example, that listeners pay particular attention

to the beginnings of foot and word domains when processing running speech. So, one

consequence of not interpreting an empty nucleus is to reduce the amount of linguistic

(specifically, prosodic) information being transmitted via the speech signal.

This is not to say that empty nuclei can never be silent. In fact, uninterpreted empty

nuclei are a grammatical possibility in many languages, including English (e.g. []

unclear, where marks a silent nucleus). Importantly, however, their appearance needs to be

controlled in order to avoid the emergence of unmanageable sequences of consonants.

Grammars which allow silent empty nuclei must therefore impose restrictions on their

distribution (Charette 1991, Scheer 2004). But if a nucleus is silent, how can we be sure it is

there at all? English provides an answer to this question by showing how the same nucleus is

14See Charette & Gksel (1996) for a more detailed account.


45/80

45

silent under certain conditions but phonetically interpreted under other conditions. The

following example illustrates the point.

According to one innovative approach to syllable structure, all well-formed lexical

representations end in a nucleus (Kaye 1990). Some languages such as Italian require this

final nucleus to be interpreted, with the result that words must end phonetically in a vowel.

For example, all native Italian words are vowel-final: casa house, case housing, caso

chance (but *cas); additionally, many loanwords in Italian have become vowel-final

through adaptation:gallon(English) gallone(Italian). By contrast, other languages allow a

final empty nucleus to be silent. As a result, they admit words ending phonetically in a

consonant: peach [] (English), schlimm [] bad (German), rhad [] cheap

(Welsh). Following Kaye (1990), the structure of the English word peachis shown in (32a),

where the word-final empty nucleus is licensed to remain silent.

(32) a.peach b. plural c.peaches

O N O N O N O N O N O N

x x x x x x x x

p |I| | | z | | p |I| | | z | |

As an independent lexical structure, the plural suffix in (32b) also has a final empty

nucleus which is not phonetically interpreted; in segmental terms, the plural marker consists

solely of its onset fricative [].15 And when a language user constructs the plural noun

peachesby concatenating the two forms (32a) and (32b), the result is the structure in (32c).

Since resyllabification is not permitted in Kayes model, the plural noun peaches ends up

with two empty nucleione from the stempeach, the other from the suffix. It also contains

the two sibilant consonants [] and [], which are phonetically adjacent and thus create an

unmanageable sequence of the kind mentioned above. Specifically, when these sounds are

adjacent, their similar acoustic properties make them perceptually almost indistinguishable.

Yet the perceptibility of [] and [] and therefore the linguistic information

associated with these segmentscan be recovered by exploiting the lexical structure itself.

By phonetically interpreting the intervening empty nucleus || as a neutral vowel [], as was

15The voicing properties of English obstruents are discussed in Backley (in prep).


46/80

46

observed for Turkish in (31a), important acoustic cues carried by the C-to-V [] transition

and the V-to-C [] transition can be easily perceived; as a result, the linguistic information

carried by [] and [] is transmitted in full. So, without recourse to arbitrary measures such

as the insertion of an epenthetic vowel, we get the formpeaches[]. This analysis of the[] plural departs from the usual textbook explanation in two respects. First, [] is seen here

as a product of the existing representation rather than as a newly introduced addition to the

structure. This is presumably a gain for restrictiveness, in that the distribution of empty nuclei

is strictly controlled by the grammar whereas epenthesis can in principle be applied anywhere.

Second, interpreting || as a neutral vowel has a clear linguistic motivation, since it enhances

the perceptibility and recoverability of linguistic information. By contrast, the traditional

vowel epenthesis account is typically concerned with notions such as ease of articulation

which, following the discussion in 2.3 above, is best seen as non-linguistic in nature.

The behaviour of the English plural suffix provides further evidence for the existence

of empty nuclei in representations. It also shows how linguistic conditions can cause an

empty nucleus to be phonetically interpreted in a language-specific way. One aspect of the

analysis of [] should be clarified, however. I have claimed that || is interpreted as []

backley Englishvowels_2010

Documents

Transcript of backley Englishvowels_2010