backley Englishvowels_2010
Transcript of backley Englishvowels_2010
-
8/11/2019 backley Englishvowels_2010
1/80
1
Element Theory
and the
The Structure of English Vowels
Phillip Backley
Tohoku Gakuin University, Japan
February 2009
-
8/11/2019 backley Englishvowels_2010
2/80
2
Contents
Chapter 1. Background and Introduction
Chapter 2. Representing Segmental Structure2.1 Segments have internal structure
2.2 Articulation versus perception
2.3 Elements as patterns in the speech signal
2.4 Monovalency versus bivalency
2.5 Elements and the grammar
2.6 Summary
Chapter 3. Element Theory and the Representation of Vowels
3.1 Introduction3.2 What makes |A I U| special?3.3 |A I U| as simplex expressions
3.4 |A I U| in compounds
3.4.1 Phonetic evidence for element compounds
3.4.2 Phonological evidence for element compounds
3.5 Central vowels
3.5.1 Phonetic evidence for empty vowels
3.5.2 Phonological evidence for empty vowels
Chapter 4. English Vowel Structure4.1 Introduction
4.2 Front rounding in vowels
4.3 Element dependency
4.4 The representation of English vowels
4.4.1 Introduction
4.4.2 Short vowels
4.4.3 Long monophthongs
4.4.4 Weak vowels
4.4.5 Diphthong structure
4.4.6 |I| diphthongs
4.4.7 |U| diphthongs4.4.8 |A| diphthongs
Chapter 5. Summary
-
8/11/2019 backley Englishvowels_2010
3/80
-
8/11/2019 backley Englishvowels_2010
4/80
4
in terms of grammaticality, for instance. The Optimality view sees grammaticality as
being determined by a once-only evaluation of some lexical input, whereas in standard theory
a grammatical form corresponds to the final stage of a serial derivation process. Yet when it
comes to segmental structure, the two approaches usually converge in the sense that they both
employ distinctive features and they both admit lexical forms comprising linear strings of
segments from which prosodic structure is largely predictable.
Distinctive features are undeniably part of the fabric of mainstream phonology. This
is not an a priori reason to accept their validity as units of linguistic structure, however. In
fact this paper claims that features do not provide the most suitable means of representing the
internal structure of language sounds. Instead, I argue that segmental representations are built
from an alternative set of units called elements, which are mapped onto patterns that humans
perceive in the speech signal. Clearly this departs from the standard view that features are
associated with the articulatory properties of speech production. Below I illustrate the use of
elements in representations by analysing the internal structure of vowels.
The discussion is organised as follows. Section 2 considers some of the problems
associated with distinctive features. In particular, it questions two common assumptions
about the nature of features: their bias towards articulation and their reliance on binary values.
(Readers who are already familiar with these issues and with the thinking behind the Element
Theory approach may skip this section altogether, and proceed to section 3). Then section 3
introduces Element Theory as an alternative way of describing segmental structure. It focuses
on the representation of vowels using the elements |A I U|. Section 4 offers an Element
Theory analysis of the vowel system(s) of English. It shows how an approach based on
phonological elements can shed light on patterns that characterise the shape and behaviour of
vowels in present-day English. Finally, section 5 summarises the main points.
-
8/11/2019 backley Englishvowels_2010
5/80
5
2: Representing Segmental Structure
2.1 Segments have internal structure
There is a long tradition of using segments to describe language sounds. For example,dictionaries provide segmental (i.e. phonemic) information to show the pronunciation of a
word (e.g. segmental //), and linguists refer to inventories of segments when
comparing one language with another, or when discussing the set of contrastive sounds in a
language. Yet there is overwhelming evidence that segments are not the primary units of
sound structure. Rather, by observing how sounds behave in languages we can uncover a set
of more basic sound properties which collectively describe the internal make-up of segments;
and it is this assumption which has driven the study of segmental phonology since the time ofTrubetzkoy.
According to this view, segments with one or more of the same basic properties in
common are expected to show similar phonological behaviour, whereas segments with little
or no shared internal structure should show quite different behaviour. Identifying these basic
sound properties is therefore central to the task of explaining segmental patterns and
groupings. As any introductory course in phonology attempts to show, understanding the
nature of segment-internal properties should reveal why segments regularly cluster together
only in certain combinations and why segments interact in predictable ways as a result of
coming into contact with each other. So, although the term segment continues to serve as a
convenient label for referring to language sounds, segments themselves should not be seen as
having the status they once had as formal units of linguistic structure.
The standard approach views segments as bundles of co-occurring features, where
each feature picks out one aspect of a segments behaviour. This means that one feature alone
cannot define any individual segment; in order to characterise a segment in full we must refer
to its combined feature specification that is, to the sum of its phonological properties.
Nevertheless, single features do have a role in representation systems: each defines an entire
class of segments, where every member of the class shares the same phonological property by
virtue of having the same feature in its representation. With this one property in common,
segments from the same class should, in principle, display similar phonological behaviour
with respect to this property. For example, the feature [+coronal] unites a range of otherwise
disparate sounds including [ ], all of which may follow the vowel [] in English:
the words couch, mouth, owl, blouse, shout, count contain well-formed sequences of []
-
8/11/2019 backley Englishvowels_2010
6/80
6
plus coronal, whereas a segment from any other class is banned from this position (*[],
*[], *[], etc.).
2.2 Articulation versus perceptionBecause every language shows distributional regularities of the kind just described, there is
little reason to doubt that segments have internal structure. What still remains unresolved,
however, is the question of the nature of this internal structure. In particular, what are the
linguistic units which represent the sub-segmental properties of speech sounds? As I have
noted, the standard approach assumes a set of features adapted from those employed in SPE.
From their labels alone (e.g. [high], [voice], [lateral], etc.) it is clear that features can be
traced back to phonetic propertiesprimarily, to properties referring to articulation such as
glottal state and tongue position. When they are used to analyse linguistic patterns in speech,
however, they are also associated with the kinds of phonological properties that describe
segmental contrasts and dynamic processes. So there is an underlying assumption that
phonological phenomena are motivated by phonetics, and more specifically by speech
productionthat is, by articulation.
Yet the association between phonology and articulation is not a necessary one. The
authors of Fundamentals of Language (Jakobson & Halle 1956) argued that phonological
features should be defined in auditory-acoustic terms, and this view had a major influence on
phonological studies until the time of SPE. For instance, they propose the feature pair
[compact]/[diffuse], where these labels reflect the acoustic properties of the sound classes
they represent. Specifically, these features describe how acoustic energy is distributed across
the spectrum. In compact sounds such as low vowels and back consonants it is concentrated
in the central area of the spectrum that is, the energy has a [compact] distribution in this
acoustic region; whereas in diffuse sounds such as high vowels and front consonants it
extends more widely across the spectrum in other words, the energy has a [diffuse]
distribution. The other eight feature pairs proposed in Fundamentals of Language have a
similar acoustic or hearer-oriented characterisation.
The tradition of describing segmental structure in auditory-acoustic terms came to an
abrupt end with the publication of SPE. This was despite the authors of SPE having given
little justification for rejecting auditory-acoustic features or for adopting articulatory features
instead. But such was the influence of SPE on the development of phonological theory that its
preference for articulatory features quickly caught on. And to this day most analyses of
-
8/11/2019 backley Englishvowels_2010
7/80
-
8/11/2019 backley Englishvowels_2010
8/80
8
In short, there seems little support for the assumption that speech sounds should be
represented in terms of articulatory properties. If anything, the arguments point towards
speech perception as being primary and speech production only secondary. This was indeed
the accepted position before SPE, as documented in the work of Sapir and Jakobson. It is also
the position that Element Theory attempts to revive. As just indicated, the acquisition facts
suggest that infant learners begin by perceiving adult input forms; on the basis of these input
forms they build mental representations, which serve as the beginnings of their native
lexicon; and only later do they go on to reproduce these stored forms as spoken language. But
while the former (perception) stage is necessary for successful acquisition, the latter
(production) stage is not, as confirmed by the ability of mutes and those with abnormalities of
the vocal apparatus to acquire a native grammar; evidently, the inability to articulate normally
is not a bar to perceiving speech. Conversely, speech production in the profoundly deaf rarely
develops to a native-like level, presumably because their means of perceiving language lacks
the necessary input from the speech signal.
Having argued that speech perception is more fundamental to the grammar than
speech production, it is natural to assume that segments should be formally described in
terms of their perceptual (i.e. auditory) propertiesthat is, from the hearers point of view.
Recall, however, that this paper is attempting to develop a representation system which
favours neither the speaker nor the hearer, but which instead models the linguistic knowledge
common to both. As suggested above, this means focusing on the speech signal the set of
acoustic events which involves the transmission of sound waves through the air and which
acts as an intermediary between the origin of a sound (the vocal organs of the speaker) and its
target (the auditory system of the hearer). This approach is motivated in Harris & Lindsey
(2000), where it is proposed that the speech signal be understood as a channel through which
speakers transmit and monitor[linguistic] information and listeners receive it (Harris &
Lindsey 2000: 185).
As a physical phenomenon, the speech signal is something that can be measured in
concrete terms. So when an utterance is transmitted between speaker and hearer it is possible
to describe its acoustic properties (e.g. amplitude, formant values). However, it seems that
most of these properties are irrelevant to the grammar, and as such, need not be encoded by
features in phonological representations. Indeed, the extensive literature on segmental
structure gives no indication that raw acoustic data such as formant values or voice onset
measurements have any place in formal phonological theory. A simple parallel can be found
in music: although the notes of a musical phrase can be described by referring to their
-
8/11/2019 backley Englishvowels_2010
9/80
9
physical attributes (e.g. frequency in hertz), a musician does not need precise information of
this kind in order to perceive that phrase, store it in memory, or reproduce it as a melody. Nor
do these physical characteristics need to be written on the page of a musical score. A musical
note is identified not by raw acoustic values, but rather, by its overall acoustic shape and its
relation to other notes in the musical context.
Like musicians, language users do not classify sounds according to their acoustic
properties. It is true that phoneticians may use phonetic data such as formant frequency to
describe the sounds of a language, or to compare different languages; importantly, however,
these data do not constitute linguistic information, and as such, do not identify segmental
features. But if the speech signal is the medium by which language is transferred between
speaker and hearer, then which aspects of the signal arerelevant to the grammar and to the
communication process? The claim made by Element Theory is that humans perceive specific
information-bearing patterns in the speech signal, and that each pattern is represented by an
element, where an element is taken to be the smallest unit of segmental structure present in
mental representations. This is the position motivated in Harris & Lindsey (2000) and
summarised in Nasukawa & Backley (2008).
2.3 Elements as patterns in the speech signal
The Element-based approach assumes that hearers instinctively seek out linguistic
information: when decoding speech, they ignore most of the incoming acoustic stream and
focus only on the specifically linguistic information contained within the speech signal. Thus
Element Theory recognizes the human ability to extract from running speech only those
acoustic patterns that are relevant to language. And, as just mentioned, it further assumes that
the mental phonological categories represented by elements are mapped directly on to those
same acoustic patterns. So although elements are associated with certain physical patterns in
the speech signal, they exist primarily as mental constructsthat is, as units of phonological
structurein the internalized grammar. In order to highlight the way the term element can
refer to both the physical and the mental, Harris & Lindsey (2000) describe elements as
auditory images. This label suggests that an element is primarily a grammar-internal object
a mental image of some linguistically significant information, but that it is also a
grammar-external objecta physical pattern in the speech signal which hearers use to cue
that mental image. The defining characteristics of these speech signal patterns are described
in section 3 below.
-
8/11/2019 backley Englishvowels_2010
10/80
10
So far, the discussion has given only a hearer-oriented view of elements, in which
hearers perceive the speech signal, recover information-bearing patterns from it, and then
associate those patterns with particular elements in phonological structure. But the speech
signal is a neutral medium, and must therefore carry linguistic information which is also
relevant to speakers. In the case of speakers, the same information-bearing patterns function
not as perceptual cues but as production (i.e. articulation) targets. It must be assumed that a
speakers internalized grammar includes knowledge of the mapping between elements in
lexical representation and their associated acoustic patterns in the speech signal. So in order
to phonetically interpret a word, speakers must access the lexical form of that word, associate
the elements it contains with their corresponding speech signal patterns, and use the vocal
organs to reproduce those target acoustic patterns in an utterance.
Importantly, this process of reproducing an acoustic target succeeds without the need
for an element to contain information about speech production. For the grammar to specify
any mapping between elements and articulation would be at best unnecessary, and at worst
counter-productive, since there is not always a one-to-one correspondence between the shape
of the vocal tract and the resulting sound. Consider a trained ventriloquist, for example, who
can reproduce the speech signal pattern associated with bilabial stops but without using
conventional lip closure. Even untrained speakers typically have available to them a choice of
different articulatory configurations for creating the same acoustic result. For example, to
bring about a general downward shift in vowel formant values creating a flattening of
the sound spectrum (Jakobson, Fant & Halle 1952: 31)speakers may employ lip rounding,
or a contraction of the pharynx, or a combination of the two. 1 In sum, an element in
phonological representation establishes which signal pattern a speaker must aim for, but it
does not prescribe what the speaker must do to reach the target. A suitable articulation is
something that speakers master only through being experienced users of their native language.
Before returning to the issue of distinctive features, let us review the way some basic
phonological concepts should be (re)defined in light of the preceding discussion on the nature
of Element Theory. First, the elements themselves are to be seen as acoustic images
primarily as cognitive objects which are present in lexical representations and which serve to
encode contrasts and alternations. However, elements also connect to the external world by
having a direct physical interpretation they are mapped onto certain acoustic patterns in
the speech signal which carry linguistic information. Thus a phonological representation may
1For further examples, see Harris & Urua (2001: 79).
-
8/11/2019 backley Englishvowels_2010
11/80
11
be thought of as a code which allows language users to store and identify these mental
acoustic patterns.
In contrast, speech production is an aspect of language use which is not controlled by
the grammar. Tongue position, glottal state, lip attitude and the like do not constitute
linguistic information; rather, they provide a way of delivering the speech signal. So
articulation serves as a vehicle for carrying the linguistic message, but it does not constitute
the message itself. To reinforce this point, we need only consider the communication process:
when a hearer perceives information-bearing patterns in the speech signal, each pattern acts
consistently and reliably as a cue to its associated elementit makes no difference whether
the signal originates from the articulation of an actual utterance, or from the recording of an
actual utterance, or from a synthesized, unarticulated voice on a computer. In each case the
linguistic message is the same, regardless of whether the vocal organs are involved or not,
since articulation is not a component of the mental grammar.
In conclusion, there is little evidence to support the prevailing view that the basic
units of segmental structure are defined in articulatory terms. For this reason, section 3 will
argue for an alternative view of phonological representations in which features or elements
are mapped onto certain patterns in the speech signal. Although these patterns can be
characterized by their acoustic properties, they are to be understood primarily as cognitive
units which carry linguistic information about the identity of morphemes.
2.4 Monovalency versus bivalency
Before going on to introduce the elements in detail, this section addresses another issue
concerning the use of distinctive features: should features (or elements) in representations be
monovalent (single-valued) or bivalent (binary-valued)? The standard model follows a
tradition of employing bivalent features, meaning that the grammar marks the presence of a
phonological property by specifying a positive feature value, while the absence of that
property is shown by the corresponding negative value. For example, l-sounds are specified
as [+lateral] while all other sounds are [lateral]; this creates an equipollent distinction
between lateral and non-lateral, according to which [+lateral] and [lateral] appear to have
equal status because the grammar is able to refer to either category. But alongside bivalent
features such as [lateral] we also find a number of monovalent features being used in some
versions of the standard model (Steriade 1995). Unlike [lateral], a monovalent feature such
as [round] can only refer to the presence of a given property, not to its absence. This creates a
-
8/11/2019 backley Englishvowels_2010
12/80
12
privative distinction between the opposing categories, because only a single value of the
feature can be expressed in representational terms.
(1)
[] vs. [] [] vs. []
a. bivalency [+round] vs. [round] [lateral] vs. [+lateral]
b. monovalency [round] vs. vs. [lateral]
As (1) shows, there are two ways of referring to the same phonological contrast,
because there are two ways of expressing the absence of a certain property. For example, to
describe a back unrounded vowel such as [] we can either use [round] (i.e. the negativevalue of the bivalent feature [round]) or we can choose to make no reference to rounding, as
indicated in (1) by (i.e. the monovalent feature [round] is absent from the segment s
representation). At first sight, the difference between [round] and seems trivial, because
the same contrast can be expressed in both systems. However, several authors including
Durand (1995), Kaye (1989), Harris (1994) and Roca (1994) have noted that the choice
between bivalency and monovalency affects our predictions about how language sounds are
grouped into natural classes and how they participate in phonological processes. That is, thetwo systems make different grammatical statements.
To illustrate this point, consider the representation of nasal vowels such as [] and [].
These belong to a natural class a non-random group whose members all share some
physical characteristic (nasal resonance) and, more importantly, some pattern of phonological
behaviour (e.g. vowel lowering, trigger of nasal harmony). It is assumed that these shared
physical and phonological characteristics are an indication that the same structural property
in this case, nasality is specified in the representation of each member of the natural
class. In other words, the common structural property defines the natural class. Furthermore,
most theories of segmental structure assume that this class-defining property corresponds to a
basic, indivisible unit in phonological representation, typically a feature or an element. In this
example the basic property is nasality, so it follows that every nasal vowel must have a
nasality feature/element in its segmental make-up.
The Amerindian language Warao (Osborn 1966) illustrates how monovalency and
bivalency make different grammatical predictions (data from Botma 2005):
-
8/11/2019 backley Englishvowels_2010
13/80
13
(2) a. sun c. summerb. walking d. kind of tree
As (2a-b) show, this language has a lexical contrast between oral and nasal vowels. So in a
monovalent system of representation, the feature [nasal] appears in the structure of [] in (2b),
while [] in (2a) makes no reference to [nasal] and is therefore interpreted as an oral vowel.
Alternatively, under bivalency [] is specified as [+nasal] while [] has [nasal]. (2c-d) show
that Warao also has a process of nasal harmony, where the presence of a nasal trigger (a nasal
vowel or nasal consonant) causes all target sounds (vowels, laryngeals, glides) to its right to
be nasalised within the word domain. Any harmonic trigger in Warao is characterised as a
segment with [nasal]/[+nasal] in its lexical representation, where this feature defines a natural
class of nasals all united by similar (harmonic) behaviour.
As expected, oral vowels do not act as harmonic triggers in this language, because
they have no [nasal]/[+nasal] specification. Moreover, they do not constitute a natural class
because they display no unified active behaviour.2 Importantly, the fact that [nasal] (i.e.
oral) vowels collectively do not do something provides no justification for grouping them
together as a natural class. Yet this is exactly what the bivalent feature system does. Allowing
[nasal] to appear in representations gives it a grammatical status equal to that of [+nasal],
making it possible for the phonology to refer to [nasal] as well as [+nasal] as an active
property in some phonological process. However, the evidence does not support this position:
for example, we find no comparable process of oral harmony in which [nasal] acts as a
harmonic trigger and oralises nasal vowels. In short, it is difficult to motivate the bivalency
prediction that [nasal] and [+nasal] both exist as basic structural properties, and hence, as
two separate natural classes.
It seems, then, that the problem with bivalent features arises from their ability to refer
to negative categories that is, to properties which are absentfrom a segments structure.To reinforce this point, consider other negative features besides [nasal] that characterise oral
vowels under a bivalent feature system. Oral vowels are all non-lateral, for example. But the
feature [lateral] does not define a natural class either, because it identifies a whole range of
sound classes besides oral vowels (e.g. obstruents, nasal stops, rhotics) which cannot be
unified by the presence of even a single common property. Compare this with a true natural
class such as [+nasal], whose members comprise nasal vowels and consonants; all and only
2Note that [nasal] fails to capture the class of segments targeted by nasal harmony in Warao, because this set
includes some non-nasals (e.g. glides) but excludes other non-nasals (e.g. obstruents).
-
8/11/2019 backley Englishvowels_2010
14/80
14
these sounds act as harmonic triggers in Warao because only these sounds possess the active,
class-defining feature [+nasal].
By contrast, the use of monovalent features makes it possible for the segmental
structure itself to show that nasal vowels form a grammatical set whereas oral vowels do not.
[nasal] identifies the nasal vowels as a natural class, while the lack of any equivalent feature
specification for oral vowels indicates that they have no common behaviour; furthermore, it
prevents the grammar from referring to them as a unified set. In more general terms, the
monovalent feature [nasal] groups together nasal vowels and consonants as a natural class, as
evidenced by Warao nasal harmony, whereas the arbitrary set of non-nasal segments (oral
vowels and all non-nasal consonants) displays no common properties and consequently has
no feature specification to indicate natural class status.
The conclusion to be drawn from this comparison between monovalent and bivalent
features is that bivalency makes for an altogether less restrictive system. Since bivalency
forces representations to specify either the presence or the absence of a given property, the
number and nature of specifiedand therefore potentially activephonological properties
exceeds what is actually observed in natural languages. In other words, it predicts the
possibility of many phonological processes and therefore many grammars that would
presumably be ruled out by a more constrained theory. Of course, the notion of restrictiveness
now plays a relatively minor role in theory building. By contrast, in early generative theory
the issue of restricting the generative capacity of the grammar was of central concern, when
the focus was on developing a model that could generate any possible grammar and at the
same time rule out any impossible one.
Even the authors of SPE recognised that the use of bivalent features did not square
easily with the generative ideal. This is clear from the final chapter of SPE, where they
acknowledge an asymmetry between the two values of a feature which cannot be expressed
simply by plus or minus. Their response was to propose a theory of markedness an
independent mechanism for calculating the grammatical significance of different feature
values, these calculations being based on cross-linguistic generalisations about the choice of a
default or unmarked value over its opposite value. According to their proposal, the relative
markedness of [+feature] or [feature] could be determined on the basis of, for example, how
widely a feature value was distributed across languages and the stage of acquisition when a
feature value is first used. However, the elaborate way in which markedness theory was
formulated does little to disguise its true identity as a repair strategy and an admission that
-
8/11/2019 backley Englishvowels_2010
15/80
-
8/11/2019 backley Englishvowels_2010
16/80
16
valency of a feature appears to be an inherent and unpredictable property of that feature
simply an observation about its behaviour in the phonology.
But if the task of identifying the basic units of segmental structure comes down to one
of observing active properties, then it is logical to assume that we can observe only what is
there, not what is absent. This means that if [+ant] and [ant] are both active in the grammar,
they must represent two distinct, equal and independent (albeit complementary) properties
that are both in some sense positive. As such, they are better expressed as a pair of
monovalent features such as [anterior] and [posterior].3Moreover, if the same idea can also
be extended to other cases where polar values are typically used, then it becomes feasible to
dispense with bivalency altogether: each negative feature displaying active phonological
behaviour is replaced with an equivalent monovalent feature, as illustrated by the
hypothetical example [ant][posterior], while redundant negative features are simply
ignored because they are linguistically insignificant. The result is a wholly monovalent
approach to the representation of segmental properties. This is the position taken in Element
Theory. The following sections will show how the notion of element is entirely consistent
with the theoretical conclusions drawn above: units in segmental representation should be
monovalent and should map onto linguistically significant patterns in the speech signal.
2.5 Elements and the grammar
From the way phonological representations are formulated in the standard approach, it is easy
to gain the impression that features occupy a separate and autonomous level of structure. Of
course they do show a direct relation with prosodic structure, by virtue of being associated to
syllabic constituents or to intervening timing units. But they appear to play no role in
determining or even influencing other aspects of the phonology. This is clear from the fact
that features have been transferred from the standard approach to quite different theoretical
models like Optimality Theory (Kager 1999, McCarthy 2002) without the need for any
modification. In the case of elements, however, the same is not true: here I show how the
decision to employ elements in representations goes hand in hand with other decisions about
the shape of the grammar. In 2.3 it was argued that elements should map onto patterns in the
acoustic signal, and in addition, in 2.4 it was claimed that they should be single-valued. Let
us now consider the effects of these two conditions on the phonological model as a whole. It
3
To my knowledge, [posterior] has never been seriously considered as a member of the feature set. However,we do find legitimate cases where the standard approach has recast a single bivalent feature in monovalent
terms: for example, [ATR] may be redefined as [ATR] and [RTR].
-
8/11/2019 backley Englishvowels_2010
17/80
-
8/11/2019 backley Englishvowels_2010
18/80
18
marked or positive property. Although [] does have other phonetic qualities including (in
traditional feature terms) [+high] and [round], Element Theory treats these as unmarked and
phonologically inactive;4 as such, they are not specified in this vowels structure. When a
speaker interprets |I| as [] the result in phonological terms is pure frontnessof F2, since noother elements are present to indicate other marked properties. This is also the reason why [ ]
is interpreted with the default phonetic qualities [+high] and [round]: a [+high] vowel results
from the absence of the open element |A| (see footnote 4), while a [round] vowel is the
phonetic byproduct of there being no round element |U| in the representation of []. The
elements |I A U| are discussed fully in the following section.
The previous paragraph has outlined one of the distinguishing properties of Element
Theory namely, the independent phonetic interpretability of elements. Yet an elementsability to be interpreted in isolation is something which relates not only to segmental structure
but more generally to the organization of the phonology as a whole. If phonological
representations are pronounceable as they stand, then in principle Element Theory needs no
separate level of phonetic representation. In other words, the use of elements implies a
monostratal organisation of the phonology. Once again this marks a significant departure
from the standard approach, which assumes a bi-stratal (or multi-stratal) model in which two
(or more) levels of representation are required because each serves a different function:
(4) underlying representation function: lexical storage
(units: abstract, contrastive)
surface representation function: input to articulation/perception
(units: concrete, phonetic)
The traditional arrangement in (4) presents phonology as a device for creating
phonetic objectsthat is, for taking abstract phonological forms and converting them into
concrete phonetic forms that can serve as the input to external language processes such as
articulation and perception. As Harris (1994) points out, however, this renders phonology a
performance system, its purpose being to generate phonetic representations and check the
4
To capture the height dimension in vowels, Element Theory posits |A| as the marked property. The element|A| loosely equates with the feature [+low], therefore high (i.e. non-low) vowels like [i] make no reference to |A|
in their representations. Section 3 describes the vowel elements in detail.
structure-changing
operations
-
8/11/2019 backley Englishvowels_2010
19/80
19
grammaticality of utterances. In effect, it places phonology outside linguistic competence and
thus outside the confines of the grammar. Yet treating phonology as extra-grammatical
clearly goes against our understanding of what language users know. We assume, for instance,
that linguistic knowledge includes knowledge of certain phonological generalisations like
patterns of alternation and distribution, which are evidently part of linguistic competence
because they exist independently of articulation and/or perception.5
So by assuming a derivational model as in (4), the standard approach gives phonology
a somewhat ambiguous status with respect to its role in the grammar. At best, we might say
that the standard approach allows phonology to straddle both sides of the traditional division
between competence and performance: by capturing a languages structure-changing
operations (i.e. rules or constraints) it relates to competence, whereas by preparing lexical
forms for articulation and/or perception (i.e. derivational output) it relates to performance.
Clearly, however, this situation is at odds with the general assumption that phonology should
be treated as part of the core grammar.
In response, Element Theory avoids this ambiguity by keeping phonology entirely
within the domain of linguistic competence. In an element-based phonology, therefore,
phonological processes do not create phonetic or pronounceable forms; in fact, they have no
direct connection with utterances. Unlike in derivational models, their role is not to take an
abstract representation and convert it into something more physical; rather, they take an
abstract phonological form, such as a stored lexical representation, and impose structural
regularities on it so that it conforms to the grammar of a given language. For example, they
may force contiguous consonants to agree in voicing, or they may cause vowels to shorten in
closed syllables. In other words, phonological processes control grammaticality by generating
the set of grammatical phonological structures of a language. Importantly, however, the
output of such processes will be no less abstract than the input: an element-based process can
only change a phonological object into another phonological object.
Of course, the inability of an element-based phonology to generate phonetic forms is
countered by the phonetic interpretability of elements. As discussed above, it is proposed that
any element expression can be mapped onto its corresponding physical pattern in the speech
signal; moreover, this can take place at any stage of derivation, since lexical representations
and derived representations are assumed to be of the same type. In principle, then, any lexical
5
The traditional bi-stratal model in (4) is also motivated by the supposed advantage of separating idiosyncraticinformation (in lexical storage) from predictable information (in the structure-changing component). As Harris
(1994) points out, however, this position has never been strongly defended in the psycholinguistics literature.
-
8/11/2019 backley Englishvowels_2010
20/80
20
form may be interpreted by a speaker or hearer as it stands. In practice, however, the result is
likely to be an ungrammatical string, because in such cases the phonology has not imposed its
characteristic effects on the grammaticality of the structure in question. So although lexical
forms in Element Theory have much in common with derived forms for example, both
involve abstract phonological representations, both employ the same structural units, and
both can be pronounced as they areit is derived forms which are consistently grammatical
and thus relevant to the process of information exchange via the speech signal.
2.6 Summary
What this discussion has shown is that the Element Theory approach to representation takes a
more abstract view of phonology than we find in the standard approach, in the sense that
phonology itself is seen as being concerned only with abstract or cognitive objects. On the
one hand, the standard approach operates primarily as a performance system, generating
phonetic forms and thereby bridging the divide between the cognitive and the physical. On
the other hand, the element-based approach operates exclusively within the cognitive domain,
providing a system for organising language users knowledge about phonological strings and
about the internal structure of morphemes. So Element Theory incorporates phonology into
the competence grammar as follows:
(5)
component controls determining
syntax sentence structure how words behave in sentences
morphology word structure how morphemes behave in words
phonology morpheme structure how elements behave in morphemes
As a component of the cognitive grammar, phonology in Element Theory has little to
say about raw phonetics. Like other theoretical approaches, it does recognise the role of
phonetic factors such as ease of articulation and/or perception in shaping the phonology; but
unlike most other approaches, it does not see any place for phonetic factors in mental
phonological representations. Similarly, speech production is viewed as a grammar-external
process specifically, as a system for transmitting linguistic information; this effectively
puts articulation on a par with writing, since both of these media function as vehicles for
delivering language but neither actually constitutes the linguistic information itself. After all,
-
8/11/2019 backley Englishvowels_2010
21/80
21
the inability to write does not prevent a person from acquiring a normal grammar, and neither
does the inability to speak.
Taking all these points into consideration, this paper develops a model of segmental
representation which uses monovalent elements as the basic units of phonological structure.
Elements represent the cognitive categories that are responsible for conveying linguistic
information about the structure of morphemes. For the purposes of communication, elements
also connect to the physical world by mapping onto information-bearing patterns that humans
perceive in the speech signal. However, their cognitive function remains primary. This means
that the process of identifying elements should begin with an analysis of phonological
behaviour (e.g. distribution, alternation, natural classes); only after an element has been
identified as a grammatical unit can it be associated with a particular speech signal pattern. In
other words, phonological structure is determined primarily through data analysis, and only
secondarily through listening.
-
8/11/2019 backley Englishvowels_2010
22/80
22
3: Element Theory and the Representation of Vowels
3.1 Introduction
Section 2 considered some of the problems inherent in the standard feature-based approach to
segmental representation. It also claimed that these problems could be overcome by imposing
certain conditions on the way the basic units of segmental structure are formulated. In
particular, it advocated single-valued features which stand for abstract phonological
categories. These features, which I will refer to as elements, are the units which characterize
the lexical shape of morphemes but which also map onto information-bearing acoustic
patterns in the speech signal.Element Theory claims that the segmental properties of all languages are described
using the set of six elements |A I U H N|. These fall naturally into two subgroups |A I U|
and |H N|, the former being associated primarily with vowel structure and the latter with
consonant structure. Admittedly, this split between vocalic and consonantal elements is
something of an oversimplification, since vowel elements do appear in the representation of
consonants, and vice versa. Indeed, as a consequence of abandoning distinctive features, it
becomes possible to play down the importance of the traditional categories vowel andconsonant and instead treat these terms simply as informal labels. So for the sake of
convenience I will continue to refer to vowels and consonants as segment types, but this does
not imply any formal bifurcation in terms of their segmental structure. This paper will focus
on vowel representations and therefore on the role of the elements |A I U|. For a description
of consonant representations and the remaining elements |H N|, see Backley (in prep).
Before discussing the structure of vowels in detail, it is worth making the point that
the set of vowel elements in (6a) is smaller than an equivalent set of features such as (6b):
(6) a. elements for vowels: |A|, |I|, |U|
b. features for vowels: [high], [low], [back], [round], [ATR]
In fact, this difference reflects a more general divergence between the two approaches over
the issue of generative capacity: namely, feature systems tend to over-generate while element
systems tend to under-generate. A single feature usually represents a very specific segmental
(typically articulatory) property, so in order to describe (the articulation of) a segment in full,
-
8/11/2019 backley Englishvowels_2010
23/80
23
the grammar must call upon a sizeable number of different features. For example, Odden
(2005) uses 17 features to describe English consonants and a further 5 features to describe the
vowels. Unfortunately, however, having so many features available opens the door to serious
levels of over-generation, where the set of possible combinations of feature values and
thus, the set of possible segmental contrastsis far larger than that required by the grammar
of any one language. To address this problem, the phonology must restrict combinability in
some way; restrictions have come in the form of feature-geometric relations (see 2.4 above)
or negative constraints such as *[+ATR, +low] (Archangeli & Pulleyblank 1994).6
In contrast to feature theories, which generate too many segmental expressions and
thus have to impose constraints on their output, Element Theory takes the opposite position
of first generating a minimal set of contrasts capable of describing only the simplest and most
common segmental inventories. As (6) shows, this is made possible by recognizing a
relatively small number of basic structural units. Now, with only a small set of elements to
hand, the phonology must have ways of expanding its generative capacity to accommodate
larger and more complex systems of contrast. Yet according to Element Theory this is the
preferred position, claiming that this under-generation approach is more restrictive because it
gives the grammar greater control over the size and shape of segmental systems. So the
function of an element-based grammar is to generate a small set of attested forms rather than
to eliminate a potentially large set of unattested ones. In this way, the set of vowel elements
in (6a) is intentionally smalla fact which reflects the way Element Theory is committed to
addressing the issue of excessive generative capacity that continues to characterize feature-
based models.
3.2 What makes |A I U| special?
For the reasons just outlined, the set of vowel elements should initially be capable of
generating vowel systems that are typologically unmarked that is, structurally simple and
cross-linguistically widespread. Why then should |A|, |I|, and |U| qualify as the most basic
segmental properties in such systems? Crothers (1978) and other vowel typology surveys
confirm that the universally preferred inventory has the following five-vowel arrangement:
6Although the filter *[+ATR, +low] succeeds in capturing a distributional regularity, it is nonetheless arbitrary
in that it fails to explain why this combination is ungrammatical whereas, for example, [+ATR, low] iswidespread. Even illogical combinations such as *[+high, +low] cannot simply be dismissed as ungrammatical
if the features in question really do stand for abstract phonological categories rather than articulatory properties.
-
8/11/2019 backley Englishvowels_2010
24/80
24
(7)
Yet despite the unmarked status of (7), it cannot be assumed that this system of five vowels
corresponds to the presence of five basic phonological properties. For instance, we cannot
automatically treat [ ] as the phonetic instantiation of a corresponding set of elements
such as |A I U E O|. In fact, there are strong arguments to indicate that the mid vowels [ ]
belong to more than one natural class (Harris 1994), which in turn suggests that [] and []
are each represented by more than one element. In other words, the phonological structure of
the mid vowels [ ] is apparently not as basic as that of the corner vowels [ ].
Treating [ ] as the least marked vowels follows naturally from their unique
properties. In describing these properties, let us begin with language typology, and with the
fact that [ ] are cross-linguistically very common, indeed present in almost every known
language. When we examine the smallest attested vowel systems, which usually comprise
only three vowels, we find such systems regularly employing only these corner vowels. The
examples in (8) are from Lass (1984):
(8) [ ] (Tamazight) [ ] (Quechua) [ ] (Moroccan Arabic)
[ ] (Greenlandic) [ ] (Amuesha) [ ] (Gadsup)
A comment is in order about phonetic vowel quality. On the understanding that the
vowel symbols in (8) stand for phonological categories rather than phonetic tokens, we do
expect to find some cross-linguistic variation in the way the same contrastive system is
interpreted phonetically. This applies not only to the systems in (8) but also to 5-vowel
systems. Take Spanish [ ] and Zulu [ ], for example. A comparison of, say,
Spanish [] with Zulu [] would show that these sounds have similar phonological properties
and play the same role in their respective systems. What counts in Element Theory (and in
related theories such as Dependency Phonology) is the behaviour of a sound with respect to
(i) natural classes and (ii) other contrastive sounds in the same system. Phonetic values are
not taken to be the main criterion for identifying melodic representationswhich, of course,
-
8/11/2019 backley Englishvowels_2010
25/80
-
8/11/2019 backley Englishvowels_2010
26/80
-
8/11/2019 backley Englishvowels_2010
27/80
-
8/11/2019 backley Englishvowels_2010
28/80
28
[bk] [+bk]
The arrangement in (10) has an articulatory bias, as it reflects tongue positionspecifically,
the height and degree of backness of the tongue needed to produce different vowel sounds.
However, a vowel square fails to capture the special status of [ ], thereby missing an
important generalization concerning typological markedness. Moreover, if Dispersion Theory
is correct in assuming that languages prefer vowels which are maximally distinct, then from
(10) we can infer that the vowels at each of the four corners of the vowel square are equally
unmarked. Yet this is clearly not the case: the [hi,bk] vowel [] is cross-linguistically less
common than [] ([+hi,bk]) or [] ([+hi,+bk]), for example.
Here I have reviewed some of the reasons for treating [ ] as basic vowels.
Element Theory characterizes the special status of these vowels by equating each with an
element from the set |A I U|, where these elements function as active phonological units in
vowel contrasts and vocalic processes. It should be noted that Element Theory is by no means
the first to recognize the significance of |A I U| as phonological primes. The vowel elements
are pre-dated by theparticlesof Particle Phonology (Schane 1984) and by the componentsof
Dependency Phonology (Anderson & Ewen 1987), both of which can be traced back to
three principal underlying and abstract 'characteristics' involved in vowel formation |u|
'roundness', |i| 'frontness', and |a| 'lowness' first proposed by Anderson & Jones (1974: 16).
What sets Element Theory apart from these other models of vowel representation, however,
is its claim that elements are associated specifically with properties of the speech signal.
Further discussion of the motivation for |A I U| can be found in Rennison (1986).
3.3 |A I U| as simplex expressions
Elements are primarily abstract units of linguistic structure: they determine the lexical shape
of morphemes, and they behave as active properties in phonological processes such as
assimilation and lenition. So we identify individual elements by studying language databy
analyzing sound contrasts, distributional patterns and dynamic phonological changes. But in
addition, elements connect to the physical world through their association with certain
patterns in the acoustic speech signal. Once an element has been identified through its
phonological properties, an analysis of its phonetic characteristics may be carried out in order
to establish its unique acoustic signature. The typological evidence reviewed in 3.2 pointed
to the existence of three vowel elements |A I U|. This section examines the speech signal
-
8/11/2019 backley Englishvowels_2010
29/80
29
patterns represented by these elements; then, to reinforce the status of |A I U| as phonological
primes, it considers their roles in linguistic structures and dynamic phenomena.
Element Theory assumes that language users focus on three specific patterns in the
speech signal when producing or perceiving vowels. These patterns are revealed by analysing
the distribution of energy across the frequency band from zero to around 3kHz the
frequency range which contains the first three formants and which is therefore crucial for
perceiving vowel sounds. The figures in (11) show the signal patterns that speakers and
hearers associate with the three abstract phonological categories |A I U|. Spectrograms of the
corresponding vowel sounds [ ] are given in (12).
(11) Spectral patterns for |I|, |A| and |U|
Figure 1: |I| as a dIp Figure 2: |A| as a mAss Figure 3: |U| as a rUmp
(12) Spectrograms of [ ] showing the first three formants
Figure 4: [] Figure 5: [] Figure 6: []
The pattern for |I| in figure 1 consists of two energy peaks with a characteristic dip in
between. One peak is located at the lower end of the vowel spectrum at around 500Hz (on the
horizontal axis), and the other is at the upper end at approximately 2.5kHz. The peaks
themselves represent bands of energy, typically resulting from the convergence of two
formants; so the same pattern can also be extracted from the spectrogram for [ ] in figure 4.
-
8/11/2019 backley Englishvowels_2010
30/80
30
This figure shows a low F1value for the high vowel [], as indicated by the concentration of
energy in the 0-500Hz range (cf. the leftmost peak in figure 1). This vowel also has a high F2
converging with F3at around 2.5kHz, which creates a concentration of energy at the top of
the spectrum (cf. the rightmost peak in figure 1). The sharp drop in energy in the middle ofthe spectrum, corresponding to the lighter area between 1-2kHz in figure 4, gives |I| its
mnemonic label dIp.7
The signal pattern for the element |A|, on the other hand, has the informal label mAss.
This term describes a mass of energy located in the centre of the spectrum, peaking at around
1kHz. As figure 2 shows, there is a drop in energy on either side of this mass. The same
characteristic mAss pattern is reflected in the spectrograph for [] in figure 5, where the
energy peak results from a high F1value converging with F2in the 1kHz region. Finally, the
speech signal pattern for the element |U| is characterised by a concentration of energy at the
lower end of the spectrum. In figure 3 the energy peaks are contained within the 0-1kHz band,
while across the higher frequency range we observe a steady fall. This falling spectral shape
has been dubbed rUmp. Again, the pattern is visible in the spectrograph for the corresponding
vowel: figure 6 shows how [] involves a lowering of all formants, with F1at around 500Hz
and F2at around 1kHz.
Of course, the formant patterns in figures 4-6 are subject to some inter-speaker (as
well as intra-speaker) variation. Nevertheless, the above samples taken from my own speech
should illustrate the general physical correlates of the phonological categories |A I U| when
each element is interpreted in isolation. In fact, from an Element Theory point of view such
variation is of no linguistic consequence, since the theory defines elements only in terms of
their overall spectral pattern i.e. dIp, mAss and rUmp and not by referring to raw
acoustic data such as precise formant values. In the preceding paragraphs I have used specific
frequency values to describe each spectral pattern in a precise way; but it must be stressed
that numerical data of this kind is for descriptive purposes only it has no formal place in
the Element Theory grammar.8A fuller description of the spectral properties of |A I U| can be
found in Harris & Lindsey (1995).
3.4 |A I U| in compounds
3.4.1 Phonetic evidence for element compounds
7
The labels dIp, mAssand rUmpare taken from Harris (1994: 139).8Not all models of segmental structure take this position. For example, Flemming (2002) proposes that scales of
formant values be incorporated directly into vowel representations.
-
8/11/2019 backley Englishvowels_2010
31/80
31
The definition of elements as speech signal patterns appears to be consistent with the Quantal
Theory explanation for why languages favour triangular vowel systems bounded by |A I U|.
As noted above, Quantal Theory assumes that each corner of the vowel triangle is associated
with a unique and unambiguous acoustic patternwhich is exactly what the vowel elements
represent. The original Quantal Theory descriptions, which refer to patterns of converging
vowel formants, are redefined in (13) in terms of the impressionistic spectral shapes shown in
figures 1-3:
(13)
The summary in (13) shows that each vowel element has a pattern which is not only unique
but also highly distinct, given the small number of variables involved. So the three-way
contrast between [], [] and [] should be easy to recognise, and moreover, difficult to
confuse, just as the quantal approach predicts. However, most languages have vowel systems
containing more than just [ ], which means they must allow elements to combine into
compound expressions. Let us now look at compounding in more detail. We first examine the
effects of compounding on the speech signal, and then consider the phonological properties
of compounds.
It will be recalled from 3.2 that the universally unmarked vowel system consists of
the corner vowels [ ] plus the mid vowels [ ]. It has already been argued that [ ]
have a special status as basic vowels, which is reflected in the way each corresponds to a
primary unit of phonological structurei.e. an element. In contrast, the mid vowels do not
share this status. Instead, the phonological evidence indicates that [ ] are each the result of
combining two elements and interpreting these simultaneously: [] is represented by the
compound |I A| while [] comes from |U A|. Now, assuming that every element is associated
with a spectral pattern, and further assuming that all information relating to element structure
is transmitted via the speech signal, we can expect the speech signal itself to contain complex
spectral patterns when a mid vowel is interpreted. The spectral patterns for mid vowels are
shown in (14) and (15):
|I| |A| |U|
position of peak(s) low + high centre low
position of trough(s) centre low + high centre + high
-
8/11/2019 backley Englishvowels_2010
32/80
32
(14) Spectral pattern for |I A| (versus |I|)
Figure 7: |I A| ([]) versus Figure 8: |I| ([])
The mid vowel [] results from the interpretation of the compound expression |I A|,
with both elements contributing to the overall shape of the composite spectral pattern in
figure 7. In the centre of the spectrum we find the dip between F 1and F2that characterises |I|,
though this is both narrower and shallower than in the pure dIppattern in figure 8 (repeated
from figure 1). The difference introduced in figure 7 is accounted for by the presence of |A|,
which produces an energy mass in the same central region with troughs on either side. In
short, the |I A| compound creates a dIp within a mAss a large central mass of energy
containing a dip inside it.
(15) Spectral pattern for |U A| (versus |U|)
Figure 9: |U A| ([]) versus Figure 10: |U| ([])
The mid vowel [] is the result of interpreting the compound expression |U A|. In
figure 9 the presence of |U| ensures that a concentration of energy is maintained at the lower
end of the spectrum, as we find with the pure rUmppattern in figure 10 (repeated from figure
3). Unlike [], however, where the energy peak is located very near the bottom of the
spectrum, the mid vowel [] shows a concentration of energy somewhat closer to the central
-
8/11/2019 backley Englishvowels_2010
33/80
33
region; as Harris & Lindsey (2000) point out, the energy peak in [] is far enough above the
bottom of the frequency range to constitute a mAss, with troughs above and below (Harris &
Lindsey 2000: 196). So the |U A| compound produces a rUmpwithin a mAssa centralised
mass of energy which falls as the frequency increases.
3.4.2 Phonological evidence for element compounds
So there is phonetic evidence to indicate that mid vowels are complex structures: the spectral
pattern for |I A| (= []) combines mAssand dIp, while the pattern for |U A| (= []) combines
mAssand rUmp. But structural complexity is primarily a phonological property, which means
that support for the existence of element compounds like |I A| and |U A| should come
primarily from phonological evidence. In the case of mid vowels, the evidence focuses on the
way the individual elements in a compound become visible under certain phonological
conditions. In other words, the phonology allows us to see inside complex expressions and
observe their internal composition.
The following examples are, above all, intended to support the existence of element
compounds in the grammar. Additionally, however, they reinforce the status of |A I U| as
phonological primes, since they demonstrate how these elements regularly participate as
active units in various dynamic phenomena. In this section I shall discuss examples of vowel
processes which make reference only to the five vowels [ ] introduced so far. In
general, these processes cause the internal (element) structure of a vowel to be reorganised or
reinterpreted in some way. This is illustrated by processes such as monophthongisation,
diphthongisation and vowel coalescence. Other process types that demonstrate the workings
of element-based representations include vowel harmony and vowel reduction; I shall touch
on these below, after having discussed the structure of element compounds in more detail.
The history of English provides numerous cases of monophthong formation and
diphthong formation. Following Harris (1994: 100), I describe these two processes together,
since one is essentially a reversal of the other. Many dialects of late Middle English had the
diphthongs [](~[]) and [] in the following words (data from Jones 1989):
(16) a. Middle English []/[] b. Middle English []
day [] day law [] law
eight [] eight dauhter [] daughter
vain []vain naught [] not
-
8/11/2019 backley Englishvowels_2010
34/80
34
pay [] pay baul [] ball
During the sixteenth and seventeenth centuries, however, these diphthongs began to develop
the monophthongal realisations [] and [], respectively, which survive in some dialects of
Modern English: for example, British English retains [] in law[] and ball [], while
some regions in northern England also pronounce [] in eight[] andpay[]. Expressed
in |A I U| terms, this monophthongisation process involves a simple reorganisation of the
elements in the original diphthong:
(17) a. [] [] b. [] []
N N N N
x x x x x x x x
|A| |A| |A| |A|
|I| |I| |U| |U|
(17a) shows how the interpretation of the expression |A I| has changed during the
development of the English vowel system. In late Middle English |A| and |I| were interpreted
separately, resulting in a diphthong []. In this case, speakers distributed |A| and |I| across the
two prosodic positions in the nuclear domain. Later, however, language users began to
interpret the same elements simultaneously, thereby producing a mid vowel [].9Segmental
reconfiguration of this kind typically leaves the prosodic structure untouched, so the later
interpretation [] is still tied to a long nucleus. (17b) shows how back diphthongs also
underwent a similar reconfiguration process.
Importantly, monophthong formation comes about as a result of speakers and hearers
adjusting their interpretation of the original diphthong structures. The lexical structures
themselves are unchanged nothing has been added or removed. In the absence of any
representational changes, then, what we see in (17) is the mid vowel interpretations [] of
the compound expressions |A I| and |A U|, respectively. On this basis, it should come as no
surprise that other ways of reinterpreting the same structures have also emerged. For example,
9The compound expression |A I| can be interpreted as either [] or []. Clearly, in languages with a []~[]
contrast these vowels must have distinct representations. This will be discussed below.
-
8/11/2019 backley Englishvowels_2010
35/80
35
Estuary English (South-East England) has since reverted to a diphthong realisation of |A I|:
day[], eight[]. By contrast, in RP and many other dialects we also find a diphthongal
reinterpretation: day[], eight[]. These are illustrated in (18):
(18) a. Estuary English: day[i] b. RP English day[]
N N N N
x x x x x x x x
d |A| d |A| d |A| d |A|
|I| |I| |I| |I|
So, historical and dialectal evidence indicates that mid vowels are represented by
compound element expressions. Further support for the structures |A I| and |A U| comes from
other cases of English dialect variation, and in particular from the simplification (in effect,
monophthongisation) patterns found in various African Englishes. The examples in (19) are
taken from Simo Bobda (2007):
(19) a. [][] diphthong simplification
like[] Sierra Leone, Liberia
finding [] Zambia
primary [] Kenya
tribe [] Uganda
b. [][] diphthong simplification
round[] Kenya
mouth[]~[] West African Pidgin
town[] Liberia
house[] Krio
The process of diphthong simplification in African Englishes seems to be accompanied by
concomitant vowel shortening, as these cases of monophthongisation tend to result in a short
-
8/11/2019 backley Englishvowels_2010
36/80
36
vowel. Nevertheless, as far as their segmental structure is concerned they reinforce the
patterns described in (17), and provide additional evidence for (i) the primary status of the
vowel elements |A I U| as active phonological units, and (ii) the representation of mid vowels
as the compounds |A I| and |A U|.
Looking beyond English, we see further evidence for the mid vowel structures |A I|
and |A U| in languages as diverse as Japanese and Maga Rukai. Kubozono (2001) describes
two processes of monophthong formation in Japanese, one historical and the other synchronic.
Towards the end of the Middle Japanese period, the diphthong [] in Sino-Japanese words
underwent monophthongisation to []:
(20) Middle Japanese monophthongisation
[] [] cherry tree ()
[] [] high(), fidelity ()
[] [] capital (), home town ()
The output forms in (20) are subject to an analysis similar to that shown in (17b) for early
English. Meanwhile, in present-day Tokyo Japanese the reinterpretation process described in
(17a) has become a characteristic of casual speech (Kubozono 2001: 63), with [] beingmonophthongised to []. The diphthong [] is retained in formal speech, however, resulting
in the alternations shown in (21):10
(21) Tokyo Japanese monophthongisation
[]~[] usually
[]~[] siblings
[]~[] painful
In view of the Japanese patterns in (20) and (21), it is clear that analysing [ ] as the
element compounds |A I| and |A U|, respectively, does not just capture mid vowel behaviour
in English; rather, it describes a property of the vowel elements themselves. This point is
reinforced by the fact that similar behaviour is also observed in other, unrelated languages. In
Maga Rukai, an Austronesian language spoken in Taiwan, a synchronic process of vowel
10Hirayama (2003) analyses the Japanese data in (21) using traditional features.
-
8/11/2019 backley Englishvowels_2010
37/80
37
coalescence has created mid vowels that were not present in the proto-language (Hsin 2003).
The nouns in (22a) have the heterosyllabic vowel sequence [][] in the root of the
negative form, which corresponds to [] in the positive. This [] is the result of merging the
phonological properties [] and []. In (22b) we find a parallel alternation between [][]and []:
(22) a. [][] coalescence b. [][] coalescence
negative positive negative positive
bee hemp
bridge tooth
pan excrement
Maga Rukai has a pattern of vowel syncope determined by its iambic foot structure (Hsin
2003: 64). In (22) this is shown as the loss of [] in the root-initial syllable of the positive
form. Yet although the nuclear position itself is suppressed, its segmental content |A| is
retained; this stray element is then interpreted in the adjacent nucleus:
(23) Maga Rukai vowel coalescence: []~[]
N N N N
x x x x x x
c |A| k c |A|k |A|
|I| |I|
So Maga Rukai provides another example of a process which reconfigures a representation in
such a way as to reveal the internal structure of mid vowels. The merger of |A| and |I| in (23)
produces [] in [], while the same analysis also applies to the merger of |A| and |U| to
create [] (e.g. []).
The representations shown here follow the conventions of autosegmental phonology
in having individual elements occupy separate structural levels or tiers; in (23) for instance,
|A| and |I| reside on independent tiers. Although this arrangement is not crucial, it does offer a
-
8/11/2019 backley Englishvowels_2010
38/80
-
8/11/2019 backley Englishvowels_2010
39/80
-
8/11/2019 backley Englishvowels_2010
40/80
40
position. This difference between schwa and other vowels is to be expected, however, if
Element Theory is correct in its claim that vowel properties are mapped onto the acoustic
signal. The presence of |A|, |I| or |U| is associated with a strong, characteristic spectral pattern;
and to produce such a pattern speakers must adopt a distinct, non-neutral vocal tract shape.
On the other hand, the absence of any characteristic spectral pattern, such as we find in [], is
naturally paired with a vocal tract configuration lacking any distinct shape. A uniformly
shaped tube is unable to manipulate formant values in any linguistically meaningful way, and
the phonetic result is schwaa central vowel of a neutral or indistinct quality.
So the spectral shape for [] shows none of the characteristic vocalic patterns dIp,
mAssor rUmp, suggesting that [] has no vowel elements in its representation. The absence
of |A I U| effectively leaves an unspecified or representationally empty vowel. As indicatedabove, the Element Theory literature also considers schwa to be informationally empty
(Harris & Lindsey 2000), in the sense that having no element structure means it contains no
linguistic information. In Element Theory, representational emptiness and informational
emptiness amount to the same thing.
But if schwa has no element structure, how can it be heard and pronounced? Harris &
Lindsey (1995) argue that the spectral pattern in figure 11 may be viewed as a baseline
resonance that exists latently in all vowels. Usually this pattern is not heard, because in the
presence of |A I U| it is overridden by the more marked patterns dIp, mAssand rUmp. In the
case of most vowels, these marked patterns are superimposed onto the baseline resonance and
have the effect of masking it entirely. In the case of schwa, however, which has no elements,
the baseline resonance is exposed. Language users associate this resonance with the central
region of the acoustic spacemore specifically, with the only area of the vowel space not
occupied by |A I U|:
(26) |A I U| areas of the vowel space
|I| |U|
|A|
-
8/11/2019 backley Englishvowels_2010
41/80
41
It has already been noted that any vowel system may contain a neutral vowel, which
can vary phonetically between [] and []12. Now consider the stylised vowel space in (26),
which demonstrates why this phonetic variation is possible, or perhaps even expected. The
absence of |A I U| corresponds to a central area of the vowel space covering a sizeable rangeof different vowel qualities, any of which may be targeted by individual languages as the
interpretation of an unspecified vowel. Importantly, phonetic differences such as [] versus
[] are trivial in most languages,13because these variants refer to the same linguistic object,
namely a phonologically empty vowel. Harris & Lindsey (1995) liken the empty vowel to a
blank canvas a neutral background which becomes hidden when different colours are
painted on to it. And no matter what shade of white or grey the original canvas may be, it is
still interpreted as having no colour as long as it remains empty (i.e. unpainted).
3.5.2 Phonological evidence for empty vowels
It has been stressed that elements should be treated primarily as units of phonological
structure, and that their existence should therefore be supported by evidence from the
phonology. At first sight, however, it seems that a different approach may be needed in the
case of schwa, the empty expression ||, because it contains no elements in its representation
and thus amounts to nothing in phonological terms. In fact this is not the case. Although [ ]
has no segmental content, it is still linked to the prosodic structure specifically, to a
syllable nucleuswhich is clearly within the scope of phonology. If [] is to be viewed as
the interpretation of an empty nucleus, then it should receive a phonological analysis like any
other nucleus. Another reason for treating || as a phonological object is that this empty
expression is often the result of a phonological process that removes element structure (e.g.
from weak syllables). If elements are removed from a vowel expression until nothing remains,
then it becomes possible for the baseline resonance of an empty nucleus to be interpreted.
The following examples from Bulgarian and Turkish illustrate the phonological identity of
empty nuclei.
Like English, Bulgarian (Pettersson & Wood 1987) has a full set of vowel contrasts in
stressed positions but only a reduced set in unstressed positions, as shown in (27). Examples
of these alternating vowels are given in (28) (data from Crosswhite 2004)::
12Other realisations of an unspecified vowel are also possible: e.g. [] in the Jivaro system [ ].
13In 4.4.4 it will be argued that this is not true of English.
-
8/11/2019 backley Englishvowels_2010
42/80
42
(27) Vowel system(s) of Bulgarian
stressed:
unstressed:
(28) Vowel reduction in Bulgarian (data from Crosswhite 2004)
stressed unstressed
[] village [] villages
[] of horn [] horned
[] work [] worker
Bulgarian illustrates a common pattern whereby unstressed syllables support only a
subset of the vowel contrasts that are possible in stressed syllables: [ ] are neutralised to []
in weak syllables, [ ] become [], and [ ] merge as []. Using traditional features it is not
easy to express these vowel reduction effects as a single process: [][] and [][] are
captured by [high][+high], whereas the same feature [high] is irrelevant to [][] as
both are [high]; instead, the change from [] to [] must be described as [+low][low].
Yet it is clear that the alternations in (27) are all motivated by the same conditioning factor
namely, the inability of an unstressed nucleus to support certain vowel properties. Restated in
terms of Element Theory, however, the generalisation becomes formally simple: |A| is not
licensed in unstressed syllables. As such, the element |A| is suppressed in those contexts but
language users still interpret any remaining elements.
(29) a. high vowels are unchanged (|A| not present)
[][] |I||I|
[][] |U||U|
b. mid vowels are raised (|A| suppressed)
[][] |A I||AI|
[][] |A U||AU|
c. central vowels become unspecified (|A| suppressed)
[][] | || |
[][] |A||A|
-
8/11/2019 backley Englishvowels_2010
43/80
43
Bulgarian vowel reduction is a process that targets |A|, and because the high vowels in
(29a) lack |A|, they are unaffected. By contrast, the mid vowel compounds [ ] in (29b) do
contain |A|; this element is interpreted in stressed positions, but is suppressed in weak
positions; the loss of |A| leaves a sole |I| or |U| remaining, which is interpreted as the highvowel [] or [] respectively. Turning to the patterns in (29c), these provide evidence to
support the analysis of [] as an unspecified vowel. As a structurally empty vowel, [] has no
|A| and is thus unaffected by vowel reduction: [][]. On the other hand, [] has |A| in its
representation, this element being interpreted in stressed syllables. But in unstressed positions
[] loses its entire element structure through the |A|-suppression process, leaving behind an
empty nucleus which is interpreted phonetically as baseline resonance: [][]. What (29)
shows is that these vowel reduction effects can be unified as a single process only if the
grammar allows for an unspecified vowel to appear in representations. In the absence of any
positive vowel properties (i.e. elements), this vowel is interpreted as neutral or baseline
resonance, typically [].
The interpretation of phonologically empty nuclei is also observed in Turkish. This
language, like a number of other Altaic systems, has a well-documented process of vowel
harmony in which suffix vowels agree in backness with root vowels. In traditional analyses
the active property is assumed to be the feature [back], whereas in Element Theory it is the
element |I|. Recall that |I| identifies those vowels with a dIp spectral pattern; these have a
relatively high second formant, which places them in the front area of the vowel space. In
Turkish vowel harmony, when a root vowel contains |I| then the same element is also
interpreted in suffixes. For example, the genitive singular suffix in (30a) has a lexically
empty vowel, so the suffix is pronounced []. Under harmony conditions, shown in (30b), it
copies |I| from the root and the suffix vowel is interpreted as []:
(30) |I| harmony in Turkish
Nom. sg. Gen. sg. Nom. pl.
a. girl stalk
b. rope house
-
8/11/2019 backley Englishvowels_2010
44/80
44
The nominative plural suffix also alternates, between its lexical form [] (with a vowel
containing |A|) and its harmonising form [] (with an additional |I|). Example structures are
shown in (31):
(31) a. b. c.
N N N N N N
x x x x x x
k | | z | | n p n |A| v l |A| r
|I| |I| |I| |I|
The forms in (30) present a somewhat simplified picture of the facts relating to vowel
harmony in Turkish.14 Nevertheless, they are consistent with the analysis of []/[] given
above, and with the claim that some grammars allow representations to contain structurally
empty nuclei. But if || really has no element content, then why is it not interpreted as
silence? Having no elements means that | | cannot be mapped on to any linguistically
significant patterns in the acoustic signal; that is, it cannot carry segmental information.
However, || isassociated with a nuclear position, and this nucleus plays an important role inthe formation of prosodic structure. In combination with other nuclei, it contributes to the
construction of higher prosodic domains such as feet and words units which convey
linguistic information deemed essential for speech perception and efficient lexical access
(Cutler & Norris 1988). There is evidence, for example, that listeners pay particular attention
to the beginnings of foot and word domains when processing running speech. So, one
consequence of not interpreting an empty nucleus is to reduce the amount of linguistic
(specifically, prosodic) information being transmitted via the speech signal.
This is not to say that empty nuclei can never be silent. In fact, uninterpreted empty
nuclei are a grammatical possibility in many languages, including English (e.g. []
unclear, where marks a silent nucleus). Importantly, however, their appearance needs to be
controlled in order to avoid the emergence of unmanageable sequences of consonants.
Grammars which allow silent empty nuclei must therefore impose restrictions on their
distribution (Charette 1991, Scheer 2004). But if a nucleus is silent, how can we be sure it is
there at all? English provides an answer to this question by showing how the same nucleus is
14See Charette & Gksel (1996) for a more detailed account.
-
8/11/2019 backley Englishvowels_2010
45/80
45
silent under certain conditions but phonetically interpreted under other conditions. The
following example illustrates the point.
According to one innovative approach to syllable structure, all well-formed lexical
representations end in a nucleus (Kaye 1990). Some languages such as Italian require this
final nucleus to be interpreted, with the result that words must end phonetically in a vowel.
For example, all native Italian words are vowel-final: casa house, case housing, caso
chance (but *cas); additionally, many loanwords in Italian have become vowel-final
through adaptation:gallon(English) gallone(Italian). By contrast, other languages allow a
final empty nucleus to be silent. As a result, they admit words ending phonetically in a
consonant: peach [] (English), schlimm [] bad (German), rhad [] cheap
(Welsh). Following Kaye (1990), the structure of the English word peachis shown in (32a),
where the word-final empty nucleus is licensed to remain silent.
(32) a.peach b. plural c.peaches
O N O N O N O N O N O N
x x x x x x x x
p |I| | | z | | p |I| | | z | |
As an independent lexical structure, the plural suffix in (32b) also has a final empty
nucleus which is not phonetically interpreted; in segmental terms, the plural marker consists
solely of its onset fricative [].15 And when a language user constructs the plural noun
peachesby concatenating the two forms (32a) and (32b), the result is the structure in (32c).
Since resyllabification is not permitted in Kayes model, the plural noun peaches ends up
with two empty nucleione from the stempeach, the other from the suffix. It also contains
the two sibilant consonants [] and [], which are phonetically adjacent and thus create an
unmanageable sequence of the kind mentioned above. Specifically, when these sounds are
adjacent, their similar acoustic properties make them perceptually almost indistinguishable.
Yet the perceptibility of [] and [] and therefore the linguistic information
associated with these segmentscan be recovered by exploiting the lexical structure itself.
By phonetically interpreting the intervening empty nucleus || as a neutral vowel [], as was
15The voicing properties of English obstruents are discussed in Backley (in prep).
-
8/11/2019 backley Englishvowels_2010
46/80
46
observed for Turkish in (31a), important acoustic cues carried by the C-to-V [] transition
and the V-to-C [] transition can be easily perceived; as a result, the linguistic information
carried by [] and [] is transmitted in full. So, without recourse to arbitrary measures such
as the insertion of an epenthetic vowel, we get the formpeaches[]. This analysis of the[] plural departs from the usual textbook explanation in two respects. First, [] is seen here
as a product of the existing representation rather than as a newly introduced addition to the
structure. This is presumably a gain for restrictiveness, in that the distribution of empty nuclei
is strictly controlled by the grammar whereas epenthesis can in principle be applied anywhere.
Second, interpreting || as a neutral vowel has a clear linguistic motivation, since it enhances
the perceptibility and recoverability of linguistic information. By contrast, the traditional
vowel epenthesis account is typically concerned with notions such as ease of articulation
which, following the discussion in 2.3 above, is best seen as non-linguistic in nature.
The behaviour of the English plural suffix provides further evidence for the existence
of empty nuclei in representations. It also shows how linguistic conditions can cause an
empty nucleus to be phonetically interpreted in a language-specific way. One aspect of the
analysis of [] should be clarified, however. I have claimed that || is interpreted as []