Christoph Schubert and Christina Sanchez-Stockhammer (Eds.) · brücken, Germany), Caroline Tagg...

Christoph Schubert and Christina Sanchez-Stockhammer (Eds.)Variational Text Linguistics

Topics in English Linguistics

Editors Elizabeth Closs Traugott Bernd Kortmann

Volume 90

Variational Text Linguistics

Revisiting Register in English

Edited by Christoph Schubert Christina Sanchez-Stockhammer

ISBN 978-3-11-044310-3e-ISBN (PDF) 978-3-11-044355-4e-ISBN (EPUB) 978-3-11-043533-7ISSN 1434-3452

Library of Congress Cataloging-in-Publication DataA CIP catalog record for this book has been applied for at the Library of Congress.

Bibliographic information published by the Deutsche NationalbibliothekThe Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

© 2016 Walter de Gruyter GmbH, Berlin/BostonCover image: Brian Stablyk/Photographer’s Choice RF/Getty ImagesTypesetting: fidus Publikations-Service GmbH, NördlingenPrinting and binding: CPI books GmbH, Leck♾ Printed on acid-free paperPrinted in Germany

www.degruyter.com

Acknowledgements

The foundations for this edited collection of articles were laid at the interna-tional conference Register revisited: New perspectives on functional text variety in English, which took place at the University of Vechta, Germany, from June 27 to 29, 2013. The aim of the present volume is to conserve the research papers and many inspiring discussions which were stimulated then and to make them available to a larger audience.

It was only possible to achieve this aim thanks to the help of many people joining us in the effort. First and foremost, we would like to thank all contributors for their continued cooperation in this project. Furthermore, we are very grate-ful to the external peer reviewers who contributed their expertise to the selec-tion and improvement of the contributions. These are (in alphabetical order): Federica Barbieri (Swansea, Wales), Eniko Csomay (San Diego, USA) Jürgen Esser (Bonn, Germany), Maria Freddi (Pavia, Italy), Christer Geisler (Uppsala, Sweden), Bethany Gray (Ames, Iowa, USA), Joachim Grzega (Eichstätt, Germany), Thomas Kohnen (Cologne, Germany), Rocío Montoro (Granada, Spain), Neal Norrick (Saar-brücken, Germany), Caroline Tagg (Birmingham, UK), Sanna-Kaisa Tanskanen (Helsinki, Finland) and Marija Zlatnar Moe (Ljubljana, Slovenia).

We are very happy that this volume appears in the series Topics in English Lin-guistics (TiEL) and would like to thank the series editors Elizabeth Traugott and Bernd Kortmann as well as Wolfgang Konwitschny, Julie Miess and Birgit Sievert at de Gruyter Mouton for their invaluable support in the preparation of this book. Needless to say that we are to blame for any remaining inadequacies.

Going back to the roots of this project, we would like to express our grat-itude to the German Research Foundation/Deutsche Forschungsgemeinschaft (DFG) for the generous funding of the conference as well as to the Kommission für Forschung und Nachwuchsförderung der Universität Vechta, the Universitäts-gesellschaft Vechta (UGV), the Volksbank Vechta and the city of Vechta for their financial support and hospitality, which contributed immensely to the memora-ble pleasant atmosphere of the event.

Christoph Schubert and Christina Sanchez-StockhammerApril 2016

Table of contentsAcknowledgements v

Christoph SchubertIntroduction: Current trends in register research 1

Section I: Specialised registers

Douglas Biber and Jesse EgbertTowards a user-based taxonomy of web registers 19

Heidrun DorgelohThe interrelationship of register and genre in medical discourse 43

Markus BieswangerAviation English: Two distinct specialised registers? 67

Rolf Kreyer‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpus-linguistic perspective 87

Teresa PhamThe register of English crossword puzzles: Studies in intertextuality 111

Section II: Cross-register comparison

Christina Sanchez-StockhammerPunctuation as an indication of register: Comics and academic texts 139

Martina LampertLinking up register and cognitive perspectives: Parenthetical constructions in academic prose and experimentalist poetry 169

Stella Neumann and Jennifer FestCohesive devices across registers and varieties: The role of medium in English 195

viii Table of contents

Section III: Regional, contrastive and diachronic register variation

Barbara GüldenringMetaphors in New English academic writing 223

Steffen SchaubThe influence of register on noun phrase complexity in varieties of English 251

Valentin WernerReal-time online text commentaries: A cross-cultural perspective 271

Javier Pérez-GuerraWord order is in order here: A diachronic register analysis of syntactic markedness in English 307

Index 337

Christoph SchubertIntroduction: Current trends in register research

1 Research interest and goals of the volumeThe discipline of text linguistics is firmly established as “any work in language science devoted to the text as the primary object of inquiry” (de Beaugrande and Dressler 1981: 14). Although there is a variety of theories and approaches in text linguistics, common research issues are the definition of “text” in old and new media, the formal and functional connections between sentences, typological classifications of texts and processes in the production and comprehension of texts (cf. Esser 2009: 20–21 and Schubert 2012: 29). As the new discipline of “var-iational pragmatics”, which investigates contextual language use across regional varieties of English, has been established in recent years (cf. Schneider and Barron 2008), the present volume aims to foster and further develop the discipline of “variational text linguistics”. Since this new field of research covers both func-tional and regional types of textual variation, it intends to provide novel insights into the multi-faceted concept of “register”. Along the lines of Biber and Conrad’s monograph Register, Genre, and Style (2009: 6), we regard “register analysis” as a perspective on text variety which investigates context-dependent communica-tive functions of characteristic lexico-grammatical features in discourse. Thus, quantitative results based on adequate corpora are here combined with qualita-tive assessment. We approach the subject of “register” from a wide perspective, incorporating stylistics, variational linguistics and discourse analysis, so that convergences and synergistic effects between disciplines become obvious.

In recent years, other volumes dedicated to textual variety have placed emphasis on different research foci, which may be illustrated by three examples: the essay collection by Dorgeloh and Wanner (2010) is interested in textual variety in English exclusively from the perspective of syntactic parameters and it inves-tigates genre rather than register. In the volume by Andersen and Bech (2013), genre variation is only one parameter next to diachronic variation in time and geographical variation in space. Moreover, the three types of variation are largely discussed separately, and the editors’ main interest lies in corpus development

Christoph Schubert, University of Vechta

2 Christoph Schubert

and analysis. The book by Szmrecsanyi and Wälchli (2014) does not only discuss register and dialectology but also includes language typology and therefore com-prises articles on a number of languages such as Dutch or members of the Slavic family. Yet, they also formulate the central diagnosis that “[e]ven though dialec-tologists, register analysts, typologists, and quantitative linguists all deal with linguistic variation, there is astonishingly little interaction across these fields” (Wälchli and Szmrecsanyi 2014: 1).

In general, register analysis offers a constantly widening range of research opportunities because of the ever-increasing possibilities of communication, mainly triggered by the advent of modern communication technologies. As the main body of linguistic research has concentrated on well-established and fre-quent registers such as newspaper writing or face-to-face conversations, many descriptive and theoretical issues have not yet been sufficiently investigated. Accordingly, the report on major register studies in Biber and Conrad (cf. 2009: 271–295) reveals that research on specialized registers has had a clear preference for academic and newspaper texts. In particular, the language of popular genres such as pop music, comics or puzzles has hardly been investigated so far, and there are several forms of electronic communication, such as online text com-mentaries, which need to be described more closely. Hence, by giving room to the description of registers which have not received an appropriate amount of atten-tion so far, we intend to point out emerging trends as well as new directions for future research. By means of cross-cultural comparisons of registers, the volume aims to build bridges to neighbouring disciplines such as cultural studies, espe-cially with regard to intercultural communication. By pointing out the ubiquitous nature of register, we also intend to show that adequate register choice is not a marginal phenomenon but a fundamental prerequisite for successful communi-cation in specific social situations.

2 Definitions of “register”As far as the semantic origin of the term “register” is concerned, the linguistic use of the term represents a metaphorical borrowing from the domain of music, in particular organ playing (cf. Renkema 2004: 146), where it refers to a “sliding device controlling a set of organ-pipes which share a tonal quality” or “the compass of a voice or musical instrument; a particular range of this compass” (Trumble and Stevenson 2002: 2514), so that it is common to speak of “the upper/middle/lower register” (Summers et al. 2005: 1380) of a specific instrument. Hence, in this analogy, “[l]anguage is seen to be regulated in the same way as the

Introduction: Current trends in register research 3

musical tuning of an organ” (cf. Dittmar 2010: 223), and competent speakers of a language have the ability to fine-tune their linguistic choices according to their intended contextual functions.

As regards the semantic extension of the term register, it is worthwhile to con-sider different subdisciplines of linguistics in more detail (cf. Gut and Schubert 2012: 4–6). Thus, it is striking that sociolinguistic approaches usually employ a narrow definition of the term, reducing it to the language of occupations, such as “the register of law”, “the register of medicine” and the like. Since the topic of discourse is the central determining factor in this type of approach, it is mainly the vocabulary that is responsible for the constitution of a register. The follow-ing two quotations taken from standard introductions to sociolinguistics aptly demonstrate this narrow notion of “register”.

Linguistic varieties that are linked […] to particular occupations or topics can be termed registers. […] Registers are usually characterized entirely, or almost so, by vocabulary differ-ences. (Trudgill 2000: 81)

Register is another complicating factor in any study of language varieties. Registers are sets of language items associated with discrete occupational or social groups. Surgeons, airline pilots, bank managers, sales clerks, jazz fans, and pimps employ different registers. (Ward-haugh 2002: 51)

It is obvious that subject matters connected to certain types of activity are respon-sible for the linguistic choices made by discourse participants in this type of approach to “register”. Although the second quotation includes the term “social groups”, this is conceptualized in a narrow way, excluding the language of social classes in the sense of working- or middle-class sociolects.

In contrast to this narrow notion of “register”, a wide definition of the term is employed by the tradition of Systemic Functional Linguistics (SFL), as can be seen in the next two definitions taken from a classic introduction to cohesion and a recent study on register variation.

The linguistic features which are typically associated with a configuration of situational fea-tures – with particular values of the field, mode and tenor – constitute a register. (Halliday and Hasan 1976: 22, emphasis original)

Just as situations tend to recur and thus form types, registers represent recurring ways of using language in a given situation. […] Registers can thus be described as sub-systems of the language system or, when viewed from below, as types of instantiated texts reflecting a similar situation. (Neumann 2013: 16)

As is the case in the influential monograph by Halliday (1978), “registers” are here seen as functional varieties, corresponding to use in specific contexts, while


“dialects” are defined as varieties based on the respective user, who has a certain social or regional background that surfaces in linguistic behaviour. The fact that registers can be rightfully viewed as “sub-systems” of a given language under-lines their formative and constitutive character in a language. As for the three sit-uational features determining register choices, “field” refers to the subject matter under discussion, “tenor” pertains to the relationship between the participants in a given context and “mode” characterizes the medium of transmission (cf. also Bex 1996: 94–110 and Matthiessen 1993: 236–238).

This wide notion of “register” is also adopted by the currently prevailing approach of Multidimensional Analysis (MDA) à la Douglas Biber (e.g. Biber 1988, 1995, 2006, 2007; Gray 2013: 363–366), which relies on corpus-derived co-occurrences of lexico-grammatical features that serve equivalent functions in discourse. Despite the enhanced methodology, the definition is relatively similar, since a register is regarded as “a variety associated with a particular situation of use (including particular communicative purposes)” (Biber and Conrad 2009: 6). By increasing the degree of specificity, it is possible to distinguish between “sub-registers” (Biber and Gray 2013), so that, for instance, academic writing can be subdivided into sub-registers such as social science, multi-disciplinary science and humanities.

In text linguistics, the terminological differentiation between “register” and “genre” has always been a notorious issue. One possible solution to the problem is offered by Dorgeloh and Wanner (2010: 10), who suggest three main differ-ences, although the distinction between the concepts is still seen as scalar and gradient. First, while register implies linguistic features dependent on situational contexts, genres are regarded as types of “social action” (Dorgeloh and Wanner 2010: 10) used to perform interindividual tasks. Second, register is dominantly geared towards the function of linguistic features, whereas genres rely to a large degree on “patterned practice” (Dorgeloh and Wanner 2010: 10), involving char-acteristic textual structures. Third, register operates at a high level of generality, while genre has a more specific and concrete character, such as “on-line medical advice” or a “corporate blog” (Giltrow 2010: 47). In fact, this more specific defini-tion offers a niche for the term “genre” in linguistics, since recently, research on “genre” has been superseded by linguistic interest in “register” (cf. Giltrow 2010: 31). Literary criticism, by contrast, clearly maintains a preference for the concept of “genre”.

An alternative approach to terminological differentiation is provided by Biber and Conrad (2009: 15–23), who regard the three terms “register”, “genre” and “style” as “different perspectives on text varieties” (2009: 15–16). The perspective of register pertains to all kinds of “frequent and pervasive” lexico-grammatical items that fulfil specific communicative functions in “a sample of text excerpts”,


so that it can be applied to all sorts of discourse. As opposed to that, “[i]n the genre perspective, the focus is on the linguistic characteristics that are used to structure complete texts” (Biber and Conrad 2009: 16). Thus, genres rely on rather specific expressions that occur “in a particular place in the text” (2009: 16) and thus add up to a distinct rhetorical organization, which can be found in texts with a fixed structure, such as formal letters. Finally, “style” is very similar to “reg-ister” but depends on linguistic features that are “not directly functional” and “are preferred because they are aesthetically valued” (Biber and Conrad 2009: 16). That is to say that it is possible to determine the style of specific authors or periods of literary history, because these linguistic items do not correspond to particular contexts of situation but serve the poetic function of language. Con-clusively, in an extension of the “music” metaphor previously mentioned in the definition of “register”, “genre” equals the specific musical piece chosen by the church organist, while “style” is the organ-player’s individual interpretation and performance of the composition.1

3 Recent developments in register researchWhile some twenty years ago in the volume Register Analysis Robert-Alain de Beau grande still diagnosed that “[t]hroughout much of linguistic theory and method, the concept of ‘register’ has led a rather shadowy existence” (1993: 7), research in the field has considerably gained momentum ever since. As regards recent developments in register research in English, five main strands may be distinguished.

First, there are numerous studies on diachronic register variation, which cover various periods of English and usually focus on specific aspects. For instance, Alonso-Almeida (2008) discusses the Middle English medical charm with reference to register, genre and text type variables, whereas Warner (2005) investigates the variable use of do-support in different registers of Early Modern English. Moving on to Modern English, Biber and Finegan (2001) discuss varia-tion in written and spoken registers from the 17th to the 20th centuries. Various 19th-century registers are covered by Geisler (2002) as well as by Egbert (2012). More generally, Davies (2009) examines word frequency in registers from a diachronic perspective, whereas Crespo Garcia (2004) and Taavitsainen (2001) employ a narrower focus, concentrating on the history of the scientific register.

1 The editors sincerely thank Jan Renkema for this metaphorical insight.


Along similar lines, Biber and Gray (2013) investigate diachronic change in news reportage and academic research writing during the twentieth century.

Second, there is a considerable body of research on register variation in spe-cialized domains. The dimensions under discussion include parameters such as medium, public and private spheres as well as the discourse of certain fields of knowledge. Research on academic English is most frequent, as shown by Csomay’s (2002) analysis of lectures and Biber’s (2006) comprehensive multidi-mensional study of spoken and written register variation in university discourse. Fryer (2013) investigates medical research articles with regard to evaluation practices, while Schutz (2013) discusses the use of verbs in registers pertaining to business, linguistics, and medical research. Gotti (2012) argues that academic English is by no means uniform but varies according to a number of criteria, such as disciplinary conventions, expertise in the respective field, and linguistic competence of the author. A particular focus on interdisciplinary discourses is found in Teich (2009), whereas further recent studies on academic English and scientific texts respectively have been published by Bartsch (2009) and Teich (2010). In Quinto-Pozos and Mehta’s (2010) study of American Sign Language, it becomes clear that different registers are present not only in verbal but also in nonverbal communication. Concerning the parameter of medium, earlier studies on spoken and written registers have been complemented by research on com-puter-mediated communication (Biber 2007). As the research survey in Biber and Conrad (cf. 2009: 271–295) underlines, interest in electronic discourse has signif-icantly increased over the last ten to fifteen years. Further studies on specialized domains comprise register shifting in US public discourse (Cole 2012), the crea-tion of humour through incongruity in register (Venour, Ritchie and Mellish 2011), the register of news reporting in its social context (Lukin 2010), Business English (Cortés de los Ríos 2010), the evaluative language of corporate social reporting (Fuoli 2013), legal language (Battarbee 2010) and the language of linguistics (Freddi 2005). There is also some research on the use of registers in literary texts, as exemplified by Pollner’s (2005) analysis of language variation in Irvine Welsh’s novel Trainspotting.

Third, a quickly developing trend brings together register research with socio-linguistic investigations of regional variation, usually concentrating on inter-national varieties of English, or “World Englishes”, used as a second language (ESL). Xiao (2009) provides a discussion of general issues of the study of World Englishes from the perspective of multidimensional analysis. The recent volume by Szmrecsanyi and Wälchli (2014) contains a number of papers which combine quantitative techniques in register analysis, dialectology, and language typology. For instance, the contribution by Diwersy, Evert and Neumann (2014) shows how a corpus-driven multivariate approach can be used for the study of both regis-


ter and regional variation. Hilbert and Krug (2012) present a study on the use of progressives in spoken conversations and written press language in Maltese English, as compared to British and American English. As far as Asian varieties are concerned, there is research on registers in Singapore English (Bao and Hong 2006) and on Indian English registers (Balasubramanian 2009a), complemented by a special focus on adverbials (Balasubramanian 2009b). Regarding Africa, there is multidimensional research on various registers in East African English, pointing out, among other aspects, the presence of a greater degree of formal-ity and an increased involvement of the addressee (Van Rooy et al. 2010). Other papers analyse expository writing in Cameroon English (Nkemleke 2006) and academic texts by African American college students (Syrquin 2006). Neumann (2012) chooses a more comprehensive approach, comparing a number of registers in the Englishes spoken in New Zealand, Hong Kong, India, Jamaica, Singapore and Canada. The ultimate goal of most of these studies is to give a complete and comprehensive account of geographical varieties by describing their internally diversified registers, thus taking sociolinguistics to the next level. Along these lines, Balasubramanian (2009a: 19) argues that “[t]o provide a thorough linguis-tic description of a variety […], it is important to study registers of that variety – i.e. to study the variation within the dialect” and that “[s]uch study of register was missing in the earlier methodologies of dialectology”. As has been pointed out in research on postcolonial Englishes, it is common for these new Englishes to develop use-related varieties in addition to user-related ones, which corresponds to the stage of “differentiation” in the evolutionary development of postcolonial varieties (cf. Schneider 2007: 52–55). Hence, the study of registers aptly comple-ments sociolinguistic approaches, so that this liaison will undoubtedly prove highly fruitful in future research on linguistic variety.

Fourth, contrastive register analysis investigates register variation across two or more languages and is often linked to questions of translation studies. For instance, Teich (2003) compares textual variety in English and German and thereby significantly extends the scope of Contrastive Linguistics, which used to focus mainly on relatively isolated phonological and morphosyntactic features. Neumann (2013) likewise contrasts English and German registers by including both cross-linguistic variation and variational differences between original and translated texts. One central result is that related registers in the two languages show different register features with regard to the chosen subdimensions, so that individual register studies for both languages are necessary. More specifi-cally, the monograph by Barron (2012) compares public information messages in Irish English and German, while register shifts in translations from English into Slovene are investigated by Zlatnar Moe (2010). Focusing on the digital medium, Hardy (2012) contrasts electronic discourse in Filipino and American English.


Fifth, from an applied linguistic perspective there are numerous publica-tions on register and language teaching. While Painter (2001) writes on general issues of teaching genre and register and Reppen (2001) compares spoken and written registers of school-aged students and adults, many articles – quite unsur-prisingly – deal with the teaching of academic English. For instance, Halliday’s Systemic-Functional Linguistics is used for the analysis of student report writing by Gardner (2012), and Gilquin (2008) as well as Moore (2006) investigate Learner Academic Writing. On the basis of similar research interests, Han (2010) discusses the teaching of English for Specific Purposes (ESP) from the perspective of register theory. Another language-pedagogical topic is addressed by Volden (2009), who concentrates on registers used by autistic children. Rühlemann (2008) examines the teaching of the informal conversational register, which is frequently neglected in EFL research. With the exception of language pedagogical approaches, all of the trends mentioned are taken up by the papers in the present volume.

4 A model for register analysisAll of the contributions in this volume refer to the theoretical model of the influ-ential textbook by Biber and Conrad (2009). The central statement underlying register analysis in this textbook names the following crucial parameters: “[t]he description of a register covers three major components: the situational context, the linguistic features, and the functional relationships between the first two components” (Biber and Conrad 2009: 6). By establishing meaningful relations between these aspects, any given register can be described on the basis of a qual-itative and quantitative investigation.

As far as the situational context is concerned, Biber and Conrad expand the three parameters proposed by Halliday (1978) by establishing the following seven characteristics (2009: 40–47): (1) participants: the addressor(s) as the produc-er(s) of texts can be defined according to number, situation in society (individual or institutional) and personal parameters (age, gender, education etc.). Address-ees as the recipients of texts may also be classified according to number and the question whether they can be personally identified or not. In addition, there may be onlookers, who do not directly contribute to the verbal exchange but whose physical presence may nevertheless influence the linguistic choices made by the interlocutors. (2) Relations among participants: it is crucial to analyse whether the communication is immediately interactive, which social roles are played by the participants in terms of power, whether they have a personal relationship, and to what degree the interactants share relevant background


knowledge. (3) Channel: the communication can be conducted in the written or spoken mode, and a particular medium may be utilized, such as telephone, radio, television or the internet. (4) Production circumstances: while spoken com-munication commonly takes place in real time, written or electronic discourse may be carefully planned and additionally revised. (5) Setting: in spoken inter-action, the participants often share time and place, which is usually not the case in written texts. Moreover, communication can take place in a private or public setting or at a specific location such as a church. In temporal terms, linguistic conventions change through the decades and centuries. (6) Communicative purposes: while general discourse intentions include description, persuasion or narration, they may be complemented by specific textual functions referring to particular states of affairs, such as scientific findings or political spin. What is more, the text may be presented as fictitious or factual, and addressors often use linguistic items expressing their personal stance. (7) Topic: the theme of any kind of communication can be classified at a very general level as belonging to a certain field of discourse, such as science or business, while such domains obvi-ously offer manifold possibilities of topical sudivisions.

Those seven situational characteristics can be related to fifteen linguistic categories that may be worthwhile investigating in a register analysis (cf. Biber and Conrad 2009: 78–82): vocabulary features (e.g. technical terms), content word classes, function word classes, derived words, verb features (e.g. tense and aspect), pronoun features, reduced forms and dispreferred structures (e.g. con-tractions or ellipsis), prepositional phrases, coordination, main clause types, noun phrases, adverbials, complement clauses, word order choices (e.g. raising or extraposition) and special features of conversation (e.g. backchannels, pauses and repetitions). Any of these features may then function as either “register fea-tures” or “register markers”, which are distinguished in the following way (Biber and Conrad 2009: 53–54): register features are both pervasive and frequent, as they occur in all parts of a sample text belonging to a given register and appear more often in a selected register than in others. In contrast, register markers are unique to a particular register, as they do not occur in any other register, such as technical expressions in specific types of sport broadcasts.

In order to make a comparison of registers possible, it is necessary to intro-duce a limited set of dimensions along which various registers show different frequencies of the respective linguistic features. For instance, dimensions used for the study of spoken and written university registers may be “oral versus lit-erate discourse” or “procedural versus content-focused discourse” (Biber and Conrad 2009: 226–230). This approach, accordingly entitled “multidimensional (MD) analysis”, heavily relies on corpus-derived quantitative data. With the help of factor analysis, co-occurring clusters of linguistic features in target registers


can be retrieved. Eventually, it is possible to identify register-specific dimension scores, by means of which the registers can be compared. This approach also underlies the register distinction present in the seminal Longman Grammar of Spoken and Written English (Biber, Johansson, Leech, Conrad and Finegan 1999) as well as in the monograph University Language (Biber 2006), and it is the foun-dation of numerous studies on registers in recent years. For instance, Biber (2012) challenges the common practice of reference grammars which fail to take into account register distinctions and treat grammatical structures as general features of English at large. Biber’s impact can be measured by the fact that his method of multidimensional analysis has become more and more widespread (e.g. Egbert 2012; Geisler 2002; Reppen 2001; van Rooy et al. 2010; Xiao 2009). This trend is further corroborated by a recent edited volume which is dedicated explicitly to Biber’s MDA and contains articles on regional and register variation in both English and Romance languages (Sardinha and Pinto 2014).

5 An outline of the volumeThis volume is subdivided into three thematic parts, each introduced by general remarks on the respective section topic and by a summary of the individual articles: the first part, specialised registers, is dedicated to the description of individual registers, namely web registers (Biber and Egbert), medical texts (Dorgeloh), Avi-ation English (Bieswanger), hip-hop (Kreyer) and crossword puzzles (Pham). The second part, cross-register comparison, builds upon that basis by providing register-transcending studies which compare individual registers. More specifi-cally, it contrasts comics and academic texts (Sanchez-Stockhammer), academic prose and minimalist poetry (Lampert) as well as academic writing, administra-tive writing, timed exams, conversations and broadcast discussions (Neumann and Fest). The third part, regional, contrastive and diachronic register variation, widens the perspective by investigating register variation along inter-national, contrastive-linguistic and historical dimensions. It is dedicated to met-aphors in the New Englishes of India, Hong Kong and Singapore (Güldenring) as well as noun phrase structure in Indian English, Jamaican English, Hong Kong English and Canadian English (Schaub). Online text commentaries are analysed contrastively in British and German sports reports (Werner). The diachronic perspective is considered in the discussion of developments of word order from Middle English to Late Modern English (Pérez-Guerra). The paper by Neumann and Fest functions as an apt link between Sections II and III, since it combines cross-register comparisons with regional variation. Although the various contri-


butions to the volume take different research perspectives, all deal with frequent and recurrent linguistic features throughout texts supporting specific superor-dinate functions. Conclusively, the papers cover theoretical considerations, case studies and reflections on presently employed methods, suggesting approaches and topics for future research on variational text linguistics in English.

BibliographyAlonso-Almeida, Francisco. 2008. The Middle English medical charm: Register, genre and text

type variables. Neuphilologische Mitteilungen 109(1). 9–38.Andersen, Gisle & Kristin Bech (eds.). 2013. English corpus linguistics: Variation in time, space

and genre. Amsterdam: Rodopi.Balasubramanian, Chandrika. 2009a. Register variation in Indian English. Amsterdam:

Benjamins.Balasubramanian, Chandrika. 2009b. Circumstance adverbials in registers of Indian English.

World Englishes 28(4). 485–508.Bao, Zhiming & Huaqing Hong. 2006. Diglossia and register variation in Singapore English.

World Englishes 25(1). 105–114.Barron, Anne. 2012. Public information messages: A contrastive genre analysis of state-citizen

communication. Amsterdam: Benjamins.Bartsch, Sabine. 2009. Corpus studies of register variation: An exploration of academic

registers. Anglistik: International Journal of English Studies 20(1). 105–124.Battarbee, Keith. 2010. Shifts in the language of the law: Reading the registers of official-

language statutes. Text & Talk 30(6). 637–655.Bex, Tony. 1996. Variety in written English: Texts in society – societies in text. London:

Routledge.Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge UP.Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.

Cambridge: Cambridge UP.Biber, Douglas & Edward Finegan. 2001. Diachronic relations among speech-based and

written registers in English. In Susan Conrad & Douglas Biber (eds.). Variation in English: Multi-dimensional studies, 66–83. Harlow: Pearson Education.

Biber, Douglas. 2006. University language: A corpus-based study of spoken and written registers. Amsterdam: Benjamins.

Biber, Douglas. 2007. Towards a taxonomy of web registers and text types: A multidimensional analysis. In Marianne Hundt, Nadja Nesselhauf & Carolin Biewer (eds.). Corpus linguistics and the Web, 109–131. Amsterdam: Rodopi.

Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge UP.Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus linguistics and

linguistic theory 8(1). 9–37.Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.

Longman grammar of spoken and written English. London: Longman.Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of

sub-register. Journal of English Linguistics 41(2). 104–134.


Cole, Debbie. 2012. Uptake (un)limited: The mediatization of register shifting in US public discourse. Language in Society 41(4). 449–470.

Cortés de los Ríos, Ma Enriqueta. 2010. A combined genre-register approach in texts of business English. LSP Journal 1(1). 13–28.

Crespo García, Begoña. 2004. The scientific register in the history of English: A corpus-based study. Studia Neophilologica 76(2). 125–139.

Csomay, Eniko. 2002. Variation in academic lectures: Interactivity and level of instruction. In Randi Reppen, Susan M. Fitzmaurice & Douglas Biber (eds.). Using corpora to explore linguistic variation, 203–224. Amsterdam: Benjamins.

Davies, Mark. 2009. Word frequency in context: Alternative architectures for examining related words, register variation and historical change. In Dawn Archer (ed.). What’s in a word-list? Investigating word frequency and keyword extraction, 53–68. Surrey: Ashgate.

De Beaugrande, Robert-Alain. 1993. ‘Register’ in discourse studies: A concept in search of a theory. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice, 7–25. London: Pinter Publishers.

De Beaugrande, Robert-Alain & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics. London: Longman.

Dittmar, Norbert. 2010. Register. In Mirjam Fried, Jan-Ola Östman & Jef Verschueren (eds.). Variation and change: Pragmatic perspectives, 221–233. Amsterdam: Benjamins.

Diwersy, Sascha, Stefan Evert & Stella Neumann. 2014. A weakly supervised multivariate approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.). Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech, 174–204. Berlin: de Gruyter.

Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner (eds.). Syntactic variation and genre, 1–26. Berlin: De Gruyter Mouton.

Egbert, Jesse. 2012. Style in nineteenth century fiction: A multi-dimensional analysis. Scientific Study of Literature 2(2). 167–198.

Esser, Jürgen. 2009. Introduction to English text-linguistics. Frankfurt/Main: Peter Lang.Freddi, Maria. 2005. From corpus to register: The construction of evaluation and argumentation

in linguistics textbooks. In Elena Tognini-Bonelli & Gabriella Del Lungo Camiciotti (eds.). Strategies in academic discourse, 133–151. Amsterdam: Benjamins.

Fryer, Daniel Lees. 2013. Exploring the dialogism of academic discourse: Heteroglossic engagement in medical research articles. In Gisle Andersen & Kristin Bech (eds.). English corpus linguistics: Variation in time, space and genre, 183–207. Amsterdam: Rodopi.

Fuoli, Matteo. 2013. Texturing a responsible corporate identity: A comparative analysis of appraisal in BP’S and IKEA’S 2009 corporate social reports. In Gisle Andersen & Kristin Bech (eds.). English corpus linguistics: Variation in time, space and genre, 209–235. Amsterdam: Rodopi.

Gardner, Sheena. 2012. Genres and registers of student report writing: An SFL perspective on texts and practices. Journal of English for Academic Purposes 11(1). 52–63.

Geisler, Christer. 2002. Investigating register variation in nineteenth-century English: A multi-dimensional comparison. In Randi Reppen, Susan M. Fitzmaurice & Douglas Biber (eds.). Using corpora to explore linguistic variation, 249–271. Amsterdam: Benjamins.

Gilquin, Gaëtanelle. 2008. Too chatty: Learner academic writing and register variation. English Text Construction 1(1). 41–61.


Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun Dorgeloh & Anja Wanner (eds.). Syntactic variation and genre, 29–51. Berlin: De Gruyter Mouton.

Gotti, Maurizio. 2012. Variation in academic texts. In Maurizio Gotti (ed.). Academic identity traits: A corpus-based investigation, 23–42. Bern: Peter Lang.

Gray, Bethany. 2013. Interview with Douglas Biber. Journal of English Linguistics 41(4). 359–379.

Gut, Ulrike & Christoph Schubert. 2012. Approaches to language variation: Introduction. In Monika Fludernik & Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings, 3–9. Trier: WVT.

Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language and meaning. London: Arnold.

Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.Han, Huabing. 2010. On the methodology employed in ESP teaching under register theory. The

1st Asian ESP conference. [Special edition]. Asian ESP Journal, 158–163.Hardy, Jack A. 2012. Filipino and American online communication and linguistic variation. World

Englishes 31(2). 143–161.Hilbert, Michaela & Manfred Krug. 2012. Progressives in Maltese English: A comparison with

spoken and written text types of British and American English. In Marianne Hundt & Ulrike Gut (eds.). Mapping unity and diversity world-wide, 103–136. Amsterdam: John Benjamins.

Lukin, Annabelle. 2010. ‘News’ and ‘register’: A preliminary investigation. In Ahmar Mahboob & Naomi K. Knight (eds.). Appliable linguistics, 92–113. London: Continuum.

Matthiessen, Christian M. I. M. 1993. Register in the round: Diversity in a unified theory of register analysis. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice, 221–292. London: Pinter Publishers.

Moore, Nick. 2006. Advanced language for intermediate learners: Corpus and register analysis for curriculum specification in English for academic purposes. In Heidi Byrnes (ed.). Advanced language learning: The contribution of Halliday and Vygotsky, 246–264. London: Continuum.

Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik & Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings, 75–94. Trier: WVT.

Neumann, Stella. 2013. Contrastive register variation: A quantitative approach to the comparison of English and German. Berlin: Mouton de Gruyter.

Nkemleke, Daniel A. 2006. Some characteristics of expository writing in Cameroon English. English World-Wide 27(1). 25–44.

Painter, Clare. 2001. Understanding genre and register: Implications for language teaching. In Anne Burns & Caroline Coffin (eds.). Analysing English in a global context, 167–180. London: Routledge.

Pollner, Clausdirk. 2005. English 0 – and drugs galore: Varieties and registers in Irvine Welsh’s Trainspotting. In Gisela Hermann-Brennecke & Wolf Kindermann (eds.). Anglo-american awareness: Arpeggios in aesthetics, 193–202. Münster: LIT.

Quinto-Pozos, David & Sarika Mehta. 2010. Register variation in mimetic gestural complements to signed language. Journal of Pragmatics 42(3). 557–584.

Renkema, Jan. 2004. Introduction to discourse studies. Amsterdam: John Benjamins.


Reppen, Randi. 2001. Register variation in student and adult speech and writing. In Susan Conrad & Douglas Biber (eds.). Variation in English: Multidimensional studies, 187–199. London: Longman.

Rühlemann, Christoph. 2008. A register approach to teaching conversation: Farewell to Standard English? Applied Linguistics 29(4). 672–693.

Sardinha, Tony Berber & Marcia Veirano Pinto (eds.). 2014. Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber. Amsterdam: John Benjamins.

Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge UP.

Schneider, Klaus P. & Anne Barron (eds.). 2008. Variational pragmatics: A focus on regional varieties in pluricentric languages. Amsterdam/Philadelphia: Benjamins.

Schubert, Christoph. 2012. Englische Textlinguistik: Eine Einführung. 2nd edn. Berlin: Erich Schmidt.

Schutz, Natassia. 2013. How specific is English for academic purposes? A look at verbs in business, linguistics and medical research articles. In Gisle Andersen & Kristin Bech (eds.). English corpus linguistics: Variation in time, space and genre, 237–257. Amsterdam: Rodopi.

Summers, Della et. al. (ed.). 2005. Longman dictionary of contemporary English. Harlow: Pearson Education Limited.

Syrquin, Anna F. 2006. Registers in the academic writing of African American college students. Written Communication 23(1). 63–90.

Szmrecsanyi, Benedikt & Bernhard Wälchli (eds.). 2014. Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech. Berlin: de Gruyter.

Taavitsainen, Irma. 2001. Language history and the scientific register. In Hans-Jürgen Diller & Manfred Görlach (eds.). Towards a history of English as a history of genres, 185–202. Heidelberg: Winter.

Teich, Elke. 2003. Cross-linguistic variation in system and text. Berlin: Mouton de Gruyter.Teich, Elke. 2009. Scientific registers in contact: An exploration of the lexico-grammatical

properties of interdisciplinary discourses. International Journal of Corpus Linguistics 14(4). 524–548.

Teich, Elke. 2010. Exploring a corpus of scientific texts using data mining. In Stefan Th. Gries, Stefanie Wulff & Mark Davies (eds.). Corpus-linguistic applications: Current studies, new directions, 233–247. Amsterdam: Rodopi.

Trudgill, Peter. 2000. Sociolinguistics: An introduction to language and society. 4th edn. London: Penguin.

Trumble, William R. & Angus Stevenson (eds.). 2002. Shorter Oxford English dictionary on historical principles. 2 vols. Oxford: Oxford UP.

Van Rooy, Bertus, Lize Terblanche, Christoph Haase & Joseph Schmied. 2010. Register differentiation in East African English: A multidimensional study. English World-Wide 31(3). 311–349.

Venour, Chris, Graeme Ritchie & Chris Mellish. 2011. Dimensions of incongruity in register humour. In Marta Dynel (ed.). The pragmatics of humour across discourse domains, 125–144. Amsterdam: Benjamins.

Volden, Joanne. 2009. Bossy and nice requests: Varying language register in speakers with autism spectrum disorder (ASD). Journal of Communication Disorders 42(1). 58–73.

Wälchli, Bernhard & Benedikt Szmrecsanyi. 2014. Introduction: The text-feature-aggregation pipeline in variation studies. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.).


Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech, 1–25. Berlin: de Gruyter.

Wardhaugh, Ronald. 2002. An introduction to sociolinguistics. 4th edn. Oxford: Blackwell.Warner, Anthony. 2005. Why DO dove: Evidence for register variation in Early Modern English.

Language Variation and Change 17(3). 257–280.Xiao, Richard. 2009. Multidimensional analysis and the study of World Englishes. World

Englishes 28(4). 421–450.Zlatnar Moe, Marija. 2010. Register shifts in translations of popular fiction from English into

Slovene. In Daniel Gile, Gyde Hansen & Nike K. Pokorn (eds.). Why translation studies matters, 125–136. Amsterdam: Benjamins.

Section I: Specialised registers

The volume opens with five contributions discussing the lexico-grammatical features of previously underdescribed registers, which are situated on different levels in the hierarchy of specificity: web registers, medical discourse, Aviation English, hip-hop and crossword puzzles. The first two registers comprise hetero-geneous sub-registers, as, for instance, a distinction is made among the web registers between interviews, discussion forums, encyclopedia articles, adver-tisements and recipes, while Aviation English is a twofold construct and hip-hop and crossword puzzles constitute relatively uniform categories. All studies can be situated within the analytical register framework described in Biber and Conrad (2009) and examine to what extent their object of inquiry can be considered a register or where the boundaries between more general categories and sub- registers may be drawn. In addition, Dorgeloh’s contribution extends the model by including the genre perspective in the analyses.

The first paper in the volume, Douglas Biber and Jesse Egbert’s study “Towards a user-based taxonomy of web registers”, stands out from the other papers’ corpus-based approaches by its use of a bottom-up design in which inter-net users were asked to identify basic situational characteristics of web docu-ments. These characteristics were then used to construct a hierarchical decision tree, which permitted the successful categorisation of most internet texts by the same type of informants in the next step. Among the most important results of this study are the finding that some sub-registers might be easier to identify than their superordinate category and the observation that a relatively large propor-tion of registers on the internet can be considered hybrid with regard to their communicative purposes.

Hybridity of either form, discourse function or both is also observed by Heidrun Dorgeloh in her study “The interrelationship of register and genre in medical discourse”, which finds hybridity in the three medial registers under consideration: illness blogs, medical case reports and medical case presenta-tions. She argues that the correlations between form and function in medical dis-course are less linked to the communicative situation than to the type of activity and concludes that the notion of genre should be conferred primacy over that of (sub-)registers.

Markus Bieswanger, by contrast, applies a classical Biberian register analy-sis to the field of air traffic communication in his paper “Aviation English: Two dis-tinct specialised registers?”. While the term Aviation English is generally used to designate both the standardised phraseology promoted by the International Civil

18 Section I: Specialised registers

Aviation Organization and the plain English used in exceptional situations where communicative needs transcend the routine repertoire, Bieswanger’s analysis of authentic air traffic communication material manages to demonstrate that these are actually two distinct registers and not just one register with two sub-registers. While Dorgeloh’s and Bieswanger’s material-based approaches place a particular focus on the qualitative analysis of their data in order to explore the boundaries of their particular register(s), the remaining two studies represent quantitative corpus-based studies of specialised corpora.

Rolf Kreyer’s contribution, “‘Now niggas talk a lotta Bad Boy shit’: The reg-ister hip-hop from a corpus-linguistic perspective”, targets a question similar to Bieswanger’s, namely whether hip-hop lyrics should be considered a sub-register of pop song lyrics. Based on a corpus of lyrics from the top albums in the US album charts in 2003 and 2011, Kreyer contrasts a hip-hop sub-corpus with lyrics by rappers and hip-hoppers to the lyrics from the remaining albums. His analyses yield differences regarding the semantically annotated content and some non-standard spellings but particularly regarding the absence of the copula. Kreyer therefore concludes that the language used in hip-hop can be considered a regis-ter in its own right.

The section closes with Teresa Pham’s corpus analysis entitled “The register of English crossword puzzles: Studies in intertextuality”, in which she reaches the conclusion that cryptic and non-cryptic puzzles constitute sub-registers of the general register of crossword puzzles. The differences with regard to the use of intertextuality between the two types of crossword puzzle suggest the addition of intertextuality to the list of linguistic features that can be used to distinguish registers from each other in the Biberian framework.

Douglas Biber and Jesse EgbertTowards a user-based taxonomy of web registers

Abstract: There is a well-established need for a comprehensive taxonomy of English web registers grounded in the actual experiences of end-users. In this paper, we introduce a new grant-funded initiative aimed at filling this gap. We first describe the methods used to develop a hierarchical web register framework and introduce our bottom-up, user-based method of web register classification. Using a hierarchical decision tree, a large sample of webpage URLs (N = 1,000) was classified into register and sub-register categories by four raters each. The results indicate that the approach can be effectively used to identify the register category for most internet texts, although the results also show that many texts belong to ‘hybrid’ registers. The primary goals of the paper are to present the overall distribution of internet texts across general registers, sub-registers and ‘hybrid’ registers, and to discuss some of the key characteristics of the major reg-ister categories. We conclude with a discussion of challenges and future direc-tions for web register research.

1 IntroductionThere is a mind-boggling amount of information available on the World Wide Web. For example, Fletcher (2012: 1) estimates that Google indexes about 40 billion webpages. Although not its intended purpose, the WWW also provides a tremendous resource for linguists, who can use the web as a corpus to investi-gate linguistic patterns of use. This approach has become so prevalent that the acronym WAC (Web-as-Corpus) has now become commonplace among research-ers who explore ways to mine the WWW for linguistic analysis.

One of the major challenges for WAC research is that a typical web search usually provides us with no information about the kinds of texts investigated. For example, Fletcher notes that a linguistic search of the Web-as-Corpus will tell us nothing about:

Douglas Biber, Northern Arizona UniversityJesse Egbert, Brigham Young University

20 Douglas Biber and Jesse Egbert

For whom and what purpose is the text intended? What […] target audience does it repre-sent? Was it written carefully or carelessly by a native speaker, or is it an unreliable transla-tion by man or machine? Is the document authoritative – accurate in content and represent-ative in linguistic form? (2012: 1341)

Similar problems were noted a decade earlier by Kilgarriff and Grefenstette (2003) in their introduction to a special issue of Computational Linguistics on WAC. Thus, they write:

“Text type” is an area in which our understanding is, as yet, very limited. Although further work is required irrespective of the Web, the use of the Web forces the issue. Where research-ers use established corpora, such as Brown, the BNC, or the Penn Treebank, researchers and readers are willing to accept the corpus name as a label for the type of text occurring in it without asking critical questions. Once we move to the Web as a source of data, and our corpora have names like “April03-sample77,” the issue of how the text type(s) can be char-acterized demands attention. (2003: 343)

These concerns are shared widely among WAC researchers, and as a result, there has been a surge of interest over the last several years in Automatic Genre Identi-fication (AGI): computational methods using a wide range of descriptors to auto-matically classify web texts into genre classes. The typical methodology used in an AGI study is to manually identify the genre (or register) of selected internet texts and to then test the extent to which computer programs can automatically place those texts into the same categories. However, although some studies have achieved high accuracy rates (e.g., Lindemann and Littig 2010; Santini 2010), serious questions have been raised about the validity of those results. First, some scholars raise doubts about the representativeness of the web corpora analysed in previous AGI studies: researchers often disregard the question of whether the sample used in an AGI study represents the full population of internet texts (see discussion in Santini and Sharoff 2009).

There have also been questions raised about the actual genre/register cate-gories that we are trying to predict. Most studies have followed the same general procedure: they first begin with a list of possible genre categories; then internet texts are manually classified into those categories by an ‘expert’; and then com-putational methods are used to determine whether those genre categories can be automatically predicted. This approach is based on two assumptions: 1) that researchers have identified the ‘correct’ set of possible genre/register categories found on the web, based on a priori intuitive consideration of internet texts; and 2) that a single expert user is able to ‘correctly’ identify the genre/register cat-egory of individual internet texts. Unfortunately, neither assumption seems to be warranted. The few cases where inter-rater reliability is reported have shown

Towards a user-based taxonomy of web registers 21

that it tends to be quite low, even for linguists. This is especially true for corpora composed of randomly extracted web texts (see discussion in Sharoff, Wu, and Markert 2010). Given the problems that ‘experts’ have identifying web genre cat-egories, it is not surprising that non-expert web users also vary in their under-standing of genre labels (see Crowston, Kwaśnik, and Rubleske 2010) and that reliability among lay users is often unacceptably low (Rosso and Haas 2010).

More importantly, though, it is not clear that the genre categories being pre-dicted in AGI studies are actually valid. This problem has been recognised and discussed in previous research; thus, for example, Rehm et al. (2008: 352) note:

One of the most important problems concerns the elusiveness of the concept of genre. The consequence is that, in practical terms, genre researchers usually have different ideas of what a genre is, how genres should be defined and identified and, therefore, they use dif-ferent genre labels in their approaches.

A few years ago, there was considerable effort to agree on a standard set of register/genre categories for AGI research, as part of a wiki-based collaboration among Web-as-Corpus experts (http://www.webgenrewiki.org/). That collabo-rative effort resulted in a list of 78 register/genre distinctions, but the initiative appears to have faded out in the last few years, with little consensus regarding the relative status of those categories. As a result, there is still no generally agreed-on set of register/genre categories used in current AGI research. (In the remainder of this paper, we use the term ‘register’ rather than ‘genre’ to refer to situational-ly-based textual distinctions, following the research tradition developed in Biber 1995, Biber et al. 1999, Biber and Conrad 2009, etc.).

In the present study, we tackle this problem with a completely different approach: instead of relying on expert coders, we recruit typical end-users of the web for our register analyses, assessing the degree of agreement among those users. Most importantly, we do not force users to choose directly from a pre-defined set of register categories. Rather, we ask users to identify basic situ-ational characteristics of each web document, coded in a hierarchical manner (see below). Those situational characteristics lead to general register categories, which in turn allow users to select a specific sub-register category. By working through a hierarchical decision tree, users are able to identify the register cat-egory of most internet texts with a high degree of reliability.

In Section 2 below, we briefly document the methodological procedures used for this project. (Readers are referred to Egbert and Biber 2013 for more detailed discussions.) In Section 3, we introduce the register framework used for our study. In Section 4, then, we describe the overall prevalence of different types of regis-ters on the web and briefly describe and illustrate some of the major web regis-


ters identified in the study. Section 5 discusses a more specialised type of register identified by users in this study: ‘hybrid registers’. Finally, in the conclusion we outline our on-going research to extend this methodological approach to a large representative corpus of web documents.

2 Methods

2.1 Corpus for analysis

The corpus used for our study was extracted from the Corpus of Global Web-based English (GloWbE), constructed by Mark Davies (see http://corpus2.byu.edu/glowbe/). The entire corpus contains ca. 1.9 billion words and 1.8 million web pages, collected by using the results of Google searches of highly frequent English 3-grams (e.g., is not the, and from the). The use of n-grams as search engine seeds is an approach that has been used in the past by many WAC schol-ars (see, e.g., Baroni and Bernardini 2004; Baroni et al. 2009; Sharoff 2005, 2006). Our decision to use 3-grams (rather than 2-grams or 4-grams) was based largely on empirical evidence from the Longman Grammar of Spoken and Written English (Biber et al. 1999). 2-grams are generally collocations that are semantically-based and likely to result in topic-driven Google search results. 4-grams, on the other hand, are much less frequent than 3-grams and were thus not likely to offer us a broad enough sample of n-grams to choose from. To create the actual corpus, the web pages identified through these random searches were downloaded using HTTrack (http://www.httrack.com). Our ultimate goal in this project is to carry out linguistic analyses of internet texts from the range of web registers. To prepare the corpus for such analyses, non-textual material was removed from all web pages (HTML scrubbing and boilerplate removal) using JusText (http://code.google.com/p/justext). Finally, for the present pilot study, we randomly extracted 1,000 web pages from the larger corpus (with URLs from the US, UK, CA, AU, NZ). Roughly 7 % of the web pages in this initial sample were dropped from the reg-ister analysis: 33 of the 1,000 web sites in the corpus were no longer available at the time of coding and an additional 36 web pages consisted mostly of photos or graphics. Consequently, the results reported below are based on a corpus of 931 web pages.


2.2 Overview of procedures

The study described here is part of a larger project, designed to identify the reg-isters found on the web, document the extent to which each of those registers is actually used and ultimately undertake comprehensive linguistic analyses of those register categories as the basis for automatic register and genre identification.

The first step required to reach these goals was to establish a set of regis-ter distinctions that end-users actually recognise and can reliably identify. This step turned out to be highly challenging, requiring several rounds of pilot testing with end-users. In the process, we reconsidered our basic approach, developing a decision tree of situational characteristics rather than asking users to directly identify the register category of a given internet text. We discuss these register distinctions, and the development of a web classification tool, in Section 3 below.

Once we had developed this tool, and verified that end-users were able to reliably identify the register distinctions built into the tool, we moved on to the larger pilot study to explore the types and distributions of registers found on the web. We recruited 85 raters (typical end-users of the web) to analyse the 1,000 web pages in our pilot corpus. Raters were recruited through Mechanical Turk. Mechanical Turk is an Amazon-based online crowd-sourcing utility that connects Requesters – or people who need small tasks completed by human raters—with Workers – or people who are willing to complete those small tasks for money. Each web page was coded by four independent raters, so we were able to analyse the reliability of the coding. We determined that four was the optimal number of raters as a result of several rounds of pilot research. The choice to use 1,000 URLs was based mostly on practicality and the money available to us. While there was consensus on the coding of the majority of pages, this approach also allowed us to identify the existence of ‘hybrid registers’ (see Section 5 below). Finally, we compiled distributional results from the coding, providing the basis for our pre-liminary description of register variation on the web (Sections 4–5).

3 Register categories distinguished in the study Before undertaking empirical investigation of the registers found on the web, we needed to decide on a set of register categories to be used for the coding. For this purpose, we began with the 78 register/genre categories identified through the wiki-based collaboration of Web-as-Corpus experts (http://www.webgenrewiki.org/; see also the discussion in Rehm et al. 2008). We catalogued the underlying situational characteristics of those 78 categories (e.g., mode, interactivity, commu-


nicative purpose; see Biber and Conrad 2009, Chapter 2), and based on that anal-ysis, we developed a framework with the eight general registers shown in Table 1.

Table 1: General web register categories distinguished in the study

A. Internet texts that originated in the spoken mode (e.g., transcripts of speeches or interviews)

B. Internet texts that originated in the written mode1. Interactive written internet texts 2. Non-interactive written internet texts

2.a. Narratives 2.b. Informational descriptions or explanations 2.c. Overt opinions2.d. Information presented with the intent to persuade2.e. How-to procedures or instructions2.f. Lyrical discourse

In our early pilot studies, we asked non-expert users of the internet to categorise web pages by directly identifying the register category of each page. However, this approach proved problematic, in some cases achieving agreement rates below 50 %. As a result, we developed a more bottom-up approach involving a deci-sion tree with basic situational characteristics. At the top level, we asked users to make a 2-way decision about the mode of production:1. Internet texts that originated in the spoken mode (e.g., transcripts of speeches

or interviews)2. Internet texts that originated in the written mode

Then, for the written texts, we asked users to distinguish between interactive dis-cussions (e.g., discussion forums) versus non-interactive internet texts. Even this simple distinction is often not clear-cut on the web, because authored web docu-ments are often followed by reader comments. We thus made it clear to coders that ‘written interactive discussions’ are distinct from written documents fol-lowed by reader comments, and that coders would be able to note the existence of reader comments for non-interactive texts later in the process. These reader comments are common in web documents. While we do not currently have plans to classify documents with reader comments differently than those without com-ments, coding for their presence makes this a possibility for future analyses.

For the first two general categories above (spoken and interactive written), we immediately asked coders to identify a specific sub-register (see Table 2 below). In both cases, users could select ‘other’ if the page did not fit clearly into one of the existing categories.


For the third general category – written non-interactive internet texts – we asked users to distinguish among general registers based on communicative purpose:– to narrate or report on EVENTS [past, present, or future] – to describe or explain INFORMATION – to express OPINION – to describe or explain FACTS WITH INTENT TO PERSUADE – to explain HOW-TO or INSTRUCTIONS – to express oneself through LYRICS

Then, once a user had selected one of those general categories (2.a.–2.f. in the list above), we asked them to identify the specific sub-register. The full list of general register and specific sub-register distinctions in our framework is listed in Table 2 below.

Table 2: Web registers and sub-registers distinguished in the study

1. Internet texts that originated in the SPOKEN mode– interview– formal speech– transcript of video/audio recording– TV/movie script– other (spoken)

2. INTERACTIVE internet texts that originated in the WRITTEN mode– question/answer forum– discussion forum– reader/viewer responses– other (discussion)

3.–8. Non-interactive internet texts that originated in the written mode

3. NARRATIVES or reports of events [past, present or future]– news report/blog– sports report/blog– personal/diary blog– historical article– short story– novel– biographical story/history– magazine article– memoir– obituary– travel blog– other (narrative)


4. INFORMATIONAL DESCRIPTION or EXPLANATION – description (place, product, organisation, program, job, etc.)– description of a person (including celebrity profiles)– frequently asked questions (FAQ) about information– encyclopedia article– abstract– research article– course materials– informational blog– legal terms and conditions– technical report– other (informational)

5. express OPINION– opinion blog– review (product, service, movie, etc.)– advice– religious blog/sermon– advertisement– self-help– letter to the editor– other (opinion)

6. describe or explain FACTS WITH INTENT TO PERSUADE– description with intention to sell– editorial– persuasive article or essay– other (informational persuasion)

7. explain HOW-TO or INSTRUCTIONS– instructions– frequently asked questions (FAQ) about how to do something– how-to– technical support– recipe– other (instructions)

8. express oneself through LYRICS – poem– prayer– song lyrics– other (lyrical)

Table 2 (continued)


4 Distribution of registers on the webApplying the register classification scheme outlined in the last section, we asked 85 raters to code the register characteristics of 1,000 web pages, with each text being coded by four different raters. As noted above, ca. 7 % of the web pages in our initial sample were dropped from the register analysis (pages that were no longer available or consisted mostly of photos or graphics). Thus, the results reported below are based on a corpus of 931 web pages.

As Table 3 shows, at least three raters were able to agree on the general regis-ter category for 62.7 % of the web pages in our corpus (see Table 3 below). All four raters agreed on the classification of ca. 34 % of the texts, while three of the four raters agreed on the classification of an additional ca. 29 % of the texts. For 11 % of the texts, raters showed a 2-2 split in their classifications. It turned out, though, that many of the specific classifications in these splits occurred repeatedly in the corpus. As a result, we explored the possibility that these common 2-2 splits repre-sent ‘hybrid registers’ on the web. We return to that possibility in Section 5 below.

Table 3: Agreement results for the general register classification of 931 webpages

4 agree 3 agree 2-2 split 2-1-1 split No agreement

Total

315 269 104 173 70 93133.8 % 28.9 % 11.1 % 18.6 % 7.6 % 100 %

Table 4 shows that the levels of agreement were somewhat lower for the coding of specific sub-register categories: raters were able to agree on the sub-register for ca. 43 % of the web pages (with 3 or all 4 raters in agreement), while an additional ca. 8 % of these pages were coded with a 2-2 split.

Table 4: Agreement results for the specific sub-register classification of 931 webpages

4 agree 3 agree 2-2 split 2-1-1 split No agreement

Total

171 231 73 90 366 93118.3 % 24.8 % 7.8 % 9.8 % 39.3 % 100 %


Taken together, the distributional results from the pilot study show that non- expert web users can, to a large extent, reliably classify web pages into general register categories, and that there is substantial agreement even for specific sub-register categories.

The data obtained from this coding process allow us to begin to explore the content of the web, asking what registers are especially prevalent and which ones are relatively rare. Thus, Table 5 shows the breakdown of general register cate-gories (presented in order of frequency) for all 931 texts in our corpus (see Table 3 above). Table 6 shows the breakdown of specific sub-registers within each of these general register categories.

Table 5: Frequency information for general register categories

General Register # %

Narrative 177 19.0Informational Description/Explanation 140 15.0

Interactive Discussion 79 8.5How-to/Instructional 27 2.9Lyrical 19 2.0Informational Persuasion 15 1.6Spoken 6 0.6Hybrid (see Section 5) 277 29.7No agreement 70 7.5Total 931 100

Table 6: Frequency information for sub-register categories

Register # %

Narrative 177

News report/blog 99 55.9Sports report/blog 19 10.7Personal/diary blog 7 4.0Historical article 4 2.3Short story 3 1.7Novel 2 1.1Biographical story/history 1 0.6Joke 0 0Magazine article 0 0Memoir 0 0Obituary 0 0Other factual narrative 0 0


Register # %

Other fictional narrative 0 0Other personal narrative 0 0Travel blog 0 0No agreement on sub-register 42 23.7

Informational Description/Explanation 140

Description of a thing 34 24.3Description of a person 9 6.4Research article 7 5.0Abstract 5 3.6Legal terms and conditions 4 2.9FAQ about information 2 1.4Encyclopedia article 2 1.4Informational blog 2 1.4Course materials 1 0.7Technical report 1 0.7No agreement on sub-register 73 52.1

Opinion 121

Opinion blog 57 47.1Review 23 19.0Advice 9 7.4Religious blog/sermon 5 4.1Self-help 1 0.8Advertisement 0 0Letter to the editor 0 0No agreement on sub-register 26 21.5

Interactive Discussion 79

Question/answer forum 46 58.2Other forum 7 8.9Other discussion 1 1.3Reader/viewer responses 0 0No agreement on sub-register 25 31.6

How-to/Instructional 27

How-to 13 48.1Technical support 2 7.4Recipe 1 3.7Instructions 0 0FAQ 0 0No agreement on sub-register 11 40.7

Table 6 (continued)


Register # %

Lyrical 19

Song lyrics 17 89.5Other 1 5.2Poem 0 0Prayer 0 0No agreement on sub-register 1 5.2

Informational Persuasion 15

Description with intent to sell 8 53.3Persuasive article or essay 2 13.3Editorial 0 0Other 0 0No agreement on sub-register 5 33.3

Spoken 6

Interview 5 83.3Transcript of video/audio 1 16.7TV/movie script 0 0No agreement on sub-register 0 0

Based on the data in our pilot corpus, the most common general internet register is Narrative (19 % of the texts in our corpus; see Table 5). Table 6 shows that ca. 65 % of the texts in this general register were classified as either News report/blogs or Sports reports/blogs. Many of these texts are examples of registers found in print media that have simply been transferred to the web. At first we planned to distinguish news/sports blogs, which have their origin on the web, from news/sports reports that have their origin in print media. In practice, though, it proved nearly impossible to determine whether a news/sports report was originally pub-lished in a print newspaper or whether it had been written specifically for a web blog. As a result, we treat these reports and blogs as a single category (although it was generally easy for raters to distinguish between news reports/blogs versus sports reports/blogs, based on the topic of the text).

The second most frequent general register is Informational Description/Explanation (15 % of the texts in our corpus; see Table 5). However, as Table 6 shows, raters often failed to agree on the specific sub-register for this general category (52 % of the total texts). In future research, we plan to investigate the possibility of hybrid registers at the sub-register level to better understand the nature of these texts.

Table 6 (continued)


Opinion web pages were nearly as common as description pages (see Table 5). Nearly half of these were classified as Opinion blogs (47 %), while another 19 % were classified as Reviews. In general, there was much higher agreement about these sub-register categories of Opinion than there was for the general cate gory of Informational Description/Explanation.

The Interactive Discussion general register was also used relatively fre-quently, and the majority of these texts were classified as Question/Answer forums. Similar to blogs, these are specialised web registers not found in print media.

The other four general register categories – Lyrical, How-to/Instructional, Informational Persuasion and Spoken – occurred much less frequently than the major categories of Narration, Informational Description/Explanation, Opinion and Interactive Discussion. However, it is clear that these registers each comprise one or two important sub-register categories. For example, the specific sub-regis-ters of song lyrics and spoken interviews were especially prevalent.

While some of these general registers and sub-registers are very similar to traditional print registers (e.g., News reports, Sports reports, Reviews, Research articles, Song lyrics), many of them are unique to the domain of the internet. For example, the sub-registers of Personal/diary blogs and Opinion blogs, as well as the general register of Interactive Discussion are distinctive to the internet. Furthermore, some of the web registers that appear to be traditional are actu-ally quite different from their printed, non-internet counterparts. This is due to several factors, including the relative ease of ‘publishing’ on the internet and decreased attention to pre-planning and editing common in many internet regis-ters. In future research, we plan to explore these innovative registers in consider-ably more detail (see Section 6 below).

5 Hybrid registersAt the beginning of Section 4, we noted that many web pages were coded with a 2-2 split. For example, two raters might have coded a given page as a ‘narrative’, while two other raters classified the same page as an ‘informational description/explanation’. One interpretation of these splits is that they simply show a lack of agreement among raters, reflecting a lack of reliability in the register framework. However, the actual distribution of these pairings suggests a different interpreta-tion.

In theory, there are 28 different 2-2 categories that could be formed by com-bining the 8 general register categories in our framework. So, for example, there


are 7 different 2-2 categories that could have been formed by combining ‘narra-tive’ with one of the other categories (narrative-spoken, narrative-interactive discussion, narrative-informational description, narrative-opinion, narrative-in-formation presented with the intent to persuade, narrative-how-to, narrative-lyr-ical). Similarly, there are 21 other pairings of general registers that are theoreti-cally possible.

Given this fact, it is surprising that only four combinations of general registers commonly occurred in 2-2 splits (see Table 7): Narrative+Informational Descrip-tion, Narrative+Opinion, Informational Description+Opinion and Informational Persuasion+Opinion. Other combinations occur in 2-1-1 splits (see Table 8). This restricted set of commonly occurring register combinations suggests an alterna-tive explanation for the lack of agreement among raters: rather than reflecting a problem with the coding rubric, these common 2-2 combinations (and 2-1-1 com-binations) can be interpreted as evidence that these texts belong to ‘hybrid’ reg-isters – registers that combine the communicative purposes and other situational characteristics of two or more general registers.

Evidence for this interpretation comes from the fact that these combina-tions were identified by coders much more often than others. In particular, the frequent hybrid combinations are restricted to four general register categories: Narrative, Informational Description/Explanation, Opinion and Informational Persuasion. These four general register categories are distinguished primarily by their communicative purposes: For example, Table 7 shows that Narrative+Infor-mational Description occurred 43 times, accounting for ca. 41 % of all 2-2 splits. Table 8 shows that Narrative+Description+Other also accounts for ca. 56 % of 2-1-1 splits, further supporting the existence of a hybrid register that combines these purposes.

Table 7: General register 2+2 hybrid combinations

Hybrid Combination (2+2) Count

Narrative + Informational Description/Explanation 43Narrative + Opinion 27Informational Description/Explanation + Opinion 17Informational Persuasion + Opinion 11Informational Description + Informational Persuasion 6Informational Description + How-to/Instructional 4Interactive Discussion + Opinion 4Informational Description + Interactive Discussion 3How-to/Instructional + Opinion 3TOTAL 118

Table 8: General register 2+1+1 hybrid combinations

Hybrid Combination (2+1+1) Count

Narrative + Description + Opinion 56Description + Informational Persuasion + Opinion 40Narrative + Description + Informational Persuasion 28Informational Persuasion + Narrative + Opinion 24Description + How-to/Instructional + Opinion 15Other combinations 10TOTAL 173

Text Sample 1 illustrates a web page from the Daily Mail with combined Narra-tive+Informational Description communicative purposes. Two raters coded the sub-register of this text as a news report/blog and two other raters coded it as a description of people. This text occurs online as a single web page (which is still available on the web, despite its dated content). However, the text comprises a series of topics, demarcated only by the use of ALL-CAPS. (The formatting of the 8th paragraph is corrupted in the original version of the page online, since THURSDAY nights and THE fashionable residents seem to begin new topics.) The title of the page (It’s King Tony to see you, ma’am) seemingly relates only to the first of these embedded topics. Such pages are common on the web (and perhaps becoming more common in print media). They have no single topic or commu-nicative purpose, except maybe to present a bunch of information that the author happens to find interesting or amusing. The information in the page is sometimes descriptive and sometimes narrative, resulting in the hybrid nature of such texts.

Text Sample 1:<http://www.dailymail.co.uk/debate/columnists/article-316674/Its-King-Tony-maam.html>

<h> It’s King Tony to see you, ma’am Tony and Cherie Blair arrived at Balmoral last night for their annual get-together with the Queen and the Duke of Edinburgh. The Blairs have spent the summer touring the West Indies, Italy and Greece, hobnob-bing with celebrities and world leaders, barely spending a penny of their own money. A Royal tour in all but name. The Windsors spent most of the summer pottering unnoticed around Britain. One can’t help wondering why Her Majesty doesn’t just hand over the key to the castle. BORIS JOHNSON is in big trouble with Commons speaker and former sheet-metal worker Michael ‘Gorbals Mick’ Martin. The Tory MP’s new novel features a Commons Speaker who is a “buttockclenching, fat, tactless, Left-wing Scot who eats the traditional sheet-metal worker’s breakfast of black pudding”. Order! Order!

DON’T be taken in by claims that Tory chairman Liam Fox patched up the row over the warning by Karl Rove --George Bush’s aide – that Michael Howard will never be allowed to meet the President. Rove was “too busy” even to speak to Fox at the Republican convention, let alone sit next to him during Bush’s speech, as was claimed. CHERIE BLAIR’S new job as ambassador for Britain’s 2012 Olympic bid has surprised friends who cannot recall her interest in sport. She is being ‘coached’ by her new spin doctor Jo Gibbons, a former Football Association aide. Gibbons is best friends with Jo Moore, the Labour aide who “coached” the former Trans-port THURSDAY nights at London disco, Base 1, situated in a basement beneath the Tory Party’s new HQ in Victoria Street, Westminster, are booming. The club has been “adopted” by smart preppy males who work for the Conservatives and pop downstairs for a sweaty session of high-energy dancing once a week. THE fashionable residents of Suffolk resort Walberswick – including film-maker Richard Curtis and his partner Emma Freud, daughter of ex-MP Clement – may be alarmed to learn the least fashionable member of the Cabinet has moved in. Defence Secretary Geoff Hoon, the kind of man who wears knee-length socks with open-toed sandals on his hols, is a new neighbour. Somehow he mingled with them unnoticed at last week’s summer fete. THE death of spin has been greatly exaggerated. Labour HQ has sent out invitations to MPs summoning them to a series of three all-day training sessions on how to ‘spin’ stories to the media.

It is perhaps not surprising that such texts also often include opinionated pur-poses. (Even Text Sample 1 could be interpreted in that way, although there are few overt lexico-grammatical expressions of stance.) In particular, personal blogs commonly combine narrative and opinionated purposes. For example, Text Sample 2 was coded by two raters as a narrative-personal blog, and by two raters as an opinion blog. A quick read through this text shows both purposes: it begins with a narrative, but it also includes considerable discussion that could be regarded as overt opinion (e.g., my gut is; Here’s one good reason to do that; But I’m already on-side with that argument. It’s time to convince people…; ‘Making the internet happen’ shouldn’t be magic).

Text Sample 2: <http://matthewsheret.com/2011/08/26/time-to-get-out-more/>

<h> Time to get out more So, I’ve been thinking about something else that Laptops and Looms threw up for me. At one point someone -- I think it was Alice Taylor -- remarked that we’re really good at talking about post-digital stuff to one another, but that it’s time to talk to other people. And while many people at the event seemed to think about that in the context of reaching out to manufacturers and discussing new ways of grokking production, my gut is that we should talk more to people totally uninvolved with the whole thing. Here’s one good reason to do that. It was fascinating, hearing what a bunch of people might do if given the opportunity to turn old mills and factories built a hundred and fifty years ago into things that operate in the space between digital interfaces and traditional

manufacture. But I’m already on-side with that argument. It’s time to convince people who’ll have to live with those products and live alongside the places that produce them. Here’s another. Russell jokingly mentioned the ‘Google apprenticeship’ as a means of answering some of the questions floating around the room to do with aspiration, but my gut feeling is that you get people engaged with working in companies like Google when you demystify the whole process. ‘Making the internet happen’ shouldn’t be magic that someone else does anymore, it should be something we show off.<h> Find me at<h> Email me

Finally, informational/descriptive texts often incorporate evaluative language, but they are not uniformly regarded as ‘opinionated’. Text Sample 3 presents an extreme case: a business report on a corporation that begins with an explicit dis-claimer that the blog represents ‘personal opinions’. However, this text is mostly presented as a simple report of information. It overtly identifies ‘strengths’ and ‘weaknesses’, but the information provided appears to be mostly factual descrip-tion. Reflecting these combined purposes, two raters coded this text as an opinion blog, one rater coded it as descriptive information and one coded it as a news report/blog.

Text Sample 3: <http://beta.fool.com/leglamp/2012/11/09/get-a-leg-up-on-the-market/16123/>

<h> Get a LEG Up on the Market AnnaLisa is a member of The Motley Fool Blog Network -- entries represent the personal opinions of our bloggers and are not formally edited. Leggett & Platt (NYSE: LEG ) , the diversified bedspring, automotive, and industrial manufacturer, just announced it would pay its dividend early so that shareholders wouldn’t see a big tax on the dividend usually paid out in January. The early Christmas present goes ex-dividend on Dec. 10, with the dividend to be paid out on Dec. 27. Leggett & Platt seems to be one of the first companies to react to an anticipated tax increase on dividends come 2013. This Standard & Poor’s dividend aristocrat is certainly shareholder attentive, but let’s drill down on this company’s strengths, weaknesses, opportunities, and threats. STRENGTHS The company is extremely shareholder friendly, with dividends paid since 1987, and has more than 25 consecutive years of increasing the dividend. An EPS growth rate of 15 %, and a P/E that currently stands at 21.59. The company is diversified across many industries besides their original status as a bedspring company. It also manufactures retail store fixtures and display units, industrial parts (especially for automotive and aviation), and parts for office and residential furniture. Their latest 10-K states the company plans to maintain a 4-5 % growth rate. The company repurchased 10 million shares in 2011. Their latest Q3 earnings release on Oct. 29 beat with EPS rising 45 % over the same quarter a year ago and reflected strong volume and expanding margins. The yield now stands at 4.20 %

WEAKNESSES The payout ratio on the yield is 90 % , very high for a company that is not a REIT or a master limited partnership. Their P/E is higher than the industry average and higher than the 15.63 P/E of competitor Genuine Parts Company (NYSE: GPC ) While they manufacture most of their steel wire in house, steel is their number one raw material and fluctuations in steel prices are a continuing concern, according to their 10-K. Revenue from international operations dropped due to currency fluctuations. […]

Three-way splits, summarised in Table 8 above, suggest that there might be hybrid registers that combine multiple communicative purposes. The most fre-quent 3-way hybrid is Narrative+Opinion+Description. Text Sample 3 above gives one example of this type. Another example of a 3-way hybrid was coded as a News report/blog (2 raters), a Description of a person (1 rater) and an Opinion blog (1 person). The title of this text is enough by itself to demonstrate the triad of characteristics recognised by raters: ‘On the road: Bradley Wiggins and Team Sky have made Tour de France history – it’s been emotional’. This text is a blog post that recounts a recent news story (Narrative), describes a team of athletes (Description), and recounts the emotions and attitudes of the author (Opinion).

A different kind of hybrid register is extremely common on the web: pages that present a text followed by reader comments. Table 9 shows that this type of hybrid can occur with any of the non-interactive written registers.1 However, it is interesting to note that reader comments are much more likely with some registers than others. In particular, pages expressing opinions or persuasion are especially likely to include reader comments: ca. 60 % of opinion pages and 80 % of informational persuasion pages are followed by reader comments.

1 This option is not applicable to written interactive discussions, which incorporate reader com-ments by definition. We are not sure why transcribed texts of spoken events are not followed by reader comments in our sub-corpus.


Table 9: Frequency information for texts containing reader comments

Register Count % of register with comments

Narrative 87 49.1 %Opinion 86 61.4 %Description 37 30.6 %Informational Persuasion 12 80.0 %How-to/Instructional 8 29.6 %Lyrical 4 21.1 %Spoken 0 0Discussion 0 0

Total 234 --

6 Summary and future directions The approach for register classification adopted here – a bottom-up hierar-chical framework based on underlying situational characteristics – allows us to describe the register characteristics of most web pages. Raters agree on the general register category of ca. 63 % of the web pages included in our corpus (see Table 3 above). Approximately another 25 % of these texts were coded as ‘hybrid’ registers belonging to a few combinations that occur commonly on the web (e.g., Narration + Information Description; Narration + Opinion; see Tables 7 and 8). Taken together, these results indicate that approximately 88 % of web pages can be reliably described for their singular or hybrid register characteristics.

An alternative perspective is to consider the register categories themselves, regarding the extent to which general registers occur in their ‘simple’ state, rather than as hybrids in combination with some other register category. At one extreme, Table 10 shows that interactive discussions (e.g., question-answer forums) and lyrical texts (e.g., songs or poems) usually occur as ‘simple’ registers, with only ca. 30 % of those texts being coded as hybrids in combination with some other register category, a relatively small proportion in comparison with several of the other register categories. At the opposite extreme, Informational Persuasion was almost never identified as the simple register of a web text. However, it was com-monly selected by at least one of the raters, suggesting that this communicative priority frequently occurs in hybrid combinations with other general register cate gories.


Table 10: Extent to which each register category was identified as a simple register (3 or 4 raters in agreement), as a hybrid category (2-2 or 2-1-1 splits), or by only 1 rater

General Register 3-4 raters 2 raters 1 rater Total (100 %)

Narrative 177 (47 %) 109 (29 %) 91 (24 %) 377

Informational Description/ Explanation

140 (30 %) 97 (21 %) 231 (49 %) 468

Opinion 121 (50 %) 114 (47 %) 8 (3 %) 243

Interactive Discussion 79 (69 %) 14 (12 %) 22 (19 %) 115

How-to/Instructional 27 (33 %) 23 (28 %) 33 (40 %) 83

Lyrical 19 (68 %) 3 (11 %) 6 (21 %) 28

Informational Persuasion 15 (8 %) 38 (21 %) 125 (70 %) 178

Spoken 6 (43 %) 8 (57 %) 0 (0 %) 14

Narration, description, exposition and argumentation have long been regarded as core textual distinctions distinguished by their communicative purposes (cor-responding to the rhetorical ‘modes’ of discourse; see Connors 1981). In the reg-ister framework developed here, we divided these distinctions up in a somewhat different way, based on our survey of the kinds of texts found on the web and our early pilot studies to investigate the distinctions that end-users could reliably make (see Sections 2 and 3 above). Thus, we ended up combining ‘exposition’ and ‘description’ into our category of Informational Description/Explanation, while we split ‘persuasion’ into two categories: Opinion (expressing attitudes with little supporting evidence) and Informational Persuasion (a type of exposition with a clear intent to sell or persuade).

However, our preliminary results, summarised in Table 10, indicate that these general register categories are not equally well-defined for end-users. For example, almost half of the texts in our corpus (468 of the 931 texts) were coded as Informational Description/Explanation by at least one rater, suggesting that most texts can be regarded as presenting some kind of description/explanation of information. Texts were also commonly coded as having narrative purposes (377 texts), often in hybrid combinations with other registers.

The results for opinionated/persuasive texts are especially interesting here. On the one hand, the category of simple opinion seems to be relatively well defined: half of the texts classified as such in some way were categorised as simple opinion by 3 or 4 raters. In most other cases, if a text was coded as opinion by two raters, it was coded as narration or description by the other raters. By con-


trast, the category of Informational Persuasion seems especially problematic: it was almost never identified as the simple register of a text, but there were many instances where one rater noted this communicative priority. Over half of those texts were coded as simple opinion by other raters, suggesting that these two general registers are especially difficult to distinguish. Results like this point to the need for more detailed future research focused on these categories.

In our on-going research, we are applying the framework and analytical approach outlined here to a much larger corpus, with over 50,000 texts randomly sampled from the web. That research effort will allow us to investigate the extent to which the patterns described in Sections 4 and 5 above are typical of the web more generally and to undertake more detailed analysis of specific patterns (especially regarding sub-registers and sub-register hybrids). Beyond that, we plan to analyse the lexico-grammatical characteristics of those texts and eventu-ally undertake predictive research for the purposes of automatic register (genre) identification.

One of the major limitations of the hierarchical approach used for these analyses is that specific sub-registers are restricted to a single general register category on an a priori basis. For example, sports blogs are listed only as a sub- register of Narrative; reviews are listed only as a specific sub-register of Opinion; editorials are listed only as a specific sub-register of Informational Persuasion. This approach was motivated by two considerations: 1) previous research had indicated that end-users become overwhelmed when they are required to directly choose from a massive list of specific sub-registers and 2) we therefore believed that general register categories – isolating specific situational characteristics – would be easier to identify than specific sub-registers. However, review of our findings here suggests the need to further explore these decisions.

As a result, we also plan to explore the possibility that some sub-register dis-tinctions might be easier to directly identify than general register distinctions. For example, a particular text might be a clear instance of a sports blog. However, given the design of our coding framework at present, an end-user might never be given the chance to make that simple classification. For example, if a user decided that a text was primarily opinionated rather than narrative, there would be no possibility of subsequently identifying the text as a ‘sports blog’ (see Table 2 above).

To explore this possibility, we plan to recode a set of web pages from our corpus, asking users to directly choose a specific sub-register category. Then, the results of the hierarchical coding will be compared to the results of the direct sub-register coding for those texts. Our expectation is that the two approaches will uncover complementary patterns. For example, we expect to find some texts that clearly belong to a single specific sub-register but combine multiple general


registers (e.g., a sports blog with both narrative and opinionated purposes). We also expect to find some common hybrid sub-register categories that bridge general registers (e.g., a personal blog + opinion blog hybrid; or an editorial + review hybrid). We would not argue that one or the other of these approaches is correct, but taken together, our hope is that we will be able to offer a more com-prehensive description of the incredible range of register variation found on the web.

AcknowledgementsThis material is based upon work supported by the National Science Foundation under Grant No. 1147581. We also thank Anna Gates and Rahel Oppliger for their help with the pilot testing of register classification schemes.

ReferencesBaroni, Marco and Silvia Bernardini. 2004. BootCaT: Bootstrapping corpora and terms from the

web. Proceedings of LREC 2004, 1313–1316. Lisbon: ELDA.Baroni, Marco, Silvia Bernardini, Adriano Ferraresi & Eros Zanchetta. 2009. The WaCky wide

web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43(3). 209–226.

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. 1999. Longman grammar of spoken and written English. London: Longman.

Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.

Connors, Robert J. 1981. The rise and fall of the modes of discourse. College Composition and Communication 32(4). 444–455.

Crowston, Kevin, Barbara Kwaśnik & Joseph Rubleske. 2010. Problems in the use-centered development of a taxonomy of web genres. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 69–84. New York: Springer.

Egbert, Jesse & Douglas Biber. 2013. Developing a user-based method of web register classification. In Stefan Evert, Egon Stemle & Paul Rayson (eds.), Proceedings of the 8th Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013, 16–23.

Fletcher, William H. 2012. Corpus analysis of the World Wide Web. In Carol A. Chapelle (ed.), Encyclopedia of applied linguistics, 1339–1347. Hoboken, NJ, Wiley-Blackwell.

Kilgarriff, Adam and Gregory Grefenstette. 2003. Introduction to the special issue on the Web as Corpus. Computational Linguistics 29. 333–347.


Lindemann, Christoph & Lars Littig. 2010. Classification of Web sites at super-genre level. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 211–235. New York: Springer.

Rehm, Georg, Marina Santini, Alexander Mehler, Pavel Braslavski, Rüdiger Gleim, Andrea Stubbe, Svetlana Symonenko, Mirko Tavosanis & Vedrana Vidulin. 2008. Towards a reference corpus of Web genres for the evaluation of genre identification systems. In Proceedings of the 6th Language Resources and Evaluation Conference, 351–358, Marrakech, Morocco.

Rosso, Mark A., & Stephanie W. Haas. 2010. Identification of Web genres by user warrant. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 47–68. New York: Springer.

Santini, Marina. 2007. Characterizing genres of Web pages: Genre hybridism and individualization. In Proceedings of the 40th Hawaii International Conference on System Sciences (HICSS-40). Hawaii.

Santini, Marina. 2008. Zero, single, or multi? Genre of Web pages through the users’ perspective. Information Processing and Management 44(2). 702–737.

Santini, Marina and Serge Sharoff. 2009. Web genre benchmark under construction. Journal for Language Technology and Computational Linguistics 25(1). 125–141.

Santini, Marina. 2010. Cross-testing a genre classification model for the Web. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models and empirical studies, 87–127. New York: Springer.

Sharoff, Serge. 2005. Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini (eds.), WaCky! Working papers on the Web as Corpus, 63–98. Bologna: Gedit.

Sharoff, Serge. 2006. Open-source corpora: Using the net to fish for linguistic data. International Journal of Corpus Linguistics 11(4). 435–462.

Sharoff, Serge, Zhili Wu & Katja Markert. 2010. The Web library of Babel: Evaluating genre collections. In Proceedings of the Seventh Language Resources and Evaluation Conference, LREC 2010. Malta.

Vidulin, Vedrana, Mitja Luštrek & Matjaž Gams. 2009. Multi-label approaches to Web genre identification. Journal for language technology and computational linguistics 24(1). 97–114.

Heidrun DorgelohThe interrelationship of register and genre in medical discourse

Abstract: This chapter is concerned with medical discourse which is produced beyond the established roles of doctors and patients. The text varieties inves-tigated are all somewhat hybrid, either in form, discourse function, or both. A study based on a small corpus of these texts investigates the presence of features from a narrative discourse mode and finds variable relationships of textual form and textual function, which are then discussed from a genre as well as from a register perspective. While it turns out that the presence of a narrative register crosscuts over specific discourse activities, the genre perspective can explain the nature of this textual variation. It accounts for the pervasiveness of linguistic fea-tures but, more importantly, for the variant discourse functions which apply to the verbalisation of medical experience. In such cases, it is argued, a genre ana-lysis logically subsumes and pre-determines a register analysis.

1 IntroductionMedicine uses a variety of texts since it is both an “area of knowledge […] and the applied practice of that knowledge to medical praxis” (Gotti and Salager-Meyer 2006: 9). Accordingly, most linguistic research on medical discourse focuses either on written genres of the medical profession, such as case reports or medical research articles, or on the speech of medical practitioners and their patients, i.e. on medical encounters or interviews. By contrast, the present study is concerned with text varieties in medicine which are produced beyond the established roles of both speaker groups. It deals with illness blogs, on the one hand, and medical case presentations, including some innovative forms, on the other. These consti-tute, in line with the purpose of the present volume (cf. Schubert, this volume), less established and more hybrid forms of medical case writing and thus provide good cases in point for illustrating new directions in register research. In particu-lar, I will argue for a close interrelationship between register and genre as well as for a primacy of the notion of genre, rather than (sub-)register.

Heidrun Dorgeloh, Düsseldorf University

44 Heidrun Dorgeloh

As laid down in the introduction to this volume, register and genre are dif-ferent perspectives for analysing text variety: the register perspective considers functional correlations of linguistic co-occurrence patterns with variables from the situation of use while the genre perspective refers to properties of entire texts and has a conventional basis (Biber and Conrad 2009: 15; also Schubert, this volume). It results from this distinction that a register analysis rests upon quan-titative co-occurrence patterns in a given situation whereas genre characteristics can actually be quite rare. They contribute to the rhetorical organisation of a text, often occurring only once or in a particular position (Biber and Conrad 2009: 16).

Since textual variation can in principle refer to any level of text classification (Biber 2006: 12) other approaches to register and genre point out that the con-cepts also differ in the level of generality at which they determine situational vari-eties (Giltrow 2010; Dorgeloh and Wanner 2010). The concept of a genre focuses primarily on the discourse goals and purposes (e.g. Martin and Rose 2003; Swales 2004), on the kind of “social action” (Miller 1984); therefore the classification is typically more specific for genres than for registers (Giltrow 2010: 30). More specialised text varieties are also referred to as “sub-registers” (cf. Biber and Gray 2013), but genre studies have emphasised that the textual or social event is an important basis for text classification, thus subsuming in one category a co-pat-terning of setting, structure, and function (Richards and Schmidt 2002: 224). I will argue here that for text varieties of medical discourse, which are often marked by “discourse hybridity” (Sarangi and Roberts 1999; Sarangi 2001; also cf. Biber and Egbert, this volume), a genre perspective in line with these approaches covers the relevant linguistic patterns at a sufficient level of specificity. In particular, I will show that the form-function correlations that one finds have more to do with activity types, such as covered by the concept of genre, than with general situa-tional parameters.

The case studies presented below contrast with more recently developing medical genres. The aim of the analysis is to show that, on the one hand, there are general discourse goals and purposes within medical discourse, notably narration, which crosscut over all the texts investigated. The resulting language variation is covered by the register perspective, since it defines a rather general, presumably universal, register pattern (Biber and Conrad 2009: 259). On the other hand, this pattern serves in a given genre more specific discourse goals, which are expressed by features which need not be frequent nor pervasive. For example, the interactional hybridity of a medical encounter includes a narrative discourse type, but this type is embedded within a more complex social event, in which a doctor fulfils several tasks such as data gathering, relationship building, and edu-cating the patient about diagnoses and treatment (Frankel 2000: 85; also Maseide 2003). This variation within one activity produces more hybrid registers. In such

The interrelationship of register and genre in medical discourse 45

cases, the genre perspective has clear advantages over the register perspective, since it focuses on the social activities going on and hence provides text classifi-cation at a rather low level of generality. However, this means that the concept of genre must be taken beyond the limits of rhetorical conventions.

The chapter is structured as follows: in Section 2, I offer a more detailed consideration of the concepts of register and genre as categories for text clas-sification from a theoretical point of view. Section 3 introduces three varieties of medical discourse: on the one hand, it describes how they are situated with regard to a general narrative dimension of textual variation (level of form); on the other, the texts are discussed as instantiating different genres (level of discourse function and social activity). The resulting profiles of the three functional varie-ties show that the sample texts investigated are all hybrid in either form, function or both. This complex picture is typical for the domain of medicine, and it can be best understood from the genre perspective. Based on these profiles, an analysis of characteristic form-function-relations within the medical register, in particular with regard to narrative features, is provided in Section 4, followed by a conclud-ing discussion in Section 5.

2 Some theoretical issues on register and genreThis section will discuss the concepts and positions relevant for the analysis of the medical text varieties in Sections 3 and 4.

2.1 Register and genre in the context of the study of language variation1

Language variation is conditioned by a variety of social and pragmatic factors. When studied by way of quantitative, corpus-based methodology, there are in principle two research goals that can be pursued: the first is “to describe the vari-ants and use of a word or linguistic structure” and the second “to describe differ-ences among texts and text varieties, such as registers […]” (Biber 2012: 12). While the former approach is variationist in nature, i.e. it presupposes the existence of “formal alternatives which can be considered optional variants, in the sense that they are nearly equivalent in meaning” (Biber et al. 1999: 14), register variation in

1 Cf. also the introduction to Dorgeloh and Wanner (2010).

46 Heidrun Dorgeloh

principle also involves “different ways of saying different things” (Halliday 1978: 35; emphasis added). As a result, the study of textual variation deals with “varia-tion in verbalization [which] is not occasional [… but] UBIQUITOUS” (Croft 2010: 10; emphasis in the original).

This difference allows for some insights regarding the nature of both regis-ters and genres. Rosenbach (2002: 77) proposes the attribute “choice-based” for this type of linguistic variation, in contrast to the “variation-based” perspective, which concentrates on sets of formal variants. The study presented here, and in fact the entire volume, belongs to the choice-based, “text-linguistic” tradi-tion (Biber 2012: 12), which means that the texts themselves are the target of the description and not a predictor for the occurrence of formal variants.2 It results from this approach that register and genre differences are typically “not categor-ical (such that one variety has a certain grammatical element or syntactic con-struction which another has not)” (Kortmann 2006: 603); instead, the choices motivated and reflected by them are “meaningful choices”, in the sense of serving “the […] needs of the language user” (Schulze 1998: 7). As shown below, this applies not only to the occurrence of individual linguistic features, but also to entire patterns of textual form, which can be shared by what are nonetheless distinct text varieties.

Another consequence of the “polyvalent” nature of “grammatical structure in discourse” (Sankoff 1988: 141, emphasis in the original) is that genres, but not reg-isters, are in principle formally “underdetermined” (Giltrow and Stein 2009: 3). Only by virtue of their being “typified responses to situations” (Salmon 2010: 219) do users of a genre generally know what to expect and infer “both the stable and variable aspects of form” (Salmon 2010: 223). For the linguistic variation taking place within them this means that the genre perspective includes both frequently occurring features as well as patterns that occur less pervasively; i.e. the genre perspective logically subsumes, rather than opposes, the register perspective.

2.2 Genre in relation to register and discourse type

Textual variation is “normal in individuals’ linguistic performance” (Honeybone 2011: 167): speakers show “shifts in usage levels” for features associated with the situation, i.e. they switch into specific registers, but they also switch “into and out of genres” (Schilling-Estes 2002: 375). While a register is “associated with a particular situation of use” (Biber and Conrad 2009: 6), the concept of genre

2 A detailed account of the distinction can be found in Biber (2012).


focuses primarily on the discourse goals and purposes, including “culturally rec-ognized” patterns (Coupland 2007: 15) for realising them. As a result, the level of genre classification tends to be lower, i.e. more specific, suggesting that genres can, and typically do, contrast in registers, for example when requiring a certain level of formality or technicality. Use of a certain register is therefore a function of, but not a sufficient condition for, a genre, i.e. the genre perspective is the more encompassing one.

In the text-linguistic tradition, discourse goals and purposes have also led to the establishment of text typologies, which often integrate basic rhetorical types (e.g. Kinneavy 1971; Werlich 1976). The text or discourse type here refer to entire texts; but this tradition is still rather separate from genre analysis, if only due to the fact that they “feature in different studies” (Virtanen 2010: 55). By contrast, corpus-linguistic work (e.g. Biber 1988, 1989) understands text types as “co- occurrence variables” (Eckert and Rickford 2001: 5), i.e. these text types are, much like registers, the outcome of a classification based on linguistic form (Biber 1988: 170). It is a central insight from this corpus-based tradition that genre distinctions do not “adequately represent the underlying text types” (Biber 1989: 6). This finding is further support for the position that genres are to a certain extent underdetermined by, and hence independent of, their form.

The category of discourse type, in contrast to text type, refers more directly to the function of a discourse (Virtanen 2010: 57), but, in contrast to the discourse goal pertaining to a genre, this has traditionally meant a discourse classification based on a limited set of functions; for instance, on a classification of illocutions (e.g. Brinker 2005). It is an important insight from this kind of work that the func-tional discourse types are related in different ways to their linguistic form, since a discourse type can express its function more or less directly (Virtanen 1992a, 2010). Narrative structures, in particular, have been noted to have primary or sec-ondary uses, i.e. they are a textual pattern that “can be put to use in very different genres” (Virtanen 2010 76).3

The analysis of medical texts presented here rests upon such a principled separation of linguistic form, i.e. register features and text structure, and dis-course function. A classification by discourse function leads, at a more general level, to the identification of the discourse type; at a more specific level, it results in genres. The analysis is also based on the assumption that the category of “nar-rative” refers both to a very basic and presumably universal register and text type (Virtanen 1992a; Biber and Conrad 2009) as well as to a widely used discourse type or meta-genre (Fludernik 1996; Smith 2003). In the domain of medicine,

3 Werner (this volume), for example, notes the narrative properties of online text commentaries.

48 Heidrun Dorgeloh

both narrative form and function play a prominent role, since knowledge in this discipline is not just expertise, i.e. “relevant biological and pathological infor-mation”, but is primarily evidence based on human experience (Hunter 1991: 8).

It is interesting to note in this context that recent discussions on medical dis-course have argued quite explicitly in favour of a more “narrative” kind of med-icine (e.g. Charon 2006), emphasising the importance of the individual patient and his or her experience. As a result, there are now genres within the medical register which are innovative particularly with respect to the role of narration. While proper storytelling is absent in professional medical reporting, there are now other types of medical discourse which are more open to narration. This dif-ference, however, does not primarily manifest itself in a more or less extensive use of narrative features. Looking at three different genres from the medical reg-ister in this study, I therefore hypothesise here that 1) a narrative discourse func-tion correlates only insufficiently with a narrative form, and that 2) a discourse purpose other than narration does not necessarily result from the absence of nar-rative form. This in turn suggests that the function or goal of a discourse is not primarily something to be observed in the form of frequencies of occurrence. On a more theoretical level, these findings will lead me to the claim that, with respect to the specific discourse goals and purposes typical of the context of medicine, the target of the description should be the genre, rather than the register.

3 Types of medical discourse

3.1 Sources and voices in medicine

The instances of medical discourse which I will cover in my analysis come from three different sources: illness blogs written by patients, case reports written by doctors, and texts from a special section termed “Clinical Crossroads” of The Journal of the American Medical Association (JAMA). Each of these text varieties is characterised more closely in Sections 3.2 to 3.4. Before discussing these genre profiles, I will first comment on the general nature of the relation between their situational characteristics, in particular the discourse function, and their linguis-tic form.

The three text varieties represent discourse with different perspectives on the topic of disease or illness; i.e. the medical topic is the only situational variable which they share. The texts differ, not only in the different speaker roles of doctor and patient, but, more specifically, in that these groups of authors assume, by different ways of speaking throughout their own discourse, different “voices”


(Mishler 1984: 103). In the professional medical discourse “of disease” (Fleisch-man 2001: 475), such as in case reports, doctors primarily use the voice of medi-cine; however, they also have a doctor’s voice when they occur in the discourse as a participant, for example, when concerned with “information about the patient’s current health condition, […] patient compliance, and […] test results” (Murawska 2012: 71). Patients, by contrast, have primarily a voice of health-related storytell-ing, but over time they also develop a medical competence of their own (Cordella 2004: 119). At some point, diagnosis and further treatment become a collabora-tive effort, which is when patients also use elements of a voice of medicine. The interactional hybridity of medical discourse referred to above is thus primarily a hybridity of voices and it is one of the central variables that guide linguistic vari-ation across all medical text varieties.

By contrast, illness blogs, professional case reports and the discourse jointly produced by doctors and patients for “Clinical Crossroads” (for details, cf. Section 3.4) differ in a variety of other situational variables, especially those per-taining to production circumstances and setting (cf. Biber and Conrad 2009: 40). The text varieties under investigation are therefore not easily subsumed as one single register. However, instead of taking up a principled position about where a register ends, and a new (sub-)register starts, the analysis below rests upon two observations. On the one hand, the verbalisation of a disease or illness leads to a concern with medical case histories, which cuts across general communicative purposes, such as to narrate or to report (cf. Biber and Conrad 2009: 40). Linguis-tically, this is marked by a pervasive presence of linguistic features such as “past tense, communication verbs, third person pronouns, and time adverbials”, i.e. the characteristic features of a narrative dimension of linguistic variation (Biber and Conrad 2009: 259). It is with regard to these features, which arise out of the topic of illness, that the texts share the same register.

On the other hand, although there are recognisably different discourse goals involved in the verbalisation of a case history, the difference between “private” and “public” medicine has always been gradual, as the evolution of medical research writing has also shown (Atkinson 1992: 361–363). While profes-sional medicine has long drifted away from the “rhetoric of immediate experi-ence” (Atkinson 1992: 359), and while published case reports are professional and public, only illness blogs constitute real narratives of personal experience. However, nowadays, with the movement towards a narrative medicine, there are also professional texts which aim at being more “patient-focused” again (Winker 2006: 2888).

Genre categories grasp this mixing of purposes and voices present in such developments, not only due to the level of specificity they refer to, but also because genres are often formally underdetermined and may therefore be com-

50 Heidrun Dorgeloh

posed of hybrid form. This is illustrated in Figure 1, which shows the three text varieties as three different genres, with distinctly different discourse goals and purposes, as the discussion has just shown. On the level of the general commu-nicative purpose, i.e. at a high level of generality, these discourse functions can be described as being narrative, non-narrative, or hybrid. This categorisation links up the genre classification to register variation, because the narrative as dis-course mode (Georgakopoulou and Goutsos 2004: 43–47) is an important aspect of the register in all three cases. As the analysis below will illustrate in detail, the narrativisation of the events (Georgakopoulou and Goutsos 2004: 43) which have to do with the course of an illness is a major source of hybrid form across the three text varieties and therefore explains some pervasive register features. Before turning to the linguistic features and their interrelationship with the genre category in Section 4, the next three subsections will introduce each text variety and the sample texts used in more detail.

Figure 1: Narrative form and function in medical text varieties

3.2 Illness blogs: The patients’ tale

Medical topics are among the ubiquitous contents on the internet (Döring 2003: 19). When patients tell their stories on the web, i.e. when they produce narratives of illness (cf. McCullough 1989: 124), this constitutes, not “a solitary occupation”,

hybrid form andnarrative function

patients’ tale

hybrid formandnon-narrative functionmedical case report

hybrid form andhybrid function

Clinical Crossroads inJAMA


but one which is shaped by the context of “the community of web users” (Page 2012: 45). Patients’ tales in illness blogs are thus more interactive than when elic-ited in medical interviews, and they establish a particularly strong relation to the audience: “the primary function of the comments on the […] blogs is to provide or seek support in the form of shared experience, advice, and encouragement” (Page 2012: 45).

From the point of view of this interactive function, illness blogs qualify as patients’ tales, i.e. proper stories, but not in the first place from a structural point of view. Narrative discourse, in essence, “attempts to sweep narrator and audi-ence into a community of rapport”, i.e. the aim is to move, rather than to inform (Georgakopoulou and Goutsos 2004: 53; also Tannen 1989). This means that, although patients’ tales typically employ a “narrative syntax” (Labov 1997: 3), they show the narrative mode primarily due to the “function of personal inter-est” (Labov and Waletzky 1967: 13; emphasis added). This function rests upon the sharing of the individual experience of illness (Dorgeloh 2012: 263) and dis-tinguishes a patient’s tale, as any other kind of story, from a report, which “is most typically elicited by the recipient […] or in response to circumstances which require an accounting of what went on“ (Polanyi 1985: 10–11).

The examples of the variety of illness blogs come from a website where patients share their stories about a rare neurological disease [SPS: The Real Stories4]. Note that, as its title suggests, the website focuses primarily on the pub-lication of the stories, and not, as other types of illness blogs, on the discussion and commenting of postings on illness (cf. Page 2012). As sample (1) illustrates, the typical structure is that the patients introduce themselves and then turn to the chronology of the events:

(1) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symptoms for the past several years that got worse as the years went on. I was exercising and swimming three times a week and then I started getting more muscle cramps. I went to the doctor and he just told me to take calcium and magnesium and drink more water. It took him a long time to understand that the muscle cramp were extremely painful happening several time a day. I would have abdominal muscle cramps that felt like i was in full-blown labor. They would come on suddenly when I was startled or when I coughed. They would ease up for a few seconds and then just get worse again. Several times my feet and hands would cramp up until they were fully distorted. I did go to a neurologist who seemed to have an idea of what I had but made no effort to diagnosis what I had. He told me that it would not do any good to try to diagnosis my disease and instead gave me all kinds of different pills and most of them did not work well and also caused several side effects. Often when I went to see him I did not feel like he even

4 http://www.stiffpersonsyndrome.net; last accessed on March 30, 2015.

52 Heidrun Dorgeloh

remembered me. I did finally request a new doctor, which has been a Godsend to me and now is treating me with IVIG, which is working well. My symptoms still get worse at times but they are manageable. I am eager to talk to people that have the same syn-drome. Most people do not understand the pain and all the other symptoms. I found your web site today and am eager to learn more. (http://www.stiffpersonsyndrome.net, accessed March 17, 2011)

The proper narrative contained in (1) ends when the course of the events reaches its most recent state. This description of the current situation (My symptoms still get worse at times but they are manageable) serves as a coda and is followed by an explicit mention of the story point. This point relates to the ill person him- or herself, as in (1), or it centres on the social function of the blog by addressing the readers’ interests, as in (2) and (3):

(2) If in any way I can contribute to bringing awareness to this insidious disease I throw in my hat. (Wendy’s story; http://www.stiffpersonsyndrome.net)

(3) I must tell you that neither my wife nor myself ever gave up hope, In fact just the oppo-site. We were very pro active in the treatment of our diseases. […] My prayer is for all of you to see your journey through SMS with the knowledge that there is hope for all. Stay the course, keep the faith, and fight on. (John’s story; http://www.stiffpersonsyn-drome.net)

The story point expressed in (3) shows that the verbalisation of the experience of illness has a strong component of self-reflection and evaluation. Many illness blogs have such properties of “reflective anecdotes” (Page 2012: 58–59) and in that tend towards less purely narrative text forms. It is highly typical that, instead of the completeness of the recount and the degree of detail which one can expect of more trivial narration (Georgakopoulou and Goutsos 2000: 125), patients’ tales often limit themselves to “remarkable event[s], characterized by an evaluative punch line” (Page 2012: 59). As was illustrated in Figure 1, a patient’s tale there-fore possesses hybridity in its narrative form, since it limits the experience which is shared to the main points of interest.

3.3 The medical case report

Case presentations in the form of published case reports are used by medical pro-fessionals “to communicate the salient details of patient cases to one another” (Schryer et al. 2003: 63; also Hurwitz 2006: 217), which means that the texts pursue a predominantly professional discourse goal. On a more general level, the discourse function is thus to inform, i.e. state “verifiable events”, rather than to


move. This function contrasts with the point of personal interest which applies to proper storytelling, which is why the discourse mode in case reports is essentially non-narrative (cf. Georgakopoulou and Goutsos 2004: 53).

The central component of a case report is the case presentation itself. It begins “ritualistically with a brief account of a patient’s complaint as translated by the doctor” (Hurwitz 2006: 234; emphasis added), followed by an account of the examinations, findings, diagnosis and suggestions for treatment. Text (4) exemplifies such an initial case presentation, referring to the same disease as text (1):

(4) A 27-year-old Hispanic woman presented to the University Medical Center Emergency Department in Las Vegas, Nevada with a sudden onset of shortness of breath and increased difficulty in moving her right arm. She reported that during the evening prior to her presentation, she was lying down when she began to experience shortness of breath with worsening right-arm weakness. She also reported that for the past two months her arm weakness was characterized as having limited strength and range of motion. She also complained of chest pains that were localized behind her sternum. The pain was characterized as a pressure sensation that was non-radiating. She did not have any aggravating or relieving factors. Pertinent positive findings included nausea, palpitations and lightheadedness. Pertinent negative symptoms included no loss of consciousness, headache, vomiting, diarrhea, or vertigo. (Journal of Medical Case Reports 4, 2010)

It has been noted that case reports published in journals “reorganize clinical data using a variety of narrativising techniques” (Hurwitz 2006: 217; also Hunter 1990). However, as one can see in (4), from a narratological viewpoint this is only a “degree-zero” narrativity (Fludernik 1996: 358); i.e. although a sequence of events is verbalised, it is “translated” by a medical professional. The result is a discourse which deals with a disease, i.e. which foregrounds the medical facts and assigns “the sufferer […] the experiencer role” (Fleischman 2001: 476). In such a text, the chronology lacks “experientiality” as the central component of narrativity (Fludernik 1996) and is therefore only a hybrid narrative form.

3.4 ‘Clinical Crossroads’ in JAMA

In 1995, JAMA launched the publication of various types of medical discourse within a section titled “Clinical Crossroads”. The contributions in this section follow the organisation of a “Grand Round” in clinical departments, where case presentations are given from various perspectives. These case presentations are later edited and published in the journal. The full process is described as follows (cf. also Dorgeloh 2014):

54 Heidrun Dorgeloh

The Grand Round begins with the case history of a patient and that patient’s firsthand account of the medical decision he or she faced, occasionally along with the patient’s primary care physician’s perspective. These accounts are followed by questions for the Grand Rounds discussant, which the discussant, usually a well-recognized authority on the clinical topic, addresses based on available evidence in the literature, and, where no evidence exists, clinical experience. Following the presentation, the discussant drafts the manuscript for submission to JAMA, including the case description, the patient’s perspec-tive, the discussion (including references and pertinent tables and figures), and the ques-tion-and-answer session that occurred at the end of the Grand Rounds. The manuscript then undergoes editorial evaluation, external peer review, and revision. If the manuscript is revised satisfactorily and determined to have a level of quality appropriate for JAMA, the manuscript is accepted and published in JAMA and usually is featured in Clinician’s Corner. (Winker 2006: 2888)

The idea behind this more innovative medical text variety is to approach a case from various perspectives, including that of the patient. The purpose is not only to offer and exchange information, but to improve medical decisions, which is to be achieved by “aligning the goal of the patient and physician” (Winker 2006: 2888). Since its foundation, the section has been re-structured several times, but the core idea, a joint context for doctors and patients, who contribute different perspectives, has essentially remained unchanged. (5) and (6) are text samples of a patient’s and a doctor’s presenting on the same case:

(5) After I had bladder surgery […], my doctor told me, “I have good news and bad news and good news; it’s not bladder cancer, but the bad news is that it’s something else.” I accepted the complete hysterectomy, which at my age was not disturbing news. But in terms of the treatment and how it was going to affect me, the thing that worried me most was that I kept hearing about nausea, exhaustion, and that I wouldn’t be able to do things. As a result of that, I canceled my teaching for that fall. I remembered being very anxious the first day of chemotherapy because I just didn’t know what to expect. I decided to do the intraperitoneal chemotherapy because it made spatial logic to me. If you are aiming a treatment at the area of the cancer, it was going to get there more rapidly. I probably had some benefit from having had this mode of treatment before I went back to complete the treatment with the IV. Now, I have CAT [computed tomography] scans every 3 to 4 months. I don’t like to go to doctors, my mother never went until she was 80, but I go now because I’ve learned to trust the process, so I keep my appointments. The last time I chatted with the oncolo-gist, I asked him if we could talk about the kinds of symptoms I should look for going forward. What should I expect for myself? (Journal of the American Medical Associa-tion, 4 April 2010; Ms W)

(6) Ms W is a 75-year-old woman with epithelial ovarian cancer. She first developed lower abdominal pain in 2008. After workup for a genitourinary origin of the pain, she was found to have a 13.5 × 11 × 15.5–cm complex right adnexal mass. She had an optimal surgery cytoreductive, with less than 1 cm of peritoneal disease remaining at the end of the procedure. The pathologic findings were consistent with epithelial ovarian cancer


of mixed endometrioid/clear-cell histology. Her uterus, fallopian tubes, and omentum were free of disease. Metastatic adenocarcinoma was noted in the left paracolic gutter and she was diagnosed as having stage IIIC disease. She then started intraperitoneal and intravenous (IV) cisplatin/paclitaxel chemotherapy, which was switched to IV car-boplatin/paclitaxel because of an infection of the intraperitoneal catheter. She was in complete clinical remission after 6 cycles of platinum-based chemotherapy and was then registered in a clinical trial of maintenance abagovomab vs placebo. She is currently not receiving any treatment and is questioning her prognosis and how she should be followed up in the long term. (Journal of the American Medical Association, 4 April 2010; Dr Tess)

The texts in (5) and (6), although from a highly professional medical journal, illus-trate that the discourse is intended for the narrative kind of medicine described in Section 2.2. This situation makes for text varieties that show a more mixed char-acter than illness narratives, as exemplified by (1), as well as case reports, such as (4). On the one hand, both (5) and (6) have a chronological structure, i.e. “degree-zero” narrativity; on the other hand, the patient in (5) shows a degree of expertise and professional competence, a voice of medicine (cf. Section 3.1), which makes the register in the text more similar to professional medical discourse, like (4) and (6). As a result, (5) and (6) possess hybridity in form, i.e. they combine narrative and non-narrative register features.

By contrast, considering the discourse function, the doctor’s motivation in this context is not limited to presenting a case to colleagues. Instead, there is a more personal, though third-party, point in telling the patient’s story, as expressed by She is […] questioning her prognosis and how she should be followed up in the long term. Although throughout the main body of the presentation the doctor uses the voice of medicine, the main purpose is collaboration and a joint effort; the presentation thus comes from the doctor’s voice and carries an indirect, and ulti-mately more hybrid function for a narrative. As the analysis of linguistic features in the corpus study will show, this complex relationships of form and function is reflected by the genre perspective.

4 Register and genre profile of the three types of medical discourse

4.1 Data and research aims

The analysis which follows is based on a small corpus of texts, covering in roughly equal shares the three genres under investigation and amounting to a 3,777 words

56 Heidrun Dorgeloh

total. The exact proportions are included in Table 1. The analysis is intended as a pilot study and rests upon a limited database, but it will demonstrate how the interpretation of findings on register features benefits from a genre perspective. Numerous studies already document the co-occurrence of features from a nar-rative dimension of variation on a quantitative basis (starting with Biber 1988, 1989), among which, most notably, the presence of past tense forms, pronominal reference, and time adverbials. The claim here is that these features, which are pervasive to varying extents in the texts investigated, on the one hand testify the formal hybridity of the genres as illustrated in Figure 1 but, on the other, do not determine the text variety at a sufficient level of specification.

The more integrated genre analysis will be presented in two steps: in 4.1, the register features indicative of a chronology, i.e. past tense narration and time adverbials, are functionally re-interpreted from the point of view of the genre in which they occur. This part of the analysis illustrates that in medical discourse high frequencies of narrative features may in fact correlate with a non-narrative discourse mode. It is argued, in particular, that the dominance of such narrative text form goes beyond the presence of narrative episodes, which is something that applies to many kinds of discourse (e.g. Csomay 2006, 2007; also Werner, this volume), but is specifically motivated by the “object-oriented” discourse goal of the genres investigated here. Section 4.2 then looks at features reflecting the expression of human experience: pronoun usage and choice of subjects. The aim of this section is to show that, rather than in a grammatical form such as pronoun usage, genres with a narrative as opposed to a non-narrative purpose differ in a characteristic way in a use of semantic categories. The more general claim behind both analyses is that genre categories, in the sense of referring to discourse at a relatively low level of generality, are effective beyond both register features as well as textual conventions, but lead to patterns at several levels of analysis. Complex discourse goals, such as the verbalisation of medical experience, are therefore better accounted for from a genre, rather than from a register perspective.

4.2 Degree-zero narrativity in different medical genres

A narrative discourse mode is primarily associated with events that happened in the past and with their temporal sequencing (Georgakopoulou and Goutsos 2000: 125, 2004: 43). For this reason, the primary narrative register features indicative of this degree-zero narrativity (cf. Section 3.4) are the use of past tense narration and of time adverbials (cf. Biber and Conrad 2009: 119).

In Table 1, the proportion of overall text in the narrative, past tense mode is shown as a word count, compared to the amount of text passages containing


other tenses.5 The second feature is the use of time adverbials, which situate the events in their temporal sequence. For example, text (1), shown here as (7), has non-narrative passages (printed in italics) in the beginning and in the closing evaluative comment, serving as a coda, while the main body of the narration is structured in episodes marked by explicit temporal reference (in bold print).

(7) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symp-toms for the past several years that got worse as the years went on. I was exercising and swimming three times a week and then I started getting more muscle cramps. I went to the doctor and he just told me to take calcium and magnesium and drink more water. It took him a long time to understand that the muscle cramp were extremely painful happening several time a day. I would have abdominal muscle cramps that felt like i was in full-blown labor. They would come on suddenly when I was startled or when I coughed. They would ease up for a few seconds and then just get worse again. Several times my feet and hands would cramp up until they were fully dis-torted. I did go to a neurologist who seemed to have an idea of what I had but made no effort to diagnosis what I had. He told me that it would not do any good to try to diagnosis my disease and instead gave me all kinds of different pills and most of them did not work well and also caused several side effects. Often when I went to see him I did not feel like he even remembered me. I did finally request a new doctor, which has been a Godsend to me and now is treating me with IVIG, which is working well. My symptoms still get worse at times but they are manageable. I am eager to talk to people that have the same syndrome. Most people do not understand the pain and all the other symptoms. I found your web site today and am eager to learn more. (http://www.stiff-personsyndrome.net , accessed March 17, 2011)

Table 1 shows the proportion of text passages in the narrative mode across the three genres. While case presentations from the medical case report contain only past tense passages, patients’ tales from the blog, i.e. from a medium that encour-ages reflection and relation-building (cf. Section 3.2), have a lower proportion of the narrative mode. The texts from “Clinical Crossroads” contain the lowest pro-portion of proper narration, which is in line with a discourse goal consisting in, not only the sharing of information, but also in preparing an adequate decision.

5 Instead of counting verb forms, the proportion of narrative as opposed to non-narrative mode is measured in the relative length of the text passages in which past as opposed to non-past tenses are used.

58 Heidrun Dorgeloh

Table 1: Narrative features in three medical genres6

Illness blog Case report Clinical Crossroads

total no. of words 1,126 1,391 1,260

proportion of past tense text passages (by no. of words7)

80 % 100 % 59 %

time adverbials per 100 words ( absolute frequency)

3.11 (35) 1.44 (20) 2.78 (35)

The results are almost opposite for the occurrence of time adverbials: their fre-quency is high in the patients’ tales, including the case presentations in “Clinical Crossroads”, and much lower in case reports. Explicit temporal reference thus seems to be directly related to more personal accounts, i.e. to a narrative, or at least to a partly narrative (hybrid) function (cf. Figure 1). The finding is in line with research which has shown that in proper stories time adverbials do not only carry temporal meaning, but are also text-strategic devices (cf. Virtanen 1992b). Note, however, that this applies, in particular, to time adverbials in sentence-in-itial position, where they mark temporal shifts in the progression of a narrative strategy (Virtanen 2004). For example, in (7) then I started getting more muscle cramps marks the beginning of a new episode, whereas uses of the same tempo-ral adverb in (6) (She then started […] chemotherapy, […]; She […] was then regis-tered in a clinical trial), do not mark a text structure based on temporal sequence and are thus placed sentence-medially. This means that the point of departure is the patient as medical case, and not as a character. The lower amount of time adverbials in case reports thus reflects their “topic-oriented strategy” focussing on the medical case, turning them into an expository, rather than a narrative, text (Virtanen 2010: 66–67).

These two findings together suggest that a differentiated look is necessary when interpreting quantitative results about pervasive linguistic features in their discourse context. In particular, a narrative form and a narrative text function need to be distinguished, as the outline in Sections 3.2 to 3.4 and the illustration in Figure 1 have shown. In the texts investigated, the non-narrative function of the case reports, in the sense of a lack of personal story-point, goes together with an

6 Besides the individual sample texts discussed for illustration in Section 3, the corpus consists of other texts from the same genre, totaling to the amount of words as indicated. 7 As reflected by the use of past tense verbs. As in texts (1) and (5), this also includes the use of the so-called “habitual conditional” (cf. Haiman and Kuteva 2002: 120).


exclusive use of past tense forms, showing that the verbalisation of a chronology of events has a variety of uses (cf. Section 2.2). A narrative function, by contrast, also involves passages in which the narrative mode is absent, since it is evaluative comments, particularly the coda, which verbalise the point of a proper story. In this way, although dominated less by past tense narration, illness blogs as well as case presentations from “Clinical Crossroads” gain their narrative or hybrid function from passages in the non-narrative mode – a form-function complexity which a genre perspective makes understood.

4.2 From register feature to genre feature: Exploring reference in medical discourse

While past tense forms and time adverbials have to do with the past temporality of the events reported, the pervasiveness of pronominal reference, as opposed to more explicit forms of expression, arises from the fact that a narrative verbal-ises human experience (Biber and Conrad 2009: 259, also cf. Neumann and Fest, this volume). The presence of a narrator allows “readers to immerse themselves in a different world and in the life of the protagonists” (Fludernik 2009: 6). The main protagonists in a medical process are the doctor and the patient, reference to them being made, in particular, when the doctor’s voice and the patient’s sto-rytelling are used (cf. Section 3.1). By contrast, the voice of medicine tends to de-focus human experience; turning the language of medical discourse into a more scientific register, which is “object”- rather than “agent”-oriented (Atkinson 1999). It is therefore expected that reference to these different components of an illness correlates in significant ways with the genre of medical discourse.

As Table 2 shows, the frequency of pronouns8 is higher in the text samples with a narrative or hybrid function, i.e. in the patients’ tales and in “Clinical Crossroads”. It is lower, though not very low, in the case reports. This reflects the non-narrative, object-oriented discourse goal of professional medical discourse, although the main object of investigation is nonetheless a human agent. The hybrid narrative form of medical case reports is thus also confirmed by the use of pronominal reference.

Since the use of pronouns as a register feature distinguishes the three genres only insufficiently, Table 2 also presents results of an alternative analysis of the referential patterns one finds in the texts. Looking at the subjects of all (finite and

8 This feature includes personal and possessive (including reflexive) pronouns as well as rela-tive pronouns referring to a noun phrase (and not to a clause).

60 Heidrun Dorgeloh

non-finite) clauses in the corpus, the instances of the (explicit or implicit) sub-jects were categorised as referring to the patient, the doctor, or to the domain of medicine.9 Subjects being the unmarked point of departure of the English clause and therefore more often than not the topic (e.g. Börjars and Burridge 2010: 226), it was assumed that their reference is likely to indicate which voice is talking (cf. Section 3.1) and to what extent the discourse truly focuses on human experience.

Table 2: Pronominal reference and reference of topics in the three genres

Illness blog Case report Clinical Crossroads

personal pronouns per 100 words ( absolute frequency)

10.48 (118) 7.55 (111) 10.64 (134)

clausal topics per 100 words (absolute frequency)

12.79 (144) 8.63 (120) 13.02 (164)

patient as topic 5.60 (63) 3.45 (48) 8.50 (107)

doctor as topic 1.51( 17) 0.86 (12) 0.56 (7)

topic from the domain of medicine 5.16 (58) 4.10 (57) 2.70 (34)

other topics 0.98 (11) 0.22 (3) 1.27 (16)

While the overall frequencies of clausal topics per text category differ mainly for reasons of sentence length, the semantic sub-categorisation contained in Table 2 yields some notable similarities and differences. In particular, illness blogs and case reports are quite similar with respect to their reference to the domain of medicine, and both do not reach the extent of reference made to the patient in “Clinical Crossroads”. Although they pursue opposite, i.e. narrative as opposed to non-narrative, discourse goals and are produced by opposite speaker roles, illness blogs and case reports, which otherwise differ in their use of narrative register features, reveal a striking similarity in this respect.

9 Assuming that every lexical verb gives rise to a clause, each explicit or implicit subject belong-ing to a lexical verb was categorised semantically. The category “patient” includes reference to the person as well as to body parts. The category “medicine” covers symptoms (weakness, pain), reference to the disease, as well as to elements from the diagnosis (tests, findings) or therapy (e.g. medication or treatment). In the majority of cases, these categories were distinct; there were only two instances of a subject referring to both patient and doctor, as in: The last time I chatted with the oncologist, I asked him if we could talk. Subjects like these were counted towards both categories.


By contrast, the texts from “Clinical Crossroads” show a pattern of reference to topics which reflects the discourse goal of aligning the perspectives of the patient and the doctor (cf. Section 3.4). The focus of this discourse is not so much on the domain of medicine, nor on the role of the doctor, but in line with the objective of a “narrative medicine” it represents a true expression of patient-centred medical care (Gerteis et al. 1993). That such a discourse context provides in fact for a new genre becomes particularly evident if one looks at the proportions of topics as used by the patients, as opposed to ones used by the doctors, in Figure 2.

Figure 2: Use of medical topics10 by both speaker groups

The results from Table 1 and Figure 2 make obvious that genres with opposite functions, i.e. illness reports and medical case reports, can in fact be more similar than the ones with a related function, such as patients communicating their illness in different situations. The reason is that different voices are used for communicating illness (cf. Section 3.1), which highlight different aspects of the course of the events. While due to general situational parameters, such as speaker or discourse function, illness blogs and “Clinical Crossroads” are similar in their register usage, they nonetheless differ in their choice of topics. It is this

10 Percentages show the proportion of the four semantic categories in relation to the total of topics as given in Table 2.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

patient in blog doctor in case report patient in CC doctor in CC

other

patient

doctor

medicine

62 Heidrun Dorgeloh

interrelationship of form (register), function, and social context, which for the analysis of medical discourse suggests a primacy of the notion of genre.

5 ConclusionMy analysis of text varieties from medical discourse has intended to show that investigating linguistic variation with a view to genre adds an important perspec-tive to the understanding of form-function relationships in text-linguistic studies. While these commonly rest upon the assumption that “linguistic co-occurrence reflects shared function” (Biber 1989: 5) and present corpus-linguistic evidence for this, the interrelationship of register and genre can only be made explicit by combining the perspectives. Since a genre classifies discourse at a rather low level of generality, especially with regard to the purpose and goal of a discourse, it determines both pervasive linguistic features as well as the choice of discourse topics and semantic categories. Hence, I have argued here that a genre analysis logically subsumes and pre-determines a register analysis.

Genres, especially in the domain of medicine, make regular use of the nar-rative discourse type with its attested register features. This is not surprising, given the acknowledged role of the narrative as a basic text type or meta-genre (cf. Section 2.2). A similar interrelationship underlies the observation that the dividing line between lay and professional communication is also one between narrative and non-narrative discourse (Georgakopoulou and Goutsos 2000). The discussion here has added to this view that one needs to distinguish between nar-rative form and narrative discourse function, and that more professional social and cognitive activities typically go together with more complex (in the sense of more indirect) uses of narrative register variation. Text varieties of this kind are best understood from a genre perspective, which can account for their mixed pur-poses and voices and, thus, their hybridity in register.

ReferencesAtkinson, Dwight. 1992. The evolution of medical research writing from 1735 to 1985. Applied

Linguistics 13. 337–374.Atkinson, Dwight. 1999. Scientific discourse in sociohistorical context: The Philosophical

Transactions of the Royal Society of London 1675–1975. Mahwah, NJ: Lawrence Erlbaum.Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University

Press.Biber, Douglas. 1989. A typology of English texts. Linguistics 27. 3–43.


Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Longman.

Biber, Douglas. 2006. University language: A corpus-based study of spoken and written registers. Amsterdam & Philadelphia: John Benjamins.

Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory 8(1). 9–37.


Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of sub-register. Journal of English Linguistics 41(2). 104–134.

Börjars, Kersti & Kate Burridge. 2010. Introducing English grammar. London: Arnold. Brinker, Klaus. 2005. Linguistische Textanalyse: Eine Einführung in Grundbegriffe und

Methoden. Berlin: Schmidt.Charon, Rita. 2006. Narrative medicine: Honoring the stories of illness. Oxford & New York:

Oxford University Press.Cordella, Marisa. 2004. The dynamic consultation: A discourse analytical study of doctor-

patient communication. Amsterdam & Philadelphia: John Benjamins.Coupland, Nikolas. 2007. Style: Language variation and identity. Cambridge: Cambridge

University Press.Croft, William. 2010. The origins of grammaticalization in the verbalization of experience.

Linguistics 48. 1–48. Csomay, Eniko. 2006. Academic talk in American university classrooms: Crossing the

boundaries of oral‐literate discourse? Journal of English for Academic Purposes 5(2). 117–135.

Csomay, Eniko. 2007. A corpus-based look at linguistic variation in classroom interaction: Teacher talk versus student talk in American University classes. Journal of English for Academic Purposes 6(4). 336–355.

Dorgeloh, Heidrun. 2012. Arztbericht vs. Patientengeschichte: Story point als Genremerkmal im medizinischen Internetdiskurs. In Ansgar Nünning, Jan Rupp, Rebecca Hagelmoser & Jonas Ivo Meyer (eds.), Narrative Genres im Internet: Theoretische Bezugsrahmen, Mediengattungstypologie und Funktionen (WVT-Handbücher zum literaturwissenschaftlichen Studium), 261–276. Trier: WVT.

Dorgeloh, Heidrun. 2014. ‘If it didn’t work the first time, we can try it again’: Conditionals as a grounding device in a genre of illness discourse. Communication & Medicine 11(1). 55–67.

Dorgeloh, Heidrun & Anja Wanner. 2010. Syntactic variation and genre. Berlin & New York: de Gruyter Mouton.

Döring, Nicola. 2003. Sozialpsychologie des Internet. Göttingen: Hogrefe.Eckert, Penelope & John R. Rickford (eds.). 2001. Style and sociolinguistic variation. Cambridge:

Cambridge University Press.Fleischman, Suzanne. 2001. Language and medicine. In Deborah Schiffrin, Deborah Tannen &

Heidi E. Hamilton (eds.), The handbook of discourse analysis, 470–502. Malden, Mass.: Blackwell.

Fludernik, Monika. 1996. Towards a ‘natural’ narratology. London: Routledge.Frankel, Richard M. 2000. The (socio)linguistic turn in physician-patient communication

research. In James E. Alatis, Heidi E. Hamilton & Ai-Hui Tan (eds.), Linguistics, language, and the professions, 81–103. Georgetown: Georgetown University Press.

64 Heidrun Dorgeloh

Georgakopoulou, Alexandra & Dionysis Goutsos. 2000. Mapping the world of discourse: The narrative vs. non-narrative distinction. Semiotica 131(1–2). 112–141.

Georgakopoulou, Alexandra & Dionysis Goutsos. 2004. Discourse analysis: An introduction. Edinburgh: Edinburgh University Press.

Gerteis, Margaret, Susan Edgman-Levitan, Jennifer Daley & Thomas L. Delbanco (eds.). 1993. Through the patient’s eyes: Understanding and promoting patient-centered care. San Francisco: Jossey-Bass.

Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 29–52. Berlin & New York: de Gruyter Mouton.

Giltrow, Janet & Dieter Stein. 2009. Genres in the internet. Amsterdam & Philadelphia: John Benjamins.

Gotti, Maurizio & Françoise Salager-Meyer. 2006. Introduction. In Maurizio Gotti & Françoise Salager-Meyer (eds.), Advances in medical discourse analysis: Oral and written contexts, 9–16. Bern: Peter Lang.

Haiman, John & Tania Kuteva. 2002. The symmetry of counterfactuals. In Joan Bybee & Michael Noonan (eds.), Complex sentences in grammar and discourse: Essays in honor of Sandra A. Thompson, 101–124. Amsterdam & Philadelphia: John Benjamins.

Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold.

Honeybone, Patrick. 2011. Variation and linguistic theory. In Warren Maguire & April McMahon (eds.), Analysing variation in English, 151–177. Cambridge: Cambridge University Press.

Hunter, Kathryn M. 1991. Doctors’ stories: The narrative structure of medical knowledge. Princeton, NJ: Princeton University Press.

Hurwitz, Brian. 2006. Form and representation in clinical case reports. Literature and Medicine 25(2). 216–240.

Kinneavy, James Louis. 1971. A theory of discourse: The aims of discourse. Englewood Cliffs, NJ: Prentice-Hall.

Kortmann, Bernd. 2006. Syntactic variation in English: A global perspective. In Bas Arts & April McMahon (eds.), Handbook of English linguistics, 603–624. Oxford: Blackwell.

Labov, William. 1997. Some further steps in narrative analysis. The Journal of Narrative and Life History 7. 395–415.

Labov, William & Joshua Waletzky. 1967. Narrative analysis: Oral versions of personal experience. In June Helm (ed.), Essays on verbal and visual arts, 12–44. Seattle: University of Washington Press.

Martin, James Robert & David Rose. 2003. Working with discourse: Meaning beyond the clause. London: Continuum.

Maseide, Per. 2003. Medical talk and moral order: Social interaction and collaborative clinical work. Text 23(3). 369–403.

McCullough, Laurence B. 1989. The abstract character and transforming power of medical language. Soundings 72(1). 111–125.

Mishler, Elliot G. 1984. The discourse of medicine: Dialectics of medical interviews. Norwood, NJ: Ablex.

Miller, Carolyn R. 1984. Genre as social action. Quarterly Journal of Speech 70. 151–167.Murawska, Magdalena. 2012. The many narrative faces of medical case reports. Poznan Studies

in Contemporary Linguistics 48(1). 55–75. Page, Ruth. 2012. Stories and social media: Identities and interaction. New York: Routledge.


Polanyi, Livia. 1985. Telling the American story: A structural and cultural analysis of conversational storytelling. Norwood: Ablex.

Richards, Jack C. & Richard W. Schmidt. 2002. Longman dictionary of language teaching and applied linguistics. Harlow, UK: Longman.

Rosenbach, Annette. 2002. Genitive variation in English: Conceptual factors in synchronic and diachronic studies (Topics in English linguistics 42). Berlin & New York: Mouton de Gruyter.

Salmon, William N. 2010. Formal idioms and action: Toward a grammar of genres. Language & Communication 30(4). 211–224.

Sankoff, David. 1988. Sociolinguistics and syntactic variation. In Frederick J. Newmeyer (ed.), Linguistics: The Cambridge survey, 140–161. Oxford: Blackwell.

Sarangi, Srikant & Celia Roberts. 1999. Introduction: Discourse hybridity in medical work. In Srikant Sarangi & Celia Roberts (eds.), Talk, work, and institutional order: Discourse in medical, mediation, and management settings. 61–74. Berlin: Mouton de Gruyter.

Sarangi, Srikant. 2001. Activity types, discourse types and interactional hybridity: The case of genetic counseling. In Srikant Sarangi & Malcolm Coulthard (eds.), Discourse and social life, 1–27. Harlow: Longman.

Schilling-Estes, Natalie. 2002. Investigating stylistic variation. In Jack K. Chambers, Peter Trudgill & Natalie Schilling-Estes (eds.), The handbook of variation and change, 374–401. Oxford: Blackwell.

Schmid, Hans-Jörg. 2013. Is usage more than usage after all? The case of English not that. Linguistics 51(1). 75–116.

Schryer, Catherine, Lorelei Lingard, Marlee Spafford & Kim Garwood. 2003. Structure and agency in medical case presentations. In Charles Bazerman & David R. Russel (eds.), Writing selves/writing societies, 92–96. Fort Collins: WAC.

Schulze, Rainer (ed.). 1998. Making meaningful choices in English: On dimensions, perspectives, methodology, and evidence. Tübingen: Gunter Narr.

Smith, Carlota S. 2003. Modes of discourse: The local structure of texts (Cambridge Studies in Linguistics 103). Cambridge: Cambridge University Press.

Swales, John M. 2004. Research genres: Explorations and applications. Cambridge: Cambridge University Press.

Tannen, Deborah. 1989. Talking voices: Repetition, dialogue and imagery in conversational discourse. Cambridge: Cambridge University Press.

Virtanen, Tuija. 1992a. Issues of text typology: Narrative – a ‘basic’ type of text? Text 12(2). 293–310.

Virtanen, Tuija. 1992b. Given and new information in adverbials: Clause-initial adverbials of time and place. Journal of Pragmatics 17(2). 99–115.

Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 53–84. Berlin & New York: de Gruyter Mouton.

Werlich, Egon. 1976. A text grammar of English. Heidelberg: Quelle & Meyer.Winker, Margaret A. 2006. Clinical crossroads: Expanding the horizons. The Journal of the

American Medical Association 295(24). 2888–2889.

Markus BieswangerAviation English: Two distinct specialised registers?

Abstract: The communication between air traffic controllers and pilots via voice radio is regularly referred to as Aviation English in the literature. Responding to growing international air travel after the Second World War and in reaction to several accidents and incidents at least partly caused by controller-pilot miscom-munication, the International Civil Aviation Organization (ICAO) developed a set of standards and recommended practices concerning language use in air traffic control communication. These ICAO guidelines permit the use of two different and precisely defined varieties of Aviation English: standardised phraseology in most routine situations and plain Aviation English when standardised phra-seology is insufficient to serve an intended transmission. Based on the official ICAO recommendations and the analysis of text excerpts from authentic air traffic control communication, this paper addresses the question whether the two vari-eties currently referred to as Aviation English are distinct registers in the sense of Biber and Conrad (2009). The relationship between the two different inter-pretations of Aviation English in actual controller-pilot communication and the linguistic characteristics of these varieties are investigated and compared. The analysis shows that the two varieties in question are indeed distinct specialised registers and supports the main objective of the volume by demonstrating that adequate register choice is a prerequisite for successful communication, in this case in aviation contexts.

1 IntroductionFor several decades, aviate – navigate – communicate has been widely known as the axiomatic set of any pilot’s duties, particularly during non-routine and emer-gency situations, but also in everyday routine flying. From the point of view of pri-oritisation of tasks in high workload situations, the order implies that the primary concern of any flight crew must be to maintain control over their aircraft, the second most important duty is to make sure that the aircraft moves in the direc-tion it is supposed to fly and the third priority is to communicate the intentions

Markus Bieswanger, University of Bayreuth

68 Markus Bieswanger

of the flight crew to and receive instructions from air traffic control. However, this order does not mean that communication plays an unimportant role in aviation. Despite the highly plausible prioritisation of tasks, it should also be noted that communication is included in the set of the three most important duties of pilots (cf. Kostecka 2007: 13).

As a result of a number of incidents and accidents associated with commu-nication problems as well as several decades of continuous growth of air traffic around the globe, communication issues in air traffic control contexts are cur-rently taken very seriously by the aviation authorities and play a heightened role in pilot and air traffic controller training. The International Civil Aviation Organ-ization explains this as follows:

With mechanical failures featuring less prominently in aircraft accidents, more attention has been focused in recent years on human factors that contribute to accidents. Communi-cation is one human element that is receiving renewed attention. (ICAO 2010: vii)

The renewed interest in air traffic control communication also shows in the desire for an exchange of ideas and expertise between aviation professionals and linguists, as illustrated by the recent volume entitled Aviation Communication: Between Theory and Practice (Hansen-Schirra and Maksymski 2013). Voice-based communication between pilots and air traffic controllers, so-called radiotele-phony, is regularly referred to as Aviation English or at least constitutes a central part of even the broadest definitions of Aviation English. Moder (2013: 227) pro-vides such a broad definition:

Aviation English describes the English used by pilots, air traffic controllers and other per-sonnel associated with the aviation industry. Although the term may encompass a wide variety of language use situations, including the language of airline mechanics, flight attendants, or ground service personnel, most research and teaching focus on the more specialized communication between pilots and air traffic controllers, often called radiote-lephony.

Linguistic publications indeed often adopt a more focused definition of Aviation English as “the language used by pilots and air traffic controllers” (Intemann 2008: 21). The present article follows this definition of Aviation English as the English used in voice-based air traffic control communication, but differs from most previous work in that it does not aim to analyze Aviation English prima-rily to investigate the reasons for miscommunication in air traffic control and the contribution of communication problems to incidents and accidents (cf., e.g., Bieswanger 2013), but to assess the status of Aviation English from the perspec-tive of register research. In the following, this article will give a short overview

Aviation English: Two distinct specialised registers? 69

of the history of English in air traffic control contexts and then go on to answer the question whether the two varieties currently referred to by the term Aviation English are distinct registers which can be categorised as specialised registers in the sense of Biber and Conrad (2009).

2 English in Air Traffic ControlIn 1944, 52 states signed the Chicago Convention, i.e. the first international con-vention on civil aviation. The convention resulted in the foundation of the Interna-tional Civil Aviation Organisation (ICAO), which became a United Nations Agency in 1947. One of the purposes of the ICAO is to provide international standards for air traffic control and safe flight operations, which includes recommendations on language use in pilot-controller communication. These provisions concerning language use and language requirements are primarily defined in Volume II of the Annex 10 to the Convention on International Civil Aviation on Aeronautical Com-munications (ICAO 2001), additional language recommendations are defined in the Annexes 1, 6, and 11. The requirements are further specified in the Manual of Radiotelephony (ICAO 2007a) and the Procedures for Air Navigation Services: Air Traffic Management (ICAO 2007b).

It is mainly as a result of World War II that English was chosen as the basis of the world-wide aviation communication language. It has to be noted that the ICAO recognises national languages and does not forbid the use of languages other than English for local air navigation purposes, provided that all persons involved share that other language. In international aviation, by contrast, the use of English is the rule. Crystal (2003: 108) sums up the reasons for this choice as follows: “[…] they agreed that English should be the international language of aviation when pilots and controllers speak different languages. This would have been the obvious choice for a lingua franca. The leaders of the Allies were English-speaking; the major aircraft-manufacturers were English-speaking; and most of the post-war pilots in the West (largely ex-military personnel) were Eng-lish-speaking.” Regarding the economic, technological, and military dominance of Great Britain and especially the USA at that time, other languages were not a realistic option.

The Chicago convention granted “complete and exclusive sovereignty over the airspace above its territory” (Convention on International Civil Aviation 1944) to each of the contracting states, but also demanded that all contracting states provide adequate regulations for the safety of aviation. The original language of the document is English, but it was translated into French and Spanish as the


two other languages “equal of authenticity” (cf. Convention on International Civil Aviation 1944). Today, there are also translations of the document into Russian, Chinese and Arabic, since these are official languages of the United Nations. Cur-rently the ICAO has 190 member states.

The first version of the Convention on International Civil Aviation (1944) does not include any statements on the question of an international air traffic commu-nication language, but it promises further regulations. Today’s air traffic man-agement procedures are the result of an ongoing evaluation and revision of the first document provided in 1946 by the Air Traffic Control Committee of the Inter-national Conference on North Atlantic Route Service Organisation (cf. ICAO 2007b: vii). This bias towards North Atlantic air traffic has definitely also contributed to the choice of English.

Responding to the constantly growing international air travel after the Second World War and in reaction to several accidents and incidents at least partly caused by controller-pilot miscommunication (cf., e.g., Cushing 1994; Jones 2003), the ICAO developed a set of standards and recommended practices (SARPs) concerning language use in general and the use of English in particular in air traffic control communication (cf. ICAO 2001; ICAO 2007a; ICAO 2007b), which has been adopted by most countries world-wide. For several decades, until about the turn of the century, these SARPs were almost exclusively devoted to the definition of the so-called “ICAO standardized phraseology” (ICAO 2001: 5-1; for a detailed description cf. Section 3.2 below), which is supposed to “provide the tools for communication in most of the situations encountered in the daily prac-tice of ATC [= air traffic control] and flight” (ICAO 2010: 3-5).

More recently, the ICAO has added SARPs concerning the proficiency in plain Aviation English of all pilots and air traffic controllers involved in international aviation (cf. Mathews 2004; Mitsutomi and O’Brian 2004; ICAO 2010). Experi-ence with standardised phraseology had shown that in unusual and unexpected “cases, where phraseology provides no ready made form of communication, pilots and controllers must resort to plain language” (ICAO 2010: 3-5). The moti-vation for the demand of a certain level of proficiency in plain Aviation English by all stakeholders in air traffic control communication was similar to the reasons that had earlier led to the development of the standardised phraseology:

Over 800 people lost their lives in three major accidents […]. In each of these seemingly different types of accidents, accident investigators found a common contributing element: insufficient English language proficiency on the part of the flight crew or a controller had played a contributing role in the chain of events leading to the accident. In addition to these high-profile accidents, multiple incidents and near misses are reported annually as a result of language problems, instigating a review of communication procedures and standards worldwide. (ICAO 2010: 1-1)


As a result of accidents and incidents more or less intimately connected to com-munication problems, currently all pilots and air traffic controllers involved in international aviation have to demonstrate proficiency in plain aviation-related English or plain Aviation English; the required level of proficiency is at least level 4 “operational” on a scale from level 1 “pre-elementary” to level 6 “expert” (ICAO 2010: A-7 and A-8).

To sum up, two varieties of English used for communication between pilots and air traffic controllers are presently referred to by the term Aviation English, namely standardised phraseology, on the one hand, and plain Aviation English, on the other. In this paper, Aviation English will be used as the umbrella term, while standardised phraseology and plain Aviation English will be used to refer to the varieties of Aviation English respectively. The following chapter will apply the classification of Biber and Conrad (2009) to these varieties and investigate whether we are concerned with two distinct specialised registers referred to by the same designation.

3 Registers of Aviation EnglishAccording to Biber and Conrad (2009: 6), “a register is a variety associated with a particular situation of use (including particular communicative purposes.” Biber and Conrad (2009: 6) identify three components of a register analysis: firstly, the situational context of use, i.e. the unique situational characteristics of a certain variety of language use. Secondly, the linguistic analysis, i.e. the description of “typical lexical and grammatical features” (Biber and Conrad 2009: 6) that are pervasive in a variety. Thirdly, the interpretation of the functions of these per-vasive linguistic features in the situational context specified earlier. Section 3.1 will be devoted to a situational analysis of the two registers in question, while Sections 3.2 and 3.3 will describe their linguistic characteristics and their specific functions.

3.1 Situational analysis

As already mentioned, Aviation English consists of standardised phraseology and the use of plain English in aeronautical radiotelephony communication. When applying Biber and Conrad’s (2009: 39) “framework for analyzing situational characteristics,” many similarities and some crucial differences concerning the situational context of these two varieties of Aviation English can be identified.


According to Biber and Conrad (2009: 40), the major situational characteristics of registers are: participants, relations among participants, channel, production circumstances, setting, communicative purposes and topic (cf. also Schubert, this volume).

ParticipantsThe participants in both varieties of Aviation English are identical. The stake-holders in aeronautical radiotelephony communication, i.e. pilots and control-lers engaging in air traffic control communication, are both addressors producing text as well as intended listeners referred to as addressees (cf. Biber and Conrad 2009: 41). Depending on national regulations, it may or may not be legal for out-siders to listen to air traffic control communication, but there is no difference between the two varieties concerning what Biber and Conrad (2009: 42) call “on-lookers”. Since all parameters concerning participants and participation are identical, differences between the use of standardised phraseology and plain Avi-ation English cannot be attributed to this situational characteristic.

Relations among participantsThere are no differences between the two varieties of Aviation English in the rela-tions among participants either. The participants in air traffic control communi-cation directly interact with each other. Usually, one member of the flight crew interacts with one air traffic controller in a dialogue at any given point in time. In both varieties, the social roles of the interlocutors are identical, there are usually no personal relationships between them and all participants share considerable background knowledge about aviation.

ChannelWith channel, Biber and Conrad (2009: 43) mean the binary distinction into the physical modes of speech and writing and what they call the “specific mediums of communication.” Both types of Aviation English are voice-based and thus clearly spoken registers. Written air traffic control communication with the help of a so-called controller-pilot data link is still in its infancy and faces a number of disadvantages that seem to inhibit its more widespread use, such as the ensuing lack of situational awareness of all pilots of surrounding aircraft when messages are exchanged bilaterally between one pilot and one air traffic controller. The specific medium of communication for transmitting speech in air traffic control communication is voice radio. Unlike face-to-face communication, Aviation English thus generally belongs to the types of mediated spoken communication (cf. also setting below).


Production circumstancesAs both kinds of Aviation English are spoken registers, there is typically not much time for speakers to plan what to say next and no possibility to “edit or erase language once it is spoken” (Biber and Conrad 2009: 43). As in all spoken conver-sations, there are certain expectations as to when a speaker has to say something as well as limitations with respect to the length of pauses. Since all pilots a par-ticular air traffic controller is responsible for are tuned to the same frequency and since aviation radio technology does not allow more than one pilot to address the controller at the same time, efficient communication is one of the main concerns in air traffic control communication.

SettingAccording to Biber and Conrad (2009: 44), “the setting refers to the physical context of the communication – the time and place” (original emphasis). As with most spoken communication, the time is shared by the interlocutors in air traffic control communication, as the messages are transmitted instantaneously. Avia-tion English, however, is generally mediated communication and thus the situ-ation is special with respect to place. The participants have a certain knowledge about the place of production of their interlocutor’s speech but do not share the place of production as in face-to-face communication. The quality of transmis-sion in air traffic control communication is one of the reasons for the implemen-tation of SARPs, as it can be adversely affected by weather, distance and other circumstances.

Communicative purposesThe two varieties of Aviation English show their biggest differences in relation to the communicative purposes. It could be argued that both share what Biber and Conrad (2009: 45) call the “general purpose”, i.e. the aim to ensure efficient and effective communication between pilots and controllers, and differ only in the specific purpose. If register status was decided by the general purpose alone, the two varieties of Aviation English could be termed specific “subregister[s]” (Biber and Conrad 2009: 45) of one register. However, according to the ICAO (2001: 5-1), there should be no overlap between these two varieties: “ICAO standardized phraseology shall be used in all situations for which it has been specified. Only when standardized phraseology cannot serve an intended transmission, plain language shall be used.” Considering the fundamentally different and comple-mentary situations of use – routine versus non-routine air traffic control com-munication (cf. ICAO 2010: 3-4, 3-5) – and the considerable linguistic differences between the two varieties, as shown below, it can be argued that we are con-cerned with two distinct, albeit related, registers.


TopicThe situation concerning the factor topic resembles the differentiation of com-municative purposes: the shared general topic of both varieties is aviation, but the specific topics covered are different. While standardised phraseology is con-cerned with the fairly restricted aspects of routine air traffic control issues, plain Aviation English covers a broader range of topics in non-routine situations, such as emergencies as well as other unusual or unexpected contexts. “Topic is the most important situational factor influencing vocabulary choice” (Biber and Conrad 2009: 46) and so it is not surprising that standardised phraseology and plain Aviation English should differ to a large extent at the lexical level (cf. also Sections 3.2 and 3.3).

SummaryWith respect to the situational characteristics of the two varieties of Aviation English, many of Biber and Conrad’s (2009: 40) parameters such as participants, relations among participants, channel, production circumstances and setting are shared by both registers. However, there are clear differences in the commu-nicative purposes and the range of topics covered by standardised phraseology and plain Aviation English respectively, which leads to the conclusion that we are not concerned with sub-registers of a single register. From the perspective of situational characteristics, which “can be definitely specified” (Biber and Conrad 2009: 33) for both registers, standardised phraseology and plain Aviation English can be categorised as two distinct specialised registers.

3.2 Standardised phraseology

In this section, the linguistic features of standardised phraseology and their functions will be presented and discussed. In contrast to many other registers, the functions of the linguistic features of this variety are clearly and explicitly defined. The register that is officially referred to as “ICAO standardized phrase-ology” (ICAO 2001: 5-1) is a variety of English that is used in a precisely defined situational context and characterised by prescribed and pervasive linguistic fea-tures used for a specific function, mainly “for the purpose of ensuring uniformity in RTF [= radiotelephony] communications” (ICAO 2007a: 3-1) and “to provide maximum clarity, brevity and unambiguity” (ICAO 2007a: 3-2). This variety thus fulfils all the criteria of a “specialized register” in the sense of Biber and Conrad (2009).

The ICAO standardised phraseology is precisely defined in several official documents published by the ICAO. The second volume of Annex 10 to the Con-


vention on International Civil Aviation (ICAO 2001) describes “Aeronautical Com-munications”, chapter 12 of ICAO Document 4444 on Air Traffic Management (ICAO 2007b) is devoted entirely to “Phraseologies” and ICAO Document 9432, the Manual of Radiotelephony (ICAO 2007a), provides a collection of illustrations of the recommendations given in the other two documents.

Recommendations exist for all levels of language, including lexicon, grammar and pronunciation. According to Biber and Conrad (2009: 6), “[r]egis-ters are described for their typical lexical and grammatical characteristics” and they state that their “linguistic features are always functional”. Pronunciation features are not included in the list of linguistic features of registers by Biber and Conrad (2009: 6), but since the pronunciation features of the ICAO phraseology are strictly functional and since Biber mentioned phonological features as reg-ister features in an earlier study (cf. Biber 1995: 29), they will also be considered linguistic features of this register and thus be presented in this section.

Lexical characteristicsStandardised phraseology is probably best known for its characteristics at the lexical level. At the heart of this register is a reduced vocabulary consisting of a limited number of words and fixed phrases, each with a single precise meaning in the situational context of routine air traffic control communication.

Section 5.2.1.5.8 of Annex 10 to the Convention on International Civil Aviation (ICAO 2001) contains a brief list of words and phrases that “shall be used in radio-telephony communications as appropriate and shall have the meaning ascribed hereunder.” The list contains key terms of radiotelephony communication, such as affirm for ‘yes’, cleared (cf. Transcript 1) for ‘authorised to proceed [with the air-craft] under the conditions specified’, go ahead (cf. Transcript 4) meaning ‘proceed with your message’ but not ‘proceed with your aircraft’, monitor (Transcript 3) for ‘listen out on (frequency)’ and maintain (cf. Transcript 2) for ‘continue in accord-ance with the condition(s) specified’. Section 12.3 of ICAO Document 4444 on Air Traffic Management (ICAO 2007b) provides a more comprehensive collection of words and phrases to be used in specific circumstances. For example, climb (cf. Transcript 2) is prescribed as the phonetically dissimilar opposite of descend in standardised phraseology, ruling out the use of ascend, which is regularly listed as an antonym of descend in dictionaries of plain English (cf. OALDO 2014). The recommendations even explicitly include words and phrases that should not be used at all. For example, Section 3.1.4 of the Manual of Radiotelephony (ICAO 2007a: 3-1) suggest that “the use of courtesies should be avoided” altogether; however, courtesies such as greeting and parting expressions are often used and tolerated in non-urgent contexts (cf. Trancript 3). Standardised phraseology is thus not among the many text varieties native speakers of a language acquire


“without explicitly studying them” (cf. Biber and Conrad 2009: 2) but has to be learned by both native as well as non-native speakers of English with explicit instruction.

From the lexical perspective, two main characteristics of the special regis-ter referred to as standardised phraseology can be identified. First, in contrast to most other varieties of English – where it is the rule rather than the exception for words to have multiple meanings – each word and phrase has just one specific and precisely defined meaning in aviation phraseology. Other meanings of words which are polysemous in plain English are thus explicitly excluded from this reg-ister and some of the defined meanings of words and phrases in aviation phra-seology do not occur outside of this specialised register. Meanings of words and phrases that do not occur in other registers are called “register markers” (Biber and Conrad 2009: 53). Unlike register markers in many other registers, however, these unique characteristics are strictly functional in standardised phraseology (cf. Biber and Conrad 2009: 55). The second main lexical characteristic of this register is the fact that words and phrases are carefully selected to avoid con-fusion and misunderstandings due to phonetically similar expressions, since “maximum clarity, brevity and unambiguity” (ICAO 2007a: 3-2) are considered the most important aims of the prescription of aviation phraseology.

Grammatical characteristicsAt the grammatical level, standardised phraseology is also characterised by a number of pervasive and frequent “register features” (Biber and Conrad 2009: 53).

With respect to the use of verbs in aviation phraseology, the prescription to use most verbs in the list of essential “words and phrases” in the imperative only is certainly striking (cf. ICAO 2001: 5-6 and 5-7). According to the definitions in this list, verbs such as cancel ‘annul the previously transmitted clearance’, check ‘examine a system or procedure’, contact (cf. Transcript 2) ‘establish communica-tions with …’, disregard ‘ignore’, monitor (Transcript 3) ‘listen out on frequency’, maintain (cf. Transcript 2) ‘continue in accordance with the condition(s) speci-fied’, report ‘pass me the following information …’, and many more can only be used in imperatives, which is certainly a register feature of this variety. Aviation phraseology even prescribes the use of certain words as verbs in the imperative which are not commonly used as verbs and thus not listed in this part of speech in general-use dictionaries, e.g. the verbal use of standby (cf. Transcript 4) meaning ‘wait and I will call you’ (ICAO 2001: 5-7).

Another grammatical feature characteristic of aviation phraseology is the specific prescribed order of elements in an utterance and the high frequency of ellipses, as illustrated by the following authentic example:


Transcript 1:Aerogal seven hundred heavy Kennedy Tower (.) winds calm (.) runway one three left (.) cleared to land(JFK Tower, own transcript, 2010)

In line with the recommendation in Section 5.2.1.6 “Composition of messages” of Annex 10 to the Convention on International Civil Aviation (ICAO 2001), the message uttered by the air traffic controller at JFK International Airport consists of two main parts, the “call” made up of the call sign of the addressee Aerogal seven hundred heavy and the call sign of the originator Kennedy Tower, and the “text” winds calm (.) runway three one left (.) cleared to land, which provides information concerning the weather and contains the instruction that the plane is cleared to land on runway three one left. The fixed structure permits elliptical constructions and the reduction of function words “to a small number of prep-ositions” (Moder 2013: 229; cf. also ICAO 2010: 3-4), as illustrated by the above example.

Overall, the grammatical characteristics of standardised phraseology reflect the dominant functions of pilot-controller communication identified by Mell (2004: 13), which are sharing of information (cf. information on the wind condi-tions in Transcript 1 above), triggering actions, management of the pilot-controller relationship and managing the dialogue. For example, the frequent use of imper-atives is directly linked to the category “triggering actions” as “the core function of pilot-controller communications” (Mell 2004: 13) and the prescribed structure reduces the number of words needed for managing the dialogue between pilot and air traffic controller. Transcript 2 illustrates the importance of imperatives for triggering immediate actions (cf. also the imperatives continue, follow and monitor in Transcript 3), in this case right after the decision of the pilots to abort the landing and initiate a go-around:

Transcript 2:Lufthansa four two four heavy climb [to and] maintain 3000 [feet] (.) fly runway heading […] contact Boston Departure […](Boston Tower, own transcript, 2015; imperatives in bold)

Pronunciation characteristicsThe ICAO publications on standardised phraseology make specific recom-mendations, which leads to additional linguistic features of this register. For example, there are recommendations concerning the pronunciation of numbers and letters. The “Radiotelephony Spelling Alphabet” defines the “desired pro-nunciation” (ICAO 2001: 5-4) of the words representing letters when spelling out “names, service abbreviations and words of which the spelling is doubtful” (ICAO

2001: 5-3). According to the ICAO (2001: 5-4), for example, the letter <z> has to be pronounced as zulu /'zu:lu:/ and <k> has to be realised as kilo /'ki:lo/ (cf. Tran-script 3).

Transcript 3:Delta four twenty-seven (.) good day (.) continue down to kilo kilo [= taxiway KK] (.) follow company [= another Delta jet] seven three seven (.) monitor tower one two three point niner(JFK Ground, own transcript, 2008, my emphasis)

The pronunciation of numbers, which under most circumstances have to be pro-nounced as single digits, is also specified in the recommendations for standard-ised phraseology. Section 5.2.1.4.3 “Pronunciation of Numbers” of Annex 10 to the Convention on International Civil Aviation (ICAO 2001) provides a description of the desired pronunciation of numbers including recommended stress place-ments:

(ICAO 2001: 5-5)

To avoid misunderstandings in radiotelephony communication, some of the recommended pronunciations of numbers are deliberately different from the common pronunciation of these numbers in many varieties of English spoken by native speakers. The prescribed pronunciation features thus have to be learned by native and non-native speakers of English alike. For example, dental frica-tives are regularly replaced by alveolar stops – a recommendation in line with Jenkins’ (2008: 146) recommendations for the so-called “Lingua Franca Core” of English – and so the initial sounds in thousand and three are supposed to be realised as /t/. Unfortunately, these recommendations are “often not adopted by

native speakers of English, who typically pronounce ‘3’ and ‘5’ in the usual plain English way” (Moder 2013: 229–230). This is illustrated by Transcript 3, in which the air traffic controller at JFK International Airport in New York City, most likely a native speaker of English, pronounces <3> “in the usual plain English way” (Moder 2013: 230) but realises <9> as niner.

Unlike for most other registers, there are even provisions concerning the speed of delivery of utterances in Aviation English. The ICAO recommends “an even rate of speech not exceeding 100 words per minute” (ICAO 2001: 5-5) and an even slower rate “[w]hen it is known that elements of the message will be written down by the recipient” (ICAO 2007a: 2-1). Studies, however, have shown that particularly native speakers tend to use a much higher speech rate, often over 200 words per minute, which can lead to misunderstandings and the need for time-consuming clarifications (cf. Bieswanger 2013: 19–20). Silberstein and Dittrich (2003: 9) quote an air traffic controller who admits: “I talk faster, a lot faster – I talk so fast that they have to slow me down because they don’t under-stand me anymore.” Since the speech rate is obviously crucial in Aviation English, all pilots and air traffic controllers have to be trained to develop an awareness of the importance of their speed of delivery.

Ever since its introduction after the Chicago Convention more than half a century ago, the ICAO standardised phraseology has been refined and expanded. The continuous development of standardised phraseology had been based on pilots’ and controllers’ experiences and the analysis of language-related acci-dents, in order to cover more areas of language use in aviation, to adopt new procedures and technologies, and to deal with previously unknown or rare sit-uations. For example, in reaction to recent events, the 15th edition of the ICAO Procedures for Air Navigation Services: Air Traffic Management (ICAO 2007b: xv) adds, among other regulations, new “pilot procedures in the event of unlawful interference” and “procedures related to volcanic ash”.

Pilots and air traffic controllers are constantly urged to use standardised phra-seology and to avoid non-standard communication whenever possible (cf., e.g., ICAO 2001: 5-1; ICAO 2007a: 3-2; ICAO 2010: 2-3; Prinzo et al. 2010: 15). Despite all efforts to regularly update the standardised phraseology, the ICAO also acknow-ledges that “[i]t is not possible, however, to develop phraseologies to cover every conceivable situation” (ICAO 2010: 4-2) and that “plain language shall be used” (ICAO 2001: 5-1) when standardised phraseology is not available to cover the com-municative needs of the stakeholders in air traffic control communication. The following section will describe the use of plain language in such situations and show that plain Aviation English can also be considered a specialised register.


3.3 Plain Aviation English

The use of plain language has never been excluded from the use in pilot-control-ler communication but, quite on the contrary, has always been permitted and used in clearly defined situations in which “standardized phraseology cannot serve an intended transmission” (ICAO 2001: 5-1). As a result of this precise situ-ational context, however, plain Aviation English is fundamentally different from everyday conversations in several respects:

Plain language in aeronautical radiotelephony communications means the spontaneous, creative and noncoded use of a given natural language, although constrained by the func-tions and topics (aviation and non-aviation) that are required by aeronautical radiotele-phony communications, as well as by specific safety-critical requirements for intelligibility, directness, appropriacy, non-ambiguity and concision. (ICAO 2010: 3-5)

Plain Aviation English is thus characterised by features that result from the func-tion it has to fulfil with respect to safety and the topics covered in air traffic control communication. These constraints are the reason for distinctive register features at all linguistic levels, described and illustrated in the following subsections.

Lexical characteristicsThe lexicon of plain Aviation English is less precisely defined than the words and phrases used in standardised phraseology, but at the same time more restricted than, for example, the lexicon of everyday conversation in what could be called plain English. The ICAO recommendations make it very clear that the obvious need for plain language in non-routine situations “should in no way be inter-preted as permission to chat” (ICAO 2010: 4-3). At the lexical level, plain Avia-tion English is thus characterised by words and phrases corresponding to topics related to pilot-controller communication. These topics, which are also addressed in textbooks and courses on plain Aviation English (cf., e.g., Emery and Roberts 2008), include, among others, fields such as technology, health, animals, fire and weather (for a detailed list of domains, cf. ICAO 2010: B5-B8). For example, in-flight medical emergencies often make the use of plain Aviation English ne ces-sary (cf. Transcript 4). In Transcript 4, standardised phraseology is used in the first two transmissions to establish contact but then turns out to be insufficient to serve all of the communicative needs of the pilots. Hence a code-switch takes place and the further three transmissions are carried out in plain Aviation English. The vocabulary in these transmissions, however, is different from plain everyday English in that it is characterised by aviation-related terms such as diversion, declaring emergency and met report.


Transcript 4:American 182 Tokyo Control American one eight twoTokyo Control American one eight two (.) go aheadAmerican 182 Yes sir (.) we are (.) have a possible diversion to Narita [=Tokyo Narita

International Airport] (.) we are not declaring emergency yet but would like Narita weather

[…] Narita airport is closed, Tokyo Haneda is suggested for a possible diver-sion

Tokyo Control American one eight two (.) do you need met report [=weather report] of Haneda?

American 182 Yes sir (.) request met report for HanedaTokyo Control Okay, standby(Tokyo Control, own transcript, 2014)

Grammatical characteristicsThe grammatical structure of plain Aviation English is similar to plain English and only characterised by some tendencies which constitute functionally ori-ented register features. Of the factors mentioned in the quotation above, “conci-sion” (ICAO 2010: 3-5) is certainly one of the main driving forces responsible for these characteristics. Concision is defined as ‘giving only the information that is necessary, using few words’ in the OALDO (2014). In the context of plain Aviation English, this means that the utterances produced by pilots and air traffic control-lers have to be as brief as possible and simply structured. According to Prinzo et al. (2010: 15), the rate of readback errors is affected by “both message length and complexity” and they claim that “controllers should transmit less informa-tion more often.” With reference to concision, it has also been reported that the desire for brevity leads to an influence of standardised phraseology on plain Aviation English, showing in the deletion of function words such as determiners even when not using phraseology (ICAO 2010: 3-6). The last two transmissions in Transcript 4 illustrate this claim, as the determiner the is omitted in both trans-missions before met report.

Pronunciation characteristicsAt the level of pronunciation, plain Aviation English is less restricted than stand-ardised phraseology, as there are no specific recommendations concerning the realisation of individual words and phrases. Other ICAO recommendations con-cerning pronunciation, however, also apply to the use of plain language and make plain Aviation English more restricted than plain English in many other situations. For example, the recommended speech rate of 100 words or less per minute (ICAO 2001: 5-5; cf. above) is also valid for plain Aviation English, which aims for maximum “intelligibility” (cf. ICAO 2010: 3-5), just like standardised phraseology.


This necessity for maximum mutual intelligibility in pilot-controller commu-nication is also the reason for another requirement concerning the pronuncia-tion of plain Aviation English, namely the demand that all pilots and air traffic controllers “must take care to acquire an internationally understood accent or dialect” (ICAO 2010: 5-6). The ICAO does not specify more precisely what is meant by “internationally understood accent” and does not name any recommended accents in particular, but this fairly vaguely defined rule applies to both native speakers and non-native speakers of English. From a functional perspective, such an accent or dialect is a register feature of plain Aviation English and necessary for efficient and effective communication in air traffic control contexts.

4 ConclusionThe above sections have shown that Aviation English is not monolithic and that there is not one but two varieties referred to as Aviation English, namely stand-ardised phraseology and plain Aviation English. Both varieties occur in pre-cisely defined and complementary situations in pilot-controller communication: standardised phraseology covers most routine situations, whereas plain Aviation English is only permitted in non-routine situations. Both varieties share many of the situational characteristics Biber and Conrad (2009: 39) consider “relevant for describing and comparing registers”. They are employed by the same partici-pants, i.e. pilots and air traffic controllers, with identical relations between the participants, use the same channel, face the same production circumstances and share the same setting. The main differences with regard to the situational char-acteristics can be found in the communicative purposes and the topics covered. While both varieties share their general purpose, namely to facilitate efficient and effective air traffic control communication, standardised phraseology is restricted to a limited set of frequently used communicative purposes in routine situations, whereas plain Aviation English covers a whole range of less frequently used and non-routine communicative purposes such as emergencies. A similar pattern can be identified concerning the topics covered by these two varieties: while standardised phraseology covers a restricted but very frequently used set of topics in routine air traffic control communication, plain Aviation English covers a much broader range of air traffic related topics in non-routine situations.

Resulting from the partially different situational contexts, both varieties of Aviation English are characterised by pervasive linguistic features that fulfil spe-cific functions in each of the situations. Standardised phraseology is character-ised by a very precisely defined reduced set of words and phrases, each with a


single prescribed meaning, a grammar marked by ellipsis, short utterances and a frequent use of imperatives, and a prescribed pronunciation of numbers and letters as well as recommendations concerning the speech rate. Reflecting the wider range of communicative purposes and topics covered by plain Aviation English, the lexical, grammatical and pronunciation characteristics are less pre-cisely specified than for standardised phraseology. There are, however, character-istics at all linguistic levels that distinguish plain Aviation English from conver-sations in plain English, such as a reduced lexicon resulting from the restriction of plain Aviation English to the topics related to aeronautical radiotelephony, a grammar determined by the fundamental need for concision and non-ambiguity, and ICAO recommendations concerning the speech rate and the intelligibility of accents and dialects.

In conclusion, considering situational, linguistic and functional character-istics, the analysis presented in this paper shows that both varieties of Aviation English used in pilot-controller communication can be categorised as specialised registers in the sense of Biber and Conrad (2009: 10; 32–33). They are both fun-damentally different from the very general register of conversation, and they are distinct because they differ in their degree of specificity. Compared to plain Avi-ation English, the situational, linguistic and functional characteristics of stand-ardised phraseology can be much more precisely specified. Standardised phrase-ology thus represents one extreme of a continuum of specificity of registers, while conversations would be at the other end. Plain Aviation English could be placed somewhere in between, although certainly in the range of specialised registers and closer to aviation phraseology than to everyday conversations.

In air traffic control communication, routine and non-routine situations alternate constantly, meaning that changes in communicative purpose and the switching between the two specialised registers described in this article are the rule rather than the exception in the work-life of pilots and air traffic controllers (cf. Biber and Conrad 2009: 45). The two specialised registers, standardised phra-seology and plain Aviation English, however, are the only choices permitted in English-language air traffic control situations; plain English – as used in every-day face-to-face or mediated conversations – is not an option and explicitly dis-couraged by the ICAO. Both native speakers and non-native speakers of English have to learn these two specialised registers with explicit instruction, as neither of these specialised registers is among the many registers native speakers acquire “automatically” without any extra effort. The need for situation-specific register selection in air traffic control communication provides yet another example for the fact that the use of the appropriate register in a given situation is the pre-requisite for successful communication.


ReferencesBiber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.

Cambridge: Cambridge Universtity Press.Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge

University Press.Bieswanger, Markus. 2013. Applied linguistics and air traffic control: Focus on language

awareness and intercultural communication. In Silvia Hansen-Schirra & Karin Maksymski (eds.), Aviation communication: Between theory and practice, 15–30. Frankfurt am Main: Peter Lang.

Convention on International Civil Aviation. 1944. Convention on international civil aviation done at the 7th day of December 1944. Original version available at http://www.icao.int/publications/Documents/7300_orig.pdf (accessed 31 January 2014).

Crystal, David. 2003. English as a global language. 2nd edn. Cambridge: Cambridge University Press.

Cushing, Steven. 1994. Fatal words: Communication clashes and aircraft crashes. Chicago: The University of Chicago Press.

Emery, Henry & Andy Roberts. 2008. Aviation English: For ICAO compliance. Oxford: Macmillan.Hansen-Schirra, Silvia & Karin Maksymski (eds.). 2013. Aviation communication: Between

theory and practice. Frankfurt am Main: Peter Lang.ICAO (International Civil Aviation Organisation). 2001. Annex 10: Aeronautical

telecommunications. Volume II. 6th edn.ICAO (International Civil Aviation Organisation). 2007a. Manual of radiotelephony. 4th edn.

ICAO Document 9432-AN/925.ICAO (International Civil Aviation Organisation). 2007b. Procedures for air navigation services:

Air traffic management. 15th edn. ICAO document 4444-ATM/501.ICAO (International Civil Aviation Organisation). 2010. Manual on the implementation of ICAO

language proficiency requirements. 2nd edn. ICAO Document 9835-AN/453.Intemann, Frauke. 2008. ‘Taipei ground, confirm your last transmission was in English … ?’ – An

analysis of Aviation English as a world language. In Claus Gnutzmann & Frauke Intemann (eds.), The globalisation of English and the English language classroom, 76–93. 2nd edn. Tübingen: Narr.

Jenkins, Jennifer. 2008. Teaching pronunciation for English as a Lingua Franca: A sociopolitical perspective. In Claus Gnutzmann & Frauke Intemann (eds.), The globalisation of English and the English language classroom, 145–158. 2nd edn. Tübingen: Narr.

Jones, R. Kent. 2003. Miscommunication between pilots and air traffic control. Language Problems and Language Planning 27(3). 233–248.

Kostecka, Robert. 2007. Aviate—Navigate—Communicate. Transport Canada: Aviation safety letter 2/2007, 12–14.

Live-atc.net. www.live-atc.net. (accessed 19 February 2015)Mathews, Elizabeth. 2004. New provisions for English language proficiency are expected to

improve aviation safety. ICAO Journal 59(1). 4–6, 27.Mell, Jeremy. 2004. Language training and testing in aviation need to focus on job-specific

competencies. ICAO Journal 59(1). 12–14, 27.Mitsutomi, Marjo & Kathleen O’Brien. 2004. Fundamental aviation language issues addressed

by new proficiency requirements. ICAO Journal 59(1). 7–9, 26–27.


Moder, Carol Lynn. 2013. Aviation English. In Brian Paltridge & Sue Starfield (eds.), The handbook of English for specific purposes, 227–242. Malden: John Wiley & Sons.

OALDO (Oxford advanced learner’s dictionary online). 2014. http://oald8.oxfordlearnersdictionaries.com/(accessed 31 January 2014).

Prinzo, Veronika O., Alan Campbell, Alfred M. Hendrix & Ruby Hendrix. 2010. U.S. airline transport pilot international flight language experiences. Report 5: Language experiences in native English-speaking airspace/airports. Technical report DOT/FAA/AM-10/18. Washington, DC: Federal Aviation Administration, Office of Aerospace Medicine.

Silberstein, Dagmar & Rainer Dietrich. 2003. Cockpit communication under high cognitive workload. In Rainer Dietrich (ed.), Communication in high risk environments (Special issue 12 of Linguistische Berichte), 9–56. Hamburg: Buske.

Rolf Kreyer‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpus-linguistic perspective

Abstract: The present paper wants to provide a first corpus-based analysis of one of the most successful kinds of popular music, namely hip-hop. In particular, the paper explores to what extent hip-hop can be regarded as a register in its own right, analysing data drawn from a 200,000-word corpus of the most success-ful hip-hop albums in 2003 and 2011. Taking Biber and Conrad’s (2009) register- defining trias of situation of use, linguistic features, and associated functions as a descriptive framework, it is argued that hip-hop can be warranted the status of a register in its own right indeed.

1 IntroductionIn Western societies, pop songs are an integral part of everyday life: we are sur-rounded by pop songs in the supermarket, in the elevator or when driving a car. Moreover, listening to pop songs is one of the (if not the) most popular pastime among adolescents in America or Western Europe (cf., for instance, Schwartz and Fouts 2003). Given the pervasiveness of pop songs, it is surprising that the scien-tific study of this register does not figure very prominently in linguistics, although pop songs have been given a considerable amount of attention in fields like cul-tural studies.

In this respect, it is telling that none of the major corpora of the English lan-guage provide any lyrics of pop songs. The linguistic analysis of this register is still in its infancy and corpus-linguistic studies are few and far between. An early corpus-based analysis of pop songs is Murphey (1989; cf. also 1990 and 1992). He provides both quantitative as well as qualitative data from a 13,000-word corpus of pop-song lyrics. His main focus, however, does not lie in the description of a register but in the exploitation of pop songs for the learning and teaching of English as a foreign language. A much more ambitious project is the BLUR (Blues Lyrics collected at the University of Regensburg) corpus, which contains 7,341 song

Rolf Kreyer, University of Marburg

88 Rolf Kreyer

texts comprising roughly 1.5 million words (Miethaner 2001, 2005; Schneider and Miethaner 2006). However, this corpus consisting of recordings from the 1920s to the 1940s was compiled as evidence for earlier African American Vernacular English and, accordingly, is only of limited value for the study of pop songs as an important present-day register. More detailed analyses of modern pop songs can be found in Kreyer and Mukherjee (2007) and Kreyer (2012). The former provide a first attempt at describing the major linguistic properties of the register at issue, such as deviant spellings (also cf. Mukherjee 2000) and lexical/lexico-grammat-ical aspects. One focus of their research is on the degree to which pop songs can be considered a written or spoken register. The data show that the register is more spoken-like in general, as is shown in similarities in average word length or the high frequency of the personal pronouns you and I. Interestingly, other features that are typical of spoken language, such as the frequent use of you know as a dis-course marker, were shown not to be that important in pop songs. Kreyer (2012) explores the use of love-related metaphors in pop songs within the framework of conceptual metaphor theory (e.g. Lakoff and Johnson 1980; Kövecses 2002). He finds that, despite the (perhaps) popular assumption that pop songs are clichéd, metaphors in pop songs are quite varied and creative. The most recent register- related study of pop songs is Werner (2012). Since he is interested in small-scale diachronic as well as varietal aspects of pop songs, his corpus consists of two subcorpora, one with British lyrics and the other with American lyrics. The 1,128 songs included in the corpus span the years 1952–2008 and 1946–2005, totalling 171,968 and 170,234 words, respectively (Werner 2012: 23). Werner’s findings also confirm earlier claims about the informal and conversational nature of pop songs lyrics. However, he argues convincingly that subsuming pop song lyrics under the conversational register would go too far. Rather, the low frequencies of typical spoken features such as interjections or non-standard morphosyntactic elements call for a more careful analysis: “the picture of pop-song lyrics as exemplars of spoken/informal register […] had to be […] altered to be thought of as a ‘special’ register” (Werner 2012: 43).

The present paper wants to further contribute to our understanding of pop song lyrics from a register perspective by exploring hip-hop as a potential sub-register. A question that comes to mind is whether pop songs can be regarded as one single monolithic register or whether it makes sense to assume more spe-cific registers covered by the umbrella term ‘pop songs’. Biber and Conrad (2009: 10) claim that “[t]here is no one correct level on which to identify a register” and “that registers can be studied on many different levels of specificity”.

The present paper aims at providing a first corpus-based analysis of one of the most successful (musical) genres among pop songs, namely hip-hop. The label ‘genre’ is also to be understood in its linguistic sense at this point, since

‘Now niggas talk a lotta Bad Boy shit’ 89

we cannot yet be sure that hip-hop constitutes a register. Based on data from an updated pilot version of the Giessen-Bonn corpus of Popular music – GBoP (cf. Kreyer and Mukherjee 2007), the paper explores Biber and Conrad’s (2009: 50) three criteria for register analysis (situational characteristics, linguistic charac-teristics and function; cf. Schubert, this volume) and shows that with regard to all of these, hip-hop must be regarded as a register in its own right.

2 The dataThe data for the present study is taken from an extended pilot version of GBoP. It contains lyrics from the top albums from the US album charts of the years 2003 and 20111. More specifically, for 2003, 48 of the top 52 albums were included. Four albums had to be ignored because they either did not contain any lyrics at all or only contained non-English lyrics. The 2003 lyrics were taken from internet lyric archives or from CD booklets (cf. Kreyer and Mukherjee 2007 for details). The 2003 material has been supplemented by the (English) lyrics of the top 50 albums from 2011. These lyrics were primarily taken from A-Z lyrics (www.azlyrics.com). This site is particularly suitable, since the lyrics it provides are usually reviewed by a number of different users, resulting in a fairly ‘reliable’ version of the texts. In some cases, other archives like metrolyrics (www.metrolyrics.com) or lyrics-freak (www.lyricsfreak.com) had to be consulted.

From this compilation of albums, a subcorpus was compiled of albums that would usually be considered as representative of hip-hop. Of course, the decision whether to include an album or not is not an easy one. The criterion applied was whether the featured artist was primarily considered a rapper/hip-hopper (infor-mation taken from www.discogs.com). Nelly, for instance, is primarily regarded as a rapper, which is why his album Nellyville was included in the corpus, even though it contains tracks that might rather be considered R&B. Stripped by Chris-tina Aguilera, by contrast, was not included, since the performer is not primarily regarded as a rapper or hip-hopper, although some of the songs in her album would fall under that category. Compilation albums were excluded if they fea-tured more than one artist. All in all, the hip-hop corpus contains the lyrics from 18 albums; 9 from 2003 and 9 from 2011. Table 1 shows the composition of the corpus.

1 My first explorations of the development of pop music registers started in 2012 when the data from 2011 was the most recent data available.

90 Rolf Kreyer

Table 1: The corpus analysed in the present study.

Album # words

2Pac – Better Dayz 20,34950 Cent – Get Rich or Die Tryin’ 13,711Chingy – Jackpot 10,475Eminem – The Eminem Show 13,049Ja Rule – The Last Temptation 8,425Missy Elliot – Under Construction 7,360Nelly – Nellyville 13,424Outkast – Speakerboxx/The Love Below 15,043Sean Paul – Dutty Rock 10,163

Total 2003 111,999

Bad Meets Evil – Hell_The Sequel 9,246Eminem – Recovery 15,694Jay Z & Kanye West – Watch the Throne 7,529Kanye West – My Beautiful Dark … 8,407Lil’ Wayne – I am not a Human Being 7,218Lil’ Wayne – Tha Carter IV 11,520Nicki Minaj – Pink Friday 9,492The Black Eyed Peas – The Beginning 7,750Wiz Khalifa – Rolling Papers 7,564

Total 2011 84,420

Total 2003 + 2011 198,387

Since “[t]he analysis of register characteristics […] will generally focus on the comparison of two or more registers” (Biber and Conrad 2009: 36), the hip-hop data will be contrasted with the data from the remaining albums, in the follow-ing referred to as ‘non-hip-hop corpus’ or ‘control corpus’ (cf. Appendix 1 for its composition). Although the number of albums in this control corpus is almost four times as large, the number of words is comparatively small, namely slightly below 350,000.

In all the texts, the original punctuation and spelling deviations were retained. This is particularly important for hip-hop, as spelling conventions are an important means of creating identity (cf. Morgan 2001, 2002 and Olivio 2001). Metatextual comments like verse, chorus or bridge or the identity of the singer in duets, for example, were removed from the text. Choruses were spelt out any time they appeared in the text, i.e. a comment like Chorus [2x] was replaced by a repe-tition of the lines of the chorus. In those cases where it was not clear from the text layout which words are still part of the chorus and which are part of the verse, an

audio version of the song was consulted. Other kinds of repetition were spelt out if they contained words, e.g. a line like She (When she loves) [3x] was represented three times in the corpus (without the [3x], of course). However, if repetitions con-sisted of non-lexical material only, they were not made explicit, e.g. Oooooh oooh ooohohhh [x2]. All texts were stored in .txt format. An example of a text is given in (1) below (note that <Z>, from German Zeilenumbruch, stands for line break).

(1) G-Unit (What) <Z> We in here (What) <Z> We can get the drama popping <Z> We don’t care (What, what, what) <Z> It’s going down (What) <Z> ’Cause I’m around (What) <Z> 50 Cent, you know how I gets down (Down) <Z> What up, Blood? (What) <Z> What up, Cuz? (What) <Z> What up, Blood? (What) <Z> What up, Gangstaaa? </C> What up, Blood? (What) <Z> What up, Cuz? (What) <Z> What up, Blood? (What) <Z> What up, Gangstaaa? <Z>(50 Cent – What Up Gangsta?)

All analyses of the corpus material were conducted by using AntConc 3.2.4 (Anthony 2011) and Wmatrix (Rayson 2003, 2009).

3 Hip-hop – a register in its own right?Following the definition of ‘register’ provided in Biber and Conrad (2009; cf. Schubert, this volume), hip-hop can be regarded as a register in its own right if we can specify a particular situation of use, a particular set of linguistic features and a particular function of these features vis-à-vis the situation of use. This section will discuss the first two of these three aspects. By way of conclusion, possible functions will be explored.

3.1 Situation of use

In many respects, hip-hop and pop songs in general share situational features. For instance, in both cases the channel is identical: the primary mode is (sung) speech and the speech event is captured on a permanent medium (apart from a live concert, of course). Similarly, the settings are identical, e.g. different times and places of communication for the participants. Features of addresser and addressee can be regarded as similar as well, at least on a general level. Pro-duction circumstances might be described as ‘revised and edited’ in both cases, although spontaneous rapping plays an extremely important role in hip-hop culture (e.g. during battlin’ or cypha, i.e. rap competitions).

92 Rolf Kreyer

Alongside these similarities, two aspects are worth considering by which hip-hop and other popsongs diverge, namely topic and relations among partic-ipants. To explore topic-related differences, the corpus-analysis tool Wmatrix (Rayson 2003, 2009) was used. Wmatrix provides web access to the UCREL Semantic Analysis System (USAS), which automatically assigns semantic catego-ries to all of the lexical items in a given corpus. On the whole, the semantic tagger employs 21 broad semantic categories, which are shown in Figure 1.

Ageneral and abstract

terms

B the body and the

individual

C arts and crafts

E emotion

F food and farming

G government and

public

H architecture, housing

and the home

I money and commerce

in industry

K entertainment, sports

and games

L life and living things

M movement, location, travel and transport

N numbers and measurement

O substances, materials, objects and equipment

P education

Q language and

communication

S social actions, states

and processes

T Time

W world and

environment

X psychological actions, states and processes

Y science and technology

Znames and grammar

Figure 1: The semantic categories of USAS (Archer et al. 2002: 2).

On the highest level of specificity a total of 232 category labels is provided. The category E ‘Emotion’, for instance, contains six subcategories, one of these being subdivided into two further sub-classes. Figure 2 shows the structure of the cat-egory ‘Emotion’:


Category Subcategory I Subcategory II Example

E: Emotion E1: General emotion, hysterical

E2: Liking adore, beloved

E3: Calm/Violent/Angry gentle, infuriated

E4: Happy/Sad E4.1: …: Happy amused, cheerful

E4.2: …: Contentment dismay, humour

E5: Fear/Bravery/Shock amazed, dread

E6: Worry/Concern/ Confident

anxious, edgy

Figure 2: The semantic category ‘Emotion’ in USAS (Archer et al. 2002: 10–11).

An example of the semantic tagging can be seen in (2), which shows a few words from Tupac Shakur’s Still Ballin’.

(2) 0000002 510 VV0 Blame Q2.2/G2.2- G2.1 0000002 520 PPH1 it Z8 0000002 530 II on Z5 0000002 540 APPGE my Z8 0000002 550 NN1 mama S4f

The verb blame is tagged as a ‘speech act term’ (Q2.2) and, alternatively, as either ‘general ethics’ (G2.2) or ‘Crime, law and order: Law & order’ (G2.1). The minus sign following G2.2 indicates the lack of ethics. Note that the tags are not given in alphanumerical order; their sequence depends on the likelihood that USAS assigns to each tag. The following three words, it, on, and my are either tagged as ‘pronoun’ (Z8) or ‘grammatical bin’ (Z5). The tag ‘S4f’ for mama tells us that we are dealing with a kinship term, more specifically, female kin.

Like all automatic annotation, semantic annotation is not fully accurate. In particular, hip-hop, with its idiosyncratic spelling and use of words, can lead to problems. For instance, the frequencies of individual semantic categories showed ‘Food and Farming’ (category F) to be a topic of particular relevance for rappers and hip-hoppers – a somewhat counter-intuitive finding. A closer look at the data quickly revealed that this was due to the ambiguity of the string hoe, namely as a farming tool and in the slang use of the term in the sense of ‘promiscuous woman’. Another problem became apparent with the tag G1.2, ‘Politics’: the Patois per-sonal pronoun form dem, which is highly frequent in the lyrics by Sean Paul, was obviously understood as an abbreviation for democrat or related words. Similarly, the form dat (that), presumably misinterpreted as the acronym for digital audio

94 Rolf Kreyer

tape, led to a very high frequency of the semantic category K3, ‘Recorded Sound’, which as a consequence has also been ignored.

Such problematic cases aside, semantic annotation can give us an idea about topics that are comparatively frequent or rare in hip-hop as opposed to other pop songs. To this end, all semantic categories that showed relative frequencies higher than 0.02 % in the hip-hop corpus were checked against the respective categories in the control corpus, i.e. the non-hip-hop corpus. Table 2 provides an overview of some semantic categories that seem especially suited to paint a particular picture of the artists.

Table 2: A sample of semantic categories that are particularly frequent in the hip-hop corpus.

Semantic category Rel. freq. in hip-hop (r1)

Rel. freq. in non-hip-hop (r2)

r1/r2

F3, ‘Cigarettes and Drugs’ 0.1 % 0.02 % 5

G2.1, ‘Crime, Law and Order’ 0.16 % 0.05 % 3.2

G3, ‘Warfare, Defence, Weapons, Army’ 0.34 % 0.12 % 2.83

I1, ‘Money: Generally’ 0.27 % 0.07 % 3.86

I1.1+, ‘Money: Affluence’ 0.03 % 0.01 % 3

I2.1, ‘Business: Generally’ 0.04 % 0.01 % 4

An example of a semantic category that is overrepresented in hip-hop is F3, ‘Cig-arettes and Drugs’. While the hip-hop corpus contains 193 (0.1 %) tokens that are assigned to that category, other pop songs only show 50 cases in 293,410 words (0.02 %); i.e. in hip-hop there are five times as many words relating to cigarettes and drugs than in other pop songs. An arguably related category is G2.1, ‘Crime, Law and Order’, whose relative frequency in the hip-hop corpus is 3.2 times that of the control corpus, namely 0.16 % as opposed to 0.05 %. Another compara-tively frequent hip-hop category is G3, ‘Warfare, Defence, Weapons, Army’, which is over 2.8 times more frequent in hip-hop than in other pop songs, namely 0.34 % as opposed to 0.12 %. In addition to topics related to crime, drugs and weapons, questions of wealth and money seem to play an important role in hip-hop: the categories ‘Money: Generally’ (I1), ‘Money: Affluence’ (I1.1+) and ‘Business: Gen-erally’ (I2.1) all are at least three times more frequently attested here than in the control corpus.

It can be argued that the overrepresentation of the above categories serves to paint a particular picture of the hip-hop artist as an independent, successful and rich person that is involved in (gun) fights and crime. This image that emerges


from the semantic categories is in line with analyses from rap and hip-hop videos. Jones (1997: 353), for instance, claims that rap music shows a high amount of “socially questionable behaviors [… like] guntalk, drugtalk, the presence of alcohol, bleeping of profanity, and gambling” (Jones 1997: 353; cf. also DuRant et al. 1997; Smith and Boysen 2002; Kreyer 2015). On the whole, it could be argued that the topics explored in hip-hop promote a ‘bad boy’ image of the artist.

In addition to topic-related contrasts between pop songs and hip-hop, another major difference seems to lie in the relations among the participants, which, in turn, has a bearing on the communicative purpose of hip-hop as opposed to other pop songs. Relations among participants, are described along four dimen-sions, namely interactiveness, social roles, personal relationship, and shared knowledge, in Biber and Conrad’s (2009) approach. With regard to this variable, hip-hop seems to obtain a special status. Spady et al. (1999: 67) provide the fol-lowing quote from the rapper Method Man: “The streets is where you get you stripes at”. This hints at the important role of street credibility, i.e. a hip-hopper’s being close to his or her cultural backgrounds in ‘the streets’. Alim (2006: 113) writes: “Hip-hop Culture not only began in the streets of Black America, but the streets continue to be a driving force in contemporary Hip-hop Culture.” Although successful hip-hop artists, like any other kind of successful pop singer, mostly interact with a displaced audience, “[t]he members of the Black American Street Culture, to whom the artists are directing their lyrics, are not physically present, yet they are in conversation” (Alim 2006: 123). This hints at a relatively high level of (maybe abstract) interactiveness that might not be typical of other pop songs. Similarly, the artists’ focus on street identity and group solidarity seems to have important consequences on the other three dimensions of participant relations: artists assume a relation with the members of their audience that can be char-acterised by relative similarity of status, a huge amount of shared knowledge (which has been gained on the streets) and a personal relationship that would be described as friends or brothas and sistas, rather than that of star and fan as in many other pop music genres. This special relation of artist and audience leads to an additional communicative purpose, namely that of “staying street”, i.e. of staying connected to the streets and to their cultural background. Hip-hoppers use their art to “represent ‘the streets’” but at the same time “to connect with the streets as a space of culture, creativity, cognition, and consciousness” (Alim 2006: 124). A particularly impressive example of this is provided by JaRule’s Con-nected from the album The Last Temptation.

96 Rolf Kreyer

(3) We world wide connected, and ya’ll don’t want to fuck with us In the streets we respected, so ya’ll don’t want to fuck wit us World wide connected nigga, ya’ll don’t want to fuck wit us We gangster ass niggas and we hard to hit Murder Inc in the role who could fuck wit this

On the whole, then, the situational characteristics of hip-hop and other pop songs warrant the status of hip-hop as a register in its own right.

3.2 Linguistic features

This section discusses orthographical, lexical and grammatical phenomena as possible register features/markers.

3.2.1 Orthographic features – -er/-a and -s/-z

Non-standard spelling is a common feature in written hip-hop culture, which according to Beers Fägersten (2008: 227) “permeate[s] nearly all word types”. Of the 10 most frequent words in her corpus, all of them grammatical, of course, seven have non-standard alternatives, including the pairs the/da, you/u and that/dat. In addition, she finds final orthographic -a as a substitution for both mor-phemic and non-morphemic -er, as in rappa, younga and holla, neva, respectively. Whereas this example of idiosyncratic spelling represents non-standard phonol-ogy, the frequently occurring word-final -z is usually used as a spelling variant that represents standard phonology more precisely than the standard spelling -s.

In his study on spelling conventions in rap music, Olivio (2001: 73) distin-guishes between two types of non-standard orthography, namely spelling var-iants that represent “distinctive features of AAVE [African American Vernacu-lar English] phonology and syntax” and those that do not. He argues “that the meaning of the non-standard orthographic choices depends on its contrast with standard forms” (Olivio 2001: 73). This hints at a conscious decision on the part of the writer to use non-standard orthography. After all, writers seem to be aware of their deviation from the standard, as Olivio argues convincingly. In his corpus, an instance like fo’ shows the awareness of the final consonant that we find in the standard variant for. Similarly, the fact that bombers occurs as bombas in his data shows an awareness of the standard silent in the middle of this word. Simi-larly to what was discussed above, Olivio (2001: 72) interprets these choices as:


another way of addressing the particular audience […]. In other words, rap artists construct themselves as ‘authentic’ through the use of language […,] through the use of locally signif-icant images, sounds, and written texts.

He, too, reports on the ‘r-lessness’ of AAVE, as in the two examples above or in cases like gangsta, rida, murda etc. In some cases, stressing the AAVE-pronun-ciation leads to a decisive shift in meaning, as the late Tupac Shakur points out regarding nigga: “Niggers was the ones on the rope, hanging off the thing; Niggas is the ones with gold ropes, hanging out at clubs” (Lazin 2003). In the following we will take a look at two idiosyncratic spelling features, namely orthographic -a instead of -er and word-final -z as a plural marker. Table 3 shows the frequency of these two non-standard spelling variants in the hip-hop corpus and the non-hip-hop control corpus2.

Table 3: ‘r-less’ forms in the hip-hop corpus and the non-hip-hop control corpus.

Token Hip-hop -a

Hip-hop -er

Non-hip- hop -a

Token Hip-hop -a

Hip-hop -er

Non-hip-hop -a

anotha 11 83 0 mutha 1 0 0

balla 0 10 1 Muthafucka 1 0 0

betta 12 43 0 muthafucka 7 0 0

bigga 1 16 0 muthafuka 1 0 0

brotha 1 16 3 neitha 1 7 0

Crossova 1 0 0 neva 13 463 0

deala 1 6 0 nigga 613 0 12

docka 1 0 0 Numba 1 49 0

Exploda 1 0 0 otha 2 101 0

figga 1 19 0 Ova 5 188 0

fucka 1 2 0 playa 18 26 3

gangsta 41 6 4 Rida 3 4 0

2 The frequencies shown here are not entirely unproblematic because the texts were primarily taken from lyrics archives (i.e. are most likely transcribed by fans) and not from official booklets. To some extent, then, the numbers represent the audience rather than the artists themselves. However, they still provide us with an idea of the use of non-standard spelling within the hip-hop community, of which the artists want and claim to be a part.

98 Rolf Kreyer

Token Hip-hop -a

Hip-hop -er

Non-hip- hop -a

Token Hip-hop -a

Hip-hop -er

Non-hip-hop -a

Gangstaa 3 0 0 rocka 2 2 0

Harda 1 12 0 stoppa 1 0 0

hotta 1 13 0 stunna 1 1 0

killa 2 15 0 sucka 1 6 0

lova 0 3 12 Sucka 2 0 0

mobsta 1 1 0 supa 1 26 0

motha 1 33 16 swagga 2 6 0

mothafucka 7 51 0 trigga 3 6 0

Motherfucka 1 7 0 wanksta 8 0 0

muhfucka 1 0 0 whateva 4 37 0

Murda 4 112 0

Table 3 provides the frequencies of ‘r-less’ forms in hip-hop and non-hip-hop songs (columns ‘Hip-hop –a’ and ‘Non-hip-hop –a’, respectively). In addition, it gives the frequencies with which regularly spelt forms occur in the hip-hop texts (‘Hip-hop –er’). In the corpus we find a total of 45 ‘r-less’ types. 43 of these are attested in the hip-hop corpus. The control corpus, by contrast, only shows seven types of this particular kind of idiosyncratic spelling. With regard to type fre-quency, we see clearly that this spelling phenomenon is a feature highly typical of hip-hop. It is not surprising that this huge difference in type frequency results in a huge difference in token frequency, namely 785 in hip-hop texts as opposed to 51 in non-hip-hop texts. It is interesting to note, though, that the number of regularly spelt forms is usually higher than that of non-standard forms, even in hip-hop (see below for an explanation), notable exceptions being nigga, gangsta, muthafucka and wanksta, whose spelling is predominantly non-standard. Still, ‘r-less’ forms are a pervasive feature in hip-hop, much more so than in other pop songs: although the number of 51 tokens is fairly substantial, 28 of these occur in merely two pop songs, namely the 12 instances of lova and the 16 instances of motha. The former are all found in the song Eenie Meenie by Sean Kingston featuring Justin Bieber and all instances of motha occur in Girls by Beyoncé. Inter-estingly, in both cases these unconventional forms are part of the chorus in other-wise rather conventionally spelt songs.

Table 3 (continued)


(4) Shawty is a eenie meenie miney mo lova (Eenie Meenie)(5) Who run this motha? Girls! (Girls!)

Table 4: Word-final orthographic -z as plural marker in hip-hop and non-hip-hop popsongs.

Token Hip-hop -z Hip-hop -s Non-hip-hop -z

Boyz 21 41 0

Dredz 1 0 0

gangstaz 0 8 1

Gunnerz 1 0 0

Gunz 2 0 0

Hoez 1 148 0

Killaz 1 6 0

Niggaz 178 380 0

Outlawz 6 0 0

Ridaz 8 5 0

Word-final orthographic -z is considerably less frequent both as far as types and tokens are concerned. In the data we find ten different types all in all (‘hypercor-rect’ tokens like beatz or nutz, in which the voiced sibilant is not the correct plural allophone, were excluded), nine of which are only attested in the hip-hop corpus, totalling 219 tokens. The single type that occurs in the control corpus is gangstaz with the frequency of 1. Interestingly, this one occurrence appears in the song That’s how you like it by Beyoncé, featuring the rapper Jay-Z, who uses this form in the line shown below:

(6) I know you’ve heard I’m a gangstaThey say “Stay away from them gangstaz”They never change up, or pull they pants up (Beyoncé: That’s how you like it)

A comparison of the use of non-standard and standard variants (both in the case of ‘r-less’ forms and orthographic -z) quickly reveals that in most cases the standard still is the preferred version of spelling even in the hip-hop corpus. This finding hints at a twofold function of spelling in hip-hop lyrics, as Olivio (2001: 72) points out:

[…] the use of non-standard orthographic choices may be another way of addressing the particular audience, while these forms appear alongside standard orthographic forms

100 Rolf Kreyer

which are available to be consumed by a more general audience. In other words, rap artists construct themselves as ‘authentic’ through the use of language and accounts of the social and economic realities in late-capitalist society, and the effects of this reality on the lives of rap artists and their communities; but they also construct an ‘authentic’ audience through the use of locally significant images, sounds, and written texts.3

The only consistent use of non-standard spelling in the present corpus is shown in the texts by the Jamaican rapper Sean Paul. His texts seem to be primarily addressed at a specific audience consisting of speakers of Patois. Consider the example below:

(7) So how can they waan big up dem chest But they dun know Dutty Cup we deh ya rated as di best A wouldn’t they love diss this is Sean-A-Paul this We nuh cater fi nuh guy and only girls we a request (Sean Paul: Like Glue)

The generally mixed occurrence of standard and non-standard spelling in hip-hop non-withstanding, the data presented above show that the word-final -a instead of -er as well as -z as plural marker can be regarded as a register feature of written hip-hop lyrics (of course, partly influenced by attempts to mirror pronunciation while recording or in the actual performance).

3.2.2 Lexical aspects

Other possible register features or even register markers can, of course, be found in the lexis of hip-hop, particularly taboo expressions. Beers Fägersten (2008) reports on the frequency of taboo terms as a feature of hip-hop. In her analy-sis of a 100,000-word corpus of postings on a hip-hop-message board she found that the frequency of “swear words, profanity or taboo terms” such as shit, fuck, ass, nigga and bitch “suggests that such linguistic behaviour is in fact character-istic of the hip-hop community” (Beers Fägersten 2008: 223–224). These taboo words “serve to discursively represent the hip-hop individual, and subsequently the community as well, by virtue of their recognisability as taboo words” (Beers Fägersten 2006: 29).

With some uses of these taboo words we see what Morgan (2002: 121) refers to as inversion, where “an AAE [African American English] word means the oppo-

3 Of course, orthographic choices play a comparatively minor role since the main way of ad-dressing the audience is through the auditory channel.


site of at least one definition of the word in dominant culture”. The word shit, for instance, “can refer to almost anything – positions, events, etc.” (Smither-man 2000: 257). The shit is “a person who is the ultimate; most powerful; above all others; top dog” (Smitherman 2000: 257). Another example is the form nigga, where the idiosyncratic spelling signals a decisive shift in meaning, as discussed above.

Table 5 shows the 30 words that are most key (according to AntConc) in the hip-hop corpus when compared to the non-hip-hop control corpus.

Table 5: The top 30 key word forms in hip-hop when compared to the non-hip-hop corpus.

Rank Token Freq. hip-hop

Rel. freq. hip-hop

Freq. non-hip-hop

Rel. freq. non-hip-hop

Keyness of token in hip-hop

1 nigga 606 0.31 12 0.00 1006.88

2 shit 626 0.32 43 0.01 874.36

3 fuck 504 0.26 45 0.02 660.20

4 niggas 380 0.19 10 0.00 615.11

5 bitch 432 0.22 35 0.01 580.43

6 dem 232 0.12 7 0.00 370.02

7 ass 274 0.14 27 0.01 349.05

8 wit 207 0.11 4 0.00 344.62

9 ya 618 0.32 260 0.09 333.13

10 niggaz 177 0.09 0 0.00 325.09

11 Zoop 142 0.07 0 0.00 260.81

12 yo 253 0.13 52 0.02 239.10

13 hoes 148 0.08 5 0.00 232.88

14 gon’ 163 0.08 13 0.00 219.86

15 bitches 150 0.08 9 0.00 215.50

16 fucking 161 0.08 14 0.00 212.40

17 gettin’ 167 0.09 21 0.01 196.50

18 em 170 0.09 25 0.01 188.35

19 murder 112 0.06 2 0.00 187.61

20 they 925 0.47 705 0.24 187.39

102 Rolf Kreyer

Rank Token Freq. hip-hop

Rel. freq. hip-hop

Freq. non-hip-hop

Rel. freq. non-hip-hop

Keyness of token in hip-hop

21 get 1036 0.53 834 0.28 182.07

22 ai 867 0.44 655 0.22 179.48

23 di 124 0.06 8 0.00 175.54

24 y’all 183 0.09 39 0.01 169.49

25 yuh 90 0.05 0 0.00 165.30

26 u 145 0.07 20 0.01 164.82

27 pussy 109 0.06 5 0.00 164.25

28 them 431 0.22 238 0.08 163.16

29 money 287 0.15 115 0.04 163.03

30 up 643 0.33 454 0.15 155.53

The frequent use of taboo words and profanity that is reported in Beers Fägersten (2006 and 2008) can also be observed in the present corpus, the top five key-words being nigga, shit, fuck, niggas and bitch. Inflectionally related forms occur at rank 10 (niggaz), at rank 15 (bitches) and rank 16 (fucking). In addition, we see a strong preference for terms with strong sexual connotations, such as ass, hoes and pussy. Some of the above list might even be considered register markers. The forms niggaz, Zoop, and yuh do not occur at all in the control corpus. The form Zoop, however, cannot be regarded as indicative of the register, as it lacks the pervasiveness necessary for register features/markers: it only occurs in one song, CG by Nelly.

3.2.3 Grammatical features – copula absence

Anyone who has ever listened to hip-hop and has seen hip-hop videos is well aware of the fact that it is an art form which is dominated by African Americans, at least in the US. Are, then, the linguistic features of hip-hop merely a conse-quence of the AAVE dialect? If that was the case, one would be hard put to argue that these linguistic features fulfil a particular function in a particular situation. An answer to that question is provided by Alim (2009: 117–123) in an analysis of the absence of the present tense copular forms is and are. He compares the

Table 5 (continued)


frequencies of absence from the language of two hip-hoppers, Juvenile and Eve, in two kinds of texts: an interview and their lyrics. For both artists, Alim (2009: 121–122) finds

an increase in the frequency of absence […] when moving from the interview data to the lyrical data. […] it is clear that both of these artists display the absent form more frequently in their lyrical data than in their interview speech data. […] the data suggest that the more attention the artists pay to their speech (comparing interviews to lyrics) the more ‘non-standard’ their speech becomes […].

His claim “that Hip-hop artists are indeed in conscious control of their copula variability” (Alim 2009: 123) suggests that hip-hoppers deliberately make use of AAVE features to achieve a particular (yet to be identified) effect. It makes sense, therefore, to regard idiosyncratic linguistic features as exponents of register.

We will now look at patterns where a personal pronoun is either followed or not followed by a present tense form of BE (in the past the copula is not absent; cf. Alim 2006: 117) followed by either a NP (with definite or indefinite article) or an ing-form of a verb, as in the examples below.

(8) PersProN + BEpres/ø + a/anI am a pitbull off his leasha nigga that think he a cracker

(9) PersProN + BEpres/ø +/the/I am the baddest bitch in the petstoreI the designated driver Chuck never the rider

(10) PersProN + BEpres/ø + …ing/…in’the world is falling and I am risingNigga you fucking with a changed man

Originally, it was planned to conduct an automatic search for the above patterns. Since Wmatrix provides us with the means to tag corpora, a query for strings of parts of speech seemed to be the method of choice. However, it was soon found that the accuracy of the CLAWS tagger suffered from idiosyncratic syntax and from idiosyncratic spelling conventions, particularly in the hip-hop corpus. As a consequence, the patterns above were identified on the basis of lexical queries, for instance ‘I a/an’, I’m a/an’ or ‘I am a/an’ as the possible instantiations of pattern (8) with the first person singular personal pronoun. The resulting concordances were post-edited to weed out non-target hits, such as those shown below.

As can be seen in example (11), a query that is only based on lexical infor-mation will also find tokens that end in -ing although they are not progressive forms. Example (12) shows a written representation of an extremely reduced variant of I am going to. The example under (13) shows how problems can arise

104 Rolf Kreyer

because of Patois transcription and grammar: a is not the indefinite article in this case. Rather, it seems to be an equivalent to an emphatic do in British English.4 Example (14) is particularly challenging, since the text alone would allow two readings, namely as an instance of the pattern we are interested in or as an appos-itive construction. The only way to resolve the ambiguity was to listen to the track, which showed that the second reading is the more plausible one.

(11) I’m everything you love (Kid Rock: I’m Wrong But You Ain’t Right)(12) I’m a call you as soon as I land (Whiz Kalifa: Top Floor)(13) We nuh cater fi nuh guy and only girls we a request (Sean Paul: Like Glue)(14) We the people / Are we the people? (Metallica: Some Kind of Monster)

The results of our analysis concerning the absence or presence of copula in present tense BE are shown in Tables 6 and 7, which provide a detailed account of the distribution of the individual variants in hip-hop and non-hip-hop, respec-tively. More specifically, for each personal pronoun the tables provide the fre-quency of absent, contracted or full form of copula BE either in front of the indef-inite article, the definite article or the progressive form (in various realisations) of a verb. Note that for Table 6 an additional row was inserted to include the idiosyn-cratic written form ya for you. This row was not needed for Table 7, since the form ya could not be found in those songs that were not hip-hop.

Table 6: Copula be and copula absence in the hip-hop corpus (‘abs.’, ‘contr.’ and ‘full’ refer to absent, contracted and full form of the copula, respectively).

Pattern Indef. article Def. article …ing/…in’/…in Total

abs. contr. full abs. contr. full abs. contr. full abs. contr. full

I am/ø a/the/…ing 0 233 1 1 88 11 0 715 12 1 1036 24

You are/ø a/the/…ing 43 23 1 15 22 3 155 79 1 213 124 5

ya are/ø a/the/…ing 0 0 0 0 0 0 15 0 0 6 0 0

He is/ø a/the/…ing 2 5 0 1 7 0 6 14 0 9 26 0

She is/ø a/the/…ing 4 3 0 4 3 0 20 10 0 28 16 0

It is/ø a/the/…ing 0 76 1 0 29 1 3 28 4 3 133 6

We are/ø a/the/…ing 0 0 0 8 1 0 136 15 2 144 16 2

4 I am grateful to André Sherriah for his information on Patois.




They are/ø a/the/…ing 0 0 0 0 0 0 61 4 0 61 4 0

Total 465 1355 37

Table 7: Copula be and copula absence in the non-hip-hop control corpus (‘abs.’, ‘contr.’ and ‘full’ refer to absent, contracted and full form of the copula, respectively).



I am/ø a/the/…ing 0 152 14 0 73 20 2 895 16 2 1120 50

You are/ø a/the/…ing 9 60 3 3 98 14 47 290 7 59 448 23

He is/ø a/the/…ing 0 14 9 0 7 0 3 19 1 3 40 10

She is/ø a/the/…ing 1 34 7 0 6 3 18 50 2 19 90 12

It is/ø a/the/…ing 0 83 0 0 50 2 0 118 0 0 251 2

We are/ø a/the/…ing 0 0 0 4 11 1 48 67 2 52 78 3

They are/ø a/the/…ing 0 0 0 0 1 0 4 22 0 4 23 0

Total 139 2050 100

A summary of the results shown in the two tables is provided in Figure 3, which compares the relative frequency of absent copula BE in hip-hop as opposed to non-hip-hop lyrics.

As can be seen, the data show a very pronounced preference for copula absence in the hip-hop corpus compared to the non-hip-hop corpus. The largest proportion of copula absence in non-hip-hop songs is found with the personal pronoun we. A closer look at the data shows that, to a large extent, this exception can be explained by the African-American R&B artist R. Kelly. In particular, we find that a total of 17 tokens are found in one song only, namely Ignition. If we ignore this particular song, the relative frequency of copula absence in non-hip-hop already drops to 30 %. All in all, these results suggest that copula absence is indicative of hip-hop. Future research will have to show to what extent this particular feature is also pervasive in other possible sub-registers of pop songs, such as R&B.

Table 6 (continued)

106 Rolf Kreyer

Figure 3: Copula absence in the hip-hop and the non-hip-hop control corpus.

4 Conclusion: The functional dimensionThe concept ‘register’ rests on the assumption that a particular group of texts exhibits a set of features that are frequent and pervasive within this group, while at the same time being more or less rare in other groups of texts. In addition, these features are supposed to fulfil a function vis-à-vis the situation in which the texts at issue are used. Having explored the linguistic features above, this section concludes the paper by providing some remarks on the functional dimension of hip-hop lyrics.

In one (maybe two) word(s), the function of hip-hop lyrics may best be described by the term street credibility. Already in our discussion of the situation of use it has become clear that hip-hop artists and their audience partake in a very special kind of relationship. This can be characterised by a high degree of (displaced) interactiveness, not between a star and a fan but between brothaz and sistaz of the same street culture from which hip-hop evolved. One major function of hip-hop lyrics is to demonstrate the artists’ authenticity and to show that they are ‘staying street’. All of the features discussed in the preceding sections can be interpreted along these lines: the major topics as evidenced in the comparatively high frequency of some semantic domains (‘Cigarettes and Drugs’, ‘Warfare …’, ‘Crime, Law and Order’ and money- or business-related concepts) mirror aspects of street life in African American neighbourhoods in the US, where hip-hop evolved. At the same time, idiosyncratic spelling (word-final -a and plural marker

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

I You Ya He She It We They

absent_hip-hop

absent_other


-z), lexical features (the frequent use of taboo expressions and profanity often with a significant change of meaning) and grammatical characteristics (copula absence) focus on the common language background of the artist and his or her audience. So, when “niggas talk a lotta Bad Boy shit”, as the late Tupac Shakur raps, they portray themselves as representatives of ‘the streets’, while at the same time connecting back to the streets and the people living there.

ReferencesAnthony, Laurence. 2011. AntConc (Version 3.2.4) [Computer Software]. Tokyo, Japan: Waseda

University. http://www.antlab.sci.waseda.ac.jp (accessed May 2014).Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS category system.

University of Lancaster. http://ucrel.lancs.ac.uk/usas/usasguide.pdf (accessed May 2014).Beers Fägersten, Kristy. 2006. The discursive construction of identity in an internet hip-hop

community. Revista Alicantina de Estudios Ingleses 19. 23–44.Beers Fägersten, Kristy. 2008. A corpus approach to discursive construction of a hip-hop

identity. In Annelie Ädel & Randi Reppen (eds.), Corpora and discourse: The challenges of different settings, 211–240. Amsterdam: John Benjamins.


DuRant, Robert H., Michael Rich, S. Jean Emans, Ellen S. Rome, Elizabeth Allred & Elizabeth R. Woods. 1997. Violence and weapon carrying in music videos: A content analysis. Archives of Pediatrics and Adolescent Medicine 151(5). 443–448.

Forman, Murray & Mark Anthony Neal (eds.). 2004. That’s the joint! The hip-hop studies reader. New York: Routledge.

Jones, Kenneth. 1997. Are rap videos more violent? Style differences and the prevalence of sex and violence in the age of MTV. Howard Journal of Communication 8(4). 343–356.

Kövecses, Zoltan. 2002. Metaphor: A practical introduction. Oxford: Oxford University Press.Kreyer, Rolf. 2012. ‘Love is like a stove – it burns you when it’s hot’: A corpus-linguistic view on

the (non-) creative use of love-related metaphors in pop songs. In Sebastian Hoffmann, Paul Rayson & Geoffey Leech (eds.), English corpus linguistics: Looking back, moving forward, 103–115. Amsterdam: Rodopi.

Kreyer, Rolf. 2015. ‘Funky fresh dressed to impress’: A corpus-linguistic view on gender roles in pop songs. International Journal of Corpus Linguistics 20(2). 174–204.

Kreyer, Rolf & Joybrato Mukherjee. 2007. The style of pop song lyrics: A corpus-linguistic pilot study. Anglia 125. 31–58.

Lakoff, George & Mark Johnson. 1980. Metaphors we live by. Chicago: Chicago University Press.Lazin, Lauren. 2003. Tupac: Resurrection. Paramount.Miethaner, Ulrich. 2001. The BLUR (Blues Lyrics Collected at the University of Regensburg)

corpus: Blues lyricism and the African American literary tradition. Current Objectives of Postgraduate Studies 2. http://copas.uni-regensburg.de/article/view/64/78 (accessed 3 January 2015).

108 Rolf Kreyer

Miethaner, Ulrich. 2005. I can look through Muddy: Analyzing earlier African American English in blues lyrics (BLUR). Frankfurt am Main: Peter Lang.

Morgan, Marcyliena. 2001. ‘Nuthin’ but a G thang’: Grammar and language ideology in hip-hop identity. In Sonja L. Lanehart (ed.), Sociocultural and historical contexts of African American Vernacular English, 187–210. Athens: University of Georgia Press.

Morgan, Marcyliena. 2002. Language, discourse and power in African American culture. Cambridge: Cambridge University Press.

Mukherjee, Joybrato. 2000. ‘Krisis at Kamp Krusty’: Deviant spellings in popular culture as examples of medium-dependent graphic presentation structures. Arbeiten aus Anglistik und Amerikanistik 25. 161–172.

Murphey, Tim. 1989. The where, when and who of pop song lyrics: The listener’s prerogative. Popular Music 8. 58–70.

Murphey, Tim. 1990. Music and song in language learning: An analysis of pop song lyrics and the use of music and song in teaching English to speakers of other languages. Bern: Lang.

Murphey, Tim. 1992. The discourse of pop songs. TESOL Quarterly 26. 770–774.Olivio, Warren. 2001. Phat lines: Spelling conventions in rap music. Written Language and

Literacy 4. 67–85.Rayson, Paul. 2003. Matrix: A statistical method and software tool for linguistic analysis

through corpus comparison. Lancaster University: Ph.D. thesis.Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment. Computing

Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/(accessed May 2014).Schneider, Edgar W. & Ulrich Miethaner. 2006. When I started to using BLUR. Accounting for

unusual verb phrase patterns in an electronic corpus of Earlier African American English. Journal of English Linguistics 34. 233–256.

Schwartz, Kelly D. & Gregory T. Fouts. 2003. Music preferences, personality style, and developmental issues of adolescents. Journal of Youth and Adolescence 32. 205–213.

Seidman, Steven A. 1992. An investigation of sex-role stereotyping in music videos. Journal of Broadcasting and Electronic Media 36(2). 209–216.

Smith, Stacy L. & Aaron R. Boyson. 2002. Violence in music videos: Examining the prevalence and context of physical aggression. Journal of Communication 52(1). 61–83.

Smitherman, Geneva. 2000. Black talk: Words and phrases from the hood to the Amen corner. Boston: Houghton Mifflin Company.

Spady, James G., Charles G. Lee & H. Samy Alim. 1999. Street conscious rap. Philadelphia: Unum Loh Publishers.

Werner, Valentin. 2012. Love is all around: A corpus-based study of pop lyrics. Corpora 7(1). 19–50.


Appendix 1: The non-hip-hop corpus

Top 50 Albums – Non-hip-hop 2003 Top 50 Albums – Non-hip-hop 2011

3 Doors Down – Away From The Sun Brad Paisley – This Is Country MusicAaliyah – I care 4 U Adele – 19Alan Jackson – Greatest Hits II … Adele – 21Audioslave – Audioslave Beyoncé – 4Avril Lavigne – Let Go Bon Jovi – Greatest HitsBeyoncé – Dangerously In Love Britney Spears – Femme FataleCeline Dion – One Heart Bruno Mars – Doo-Wops And HooligansCher – The Very Best Of Cher Chris Brown – F.A.M. E.Christian Aguilera – Stripped Coldplay – Mylo XylotoColdplay – A Rush Of Blood To The Head Florence and the Machine – LungsDixie Chicks – Home Foo Fighters – Wasting LightElvis Presley – 30 #1 Hits Glee – The Music; Season 2Evanescence – Fallen Glee – The Music, The Christmas …Faith Hill – Cry Jackie Evancho – Dream With MeGood Charlotte – The Young And … Jackie Evancho – O Holy NightHilary Duff – Metamorphosis Jason Aldean – My Kinda PartyJennifer Lopez – This Is Me … Then Josh Groban – IlluminationsJohn Mayer – Room For Squares Justin Bieber – My World 2.0Justin Timberlake – Justified Justin Bieber – My World’s AcousticKelly Clarkson – Thankful Justin Bieber – Never Say Never …Kenney Chesney – No Shoes, … Katy Perry – Teenage DreamKid Rock – Cocky Keith Urban – Get CloserLinkin Park – Meteora Kenny Chesney – Hemingway’s WhiskeyLuther Vandross – Dance With My Father Kid Rock – Born FreeMatchbox Twenty – More Than You … Lady Antebellum – Need You NowMetallica – St. Anger Lady Antebellum – Own the NightR. Kelly – Chocolate Factory Lady Gaga – Born This WayRascal Flatts – Melt Mumford and Sons – Speak NowRod Stewart – It Had To Be You … P!nk – Greatest Hits … So Far!!!Santana – Shaman R. Kelly – LoveletterShania Twain – Up! Rascal Flatts – Nothing Like ThisTim McGraw – Tim McGraw And … Rihanna – LoudToby Keith – Unleashed Sugarland – The Incredible Machine

Susan Boyle – The GiftTaylor Swift – Speak NowThe Band Perry – The Band PerryThe Black Keys – BrothersTony Bennett – Duets 2Zac Brown Band – You Get What You Give

Teresa PhamThe register of English crossword puzzles: Studies in intertextuality

Abstract: Despite their popularity, crossword puzzles have so far been neglected in text-linguistic publications. Therefore, this paper provides a detailed analysis of crosswords. As a textual variety related to a specific situation, fulfilling specific functions and displaying pervasive, frequent linguistic and formal features, this type of linguistic riddle must be regarded as an independent register according to the framework by Biber and Conrad (2009). Moreover, a detailed linguistic ana-lysis establishes non-cryptic and cryptic crosswords as two distinct sub-registers. For the purpose of exploring the role of intertextuality in those two sub-registers, a corpus of 270 intertextual non-cryptic and cryptic clue-answer pairs from The Sun (N.N. 2009) and The Times (Browne 2009) was compiled. A quantitative ana-lysis of this corpus reveals that intertextual references in cryptic puzzles primar-ily target classical mythology, Shakespeare and the Bible, whereas non-cryptic puzzles additionally require knowledge of Anglo-American popular culture. The qualitative analysis of the corpus discusses the particular forms and functions of intertextuality in non-cryptic and cryptic puzzles (Stocker 1998), providing also an explanation for their use from a cognitive linguistic perspective (Geeraerts & Cuyckens 2007) as well as a comparison with intertextuality in other registers. The paper shows that intertextual references and their particular forms and func-tions may be distinctive features of certain registers. Intertextuality is context- dependent and used with a particular communicative function and should thus be incorporated as one possible feature into the linguistic analysis of registers according to the framework by Biber and Conrad (2009).

1 IntroductionCrossword puzzles (or simply crosswords) are the most popular type of linguistic puzzle today (cf. Augarde 2003: 57) and hold a permanent place in most British and American newspapers. Given this prominence in regular, if not everyday language use, their marginalisation as a register in text linguistic analysis and the resulting scarcity of relevant linguistic publications are surprising. Most pub-

Teresa Pham, University of Vechta

112 Teresa Pham

lications on crosswords belong to the discipline of psychology (e.g. Hambrick, Salthouse, and Meinz 1999; Nickerson 2011; Underwood, Deihim, and Batt 1994; Witte and Freund 1995) or examine crosswords from the perspective of didactics (e.g. Mollica 2007; Weisskirch 2006) or cultural studies (e.g. Cornell and Cornell 1980; Stratmann 1995). Other types of word games have been studied in detail (e.g. Dienhart 1998; Fix 2011; Pepicello 1980) and crosswords have been analysed even from a general linguistic (though not specifically register-based) perspec-tive (e.g. Coffey 1998; Mok 1987). Furthermore, some text linguistic publications explicitly refer to puzzles or even crosswords as a register (e.g. Heinemann 2000: 610–611; Furthmann 2006: 133; Rolf 1993: 258). Hence, while their status as a dis-tinct register is largely uncontested, the field of register studies still lacks specific analyses of crosswords.

Therefore, this paper first provides a register analysis of crosswords follow-ing the framework by Biber and Conrad (2009; see also Schubert’s introduction to this volume). It then reports the results of a corpus study on English-language crosswords, focusing on the role of intertextuality in the constitution of this reg-ister.

2 Crossword puzzles as a registerThe OED (Simpson and Weiner 2015) defines crosswords as “puzzle[s] in which a pattern of chequered squares has to be filled in from numbered clues”. Accord-ingly, crosswords are a type of word game in which answers to clues have to be inserted into a grid of boxes.

2.1 Situational analysis

I. Participants: The clues of crosswords are provided by a setter or compiler, who usually remains anonymous or works under a pseudonym. Puzzles are addressed to a plural, yet un-enumerated set of solvers, who, in most cases, work individ-ually and neither interact with setters nor are in direct, personal contact with them. Furthermore, there is some disagreement on the social status of setters and solvers. Since solving crosswords requires thorough general and sometimes even expert or ‘esoteric’, i.e. uncommon or specialist knowledge, Partridge (1992: 504) draws the following sociolinguistic profile of typical setters and solvers: “humanistically educated speakers of Standard English, with a reasonably deep basis of Western culture, a general knowledge of literature, history, geography

The register of English crossword puzzles: Studies in intertextuality 113

and current affairs, familiar with and perhaps active in what have been classed as middle-class sports”. However, since certain strategies of codification, chunks of knowledge and even clues are recurrent, crossword experience is also a major pre-dictor of crossword proficiency (cf. Hambrick, Salthouse, and Meinz 1999: 140). From the cognitive linguistic perspective, this correlation, like the phenomenon of agenda-setting (cf. Scheufele and Tewksbury 2007), is due to the fact that fre-quent activation makes cognitive representations more easily retrievable. There-fore, others (e.g. Scott and O’Donnell 1998: 237) claim that the knowledge and skills necessary for crosswords can be acquired by everyone and consequently regard crosswords as democratic.

II. Production circumstances and channel: With their close interde-pendence between clues and answers, crosswords result from a careful and time- consuming process of planning and editing. The reception process may be equally time-consuming and non-linear. Therefore, the written mode is one essential characteristic of crosswords – even in the digital age, where puzzles can be downloaded from websites or generated by computer programmes or applica-tions on mobile devices.

Furthermore, what can be considered as a marker of crosswords and what is equally dependent on their appearance in writing is their physical layout on the page. Answers must be inserted, letter by letter, into a grid available either on paper or digitally and consisting of white (generally lights; cf. Scott and O’Don-nell 1998: 219) and black squares (blocks; cf. Moorey 2008: 5). The corresponding numbering of clues and squares indicates into which light the first grapheme of the respective answer has to be inserted. Subsequent letters of the answer are inserted either horizontally or vertically into the grid, depending on whether the clue was labelled Across or Down. Answers are interdependent by their inter-secting in so-called crosslights or checked letters (cf. Biddlecombe 2009). Con-sequently, each correct answer will simplify the search for subsequent intersect-ing answers to a greater or lesser extent (cf. Nickerson 2011; Goldblum and Frost 1987). The number of letters which are part of only one answer (unchecked letters or unches) is an indicator of the difficulty of a crossword (cf. Augarde 2003: 63; Scott and O’Donnell 1998: 219).

III. Setting: Setters and solvers do not share the physical context of com-munication. As already mentioned, crosswords (as well as their solutions) are usually originally printed in newspapers, i.e. in a public space, but are typically solved in private. Heinemann (2000: 610–611) therefore assigns them to the (semi-)official, public domain.

IV. Purposes: Crosswords are devoid of the usual purpose of language use, which is communication (cf. Schlepper 1981: 63). On the contrary, the primary purpose of crosswords is to entertain and delight the addressee: they allow

114 Teresa Pham

setters and solvers alike to manipulate language irrespective of established rules and conventions and thus “provide an opportunity of handling at one’s whim a medium which in other situations very much has a will of its own” (Schlep-per 1981: 78; cf. Augarde 2003: vii). However, crosswords may also provide social pleasure when they are solved cooperatively or competitively. Finally, crosswords may be completed to test or consolidate one’s knowledge, to maintain or to boost one’s cognitive capacities (e.g. one’s memory capacity or mental flexibility). Medical research even suggests that such mental exercise reduces the risk for certain diseases like dementia (cf. Moorey 2008: 3).

In order for crosswords to fulfil these functions, it is essential that they, despite answers being encoded, are devised to be solvable by the target ‘solver-ship’. While an unsolvable puzzle causes frustration, the ability to solve a puzzle is experienced as a success and provides the pleasurable feeling of being part of the intellectual elite.

2.2 Analysis of linguistic features

2.2.1 General features of crossword puzzles

From a discourse analytic perspective, the basic building blocks of crosswords are adjacency pairs, each consisting of a clue and an answer. Each clue encodes its respective answer more or less strongly. A figure in brackets at the end of the clue usually indicates the number of letters of the answer. For answers to intersect in the grid, crosswords require a plural, yet variable number of such clue-answer pairs. The first turn is provided by the setter, whereas the second turn is provided by the solver. Since crosswords are intended to be solvable, only one answer is indubitably correct (sometimes also taking into consideration the number of lights or using the crosslights already filled in the grid). However, since clue-answer pairs function independently, there are usually no cohesive ties between them. On the contrary, linguistic means which are usually cohesive (e.g. articles, personal or demonstrative pronouns) may be employed to encode answers according to certain conventions. In some crosswords, the personal pro-nouns he or she, for example, may not function as anaphoric or cataphoric refer-ences to preceding or subsequent noun phrases, but may point to the fact that the words man or girl (or their letters) are part of the answer (cf. Skinner 2008: 25). In rare cases, the adjacency pairs of a puzzle are linked by a common topic, which may be indicated (more or less directly) by its title. Cohesion between adjacency pairs may also be established by an explicit “Cross-reference” (Partridge 1992:


501). Thus, in example (1) the clue requires prior identification of the answer to clue number 11:

(1) Line also transported 11 to shore (9) – LANDWARDS (Browne 2009: 124)

Apart from that, the only links between clues usually are their appearing together with one uniform layout and the combinatory interdependence of the respective answers in the grid. If, following Halliday and Hasan (1976: 1), a text is defined as “a unit of language in use” whose texture arises from inter-sentential cohesive ties on the surface, crosswords do not normally constitute texts.

Besides cohesion, further standards of textuality according to de Beaugrande and Dressler (1981) are not or only partially met: clues are thematically inde-pendent and there is no continuity of or even connection between underlying concepts (coherence). Furthermore, even if clues need to be new and creative to be intellectually challenging for solvers, crosswords do not have the function of transmitting information (informativity). However, the setter’s primary inten-tion of entertaining solvers is evident (intentionality) and, although most clues would be unacceptable and irrelevant in usual communicative situations, cross-word initiates accept these linguistic inconsistencies as being part of this type of puzzle (acceptability, situationality). Thus, if we define a text as a passage of lan-guage which “functions as a unity with respect to its environment” (Halliday and Hasan 1976: 1) and consider cohesion, informativity (cf. Schubert 2012: 23) and also coherence as frequent, but non-obligatory features of texts, then crosswords must certainly be regarded as texts.

2.2.2 Features of non-cryptic and cryptic crossword puzzles

There are two basic types of English-language crosswords, generally called non-cryptic or primitive and cryptic puzzles (cf. Schlepper 1981: 61). In the latter type, clues are more obscure than in the former and encode the answers more strongly according to certain conventions (see below). Non-cryptic puzzles, which have been published since 1913 (cf. Stephenson 2007: 7), are common in most European and non-European countries. Cryptic crosswords emerged in England towards the end of the 1930s (cf. Scott and O’Donnell 1998: 211). Today they are an integral part of British culture and are regularly published (often alongside non-cryptic puzzles) in most British magazines and newspapers (quality as well as popular, national as well as regional and local). Cryptic crosswords have even influenced puzzles outside Great Britain: cryptic clues occur in some American dailies such as The New York Times (Variety puzzle) and some French newspa-

116 Teresa Pham

pers (e.g. Le Figaro, Le Nouvel Observateur; cf. Mok 1987: 98). Since the 1970s, Die Zeit, a weekly national German quality paper, has been publishing a type of crossword puzzle which combines cryptic and straightforward clues (Um die Ecke gedacht, literally ‘thought outside the box’). However, the British cryptic cross-word remains unique: “Although traces of the cryptic crossword can be found in some European countries, it is nowhere developed to anything like the extent it has now reached in the UK […]. German-language puzzles are those which come closest to the British model […]. By and large, however, these are all relatively modest by British standards” (Scott and O’Donnell 1998: 211–213).

A quantitative analysis performed on 20 puzzles (523 clue-answer pairs) from The Times (cryptic puzzles; Browne 2009), The Guardian (non-cryptic puzzles; Rusbridger 11.–16.05.2013) and The Sun (two-speed crosswords giving a non-cryp-tic and a cryptic clue for each answer; N.N. 2009) confirms the basic distinction between the two types of puzzle:

Table 1: Quantitative analysis of non-cryptic and cryptic puzzles

Non-cryptic puzzles Cryptic puzzles

Length of clues (orthographic units delimited by blanks)

The Sun: 2.1The Guardian: 3.3

The Sun: 6.1The Times: 6.8

Average: 2.7 Average: 6.5

Length of answers (letters)

The Sun: 6.2The Guardian: 6.7

The Sun: 6.2The Times: 7.5

Average: 6.4 Average: 6.9

Despite variability within each type, clues and answers are considerably shorter in non-cryptic than in cryptic puzzles. Furthermore, both turns are morpho- syntactically more complex in the latter type. Non-cryptic clues are usually very simple phrases, often consisting of a head only as in (2), sometimes in combina-tion with a simple pre- or postmodifier (3), whereas the corresponding answers are mostly single content words or proper names:

(2) Flowery (6) – FLORAL(3) Mediterranean volcano (4) – ETNA (N.N. 2009: 105, 103)

Cryptic clues, by contrast, resemble block language headlines. When they are constituted by phrases, these are typically more complex, containing for example longer prepositional phrases or (finite or nonfinite) clauses as postmodifiers (4). Cryptic clues may also have an often elliptical clause structure, taking the form


of simple or complex, mainly declarative sentences (cf. Quirk et al. 1985: 40, 803) as in (5). In addition to single content words and proper names, the answers to cryptic clues often comprise morphologically complex lexemes (e.g. idioms as in (5), compounds nouns or multi-word verbs) as well as function words (6) or phrases (4).

(4) Bloomer made by top performer in nativity scene? (4,2,9) – STAR OF BETHLEHEM (5) Find a lovely partner to share a seasonal moment (4,1,7) – PULL A CRACKER(6) Jarring we hear’s in contrast (7) – WHEREAS (Browne 2009: 122, 110, 124)

Furthermore, the relationship between the turns of the same non-cryptic adja-cency pair is overtly governed by the “Rule of Inflection” and the “Rule of Iden-tity” (Schlepper 1981: 67). The former prescribes that clue and answer must “be able to fulfil the same syntactic function” (Schlepper 1981: 67). Therefore, they usually have the same inflection (7) and/or belong to the same formal syntactic category. However, a prepositional phrase may also point to an adverb or a non-finite clause to an adjective (8).

(7) Least cooked (6) – RAREST (N.N. 2009: 111)(8) Lacking injury (6) – UNHURT (Rusbridger 11.–16.05.2013)

The latter rule dictates that clue and answer have to be semantically equivalent, allowing (absolute or near) synonymy (9), negated antonymy (10), hyponymy (11) as well as paraphrases and definitions of variable precision (12).

(9) Applaud (5) – CHEER(10) Not dead (5) – ALIVE(11) Hairdo (4) – PERM(12) Short-tempered person (7) – HOTHEAD (N.N. 2009: 9, 25, 69, 27)

Therefore, according to Greimas (1970: 287), crosswords work like a reverse dic-tionary, where only the definitions are given and the appropriate lemmata have to be provided by the solver. Yet to complicate matters, solving a non-cryptic clue may require considering polysemy, homonymy and proper names. In addition, the relationship between clues and answers may also be syntagmatic, being based on phraseological units such as idioms or collocations.

The aforementioned rules apply less overtly to cryptic crosswords. The reason for this opacity is that cryptic clues have a binary structure. It is only the definition (underlined in the following examples of cryptic clues) that is syn-tactically and semantically equivalent to the answer. The subsidiary indication, however, encodes the same answer a second time semantically, phonologically

118 Teresa Pham

or orthographically. Thus, in example (13) the definition huge is a synonym of the answer, whereas the remaining subsidiary indication encodes the answer again, orthographically.

(13) Huge mines exploded around me (7) – IMMENSE (Browne 2009: 28)

Only two clue types deviate from this basic structure: In so-called all-in-one or & lit clues (‘and literally true clues’; cf. Moorey 2008: 22), which are sometimes marked by exclamation marks, the definition and the subsidiary indication are merged (14). Cryptic definition clues (cf. Moorey 2008: 27), by contrast, consist of a misleading definition or paraphrase of the answer (15). They frequently rely on homonymy or a morphological reinterpretation of lexemes or idiomatic expres-sions and may be marked by question marks. Non-cryptic clues were banned when the rules for cryptic puzzles were reformulated by setters in the 1930s and 1940s (cf. Scott and O’Donnell 1998: 236).

(14) Hood’s resort few disturbed (8,6) – SHERWOOD FOREST(15) One may move on to another American story (9) – ESCALATOR (Moorey 2008: 148, 106)

The different types of clue-answer relationship typical of non-cryptic and cryptic puzzles are illustrated schematically in Figure 1.

Figure 1: Clue-answer relationship in non-cryptic and cryptic crosswords (CWPs)

A cryptic clue thus offers two approaches to the answer and points to it unam-biguously, if interpreted correctly. Some crossword initiates therefore insist that cryptic crosswords are easier to solve than non-cryptic ones (cf. Skinner 2008: 7; Schlepper 1981: 75). However, a solver may encounter several difficulties in inter-


preting cryptic clues. First, the definition and the subsidiary indication are inte-grated into a stretch of language which seemingly permits literal interpretation. Yet the sole purpose of the surface structure of the clue is to mislead the solver. Its meaning, however, is exhausted once the clue has been solved. Therefore, clues have to be regarded as a succession of fragments which correspond to neither morpho-syntactic nor orthographic units, since word boundaries may be shifted and punctuation marks overruled: “A cryptic clue is a sentence or phrase, appear-ing to make some kind of sense and putting ideas into the solver’s head. These often have little or nothing to do with the answer, which can be derived by inter-preting all or part of the clue in ways which are less obvious” (Biddlecombe 2009).

Second, the definition and the subsidiary indication are unmarked, may occur in variable order and may even overlap. There may also be words or phrases which are superfluous for solving the clue (cf. Schlepper 1981: 66), added solely for enhancing the coherence of the surface structure. Third, even when the defi-nition has been identified, it may be a zero-derivation, polyseme or homonym and thus, due to the absence of any context, syntactically and/or semantically ambiguous. Fourth, the subsidiary indication may contain several operations of codification not necessarily indicated by signal words (for lists of such indicators cf. Stephenson 2007: 35–63; indicators will be underlined by a broken line in the following examples of cryptic clues).

Cryptic clues whose subsidiary indication encodes the answer semantically, so-called double or multiple definition clues (for the names of clue types used here cf. Moorey 2008: 13–31; Biddlecombe 2009), contain a second definition. They are usually based on polysemy, homonymy, homography or the metaphorical or literal reinterpretation of one or several lexemes in the clue and/or the answer (16).

(16) Poorly educated and characterless? (10) – UNLETTERED (Moorey 2008: 154)

By contrast, homophone clues encode the answer phonologically and are based on the phonological similarity (homeophony) or identity (homophony) of lexemes such as whale and wail in (17).

(17) Marine beast’s audible cry (4) – WAIL (Stephenson 2007: 55)

Most frequently, however, a solver has to recompose the answer orthographically. The easiest case of an orthographic codification is a hidden clue, explicitly con-taining the graphemes of the answer. In the surface structure of the subsidiary indication, these graphemes are either dispersed or contained consecutively, often across word boundaries. Furthermore, it may be necessary to reverse the order of the graphemes contained in or encoded by the subsidiary indication

120 Teresa Pham

(anadrome or reversal clues) or to rearrange them (anagram clues). Thus, in (18) the graphemes of live, a synonym of quick, have to appear in inverted order to form a synonym of sin, while in (19) the answer is an anagram of remote:

(18) Quick to return to sin (4) – EVIL (Stephenson 2007: 48)(19) Unusually remote shooting star (6) – METEOR (Skinner 2008: 18)

In addition, graphemes may also be substituted (substitution clues) or deleted (take away, apocopative or deletion clues). This is illustrated in (20), where the first letter of gown, a synonym of dress, must be deleted.

(20) Possess a topless dress (3) – OWN (Moorey 2008: 20)

In crosswords of a certain complexity, however, answers may be cut into several chunks, which, theoretically, may consist of single letters. These orthographic chunks are then encoded separately, linearly in charade or additive clues and non-linearly in content or container clues. In (21), the graphemes <arat> have to be inserted into a synonym of cat, namely lion. Dec, the abbreviation for December, the last month of the year, is added by a charade operation.

(21) Statement: Last month, a cat swallowed a rat (11) – DECLARATION (Biddlecombe 2009)

For these operations of codification, all kinds of abbreviations or acronyms may be used, such as of military ranks (e.g. Lt for lieutenant), chemical elements (e.g. Ag for silver) or terms from chess, music or cricket (e.g. W for wicket). Other letter sequences constitute foreign-language articles (e.g. le/la for the, un for one), pro-nouns (e.g. she for girl) or Roman numerals (e.g. I for one). Therefore, despite the fact that crosswords do not show grammatical cohesion, they may still contain lexemes which otherwise have a cohesive function.

Finally, to further complicate the solving of clues, the aforementioned oper-ations of codification can also be combined (complex clues). Thus, three opera-tions are included in (22): heartless indicates the deletion of the central grapheme of the. By a charade operation (see the explanation of charade clues above), R (for Latin rex ‘king’) is added to <te>. This letter sequence is then inserted into inn, a synonym of public house.

(22) Confine the heartless king in a public house (6) – INTERN (Gilbert 2001: 64)


2.3 Functional analysis

In view of the purposes of crosswords, their language is shaped by two diametric requirements: it must, on the one hand, encrypt the answers, yet, on the other hand, point to them unambiguously.

In non-cryptic puzzles, in which the syntactic and semantic relationship between clues and answers is straightforward, a solver’s proficiency depends mainly on his or her factual declarative, encyclopaedic as well as metalinguis-tic knowledge. Only when clues can activate chunks of knowledge which are stored as cognitive representations in the solver’s memory or when appropriate cognitive representations can be constructed in the process of solving the puzzle (e.g. by consulting an encyclopaedia) can those clues be solved. The language of non-cryptic puzzles mirrors this. Most non-cryptic clues permit a literal, syn-tactically and semantically unambiguous interpretation of the surface structure. Furthermore, they are characterised by structural simplicity and shortness. What primarily accounts for the difficulty of primitive puzzles are, consequently, the currency of the lexemes functioning as answers among the target ‘solvership’ and the extent to which esoteric knowledge is targeted. In addition, non-cryptic clues may constitute semantically unspecific paraphrases, pointing to several answers such as in (23). Such ambiguity can only be resolved by intersecting answers and thus imposes a specific approach to solving the respective puzzle.

(23) Atlantic county of Eire (5) – SLIGO (N.N. 2009: 11)

To procure even greater entertainment, cryptic puzzles, by contrast, take playing with words, testing mental flexibility and encoding answers to extremes. Their solution requires not only general knowledge but also expert knowledge, abilities or solution strategies. These may concern the specific conventions of codifica-tion, the frequency of certain letters, the completion of incomplete lexemes or the solution of anagrams. Cryptic crosswords thus often rely on the various syn-tagmatic and paradigmatic as well as coincidental formal relationships within the English language, which are largely irrelevant for everyday language use. Besides knowledge, they consequently depend on “fluid cognition” (Hambrick, Salthouse, and Meinz 1999: 131) or “lateral thinking” (Schlepper 1981: 79), i.e. creativity, mental flexibility and logical, abstract reasoning. This focus on a more complex codification of the answers and a more complex reasoning process in cryptic puzzles is mirrored in their language. The structurally more complex and longer surface structure of cryptic clues only seemingly permits literal interpreta-tion but deliberately aims at misleading the solver. Since operations of codifica-tion are not necessarily indicated and since the definition, the subsidiary indica-

122 Teresa Pham

tion and possible indicators are not marked, the surface structure of cryptic clues permits multiple interpretations, semantically as well as morpho-syntactically. As with non-cryptic puzzles, the difficulty of cryptic puzzles increases when rare lexemes or specialised or esoteric knowledge are targeted. As against non-cryp-tic clues, however, once the structure underlying the clue has been recognised and the operations of codification have been identified, well-constructed cryptic clues can be answered unambiguously, even without resorting to crosslights in the grid.

The previous analysis showed that crosswords are associated with a particu-lar situation and particular purposes, which are reflected in pervasive formal as well as linguistic features. Consequently, crosswords must clearly be regarded as an independent register according to Biber and Conrad’s definition (2009: 31; see also Schubert’s introduction to this volume). Furthermore, the detailed semantic and morpho-syntactic analysis of crosswords revealed that non-cryptic and cryptic puzzles, despite their being based on the same linguistic building blocks, have developed different strategies for fulfilling their primary purpose as entertainment. They codify answers to a different extent and therefore require different skills on the part of the solver. Since, due to this, non-cryptic and cryptic puzzles differ linguistically, those two types of crosswords must be regarded as distinct sub-registers of the register of crosswords.

3 Intertextuality in crossword puzzles: A corpus study

Intertextuality, the seventh standard of textuality according to de Beaugrande and Dressler (1981), implies that knowledge of one or several individual texts or groups of texts (pre-texts) may influence the production and/or reception of another text (the post-text). In registers like newspaper articles or advertise-ments, intertextuality most frequently takes the form of (unmodified or modified) quotations. Numerous studies have shown that these may have for example the re presentational function of introducing additional components of meaning into a post-text, the expressive function of supporting the author’s argumentation and/or the conative function of guiding the reader’s reception (cf. Bühler [1934] 1982: 24–33). For an intertextual reference to fulfil (most of) its functions, (more or less extensive) knowledge of the pre-text is required (cf. Schulte-Middelich 1985; Stocker 1998: 73–92). However, since intertextual references are normally doubly referential, pointing to pre-texts as well as to the extra-linguistic world (cf. Pham 2014: 472), most post-texts equally permit a literal, non-intertextual interpreta-


tion. So far, however, it has never been studied how intertextuality contributes to the characteristics and purposes of crosswords and to what extent the analysis of intertextual references can contribute to establishing crosswords as a register or non-cryptic and cryptic puzzles as distinct sub-registers.

3.1 Working definitions

The term intertextuality was coined in the late 1960s by the Bulgarian linguist and literary critic Julia Kristeva (1968). Yet, although intertextual references occur particularly frequently in texts from the 20th and 21st centuries, intertextuality is by no means an exclusively modern or postmodern phenomenon. On the con-trary, references to previous texts or utterances may be regarded as an intrinsic property of human language. Consequently, the study of intertextual references, especially in the fields of rhetoric and literary theory, can be traced back to classi-cal antiquity, albeit under different labels such as parody, quotation or imitation.

Today, there are two principal tendencies in research on intertextuality. The theory of intertextuality is historically rooted in post-structuralist literary criti-cism, which deconstructs the traditional concept of text. Post-structuralists like Kristeva, Barthes and Derrida furthermore regard intertextuality as a character-istic of all texts and consequently contest the autonomy of any text. Thus, inter-textuality does not refer back to individual, identifiable pre-texts, but to a “texte infini [infinite text]” (Barthes 1973: 59) or a “texte général [general text]” (Derrida 1972: 125), which is extended to comprise even the ‘social’, ‘cultural’ or ‘historical text’ (cf. Barthes [1968] 1977: 146). However, this ontological conception of inter-textuality has never developed a feasible method for textual analysis.

Consequently, for actual textual analysis as in the present paper, scholars revert to the second, narrower conception of intertextuality. It regards intertex-tual references as a gradable feature of some, yet not all texts, examines the forms and functions of such references and, being related to structuralism, approves of the traditional concept of text. For structuralists like Genette (1982) or Riffa-terre (1981) intertextuality theoretically refers back to isolated, identifiable pre-texts (or groups of pre-texts). It is this narrow conception of intertextuality that was adopted by linguistics in the 1980s. Linguists usually distinguish between typological intertextuality, i.e. the relationships between post-texts and groups of texts (registers, genres, styles or textual patterns), and referential intertextuality, i.e. the relationships between post-texts and individual, identifiable pre-texts.

The previous section showed that crosswords should be regarded as an inde-pendent register comprising two sub-registers. Typical examples of crossword puzzles thus follow certain conventions and are necessarily characterised by

124 Teresa Pham

typological intertextuality. Consequently, for the present study, analyses were limited to referential intertextuality. The term intertextuality was thus understood to comprise only the relationships between a post-text and one or more indi-vidual and identifiable pre-texts. The intertextual subcategory of interfigurality (cf. Müller 1991) includes the mention or appearance of figures and authors of pre-texts in a post-text (“re-used figures [and] authors”, Helbig 1996: 115). There-fore, references to pre-textual figures and authors were equally considered in the present study. Moreover, a text was defined broadly as a formally delimited communicative act which usually exists in written or spoken form but may also consist of other visual or acoustic signs.

3.2 Methodology

A corpus study on intertextuality in crosswords puzzles was conducted for this paper. Its primary aim was to investigate the particular forms and functions of intertextual references in this type of word game in order to evaluate their impor-tance for crosswords as a register as well as for non-cryptic and cryptic puzzles as sub-registers.

In the first half of the 20th century, so-called quotation clues were still used in cryptic puzzles. A citation listed in the Oxford Dictionary of Quotations (Parting-ton 1992) was reproduced literally, explicitly marked by quotation marks, italics, the name of the pre-text and/or the name of the author. One part of the original wording was elided and had to be recovered by the solver as in (24), where the quotation is accompanied by a definition:

(24) Consumed. “But answer came there none And this was scarcely odd because They’d ____ every one” (Carroll’s Through the Looking-Glass) (5) – EATEN (Gilbert 2001: 12)

Thus, for devising such clues, the setters relied on their knowledge of those pre-texts. In order to identify the answers, solvers had to be able to access similar knowledge of the pre-texts by activating (or constructing) appropriate cognitive representations (cf. Geeraerts and Cuyckens 2007: 170–187). In 1995, however, quotation clues like (24) were forbidden because they were not strictly cryptic and because some puzzles had devoted too much attention to literary background knowledge (cf. Biddlecombe 2009). By contrast, quotation clues like (25) are still to be found in non-cryptic puzzles.

(25) “A Nightmare on ____ Street” – ELM (Parker 15.04.2013)


This suggests that today references to works of literature or popular culture are considerably more frequent in non-cryptic than in cryptic puzzles and that less knowledge of existing texts is required to solve the latter. Hence, one further aim of the empirical study was to investigate this assumption comparatively by exam-ining intertextual references in the two sub-registers of crosswords as to their fre-quency, pre-texts, forms and functions.

For the corpus, two collections of crosswords were analysed, both published in 2009, i.e. well after the abolition of quotation clues in cryptic puzzles. In total, 80 non-cryptic puzzles (2080 clue-answer pairs) from The Sun (N.N. 2009) and 80 cryptic puzzles (2372 clue-answer pairs) from The Times (Browne 2009) were scrutinised for intertextual references according to the above definitions. When several references occurred in one clue-answer pair or when references pointed to several pre-texts, those were counted separately. This yielded a corpus of 270 intertextual clue-answer pairs (The Sun: 112; The Times: 158) and 295 intertextual references (The Sun: 112; The Times: 183; 38.0 % vs. 62.0 %), which were manually classified into five categories according to their respective pre-text(s).

Category (1) comprises references to folkloristic and mythological texts, orig-inally transmitted orally. Clue-answer pairs requiring knowledge of literary texts produced by individual authors according to aesthetic standards are summarised in category (2). References to the visual arts are subsumed under category (3) and subdivided into (a) painting/drawing/sculpture and (b) broadcasting/TV series/film. For references to music, category (4) was created with the subcategories (a) classical music (both orchestral and vocal) and (b) popular music. Remaining ref-erences to religious, philosophical or other theoretical texts constitute category (5). In some cases, the distinction between these (sub-)categories is not clear-cut. Thus, further criteria were introduced. For example, popular music, in con-trast to classical music, was regarded as being typically commercially oriented, addressed to large audiences and distributed by the music industry.

In addition, each group was analysed according to the provenances of the pre-texts or their authors. Thus, texts from Greek and Roman antiquity are classi-fied as Classical, British is the label for pre-texts from the UK and the Republic of Ireland, American denotes pre-texts from the USA, etc. Provenances relevant for less than four intertextual references per category were subsumed under Other. Due to their importance for intertextuality, Shakespeare and the Bible are listed separately (cf. Table 2).

126 Teresa Pham

3.3 Quantitative analysis of the corpus

The first conclusion we can draw from the quantitative analysis of the corpus is that, on the whole, and contrary to the previous assumption, intertextual refer-ences are relatively more frequent in the cryptic puzzles published in The Times than in the non-cryptic ones from The Sun. While crosswords in The Sun contain 1.4 intertextual references on average, puzzles in The Times contain 2.3 intertex-tual references. Even if references to different pre-texts occurring in the same clue-answer pair as in example (32) are not counted separately, this distributional difference remains obvious (1.4 vs. 2.0 intertextual clue-answer pairs/puzzle). A comparison with the frequency of intertextual references in non-cryptic puzzles from another quality paper, The Guardian (Rusbridger 11.–16.05.2013; 0.8 refer-ences or intertextual clue-answer pairs/puzzle), shows that this difference actu-ally depends on the type of crossword and not on the journalistic standards or the addressed readership of the respective newspapers. Consequently, despite there being considerable variability in the frequency of intertextuality within the same sub-register, cryptic puzzles generally require more knowledge of other texts than non-cryptic puzzles. The qualitative analysis of the corpus will shed light on how intertextual references are incorporated into cryptic puzzles, despite quotation clues having been banned.

Table 2: Composition of the corpus of intertextual clue-answer pairs

Categories and provenances of pre-texts THE SUN THE TIMES AVERAGE

(1) Folkloristic and mythological texts 14.3 (%) 9.3 (%) 11.2 (%)

Classical 8.0 4.9 6.1British 4.5 2.7 3.4Other 1.8 1.6 1.7

(2) Literature 28.6 51.9 43.0

Classical 1.8 2.7 2.4British (excluding Shakespeare) 16.1 31.1 25.4Shakespeare 2.7 8.2 6.1American 3.6 4.9 4.4French 2.7 2.7 2.7Other 1.8 2.2 2.0


Categories and provenances of pre-texts THE SUN THE TIMES AVERAGE

(3) Visual arts: 19.6 6.6 11.5

(a) Painting/drawing/sculpture 3.6 3.8 3.7

Italian 1.8 1.1 1.4Other 1.8 2.7 2.4

(b) Video/broadcasting/TV series/films 16.1 2.7 7.8

British 8.9 0.5 3.7American 7.1 2.2 4.1

(4) Music: 22.3 13.1 16.6

(a) Classical music 8.0 10.9 9.8

British 1.8 3.3 2.7Italian 3.6 2.7 3.1Other 2.7 4.9 4.1

(b) Popular music 14.3 2.2 6.8

British 8.0 1.1 3.7American 5.4 1.1 2.7Other 0.9 0.0 0.3

(5) Religious, philosophical and other theoretical texts

15.2 19.1 17.6

Classical 0.0 3.3 2.0British 0.9 5.5 3.7The Bible 14.3 7.1 9.8Other 0.0 3.3 2.0

Note: All values are percentages and are calculated based on the number of intertextual references in the crosswords from The Sun (112), The Times (183) or both newspapers (295; labelled Average). Differences for example between percentage sums (shaded cells) corre-sponding to (sub-)categories and respective individual percentage values (white cells) corre-sponding to provenances result from rounding to one decimal place.

Table 2 discloses the most popular pre-textual categories in crosswords in general. Works of literature are by far the most important ones (43.0 %), followed by religious, philosophical or other theoretical texts (17.6 %) and folkloristic and mythological texts (11.2 %). If provenance is considered as well, British literature (including Shakespeare; 31.5 %), the Bible (9.8 %) and myths of classical antiq-uity (6.1 %) are the most important pre-texts. In addition, Shakespeare is the indi-vidual author who is by far most often referred to (6.1 %). This result might be sur-

Table 2 (continued)

128 Teresa Pham

prising, since it has often been claimed that, at least since the mid-20th century, the traditional pre-texts of the Victorian Age have declined in importance in Anglo-American culture: “until recently Classical mythology, the works of Shake-speare and the Bible were regular sources for compilers” (Scott and O’Donnell 1998: 207; cf. also Hebel 1991: 149). Consequently, the predominance of these pre-texts in crosswords may have been even clearer in the first half of the 20th century. This result supports Partridge’s assumption that typical solvers are thoroughly and “humanistically educated” (Partridge 1992: 504).

Furthermore, it is equally revealing to compare the favourite pre-texts of the two sub-registers of crosswords. Thus, clues in non-cryptic crosswords from The Sun require knowledge of literary works in general (28.6 %), British literature (excluding Shakespeare; 16.1 %) and Shakespeare (2.7 %) less frequently than clues in cryptic crosswords from The Times (51.9 %, 31.1 % and 8.2 %). By contrast, puzzles from The Sun refer to the Bible (14.3 %) and to the oral tradition (14.3 %), especially to classical mythology (8.0 %), relatively more frequently than puzzles from The Times (7.1 %, 9.3 % and 4.9 %). The reason for these different preferences especially with regard to the traditional pre-texts of the Victorian Age might be that a British solver with an average education can be expected to possess more extensive general knowledge of the Bible and all texts of classical mythology than of the 38 plays and 154 sonnets commonly attributed to Shakespeare (cf. Greenblatt 1997: 65–66, 1923–1976). The most striking distributional differences between the two sub-registers can, however, be found in categories (3b) and (4b). Knowledge of (especially Anglo-American) video, broadcasting, TV series, films and popular music is necessary for the solution of nearly one third of all inter-textual non-cryptic clues (30.4 %) but is hardly relevant for cryptic puzzles at all (4.9 %). Cryptic crosswords of the corpus thus primarily target traditional pre-texts like classical mythology, Shakespeare and the Bible, whereas non-cryptic puzzles focus on Shakespeare to a smaller, yet on classical mythology and the Bible to a greater extent and additionally require knowledge of texts of the popular, espe-cially Anglo-American culture. However, only a corpus including non-cryptic and cryptic clue-answer pairs from further (popular and quality) newspapers could reveal whether these preferences for certain pre-textual categories are correlated to the respective sub-register of crosswords or to the expected knowledge of the target solvership (or to both).

3.4 Qualitative analysis of the corpus

In both sub-registers of crosswords, most intertextual references (95.2 %) involve proper nouns (including titles). Due to their fixed extension but particularly


complex intension as well as their high selectivity and explicit markedness (cf. Pfister 1985: 28; Karrer 1985: 106–108), proper nouns contribute to the codifica-tion of answers as well as the unequivocal solution of clues. Hence, they are well-suited for intertextual references in crosswords.

In more than two thirds of all intertextual non-cryptic adjacency pairs of the corpus (67.9 %), proper nouns referring to the same pre-text occur in both the clue and the answer, usually in combination with common nouns providing further information on the referent (26). Thus, although these references are unmarked, proper nouns can usually activate the necessary cognitive representations unequivocally even without the grid.

(26) Writer of 1984 (6) – ORWELL (N.N. 2009: 77)

In about one third of the non-cryptic clue-answer pairs of the corpus, proper nouns occur either in the answer as in (27) (23.2 %) or, more rarely, in the clue as in (28) (6.3 %), whereas the other component of the pair gives a semantically equivalent common noun or noun phrase. Only one selective proper noun being involved, more pre-textual knowledge is required for correctly associating clue and answer. Furthermore, the solver may encounter a certain ambiguity, which is resolved only when the number of letters of the answer is considered or crosslights are already given in the grid:

(27) Opera composer (7) – PUCCINI (28) Puccini work (5) – OPERA (N.N. 2009: 45, 53)

A comparison with non-cryptic puzzles from The Guardian (Rusbridger 11.–16.05.2013) shows that these two types are particularly typical of this sub-register. In the corpus, only three non-cryptic clues (2.7 %) require exact knowledge of the wording of a pre-text and may thus, despite their featuring no explicit markers, be classified as quotation clues. Interestingly, all three refer to texts of popular culture: the catchphrase of a British comedian and the beginnings of two nursery rhymes. These pre-texts can be expected to be common knowledge among British solvers.

(29) Tommy Cooper’s catchphrase (4,4,4) – JUST LIKE THAT(30) Ride a cock horse to here (7,5) – BANBURY CROSS(31) Silver-buckled sailor (5,7) – BOBBY SHAFTOE (N.N. 2009: 71, 147, 155)

In cryptic puzzles, by contrast, proper nouns are used with greater variation as intertextual references. One major difference between the two sub-registers in the corpus is that intertextual proper nouns may occur in the subsidiary indication

130 Teresa Pham

of cryptic clues, i.e. as an intermediate step in the solution of the clue (32.9 %). From a cognitive linguistic point of view, especially well-known proper nouns automatically activate easily accessible pre-textual frames. Whereas the frames activated by intertextual references in non-cryptic puzzles are directly relevant for the answers, this is not always the case in cryptic puzzles. Only lexemes in the definition need to be interpreted literally. Intertextual references in the subsid-iary indication, however, usually require no pre-textual knowledge at all. They activate frames which mislead the solver and inhibit finding the answer, espe-cially when knowledge of a completely different pre-text is required. Thus, in (32) no knowledge of Lewis or the Lake poets is necessary because the answer, the name of a different poet, is an anagram of the letters <TV CS Lewis Lake> given in the subsidiary indication.

(32) TV broadcast with C S Lewis and Lake poet (9-4) – SACKVILLE-WEST (Browne 2009: 52)

Cryptic clues whose definitions and answers contain intertextual proper nouns (usually referring to the same pre-text; 15.9 %) resemble the first type of non- cryptic clue discussed before: an intertextual name in the definition is often sufficient for an unequivocal solution and only basic pre-textual knowledge is required. Whereas the additional subsidiary indication first complicates the acti-vation of the necessary cognitive representations, once identified, it indicates the correctness or falsehood of the supposed answer. In (33) the name of a Shake-spearean spirit also results from the insertion of the Roman numeral for one into an anagram of Lear. Equally, the answer in (34) is not only indicated by the defini-tion but is also confirmed by the subsidiary indication: for the mythological place name the graphemes of no and lava, paraphrased by sign of volcanic activity, are reversed.

(33) Shakespearean spirit – one into Lear possibility (5) – ARIEL(34) No sign of volcanic activity about Arthur’s Seat (6) – AVALON (Browne 2009: 132, 56)

When intertextual proper nouns occur in the answer (41.1 %) or, more rarely, in the definition only (5.1 %) and the corresponding counterpart is constituted by a semantically equivalent common noun or noun phrase, as with the second type of non-cryptic clue discussed above, the answer can usually not be inferred unam-biguously from the definition alone. However, in these cryptic clues, the subsid-iary indication may resolve the ambiguity. Furthermore, such clues require more detailed knowledge of pre-texts than the previous categories. While the definition in (35) does not unambiguously identify the intertextual answer, the subsidiary indication requires the formation an anagram of relies on. By contrast, splitting


a couple, i.e. a lady and a man, by S (from succeeded) results in a synonym of the intertextual eponym Casanova in (36).

(35) Relies on horribly haunted castle? (8) – ELSINORE(36) Casanova succeeded splitting couple? (5,3) – LADY’S MAN (Browne 2009: 40, 72)

Moreover, seven cryptic clues (4.4 %) require knowledge of the exact wording of pre-textual passages. Thus, although they do not follow the traditional pattern of quotation clues (featuring e.g. quotation marks and a gap which has to be recov-ered), they must be classified as quotation clues. Not only is their share larger than in non-cryptic puzzles, but they also refer to a different category of pre-texts. While only two clues, (37) and (38), refer to popular culture (an English nursery rhyme and a musical based on poems by Eliot), the others require knowledge of works of well-known British and international authors: Shakespeare (39), but only seemingly (40) and (41), Shelley (42), Carroll (43), Gray (41) and Plutarch (40).

(37) When Grundy was christened, 48 hours before Chesterton’s man (7) – TUESDAY (38) Reason for Macavity’s lack of presence (5) – ALIBI(39) Underworld scam over shelter – it blighted Gloucester’s winter (10) – DISCONTENT (40) Composer includes girl in second act of Julius Caesar (7) – VIVALDI(41) Hamlet’s rude ancestor heard warning priest (10) – FOREFATHER(42) Lovely old piece describing Shelley’s traveller’s land (7) – ANTIQUE(43) Giving nasty looks? Alice never heard of such a thing! (12) – UGLIFICATION (Browne

2009: 156, 104, 92, 58, 90, 44, 42)

Finally, three cryptic clues (1.9 %) are based on idioms derived from individual pre-texts. For these, the activation of pre-textual frames may be helpful, yet is by no means essential. The idiomatic collocation representing the answer in (44) is derived from Shakespeare’s Antony and Cleopatra (1.5.72). The subsidiary indi-cation instructs the solver to insert sad (‘blue’) into lad (‘boy’) and to add ays (‘votes’).

(44) Boy in blue votes for Green term (5,4) – SALAD DAYS (Browne 2009: 58)

The qualitative analysis of the corpus revealed that intertextual references in crosswords differ drastically from those in other registers, formally as well as functionally. Whereas intertextuality e.g. in newspapers or advertisements most frequently takes the form of quotations (cf. Pham 2014), interfigural relation-ships are the predominant formal category in the present corpus. Furthermore, intertextual references in other registers are usually doubly referential (cf. Pham 2014: 472), referring to both the extralinguistic world and the respective pre-texts.

132 Teresa Pham

Thus, an advertising slogan like “To smoke or not to smoke” for cigarettes (Mieder 1985: 126) can be interpreted as a statement about the world, expressing that the consumer has to decide between two alternative actions, or as an intertextual reference to Shakespeare’s Hamlet, additionally suggesting that the decision is essential to the consumer. By contrast, a literal, non-intertextual interpretation of references in non-cryptic clues as well as in the definition of cryptic puzzles does not lead to the answer, whereas intertextual references in the subsidiary indication of cryptic clues must be interpreted literally only. In both cases, the clues’ meaning is exhausted as soon as the answer has been identified. Intertex-tual references in puzzles can thus not be regarded as doubly referential.

The analysis of the corpus and the comparison with non-cryptic inter textual clues from a quality newspaper further identified various types of intertextual clue-answer pairs in non-cryptic and cryptic puzzles. These types typically estab-lish intertextual relationships of different intensity and occur more frequently or even exclusively in one or the other sub-register of crosswords. Cryptic puzzles not only use intertextuality more often to encode the answer. Intertextual clue-answer pairs in cryptic puzzles also tend to require the activation of more comprehensive pre-textual knowledge than in non-cryptic puzzles. Furthermore, cryptic puzzles require knowledge of a greater variety of pre-texts and also of pre-texts which cannot be regarded as part of popular culture. Finally, well-known pre-texts like Shakespeare’s Hamlet are referred to for misleading the solver by activating easily accessible frames of knowledge.

4 ConclusionWhile crosswords had never been studied in detail from a text linguistic perspec-tive, the present paper established and analysed crossword puzzles as an inde-pendent register with non-cryptic and cryptic puzzles as distinct sub-registers. In addition, neither had referential intertextuality been investigated as a charac-teristic of crosswords, nor had it been considered as a linguistic feature relevant for register analysis. Thus, Biber and Conrad only mention references to previous scientific publications or postings in chatgroups (2009: 68, 289), but no other types of intertextuality. However, intertextual clue-answer pairs occurring on average more than once in every crossword in the present corpus (1.7 intertextual clue-answer pairs/puzzle), this paper proved intertextuality to be one important strategy of codification in this type of word game. Furthermore, intertextuality is used in a manner differing radically from other texts, formally as well as func-tionally. As a pervasive, frequent and distinctive linguistic feature of crosswords


which is related to the purposes and the communicative situation characteristic of this register, intertextuality must be included in a register analysis of this type of puzzle according to the framework by Biber and Conrad (2009). It might also turn out to be relevant for the analysis of other registers.

Moreover, the present corpus study revealed considerable differences in the way non-cryptic and cryptic puzzles employ intertextual references. It thus con-firmed the distinction between two sub-registers of crosswords. In non-cryptic clues, intertextuality typically supports the unambiguous solution of the clue and demands only superficial pre-textual knowledge. In cryptic crosswords, by contrast, intertextual references and even quotation clues are more frequent, despite the latter having been officially banned in 1995. 23 cryptic clue-answer pairs of the corpus such as (40) or (32) even contain references to two or three pre-texts. Thus, cryptic puzzles more frequently require the activation of pre-tex-tual frames than non-cryptic puzzles and these frames need to be more detailed. Cryptic crosswords also feature references which are formally more variable and, at least initially, lead to ambiguities which account for part of the cryptic charac-ter of this sub-register. What is specific to cryptic puzzles is the reference to well-known pre-texts in the subsidiary indication for misleading the solver. However, the present corpus permits no conclusion as to whether the pre-textual categories targeted by crosswords are dependent on the type of sub-register or the expected knowledge of the target readership of the newspapers in which these puzzles are published (or both). Thus, further corpus studies should be undertaken to specif-ically examine this correlation.

BibliographyAugarde, Tony. 2003. The Oxford guide to word games. Oxford: Oxford University Press.Barthes, Roland. [1968] 1977. The death of the author. In Roland Barthes, Image music text,

142–148. London: Fontana Press.Barthes, Roland. 1973. Le plaisir du texte. Paris: Editions du Seuil.Beaugrande, Robert-Alain de & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics.

London & New York: Longman.Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge

University Press.Biddlecombe, Peter. 2009. Yet another guide to cryptic crosswords. http://www.biddlecombe.

demon.co.uk/yagcc/(accessed 27 January 2015).Browne, Richard. 2009. The Times crossword book 13. London: Times Books.Bühler, Karl. [1934] 1982. Sprachtheorie: Die Darstellungsfunktion der Sprache. Stuttgart &

New York: Gustav Fischer Verlag.Coffey, Steve. 1998. Linguistic aspects of the cryptic crossword. English Today 14(1). 14–18.

134 Teresa Pham

Cornell, Alan & Marion Cornell. 1980. Fragen und Antworten im englischen Kreuzworträtsel. In Ernst Burgschmidt (ed.), Beiträge zu einer Linguistischen Landeskunde und Sprachpraxis, 44–63. Braunschweig: Verlag E. Burgschmidt.

Derrida, Jacques. 1972. Positions: Entretiens avec Henri Ronse, Julia Kristeva, Jean-Louis Houdebine, Guy Scarpetta. Paris: Les Editions de Minuit.

Dienhart, John M. 1998. A linguistic look at riddles. Journal of Pragmatics 31. 95–125.Fix, Ulla. 2011. Das Rätsel: Bestand und Wandel einer Textsorte. Oder: Warum sich die

Textlinguistik als Querschnittsdisziplin verstehen kann. In Ulla Fix (ed.), Texte und Textsorten – sprachliche, kommunikative und kulturelle Phänomene, 185–214. 2nd edn. Berlin: Frank & Timme.

Furthmann, Katja. 2006. Die Sterne lügen nicht: Eine linguistische Analyse der Textsorte Pressehoroskop. Göttingen: V&R unipress.

Geeraerts, Dirk & Hubert Cuyckens (eds.). 2007. The Oxford handbook of cognitive linguistics. Oxford: Oxford University Press.

Genette, Gérard. 1982. Palimpsestes: La littérature au second degré. Paris: Éditions du Seuil.Gilbert, Val. 2001. The Daily Telegraph: How to crack the cryptic crossword. London: Pan Books.Goldblum, Naomi & Ram Frost. 1987. The crossword puzzle paradigm: The effectiveness of

different word fragments as cues for the retrieval of words. Haskins laboratories status report on speech research SR-89/90. 133–146.

Greenblatt, Stephen (ed.). 1997. The Norton Shakespeare. Based on the Oxford Edition. London: W. W. Norton & Company.

Greimas, Algirdas Julien. 1970. L’écriture cruciverbiste. In Algirdas Julien Greimas (ed.), Du sens: Essais sémiotiques, 285–307. Paris: Éditions du Seuil.

Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.Hambrick, David Z., Timothy A. Salthouse & Elizabeth J. Meinz. 1999. Predictors of crossword

puzzle proficiency and moderators of age–cognition relations. Journal of Experimental Psychology: General 128(2). 131–164.

Hebel, Udo J. 1991. Towards a descriptive poetics of allusion. In Heinrich F. Plett (ed.), Intertextuality, 135–164. Berlin & New York: Walter de Gruyter.

Heinemann, Margot. 2000. Textsorten des Alltags. In Klaus Brinker, Gerd Antos, Wolfgang Heinemann & Sven F. Sager (eds.), Text- und Gesprächslinguistik. Ein internationales Handbuch zeitgenössischer Forschung, 604–614. Berlin & New York: Walter de Gruyter.

Helbig, Jörg. 1996. Intertextualität und Markierung: Untersuchungen zur Systematik und Funktion der Signalisierung von Intertextualität. Heidelberg: Universitätsverlag C. Winter.

Karrer, Wolfgang. 1985. Intertextualität als Elementen- und Struktur-Reproduktion. In Ulrich Broich & Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische Fallstudien, 98–116. Tübingen: Niemeyer.

Kristeva, Julia. 1968. Le texte clos. Langages 12. 103–125.Mieder, Wolfgang. 1985. Sprichwort, Redensart, Zitat: Tradierte Formelsprache in der Moderne.

Bern, Frankfurt am Main & New York: Peter Lang.Mok, Quirinus Ignatius Maria. 1987. Mots croisés et ambiguïté. In Brigitte Kampers-Manhe &

Co Vet (eds.), Études de linguistique Française offertes à Robert de Dardel par ses amis et collègues, 97–108. Amsterdam: Éditions Rodopi B. V.

Mollica, Anthony. 2007. Crossword puzzles and second-language teaching. Italica 84(1). 59–78.Moorey, Tim. 2008. How to master the Times crossword: The Times cryptic crossword

demystified. London: Harper Collins Publishers.


Müller, Wolfgang G. 1991. Interfigurality: A study on the interdependence of literary figures. In Heinrich F. Plett (ed.), Intertextuality, 101–121. Berlin & New York: Walter de Gruyter.

Nickerson, Raymond S. 2011. Five down, absquatulated: Crossword puzzle clues to how the mind works. Psychonomic Bulletin & Review 18. 217–241.

N.N. 2009. The Sun two-speed crossword book 10. London: Harper Collins.Parker, Timothy. 15.04.2013. Universal crossword. New York Post. New York: News Corporation.Partington, Angela (ed.). 1992. The Oxford dictionary of quotations. 4th edn. Oxford & New

York: Oxford University Press.Partridge, John G. 1992. Linguistic reflections on the English crossword puzzle. In Claudia Blank

(ed.), Language and civilization. A concerted profusion of essays and studies in honour of Otto Hietsch, 495–504. Frankfurt am Main: Peter Lang.

Pepicello, William J. 1980. Linguistic strategies in riddling. Western Folklore 39(1). 1–16.Pfister, Manfred. 1985. Konzepte der Intertextualität. In Ulrich Broich & Manfred Pfister (eds.),

Intertextualität: Formen, Funktionen, anglistische Fallstudien, 1–30. Tübingen: Niemeyer.Pham, Teresa. 2014. Intertextuelle Referenzen auf Shakespeare. Eine kognitiv-linguistische

Untersuchung. Münster: LIT Verlag.Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive

grammar of the English language. Harlow: Longman.Riffaterre, Michael. 1981. Interpretation and undecidability. New Literary History 12(2).

227–242.Rolf, Eckard. 1993. Die Funktion der Gebrauchstextsorten. Berlin & New York: de Gruyter.Rusbridger, Alan (ed.). 11.–16.05.2013. Quick crossword No. 13,418–13,422. London: Guardian

Media Group.Scheufele, Dietram A. & David Tewksbury. 2007. Framing, agenda setting, and priming: The

evolution of three media effects models. Journal of Communication 57. 9–20.Schlepper, Wolfgang. 1981. Confusing poet makes fine stuff (5): The “wrestle with words

and meanings” in the crossword puzzle. In Hans-Jürgen Diller, Stephan Kohl, Joachim Kornelius, Erwin Otto & Gerd Stratmann (eds.), anglistik & englischunterricht. Vol. 15, 61–80. Trier: WVT Wissenschaftlicher Verlag Trier.

Schubert, Christoph. 2012. Englische Textlinguistik. Eine Einführung. 2nd edn. Berlin: Erich Schmidt.

Schulte-Middelich, Bernd. 1985. Funktionen intertextueller Textkonstitution. In Ulrich Broich & Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische Fallstudien, 197–242. Tübingen: Niemeyer.

Scott, W. T. & H. O’Donnell. 1998. Recovering meaning from chaos? Word play and the challenge of sense. In William Pencak & J. Ralph Lindgren (eds.), New approaches to semiotics and the human sciences: Essays in honor of Roberta Kevelson, 203–239. New York: Peter Lang Publishing.

Simpson, John A. & Edmund S. C. Weiner (eds.). 2015. Oxford English dictionary online. Oxford: Oxford University Press. http://oed.com/(accessed 27 January 2015).

Skinner, Kevin. 2008. How to solve cryptic crosswords. London: Right Way, Constable & Robinson.

Stephenson, Hugh. 2007. Secrets of the setters. How to solve the Guardian crossword. London: Atlantic Books.

Stocker, Peter. 1998. Theorie der intertextuellen Lektüre: Modelle und Fallstudien. Paderborn: Ferdinand Schöningh.

136 Teresa Pham

Stratmann, Gerd. 1995. Kreuzworträtsel. In Rüdiger Ahrens, Wolf-Dietrich Bald & Werner Hüllen (eds.), Handbuch Englisch als Fremdsprache (HEF), 192–195. Berlin: Erich Schmidt.

Underwood, Geoffrey, Caroline Deihim & Viv Batt. 1994. Expert performance in solving word puzzles: From retrieval cues to crossword clues. Applied Cognitive Psychology 8. 531–548.

Weisskirch, Robert S. 2006. An analysis of instructor-created crossword puzzles for student review. College Teaching 54(1). 198–201.

Witte, Kenneth L. & Joel S. Freund. 1995. Anagram solution as related to adult age, anagram difficulty, and experience in solving crossword puzzles. Aging, Neuropsychology, and Cognition 2(2). 146–155.

Section II: Cross-register comparison

While the studies in Section I concentrated on single registers, Section II provides cross-register comparisons, in which the distinctive features and markers of reg-isters can be identified with great accuracy and perspicuity by means of juxtapo-sition. As the contributions will show, such comparisons are particularly reveal-ing when the registers under discussion are from clearly divergent domains. The fact that each of the three papers in Section II includes academic writing demon-strates that this register is highly distinctive and therefore well-suited as a yard-stick for text-linguistic collation.

Christina Sanchez-Stockhammer’s study “Punctuation as an indica-tion of register: Comics and academic texts” establishes a link to the papers by Rolf Kreyer and Teresa Pham in Section I, since it also analyses a register from popular culture, in this case the language of comics. At the same time, this con-tribution enters uncharted linguistic territory by focusing on punctuation as a register marker, which has been widely neglected so far despite its pervasive-ness in written discourse. The study is based on two small-scale corpora, namely AcadText, a corpus of journal articles, and CoCo, a comic corpus, both of which were designed and compiled for register comparison by the author. It is shown that different punctuation marks have varying functions and deviant frequencies in relation to the written or spoken mode prominent in the registers. As a result, features of punctuation are suggested as a valid and necessary extension of Biber and Conrad’s (2009) model of register analysis.

In her paper “Linking up register and cognitive perspectives: Parenthetical constructions in academic prose and experimentalist poetry”, Martina Lampert chooses a specific linguistic feature as the standard of register comparison. By concentrating on the syntactic construction of parenthesis, she draws an analogy between a minimalist poem by E. E. Cummings and a scientific research paper within the framework of a microscopic qualitative analysis. She picks two regis-ters which are located at the opposite ends of a continuum of written discourse and pays attention to punctuation marks as well, in this case to parenthetical round brackets (so-called lunulae). Since situational features of register descrip-tion are closely linked to cognitive principles, a correspondence is established between Biber’s register analysis and Leonard Talmy’s cognitive semantic approach. Lampert concludes by arguing that parenthesis should be included in Biber and Conrad’s (2009) list of lexico-grammatical features relevant to register investigation.

138 Section II: Cross-register comparison

The study “Cohesive devices across registers and varieties: The role of medium in English” by Stella Neumann and Jennifer Fest combines the compar-ative analysis of academic writing, administrative writing, broadcast discussions, conversations and exams with regional variation. The term “regional” is used here and in the paper in the broader Hallidayan sense grouping variation by the speakers’ geographical background as opposed to functional variation varying by context of use, not by user. Based on data from the International Corpus of English, functional variation is investigated within the six L1 and L2 Englishes of Singa-pore, Hong Kong, India, Canada, Jamaica and New Zealand. An examination of the lexico-grammatical features of pronouns, conjunctions and lexical density sheds new light on the use of cohesive ties across both varieties and registers. In particular, quantitative surveys show that there are significant differences in the frequency of the cohesive items between spoken and written registers. Along these lines, it becomes obvious that an exhaustive discussion of any regional or national variety of English needs to take into account register variation as well, so that text linguistics is shown to be an indispensable complement to sociolinguis-tics. Moreover, this paper builds a bridge to Section III, in which the interrelation between regional and register variation is further elucidated.

Christina Sanchez-StockhammerPunctuation as an indication of register: Comics and academic texts

Abstract: The currently most established definition of a register is the one devel-oped by Douglas Biber in numerous publications (e.g. Biber 1988, 1995, 2006), namely “a variety associated with a particular situation of use” (Biber and Conrad 2009: 6). The delimitation of individual registers such as telephone conversations or newspaper editorials is based on their situational context, their lexical and grammatical characteristics and the functional relationship obtaining between context and language.

While Biber’s multidimensional approach already considers a multitude of different lexico-grammatical features as potential indicators of register, this paper adds a new perspective by exploring a feature type which has not been taken into account so far in the different versions of his model, namely punctu-ation.

After discussing the functions of various punctuation marks, the paper pre-sents the corpus-based evidence of a small-scale study on two registers tending towards the extremes of the spoken – written dimension, namely academic texts and comics. To this end, the corpus AcadText was compiled for the present study by analogy to the comic component of the comic corpus CoCo (described in Sanchez-Stockhammer 2012), which comprises excerpts from Superman, Batman and Uncle Scrooge and considers the text occurring in text boxes with narration, inside speech bubbles, as onomatopoeia superimposed on the pictures etc.

The results show that some punctuation marks (such as exclamation marks and round brackets) correlate strongly with spoken and written style respectively and barely occur in the contrasting register. Furthermore, even in those cases where the results are quantitatively similar, differences in usage become obvious upon closer consideration – e.g. the dominant use of commas after introduc-tory interjections or proper nouns with vocative function in comics as compared to more varied uses of that punctuation mark in academic texts. These results suggest that punctuation is indicative of register indeed, and that it makes sense to introduce punctuation as an additional category in Biber’s register model.

140 Christina Sanchez-Stockhammer

1 Introduction: Punctuation and registerMany features of language occur in both speech and writing, but some are spe-cific to one of these two modalities: thus phonetic assimilation phenomena and intonation are by necessity restricted to the spoken modality, since they concern auditory phenomena, whereas punctuation immediately comes to mind as a visual linguistic feature that only occurs in writing. While it is sometimes claimed that punctuation acts as a substitute for prosody and pauses in the written modal-ity, Meyer (1987: 69) notes that “punctuation is at best a rather crude reflection of the complexities of prosody” and that the relation between the two is unsys-tematic. Thus commas are sometimes but not always used in contexts where one would expect a pause in speech – and sometimes they occur in contexts without a prosodic juncture: for example, the sentence

(1) Those who are fond of sleeping late make unreliable workers.

is usually spoken with a pause after late, but it does not contain a comma if common spelling conventions are adhered to (Meyer 1987: 70). By contrast, the sentence

(2) A couple of the males made good comedy, too.

is realised with a comma but arguably not produced with a pause in speech (Meyer 1987: 71). This raises the question whether the reverse relation between punctuation as the primary feature and prosody as its realisation in speech can also be postulated. One of the few exceptions where it is claimed that a feature of the written modality is rendered in oral communication are so-called air quotes, which are drawn into the air manually while speaking and which “intermodally” refer the listeners/receivers to the printed source of a spoken quote (cf. Lampert 2013). However, air quotes rely on visual gestures rather than prosody. By con-trast, punctuation marks are produced orally on some occasions, such as when separating whole numbers from decimals, e.g. in

(3) one point five.

However, in such cases it is actually the terms referring to the punctuation marks that are realised in the spoken modality rather than the corresponding function of the punctuation mark. A third conceivable option regarding the relation between punctuation and prosody is that there is none: for instance, Nunberg (1990: 7) argues that punctuation has no correspondence in speech and that it exploits “the particular expressive resources that graphical presentation makes availa-

Punctuation as an indication of register: Comics and academic texts 141

ble” in order to serve the requirements of written communication. Yet whatever the relation between speech and writing is recognised to be, what remains is that punctuation constitutes a characteristic feature of written language. This raises the question whether it is possible to recognise general principles of punctuation underlying all written language use, which are common to all written registers, or whether it is more appropriate to consider more specific tendencies in the use of punctuation marks in particular communicative situations. For instance, the combination <…???!!!> seems acceptable in the sentence

(4) She did what…???!!!

whereas one is extremely unlikely to encounter an example such as

(5) The increasing evidence that language processing is sensitive to lexical and structural co-occurrences at different levels of granularity and abstraction has led to the hypothesis that lexical and structural processing may be unified…???!!!

in actual language usage – at least not in the original context of use.1 Various explanations can be advanced for this: for instance, the first sentence is very short and therefore lends itself to the incredulous intonation associated with such a cluster of punctuation marks far better than the second sentence with its complex structure. Possibly even more importantly, the second sentence contains information situating it in the register of written academic language (it has been adapted from Snider’s 2009 article “Similarity and structural priming”), and it would seem that the punctuation above is unusual for an academic text to say the least. This discrepancy between the constructed example above and readers’ expectations suggests that language users tend to expect particular types of punc-tuation mark and their combination in some types of text rather than in others. If that is the case, then it should also be possible to use punctuation marks as an indication or even marker of individual registers – a hypothesis that will be explored in the remainder of this paper.

Following Peters (2004: 447), the present contribution distinguishes between word punctuation (comprising e.g. hyphens and apostrophes occurring within

1 By contrast, it is conceivable to encounter the example in an online discussion forum or blog with reference to unclear academic writing. (I am grateful to my anonymous reviewer for point-ing this out to me.) In that case, however, the sentence (which is a quotation) and the punctu-ation marks (which represent a comment) are situated on different linguistic levels. This is yet another example of the more general observation that texts with a metalinguistic function in the sense of Jakobson (1985) may depart from common usage. As a consequence, texts on linguistics should ideally be avoided in the compilation of general-language corpora.


unspaced sequences of letters) and sentence punctuation, and it concentrates on the latter. Sentence punctuation is usually characterised by the use of a space on that side of the punctuation mark which is not directly attached to a preceding or following sequence of letters and comprises

full stop .question mark ?exclamation mark !comma ,semicolon ;colon :dash –slash /suspension dots …single quotation marks ‘ ’double quotation marks “ ”round brackets ( )square brackets [ ] .

Register as the second concept which needs to be defined for the empirical study presented here is used with different meanings in the literature (cf. Schubert, this volume). The most commonly used definitions of register are based on the work of Douglas Biber. In numerous publications (e.g. Biber 1988, 1995, 2006), his use of the term has developed from what might be called a synonym of genre (Biber 1995: 910 about Biber 1988) to “a variety associated with a particular situ-ation of use” (Biber and Conrad 2009: 6), i.e. a concept comprising all situation- dependent variation in language use, regardless of the level of specialisation (Biber and Conrad 2009: 32), but with specific sub-registers displaying less var-iation than more general registers (Biber and Conrad 2009: 33). In Biber’s model (for a summary cf. Schubert, this volume), register features occur throughout texts from a particular register and are more frequent in the target register than in most other registers. Thus the passive voice is not restricted to academic writing and may occur in different types of text, but it is particularly frequent in that register. Register features can be structures on any linguistic level, from words to syntactic constructions. The occurrence of specific lexico-grammatical features in regis-ters is attributed to their functionality (Biber 2006: 11): they are believed to be “particularly well suited to the purposes and situational context of the register” (Biber and Conrad 2009: 6). The co-occurrence of features is therefore interpreted as reflecting their shared functions (Biber 1995: 30). With regard to the features under consideration, Biber’s approach has evolved in the course of time:


– Both Biber (1988: 73–75) and Biber (1995: 94–96) consider 16 major categories comprising 67 linguistic features:

1) Tense and aspect markers2) Place and time adverbials3) Pronouns and pro-verbs4) Questions5) Nominal forms6) Passives7) Stative forms8) Subordination features9) Prepositional phrases, adjectives and adverbs10) Lexical specificity11) Lexical classes12) Modals13) Specialised verb classes14) Reduced forms and dispreferred structures15) Co-ordination16) Negation.

– These are reduced to seven major categories in Biber (2006: 241):

1. vocabulary distributions (e.g., the number of different words in classroom teaching versus textbooks), including the distributional classifications of words from the four content word classes (e.g., common vs. rare nouns, common vs. rare verbs);

2. grammatical part-of-speech classes (e.g., nouns, verbs, first and second person pro-nouns, prepositions);

3. semantic categories for the major word classes (e.g., activity verbs, mental verbs, exist-ence verbs);

4. grammatical characteristics (e.g., nominalizations, past tense verbs, passive voice verbs);

5. syntactic structures (e.g., that relative clauses, to complement clauses);6. lexico-grammatical associations (e.g., that-complement clauses and to-complement

clauses controlled by communication verbs vs. mental verbs);7. lexical bundles – i.e. recurrent sequences of words.

– Biber and Conrad (2009: 78–82), by contrast, classify their 75 subcategories (some of which can be split up further) into 15 major categories:

1) Vocabulary features 2) Content word classes 3) Function word classes 4) Derived words 5) Verb features 6) Pronoun features


7) Reduced forms and dispreferred structures 8) Prepositional phrases 9) Coordination10) Main clause type 11) Noun phrases 12) Adverbials13) Complement clauses 14) Word order choices 15) Special features of conversation.

Without going into detail what these various categories represent precisely, it becomes immediately obvious that punctuation or other orthographic character-istics (such as capitalisation) do not figure among the distinctive features treated in any of Biber’s approaches, in spite of the fact that Biber (1995: 29) maintains that “[a]ny linguistic feature having a functional or conventional association can be distributed in a way that distinguishes among registers”.

This raises the question whether there are any arguments supporting the deliberate exclusion of punctuation as a distinctive feature. Based on Biber’s defi-nition above, one might consider arguing that punctuation does not constitute a linguistic feature – but this is hard to maintain: while punctuation is restricted to the written modality, it is used nonetheless to represent linguistic meaning (cf. below). Punctuation marks may even reverse the meaning of a sentence com-pletely; compare

(6) The Democrats say the Republicans are sure to win the next election.

in which the Republicans are the assumed victors, as against

(7) The Democrats, say the Republicans, are sure to win the next election.

In the second example, the Democrats are expected to be victorious (cf. Runkel and Runkel 1984: 34). In view of its meaning-distinguishing function, punctua-tion should consequently be considered a linguistic feature.

If punctuation had no conventional or functional association, as required by the definition of linguistic features above, it should be possible to use all punctu-ation marks interchangeably. This is, however, not the case (cf. the next section).

Since punctuation is restricted to writing, using it as a feature would seem to have the disadvantage of disregarding all registers belonging to spoken lan-guage. This is, however, only true to a certain extent, since spoken texts may be transcribed (e.g. in interviews for magazines or in corpora), and punctuation is conventionally inserted for the convenience of the reader in such cases. The rela-tion between the two dimensions is clarified by Söll and Hausmann (1985: 17),


who distinguish between the medium of realisation (auditory vs. visual code) as opposed to the characteristics of conception (spoken vs. written style). Punctu-ation is thus only present in the visual code but may be used in texts belonging both to the spoken or written style. Söll and Hausmann’s distinction is thus useful e.g. in view of the possibilities offered by computer-mediated communication, which may use the visual code but some kind of spoken style. Note also that Biber and Conrad’s (2009: 78–82) long list of linguistic features includes a subcategory “Special features of conversation”, which is restricted to a subgroup of registers with a tendency towards oral realisation and includes e.g. pauses, fillers and backchannels. As a consequence, the addition of a subcategory “Punctuation”, which applies to registers in the visual code only, would appear to be legitimate.

Furthermore, one should not overlook the fact that Biber and Conrad (2009: 63) speak of a “list of features that you might consider” in register anal-ysis, which means that they do not claim completeness. They also state that “[C]onsulting a corpus-based reference grammar is useful for deciding which fea-tures to study”. Since punctuation is only marginally treated in such grammars, possibly in view of written language’s widely assumed status as a secondary system (cf. e.g. Bloomfield 1933: 21), this may have led to its omission from the most influential model of register so far.

To conclude, there are no convincing reasons for excluding punctuation as a possible register feature. Instead, it is argued in the following that there are several good reasons for considering it.

2 Functions of the punctuation marksBiber’s approach is based on the premise that “linguistic features co-occur in texts because they reflect shared functions” (Biber 1995: 30). This means that it should be possible to establish a link between the punctuation marks occurring in texts (and their functions) and the various lexico-grammatical register features discussed in the previous literature (with their corresponding functions linking them to the situational context and the communicative purpose of the respective register). If that were indeed the case, it should be possible to make an informed guess about (or even recognise) the register of a text based solely on the punctua-tion marks occurring in that text. The following illustrative passages are extracts from example texts used in Biber and Conrad (2009). Since these “illustrate the linguistic patterns found in previous large-scale analyses of these registers” (Biber and Conrad 2009: 64), they can be considered prototypical representatives


of the corresponding registers and should also fulfil that role with regard to punc-tuation.

: . . . : ? : . : . : ? [ ] : ? : .

Figure 1: Punctuation from text A

Figure 1 constitutes a sequence of punctuation marks which were extracted from a short text (cf. below) by deleting everything except the punctuation marks. Spaces were then added to make the punctuation marks more clearly discernible. Even in this reduced format, which is void of any lexical or syntactic content, it is possible to form some idea about the communicative situation of the text. The task is made easier if paragraph breaks are conserved as well:

: . . .: ?: .: .: ? [ ]: ?: .

Figure 1a: Punctuation from text A with paragraph breaks

The most striking feature is presumably the occurrence of a colon at the begin-ning of every line, which is followed by either full stops or a question mark, thereby suggesting an interactive communicative situation. Indeed, the text is part of a conversation between a group of friends walking to a restaurant, which is included in the Longman Spoken and Written English Corpus:

Judith: Yeah I just found out that Rebekah is going to the University of Chicago to get her PhD. I really want to go visit her. Maybe I’ll come out and see her.

Eric: Oh is she?Judith: Yeah.Eric: Oh good.Elias: Here, do you want one? [offering a candy]Judith: What kind is it?Elias: Cinnamon.

Text A: Text sample 1.1 from the LSWE Corpus (Biber and Conrad 2009: 7–8)

The colons in the full text are actually not line-initial but follow the names of the speakers, just as they would in the scripted version of a play. Following the same type of convention, the information referring to the extra-linguistic context has

been added in square brackets at the end of one line. The punctuation marks are thus strongly indicative of conversation.

: ! . : ? : ! : ! : < . > . ! : < . > ? : !

Figure 2: Punctuation from text B

The same is true of Figure 2. The large amount of exclamation marks, colons, question marks and (this time angled) brackets in Text B makes it highly unlikely that the text should be a tax declaration document or newspaper article. While the fact that it is an excerpt from a drama – i.e. scripted speech – and no transcript of a conversation cannot be deduced from punctuation alone, the oral dimension of the text emerges by analogy to Text A.

RUTH: I want to go! I promised Chris Burns I’d meet him.BEATRICE: Can’t you understand English?RUTH: I’ve got to go!BEATRICE: Shut up!RUTH: <Almost berserk.> I don’t care. I’M GOING ANYWAY!BEATRICE: <Shoving RUTH hard.> WHAT DID YOU SAY?TILLIE: Mother!

Text B: Text sample 1.7 from Biber and Conrad (2009: 20): Paul Zindel’s 1970 drama The Effect of Gamma Rays on Man in the Moon Marigolds

This raises the question what typical register features are linked to punctuation. For example, the large number of first and second person pronouns typical of spoken conversation (Biber and Conrad 2009: 7–8) – which is supported by the prototypical extracts above – cannot be derived from punctuation. By contrast, another characteristic linguistic feature can: the pervasiveness of questions, which are usually marked by sentence-final question marks in many (but not all) transcripts of spoken language, e.g. in Text A, and in texts that are written to be spoken (e.g. Text B). The presence of question marks can thus be linked to the presence of questions: both are indicative of interaction (cf. Biber and Conrad 2009: 7–8). Since questions favour the production of answers as the privileged second pair part (Levinson 1983: 307), full stops following question marks are likely to represent not only statements but answers. This assumption is supported by Texts A and B above. According to Biber (1988: 227), questions “indicate a concern with interpersonal functions and involvement with the addressee”. It follows from this that they should be more frequent in registers involving that


function, occurring e.g. more frequently in riddles2 than in front-page newspaper articles (cf. Biber and Conrad 2009: 7–8) and also in scripted or transcribed con-versation.

The analysis of the four main types of pragmatic discourse function and the syntactic sentence types in Quirk et al. (1985: 803–804) with the punctuation marks used in the examples of that grammar reveals that this correlation is no coincidence: there is a strong link between – statements (which mainly convey information),

declaratives (in which the subject usually precedes the verb) andfull stops, e.g. The Prime Minister resigned.

– questions (which usually seek information), interrogatives (characterised by inversion, e.g. of subject and operator, or sentence-initial wh-question words) andquestion marks, e.g. Did the Prime Minister resign? or What did the Prime Minister do?

– directives (which are mainly used to instruct someone to do something), imperatives (which have no subject and whose verb is in the base form) and exclamation marks, e.g. Leave me alone!

– exclamations (in which speakers express the extent to which they are impressed), exclamatives (which begin with what/how and usually have no subject-verb inversion) and exclamation marks, e.g. What a funny hat!

It therefore seems safe to claim that the punctuation marks closing sentences follow a prototype-based distribution (cf. e.g. Rosch 1973, 1975) with an ideal exemplar in the centre of the category and fuzzy boundaries in its periphery. The latter would include less typical uses, such as

(8) I’d love a cup of tea.

which is a declarative from the perspective of syntax but pragmatically a direc-tive, inciting the hearer to serve a hot drink (Quirk et al. 1985: 804). The punctu-

2 Note, however, that puzzles need not necessarily be phrased as questions, e.g. in the case of crosswords (cf. Pham, this volume), which tend not to use question marks.


ation with a full stop in Quirk et al. (1985) for this particular example seems to suggest that in doubtful cases, punctuation follows the syntactic rather than the pragmatic perspective. While the use of an exclamation mark does not seem to be entirely excluded in this particular example (even if an informal internet search confirms the full stop as the norm), other indirect speech acts such as Searle’s (1975: 73) famous

(9) Can you pass the salt?

which is syntactically a question but actually a directive, definitely require the syntactically-based question mark. By contrast, the use of an exclamation mark making

(10) Can you pass the salt!

slightly more explicitly directive would seem quite unusual. As a consequence, we may conclude that there is a strong correlation between punctuation marks and particular grammatical structures – even more than with discourse func-tions, but often (in direct speech acts), both aspects will coincide.

The communicative purposes of a register determine its discourse functions and the syntactic structures associated with these – which are in turn linked to particular prototypical punctuation marks. However, some registers may simply not require particular types of expression: for instance, instruction manuals do not usually engage in mutual interaction with their readers. As a consequence, one would not expect them to contain any questions and consequently no ques-tion marks (except, possibly, the occasional rhetorical question to guide their readers more vividly).

Note, however, that the conventions of particular registers may require the use of particular punctuation marks in spite of communicative purposes or favoured syntactic sentence types which would prototypically result in the use of a differ-ent punctuation mark: thus recipes are directive and use a considerable amount of verbs in the imperative (cf. Arendholz et al. 2013), but they rarely contain any exclamation marks. This would seem to imply that the conventions associated with particular registers can override more general punctuation tendencies.

The next extract of punctuation also belongs into a highly conventionalised register.

( ). ( ) , ( . , . , . ). , ( ; . ). , ( , ; . ; . ; . ; ). ( ) , ( . ). . . ( ) .

Figure 3: Punctuation from text C


This sample is not only characterised by its complete lack of question marks and exclamation marks but also by a large proportion of full stops and brackets, many commas and even some semicolons. It comes from the introduction to a scientific research article and is thus situated clearly towards the extreme of the written dimension of language conception.

Hybridization between species can severely affect a species status and recovery (Rhymer & Simberloff 1996). Threatened species (and others) may be directly affected by hybridization and gene flow from invasive species, which can result in reduced fitness or lowered genetic variability (Gilbert et al. 1993, Gottelli et al. 1994, Wolf et al. 2001). In other cases, hybridiza-tion may provide increased polymorphisms that allow for rapid evolution to occur (Grant & Grant 1992; Rhymer et al. 1994). Species can also be influenced indirectly, because hybrid-ization may affect the conservation status of threatened species and their legal protection (O’Brien & Mayr 1991a, 1991b; Jones et al. 1995; Allendorf et al. 2001; Schwartz et al. 2004; Haig & Allendorf 2005). The Northern Spotted Owl (Strix occidentalis caurina) is a threat-ened subspecies associated with rapidly declining, late-successional forests in western North America (Gutierrez et al. 1995). Listing of this subspecies under the U.S. Endangered Species Act (ESA) attracted considerable controversy because of concern that listing would lead to restrictions on timber harvest.

Text C: Text sample 6.13 from Biber and Conrad (2009: 163): Scientific research article (Genetic identification of Spotted Owls … , Conservation Biology, 2004).

While scientific research attempts to answer research questions, these are usually formulated indirectly, with the consequence that the number of direct questions and the ensuing question marks is relatively low (although not necessarily zero). Exclamation marks, by contrast, seem to be practically excluded in this register. This is presumably because the discourse functions usually associated with that punctuation mark (cf. Quirk et al. 1985: 803–804 above) contradict the general principles of academic research: it is neither directive (at least not overtly) nor concerned with the expression of emotions such as being impressed. These con-ventions are communicated between researchers, e.g. by supervisors marking their students’ papers or by means of style guides.3

The occurrence of large numbers of full stops is not only due to the focus of research papers on transmitting information but also to the frequent occurrence of the abbreviation et al., which is rarely found outside academia, in this particu-lar passage. The use of brackets is also highly conventionalised: with few excep-

3 Note, however, that very popular style guides giving advice on academic research, such as Booth et al. (2008), do not mention punctuation (merely style), and others, such as Swales and Feak (2010: 27), limit themselves to the discussion of semicolons, colons, dashes and commas.


tions containing additional explanations, most brackets contain references to other texts. This supports the view that particular punctuation marks tend to cor-relate with particular registers, and that some punctuation marks are employed following register-specific conventions which are particularly adequate for the communicative needs of the register in question. In academic research, this includes the need to refer to previous research in a clear and unobtrusive way.

If we take all of the above into account, a question that emerges is whether there are any general functions of punctuation marks which may be put to spe-cific ends in individual registers. According to Huddleston and Pullum (2002: 1729–1730), punctuation can be ascribed four main functions from a general per-spective: – indicating boundaries (e.g. full stops mark the end of sentences)– indicating status (e.g. question marks indicate that a sentence is a question)– indicating omission (e.g. …)– indicating linkage (e.g. commas mark that units belong together).4

A more specific but nonetheless brief overview of the functions of individual punctuation marks is provided by Seely (2007: 16–124): the● full stop ○ marks the ends of sentences

○ marks complete groups of words○ ends abbreviations○ acts as a separator in e-mail and website addresses

● question mark ○ marks the end of a question○ marks statements as doubtful or questionable, e.g.

in brackets● exclamation mark ○ ends exclamations

○ ends loud or shouted direct speech○ ends sentences expressing amusement○ is used in brackets to express amusement or irony

● comma ○ separates items in lists○ encloses sentence parts parenthetically○ marks the divisions between the clauses in

complex sentences○ separates sections of sentences or numbers con-

sisting of more than four digits to make them easier to read

○ introduces or ends direct speech

4 For a more detailed theoretical account of the guide functions of punctuation cf. Patt (2013).


● semicolon ○ lists items which are very long○ marks a break between two parts of a sentence,

which are usually finite clauses that could stand on their own, in order to show the close link between them

● colon ○ introduces lists○ introduces direct speech or quotations○ separates two parts of a sentence of which the first

leads on to the second● dash ○ encloses sentence parts parenthetically

○ introduces something which further develops or exemplifies what has been written before

○ introduces asides by the writer○ shows interruptions or break-offs in mid-sentence

(in direct speech)● slash ○ indicates alternatives

○ shows a range○ is used in some abbreviations (e.g. c/o)

● suspension dots ○ reduce the length of quotations○ show incompleteness in direct speech

● quotation marks ○ separate direct speech, titles or quotations or ideas marked as not being the author’s

● brackets ○ indicate that the words enclosed within are not essential to the meaning of the sentence but provide supplementary information.

Even if this account necessarily simplifies a more complex situation, it provides a good point of departure for the consideration of more specific uses of the punc-tuation marks.

Since full stops are used at the end of statements, they seem to represent a relatively unmarked punctuation mark. They do, however, change their function and become more marked as soon as they are combined into suspension dots, which signal omission.

Question marks are apparently only placed at the end of direct questions, and direct questions always end with a question mark. Even the seemingly excep-tional sceptical use listed above can be interpreted as shorthand for a question such as “Is that true?”, e.g. in

(11) There is no such thing as a free lunch. (?)


In most other cases, however, the relation is not as unequivocal, because the punctuation marks have several functions (some of which may overlap with the functions of other punctuation marks): as we have seen, colons can be used to set off the name of characters in a play from their text, but very frequently, they are followed by explanations or specifications and they can therefore commonly be found in registers with an argumentative function, such as academic papers. Alternatively, additional information may be included in brackets or follow-ing a dash,5 but different degrees of formality are associated with the various punctuation marks. According to Seely (2007: 84), brackets are “the most formal (and most obvious) way of showing parenthesis”, commas are “less forceful” and dashes “the least formal”. This seems to imply that a superficial analysis of punctuation marks does not suffice: it is not enough to simply count the number of commas, question marks etc. (not even if the number of words in the texts is taken into consideration), but it is also necessary to consider their individual functions and possibly even their stylistic value. This is the only means of iden-tifying highly conventionalised register-specific uses, such as initial exclamation marks expressing negation (e.g. !interesting = not interesting) in “hacker-influ-enced interactions” (Crystal 2001: 90) or the specialised use of double quotation marks in comics (cf. below).

3 Punctuation in comics vs. academic textsIn order to confirm or reject the hypothesis that punctuation can serve as an indi-cation of register and to identify register-specific usage of punctuation, a small-scale empirical study was conducted. Since register characteristics become most obvious if very different registers are analysed contrastively (Biber and Conrad 2009: 8), a register with a relatively strong tendency towards spoken conceptu-alisation (namely comics) was contrasted with a register tending towards the written extreme (namely academic texts). For the first of these, the comic compo-nent of CoCo, the Comic Corpus described in Sanchez-Stockhammer (2012), was used.

5 Cf. Lampert (this volume) for a detailed treatment of parenthesis.

Table 1: The Comic Corpus (CoCo) texts (cf. Sanchez-Stockhammer 2012: 68)

Text Words Sentences Words per sentence

Batman 868 153 5.67Superman 744 101 7.37Uncle Scrooge 774 140 5.53

The language in comics considered in the compilation of CoCo occurs in head-ings, text boxes with narration, speech bubbles, thought bubbles and subtitles (common particularly in cartoons – which are part of CoCo but were not consid-ered in the present study), as onomatopoeia superimposed on the pictures and as written language within the picture (e.g. inscriptions on signs; cf. Sanchez-Stock-hammer 2012: 58–59). Combinations of punctuation marks were also encoded – notably suspension dots, which can also be considered a complex punctuation mark. Neither emoticons (e.g. < :-) >) nor obscenicons (e.g. <!?#*&>) as emotion-ally loaded combinations of punctuation marks occurred in the dataset.6 Non- linguistic semiotic means (such as the shapes of bubbles used to indicate that their content is spoken, thought, shouted etc.) were not taken into consideration, either.

The corpus of academic texts AcadText was compiled specifically for the present study. It contains three research articles from high-quality journals: one theoretical text (Schneider 2003), one empirical study (Juhasz et al. 2003) and one text by Biber and two co-authors, namely Susan Conrad and Randi Reppen (Biber et al. 1994).7

Following the same approach as in the compilation of the comic corpus wher-ever possible, all full sentences (including footnotes) and tables were taken from

6 While the absence of emoticons can be explained by the fact that the multimodality of comics permits the representation of facial expression in a more detailed manner by the drawn faces of the interlocutors, the absence of obscenicons from the corpus is presumably due to chance. However, since the expression of anger in comic strips seems to use mainly question marks and exclamation marks from the set of the punctuation marks, while frequently using symbols (e.g. <@>, <#>, <$>, <%>, <&> and <*>) and also drawings of spirals etc. (cf. Law 2010), the treatment of obscenicons belongs into the periphery of the use of punctuation marks anyway.7 Since academic English is a register with a particularly strong lingua franca element and since all articles in AcadText come from high-quality journals and have consequently undergone in-tense editing, the native language of the authors was expected to play only a marginal role. While the individual author Schneider has a German-language background, either all or the majority of the authors of the jointly written articles were working at universities in English-speaking countries at the time of publishing.


the first two pages with numbers ending in zero from each article. End-of-line hyphens were deleted and m-dashes flanked by spaces. Word-internal brack-eting, e.g. in (semi-)automatic, was deleted so as not to skew the automated counts. While full stops, question marks and quotation marks counted as sen-tence endings, colons and semicolons were considered sentence-internal. Head-ings and rows in tables counted as one sentence each. It becomes immediately obvious that the number of words per sentence is considerably larger in the aca-demic texts than in the comics.

Table 2: The Corpus of Academic Texts (AcadText)

Text Words Sentences Words per sentence

Biber et al. (1994) 892 35 25.49Juhasz et al. (2003) 1,037 40 25.93Schneider (2003) 1,103 25 44.12

Since language in comics is heavily constrained by spatial restrictions and mainly contains the written representation of spoken-style language from conversations between speakers, comics as a register should contrast with what is already known from previous research about more prototypically written registers – such as academic texts. From a statistical perspective, the hypothesis H1 is therefore that comics and academic texts should differ in their use of the punctuation marks. H0 is consequently that comics and academic texts do not differ in this respect.

In view of the assumed register-specifics, we can formulate the following more specific expectations regarding punctuation in comics: one may expect1. a relatively large proportion of question marks and exclamation marks

(due to the spoken character of this register)2. no quotation marks

(because direct speech is already marked as such by its inclusion in speech bubbles)

3. few commas (because the sentences in comics are presumably relatively short due to spatial restrictions)

4. few semicolons(for the same reason as for the commas)

5. few colons


(due to spatial restrictions and the fact that the speakers in a conversation are indicated by the pointed side of speech bubbles in contrast to usual scripted conversation)

6. fewer brackets than dashes(because these represent the most and least formal punctuation marks indi-cating parenthesis according to Seely 2007: 84)

7. a certain number of suspension dots (in order to permit longer sentences to continue in the following speech bubble).

By contrast, academic texts as a written register are expected to contain1. a very small proportion of question marks and exclamation marks

(due to the written character of this register)2. a certain proportion of quotation marks

(in order to mark passages that were taken over verbatim from another author)3. many commas

(because the sentences in academic texts are presumably relatively long due to the complexity of the subjects treated)

4. many semicolons(for the same reason as for the commas)

5. many colons(because these provide links between sentences and are also used to refer to precise pages in references)

6. more brackets than dashes(because these represent the most and least formal punctuation marks indi-cating parenthesis according to Seely 2007: 84)

7. a few suspension dots(signalling omission in quotations).

For the quantitative analysis of the punctuation marks, all letters and numbers in the original corpus texts were deleted, and the punctuation marks were counted semi-automatically by using the “replace” function in Microsoft Word. The results in Table 3 were normalised by dividing the absolute results by the number of words in the respective texts, then multiplying them by a thousand (in order to increase readability) and finally rounding them up or down to yield full numbers.


Table 3: Normalised results (divided by the number of words per text, multiplied by 1,000 and rounded)

Comics Academic texts

Batman Super-man

Uncle Scrooge

Biberet al.

Juhaszet al.

Schnei-der

Full stops 78 50 4 53 69 24

Question marks 20 22 14 0 1 0

Exclamation marks 53 36 134 0 0 0

Commas 60 62 37 62 72 71

Semicolons 0 0 0 4 1 2

Colons 0 1 0 1 0 11

Dashes 16 3 0 1 0 1

Slashes 0 0 0 4 0 0

Suspension dots 40 43 18 0 0 1

Single quotation marks (pairs) 0 0 0 3 0 9

Double quotation marks (pairs) 2 5 1 0 0 1

Round brackets (pairs) 0 0 0 18 41 12

Square brackets (pairs) 0 0 0 0 0 0

Apostrophes 58 43 71 1 4 5

For each line (i.e. for each punctuation mark), shaded cells indicate intra-group similarity and inter-group dissimilarity between comics and academic texts. This is either based on a very obvious difference in the results (e.g. for the suspension dots) or, in some cases, on the presence of at least two values larger than zero in one type of register as against all-zero in the three texts from the other register (e.g. for the semicolons).

Note that the number of quotation marks and brackets corresponds to the number of pairings of these punctuation marks. This is because it obligatorily takes two exemplars to set off parentheses – in contrast to dashes or commas, which may open a parenthesis closed by the final punctuation mark in a sen-tence, e.g. a full stop (cf. Lampert 2011: 91–92). While an alternative single-punc-tuation-mark use of brackets can be imagined, namely when a single closing bracket is employed to set off the introductory ordering letters in lists, such as


a) xxb) yyc) zz,

the fact that this type of usage did not occur in the corpus made it unnecessary to establish a more detailed distinction here. If the results from Table 3 are analysed in relation to the hypotheses formulated above, the following findings emerge:

(i) As expected, there is a marked difference in the use of question marks and exclamation marks in comics and academic texts: only one academic text contains a single question mark at the end of the sentence

(12) What function do beginning and ending lexemes assume in compound recognition?

and no text from this register uses any exclamation marks. This is in line with the usual correlation of these two punctuation marks with conceptually spoken language: all the comic texts contain both question and exclamation marks, although the proportion varies considerably, with results ranging from 14 to 134 instances.

(ii) The discussion of quotation marks requires a distinction between single and double quotation marks. As for the distribution of the single quotation marks, their analysis made it necessary to distinguish manually between single quotation marks and the formally identical apostrophes. Since apostrophes are word-inter-nal punctuation marks, they were only included in the analyses because of this necessary distinction, but they actually yielded interesting results: while both academic texts and comic texts contain a small number of stylistically neutral genitives (4 in Superman, 3 in Batman, 2 in Uncle Scrooge), the majority of the large amount of apostrophes in the comic texts either marks informal contrac-tions (e.g. won’t) or omissions or shortenings characteristic of informal language usage, e.g.

(13) With a swoop to his left an’ a peck to th’ right, he catches rat finks way out west!

However, it seems that there is currently a tendency for an increasing number of academic texts to use contractions, too, e.g. Moore and Notz (2006: 236, Let’s) or Mithun (2012: 53, I’m).

No pairs of single quotation marks were used in the comic texts, as expected, but they occasionally occur in the academic writing (12 pairs in two texts). This result may also be variety-dependent to a certain extent: according to Seely (2007: 60–62), there is a tendency for British English usage to prefer single quotation marks over double quotation marks, whereas American English has the opposite tendency – codified e.g. in The MLA Style Manual (Achtert and Gibaldi 1985: 80).


Note, however, that the article by Schneider, which uses single quotation marks, appeared in Language, which is an American journal.

The analysis of the article by Biber et al. beyond the passage included in the corpus shows that a considerable proportion of single quotation marks enclose no quotations but paraphrases of meaning, e.g. in

(14) an analysis of adjectives marking ‘certainty’

or words which are used metalinguistically, e.g.

(15) any global characterizations of ‘General English’ should be regarded with caution

Contrary to expectations, double quotation marks are almost nonexistent in the AcadText corpus, with only one pair in one text:

(16) we need to remember that ‘nations are mental constructs, “imagined communities” ’ which are constructed discursively […] (Wodak et al. 1999:4).

and it becomes clear that these are merely used to mark quotation marks within a quotation whose reference is given later in the text; the convention being that single quotation marks are doubled in this case and vice versa (cf. Achtert and Gibaldi 1985: 80; Sanchez-Stockhammer, forthcoming).

While this quasi-absence of double quotation marks from AcadText may be attributed to the small size of the random sample or the conventions of individual publishers, chance cannot explain the other unexpected finding, namely the rel-ative frequency of double quotation marks in the comic corpus (8 pairs; at least one per text). Since direct speech is already marked as such by its inclusion in speech bubbles, the double quotation marks must have a different function here: indeed, the quotation marks in the comics are used in their general (academic) function and serve to quote the speech of others. Thus the utterance

(17) Maybe next time, master Bruce.

is countered by

(18) Not “maybe”, Alfred.

Double quotation marks are also employed in the comics to refer to the metalin-guistic use of words, e.g.

(19) Funny, I didn’t think you even knew the word “honest,” Penguin.


In Superman, double quotation marks are additionally used on some occasions in narrative boxes to indicate the direct speech or thought of a character not shown in the current panel itself, but whose identity can be deduced from context or from the fact that suspension dots are linking the end of an utterance marked with quotation marks to its beginning in a panel on the previous page (cf. below).

(iii) Contrary to expectations, no marked difference was observable in the use of commas: while the figures are lower for comics overall, they are still sur-prisingly close to the results obtained for the academic texts. However, a more detailed text-based analysis reveals that commas are mainly used with very spe-cific functions in comics: a very large proportion separate off proper nouns with vocative function from the remainder of the sentence, e.g. in

(20) Toyman, you maniac!

This use is completely missing in the academic texts. Alternatively, commas occur after introductory interjections in the comics, e.g. in

(21) Man, would you look at THAT!

in another use that was not found in the academic writing. These register-spe-cific uses explain why commas occur relatively frequently in the comic texts. The most frequent use of commas in comics which is also to be expected in academic texts (but is not too frequent in the sample) is the delimitation of sentence-initial adverbials, e.g. in

(22) According to the contract, they are RABBIT eggs for your children, King!

(iv) Semicolons, by contrast, only occur in the academic writing, e.g. in Schneider (2003):

(23) traces of the previous stage will still be found; that is, some insecurity remains

Since they are absent from the sample of comic texts – presumably due to the fact that most of their uses require relatively long sentences – they can generally be used as an indication of register with regard to the spoken/written dimension.

(v) Surprisingly, it was observed that the amount of colons does not vary extremely between the comics and the academic texts considered. Merely Schneider (2003) stands out, since it is the only one among the three academic texts to indicate the precise pages in text-internal references that do not affect quotations.

(vi) Neither sample contained any square brackets. As expected, not a single pair of round brackets was used in the comic corpus – in contrast to the academic

texts, where brackets are commonly used to indicate references. The extremely large proportion in Juhasz (2003) with 41 pairs of round brackets is due to the fact that a large part of the passage randomly included in the AcadText corpus is constituted by the results section, in which relevant figures and examples are added in brackets, e.g. in

(24) high-frequency beginning lexemes were responded to quicker than low-frequency begin-ning lexemes, t1(27) = ± 3.78, p < .01, t2(18) = ± 2.02, p = .059 .

While the quasi-absence of dashes from the academic papers in contrast to a larger proportion in CoCo seems to support the view that there is a difference in formality between these two punctuation marks, the quantitative difference is not as marked as one might have expected. Furthermore, the analysis of the texts reveals that dashes are frequently used in consecutive pairs in the Batman comics and also in Superman, which raises the number of dashes. In many cases, the combination < -- > seems to indicate a longer pause, e.g. in

(25) But her insides are all right -- no bleeding there.

This use of dashes represents a function which is not usually required in aca-demic texts.

(vii) The difference in frequency between the use of suspension dots in comics and academic texts is far more pronounced than expected: the only aca-demic text using them is Schneider (2003) in one instance where omission in a quoted passage is indicated:

(26) ‘the discursive constructs of nations and national identities … primarily emphasize national uniqueness and intra-national uniformity but largely ignore intra-national dif-ferences’ (Wodak et al. 1999:4).

This is a use which is highly unlikely to occur in comics. However, the low fre-quency of suspension dots in the sample of academic texts seems to suggest that quotations are usually extracted in shorter portions and that omissions are avoided. This is supported by the quotations in AcadText, all of which represent extracts from individual sentences only, e.g. the following series of quotations from Schneider (2003):

(27) a case of ‘identity revision’ triggered by the insight that one’s traditional identity turns out to be ‘manifestly untrue’ or at least ‘consistently unrewarding’ (Jenkins 1996:95)


Comics, by contrast, use suspension dots very frequently (all texts employ them between 18 and 43 times) and often in order to create cohesion by their occur-rence not only at the end of an utterance which is interrupted in one panel, e.g. in

(28) You might be stronger and faster than I am right now, Parasite…

but also at the beginning of the continued speech or thought in the next panel:

(29) …but you’ve barely had forty-eight hours to practice using my powers.

Such interruptions are not merely attributable to spatial restrictions, it seems, but also to the fact that the picture in the new panel corresponds more closely to the action indicated in the second part, such as a punch with a fist in the Superman example above.

The differences between the use of punctuation marks in the texts from the comic corpus and the academic texts are even more striking if considered graphically. Figure 4 summarises the features which are characteristic of comics (question marks, exclamation marks, suspension dots and apostrophes); Figure 5 those which are more typical of academic writing (semicolons, single quotation marks and round brackets).

Figure 4: Punctuation marks occurring more frequently in comics than in academic texts

It may therefore come as a surprise that this striking difference between the two registers cannot be backed statistically: non-parametric statistical tests for independent samples were carried out in SPSS in order to compare the medians between groups (i.e. comics vs. academic texts), but even the Mann–Whitney U


test yielded no significant results for any of the variables (e.g. question marks) due to the small number of texts considered. Nonetheless, the graphically imme-diately obvious difference between comics and academic texts in Figures 4 and 5 permits the tentative conclusion that the use of punctuation in different registers can be employed as a register feature. At the same time, these results call for further empirical research, which is extremely likely to provide statistical backing for the more than obvious tendencies observed in this explorative study.

Figure 5: Punctuation marks occurring more frequently in academic texts than in comics

4 ConclusionPunctuation is a completely underresearched feature in register studies at the time of writing: thus Barbieri’s extensive annotation of major register and genre studies in Biber and Conrad’s Appendix A (2009: 271–295) does not mention punctuation a single time in the column “features under investigation”. It is only in Barbieri’s summary of Crystal’s (2001) major findings that there is a minor ref-erence to it, when “minimal punctuation” is found to be one of the “common characteristics of internet registers” (Biber and Conrad 2009: 289).

However, the empirical analysis of two register-specific corpora in the present study – one of comics and one of academic texts – suggests that certain types of punctuation tend to occur more frequently in certain types of register and that punctuation can therefore be employed as an indication of register. For instance, some punctuation marks correlate strongly with spoken and written style respec-tively and barely occur in the contrasting register. While question marks, excla-mation marks, suspension dots and apostrophes are far more frequent in comics


than in academic texts, the latter use a larger proportion of semicolons, single quotation marks and round brackets. Furthermore, even in those cases where the results are similar from a quantitative perspective, differences in usage emerge upon closer consideration: for instance, comics tend to use commas after intro-ductory interjections or proper nouns with vocative function, whereas academic texts make more varied use of that punctuation mark. Further research into this topic is required to establish the register-distinctive functions of the punctuation marks in more detail and for a larger number of registers.

Biber’s distinction between different registers is “based on the premise that most formal differences reflect functional differences” (Biber 1995: 136). None-theless, he claims that his multidimensional approach differs from the studies of his predecessors in that he does not conduct a functional analysis in the first place so as to identify characteristic linguistic features. Instead, he states that he “first identifies groups of co-occurring features and subsequently interprets them in functional terms” (Biber 1988: 24). While this seems to contradict an approach such as the one used in the present study at first sight, one should not forget that Biber’s analyses presuppose a list of linguistic features which were then subjected to statistical analyses. Taking into account that he reviewed “previous research to identify potentially important linguistic features” in his preliminary analysis (Biber 1988: 64) and that these are understood as features “that have been associated with particular communicative functions and therefore might be used to differing extents in different types of text” (Biber 1988: 71–72), it becomes clear that he is not correlating random phenomena but only the results of previ-ous functional analyses – even if these were carried out by other researchers. In this sense, the present study can be regarded as a legitimate suggestion for the extension of the original model.

Within such a framework, punctuation is on a level with the 15 other major categories such as “Special features of conversation” (Biber and Conrad 2009: 82). “Punctuation” is thus tentatively suggested as category 16 with the following subordinate features (some of which did not prove distinctive for comics vs. aca-demic texts but may play a more important role with regard to the differentiation between other registers):1. full stop2. question mark3. exclamation mark4. comma5. semicolon6. colon7. dash8. slash


9. quotation marks (single, double)10. brackets (e.g. round, square, angled)11. word-internal punctuation (apostrophes, hyphens)12. combinations of punctuation marks (e.g. suspension dots, emoticons).

In a very wide reading, the division of a text into paragraphs could also be con-sidered as punctuation (cf. Huddleston and Pullum 2002: 1725). According to Nunberg (1990: 17), “punctuation must be considered together with a variety of other graphical features of the text, including font- and face-alternations, capi-talization, indentation and spacing”, all of which are said to fulfil a similar func-tion. To this can be added the use of italics and bold print. At first sight, these features seem to go beyond the purely linguistic means and to unduly emphasise the visual and multimodal aspect of written language – but they sometimes find a correspondence in spoken language in pauses, stress, intonation etc., even if it is not completely systematic (cf. above).

What makes the proposed category 16 special is the fact that the register fea-tures listed therein are not lexico-grammatical, like the other features included in Biber’s models up to the time of writing. Some of the punctuation features correlate with lexico-grammatical features (e.g. question marks with syntactic questions), which are in turn typical of specific registers (e.g. conversations). However, this does not mean that punctuation is a secondary register feature. Many other punctuation marks correlate with more abstract categories; e.g. quo-tation marks with quotations, which may take practically any lexical or syntactic form. Furthermore, it is normal that “linguistic features co-occur in texts because they reflect shared functions” (Biber 1995: 30). This does not necessarily imply that one should receive more weight than the other. As a consequence, punctua-tion is considered a register feature in its own right.

In 1988, Biber (71–72) states for register analysis that “the goal is to include the widest possible range of potentially important linguistic features”. The empirical analysis presented here clearly suggests punctuation as such a feature. However, the proposed addition of punctuation to the set of categories is not to be regarded as any form of criticism of the original model, but merely as the suggestion of a valuable category to add to the long list of previously used features.

5 ReferencesAchtert, Walter S. & Joseph Gibaldi. 1985. The MLA style manual. New York: The Modern

Language Association of America.


Arendholz, Jenny, Wolfram Bublitz, Monika Kirner & Iris Zimmermann 2013. Food for thought – or, what’s (in) a recipe? A diachronic analysis of cooking instructions. In Cornelia Gerhardt, Maximiliane Frobenius & Susanne Ley (eds.), Culinary linguistics: The chef’s special, 119–137. Amsterdam: Benjamins.

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.

Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.

Biber, Douglas. 2006. University language: A corpus-based study of spoken and written registers. Amsterdam: Benjamins.


Bloomfield, Leonard. 1933. Language. New York: Holt.Booth, Wayne C., Gregory G. Colomb & Joseph M. Williams. 2008. The craft of research. 3rd edn.

Chicago: University of Chicago Press.Crystal, David. 2001. Language and the internet. Cambridge: Cambridge University Press.Halliday, Michael A.K. 1978. Language as social semiotic: The social interpretation of language

and meaning. London: Arnold.Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English

language. Cambridge: Cambridge University Press.Jakobson, Roman. 1985. Closing statement: Linguistics and poetics. In Robert E. Innis (ed.),

Semiotics: An introductory anthology, 145–175. Bloomington: Indiana University Press.Lampert, Martina. 2011. Attentional profiles of parenthetical constructions: Some thoughts

on a cognitive-semantic analysis of written language. International Journal of Cognitive Linguistics 2(1). 81–106.

Lampert, Martina. 2013. Say, be like, quote (unquote), and the air-quotes: Interactive quotatives and their multimodal implications. English Today 29(4). 45–56.

Law, Gwillim. 2010. Grawlixes past and present. http://www.statoids.com/comicana/grawlist.html (accessed 15 July, 2014).

Levinson, Stephen C. 1983. Pragmatics. Cambridge: Cambridge University Press.Meyer, Charles F. 1987. A linguistic study of American punctuation. Frankfurt am Main: Peter

Lang.Mithun, Marianne. 2012. The deeper regularities behind irregularities. In Thomas Stolz et al.

(eds.), Irregularity in morphology (and beyond), 39–59. Berlin: Akademie.Moore, David S. & William I. Notz. 2006. Statistics: Concepts and controversies. New York: W.H.

Freeman.Nunberg, Geoffrey. 1990. The linguistics of punctuation. Menlo Park, CA: CSLI.Patt, Sebastian. 2013. Punctuation as a means of medium-dependent presentation structure in

English: Exploring the guide functions of punctuation. Tübingen: Narr.Peters, Pam. 2004. The Cambridge guide to English usage. Cambridge: Cambridge University

Press.Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive

grammar of the English language. London: Longman.Rosch, Eleanor. 1973. On the internal structure of perceptual and semantic categories. In

Timothy E. Moore (ed.), Cognitive development and the acquisition of language, 111–144. New York: Academic Press.


Rosch, Eleanor. 1975. Cognitive representations of semantic categories. Journal of Experimental Psychology, General 104(3). 192–233.

Runkel, Philip Julian & Margaret Runkel. 1984. A guide to usage for writers and students in the social sciences. Towota, New Jersey: Rowman & Allanheld.

Sanchez-Stockhammer, Christina. 2012. Comicsprache – leichte Sprache? In Daniela Pietrini (ed.), Die Sprache(n) der Comics, 55–74. Munich: Meidenbauer.

Sanchez-Stockhammer, Christina. Forthcoming. The transformative power of copying in language. In Corinna Forberg & Philipp W. Stockhammer (eds.), The transformative power of the copy: A transcultural and interdisciplinary approach. Heidelberg: Heidelberg Publishing.

Searle, John. 1975. Indirect speech acts. In Peter Cole & Jerry L. Morgan (eds.), Syntax and semantics. Vol. 3: Speech act, 59–82. New York: Academic Press.

Seely, John. 2007. Oxford A–Z of grammar and punctuation. Oxford: Oxford University Press.Snider, Neal. 2009. Similarity and structural priming. In Niels Taatgen & Hedderik van Rijn

(eds.), Proceedings of the 31st annual conference of the Cognitive Science Society, 815–820. Austin, TX: Cognitive Science Society.

Söll, Ludwig & Franz Josef Hausmann. 1985. Gesprochenes und geschriebenes Französisch. 3rd edn. Berlin: Erich Schmidt.

Swales, John M. & Christine B. Feak. 2010. Academic writing for graduate students: Essential tasks and skills. 2nd edn. Ann Arbor: The University of Michigan Press.

Trudgill, Peter. 2000. Sociolinguistics: An introduction to language and society. 4th edn. London: Penguin.

Wardhaugh, Ronald. 2002. An introduction to sociolinguistics. 4th edn. Oxford: Blackwell.

Corpora:

Comic Corpus (CoCo):

Re-print: Englisch lernen mit Batman. Bad Guys Gallery. 2007. Munich: Berlitz.Re-print: Englisch lernen mit Superman. Up, up and away! 2007. Munich: Berlitz.Walt Disney’s Uncle $crooge. No. 376. April 2008. York (PA): Gemstone.

Corpus of Academic Texts (AcadText):

Biber, Douglas, Susan Conrad & Randi Reppen 1994. Corpus-based approaches to issues in applied linguistics. Applied Linguistics 15. 169–189.

Juhasz, Barbara, Matthew S. Starr, Albrecht W. Inhoff & Lars Placke 2003. The effects of morphology on the processing of compound words: Evidence from naming, lexical decisions and eye fixations. British Journal of Psychology 94. 223–244.

Schneider, Edgar. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79. 233–281.

Martina LampertLinking up register and cognitive perspectives: Parenthetical constructions in academic prose and experimentalist poetry

Abstract: This paper will explore the possibility of linking up Biber’s register analysis and Talmy’s cognitive semantics, based on the assumption that some fundamental cognitive principles inform situational features and hence would, in part, determine linguistic characteristics. As one case in point, two samples of parenthetical constructions from opposite written registers, academic science writing and minimalist poetry, are scrutinised in an initial qualitative analysis. The study identifies both a general structural and functional similarity in the examples selected for illustration, suggesting that no significant register distinc-tion will ensue, while the parenthetical pattern is likely to exhibit a substantial cross-medial difference between speech and writing. These preliminary findings invoke properties of the human cognitive architecture as well as evolutionary spe-cifics of the language modalities as critical parameters of influence and would speak for their recognition as potential determinants of register and, in turn, for a principled compatibility of the two linguistic approaches.

1 IntroductionIn this paper, I will present some arguments for linking up Douglas Biber’s regis-ter analysis with a recent (re)conceptualizion of register as a cognitive construct framed in Leonard Talmy’s cognitive semantics, suggesting that the traceable principled compatibility of these two major approaches to linguistic analysis might open up some promising insights.

In his forthcoming The Attention System of Language1, Talmy advances the view that register, generally couched in terms of “types of speech situations”,

1 As always, I am grateful to Len Talmy for the privilege of granting me access to a very substan-tial current draft version of this forthcoming book; unless otherwise indicated, all quotes are from this work, and the references to this unformatted draft lack page numbers.

Martina Lampert, Johannes Gutenberg University Mainz

170 Martina Lampert

may allow for a consistent re-analysis as speaker attitude, for instance, “toward [a lexical item’s] core meaning itself; toward the speech participants (the speaker himself, the addressee, or the relation between the two); or toward the current circumstance”. That is, in a cognitive semantics perspective, register distinc-tions would become conceivable as backgrounded speaker role, or attitude, for that matter, which are introjected into the minds of participants, thus inevitably involving attention and memory as relevant cognitive categories. To illustrate:

what might best be treated at root as a speaker’s attitude of respect toward the addressee – or a speaker’s attitude of solemnity about the circumstance – could also be interpreted as the presence of a formal situation that triggers the use of a formal register.

The fundamental significance of register for any appropriate analysis of any lin-guistic item that surfaces in Talmy’s explication ties in with Biber’s belief that “all linguistic descriptions”, such as, for instance, “collocational studies of par-ticular words […] must include consideration of register differences as a central organizing parameter, if they hope to achieve an accurate account of the patterns of use” (Gray 2013: 361). Accordingly, “register differences should be an essen-tial component of any investigation of language use” (Gray 2013: 369). These two statements, then, concur on the view that, in general, any linguistic construction inheres a register ‘signature’.

Moreover, Biber’s and Talmy’s approaches might in fact be read as suggestive of such link-up, precisely as they are seen to converge in acknowledging the major role of both medial and cognitive determinants of linguistic patterns: introjected in participants’ minds, cognitive parameters appear to effectively constrain perti-nent situational characteristics, as, e.g., Biber’s (1988: 160) remark tracing medi-al-distinctive effects back to “different cognitive constraints on the speakers and writers” unambiguously demonstrates – apart from and additional to the hard-wired effectors of the medium and the tangible properties of the setting in their specific interdependence. Capitalizing on their essentially evolutionary ‘design’, Talmy (2007b) furthermore recognises the prime significance of the options and constraints of both the production and reception circumstances, while attention proves the single most decisive determinant among the situational specifics in communicative interactions to shape a linguistic item’s representational format and its functional potential.

As a case in point, I will focus on a much neglected though highly pervasive phenomenon in language – what I have suggested to call parenthetical construc-tions (cf. Lampert 1992: 16 and chapter 2 below). To give a cursory impression of the pattern’s range in structural variability, the following examples, exclusively from academic writing, are in order. It should be noted that they are all in line

Linking up register and cognitive perspectives 171

with the formal prototype, as demarcated by parentheses in the schematic illus-tration (7) below. Examples (1) and (2) are taken from Nunberg (1999) and demon-strate a typical sub-clausal as well as an alleged marginal sentential instance. The two sub-morphemic exemplars (3) and (4), found in titles of scholarly arti-cles, are likewise deemed to be peripheral members of the category, while (5) and (6), retrieved from the academic sub-corpus of the COCA, testify to the principled unconstrainedness of the format even in the formal register.

(1) Yet for all these changes, there is a continuity here, too, in the way that change is (sometimes heatedly) debated and (sometimes grudgingly) accommodated.

(2) And there is a large number of common words for talking about the language itself, for example slang, usage, jargon, succinct, and literate. (It is striking how many of these words are particular to English. No other language has an exact synonym for slang, for example, or a single word that covers the territory that literate covers in English, from “able to read and write” to “knowledgeable or educated”.)

(3) Robertson, John M., Chi-Wei Linn, Joyce Woodford, Kimberly, K. Danos, and Mark A. Hurst. 2001. The (Un)Emotional Male: Physiological, Verbal, and Written Correlates of Expressiveness. The Journal of Men’s Studies 9, 393–412.

(4) Widdowson, Peter. 1990. W(h)ither English? In Martin Coyle, Peter Garside, Malcolm Kelsall & John Peck (eds.), Encyclopaedia of literature and criticism, 1221–1236. London: Routledge.

(5) He took pianists, guitarists and harpists in stride, but expressed shock at “13 young lady violinists (!), 1 young lady violist (!!), 4 violoncellists (!!!) and 1 young lady contra-bassist (!!!!).

(6) While ego orientation did not emerge as a significant predictor of likelihood to aggress in any of the three groups, significant correlations were found between ego orienta-tion and likelihood to aggress for boys, r (????) =.20, p <.005, and girls in the all-girls league, r (???) =.40, p <.005.

Along the general lines sketched in the introductory paragraphs of this paper, I will thus probe into parenthetical constructions’ common cognitive basis, arguing that attention direction turns out to be a relevant consistent principle for the explanation of parenthetical constructions’ usage profile, which would then have to be added, as a principal determinant of the participants’ cognitive make-up, to the list of situational features defining a register (cf. Biber and Conrad 2009: 40; for a similar suggestion, though more global and including punctuation marks in general, see Sanchez-Stockhammer, this volume).

Why parenthetical constructions – and why attention? Apart from the general neglect of attentional effects as ubiquitous phenomena in language (cf. Lampert 2009: 20–25), the pattern has somehow – vaguely, intuitively, anecdotally – been associated with reduced attention and, in consequence, been dismissed as an informational and textual ‘aside’ at least since the beginning of research on par-enthetical constructions in Schwyzer’s (1939) seminal study. The central issue is,

172 Martina Lampert

however, whether it is justified to generalise over an attenuation effect as a dis-tinctive characteristic in the construction’s spoken realisation in the first place2 and, further, unaltered, to its functional equivalent in the written mode, thus tacitly presupposing lowered salience as an unequivocal property of parentheti-cal constructions across the board.

My key objective, then, is first and foremost a conceptual concern: elabo-rating on two previous studies (Lampert 1992, 2011), I will outline, in an initial (qualitative and microscopic) analysis, a common usage profile of parenthetical constructions, which might ultimately be added to the list of linguistic features offered in Biber and Conrad (2009: 78–82), paying due respect to constraints exerted by situational characteristics that give rise to modality-sensitive register variation.

Beyond this proposal, I will, however, address a ‘classical’ target of register analysis – the options and constraints of medium-specific properties that give rise to parenthetical constructions’ register profile. Repeatedly, Biber has drawn attention to the prime significance of mode-induced variation, qualifying it as one likely candidate “for universal parameters of register variation: a dimension associated with oral versus literate discourse” (Gray 2013: 367), as it is this “oral/literate opposition […] which emerges as the very first dimension in nearly all MD studies” (Gray 2013: 367). And it is particularly the language-external medial properties of the articulatory-organic-auditory and motor-instrumental-visual channels (cf. Bredel 2008: 11) underlying this dimension that turn out to be the major determinants of variation also in this case, setting parenthetical construc-tions in the spoken mode apart from their written/printed counterparts.

To provide some evidentiary support for my line of argument, I will confine my analyses to printed text, and to the construction’s presumed representational and positional prototype,3 as illustrated through its generalised schematic tem-plate:

(7) xxxxx (xxxxx) xxxxx

In view of mode-specific differences as situational variants, I have selected as test cases for illustration two random samples, representing two extreme written reg-isters, science writing and experimentalist poetry, which quite reasonably may

2 Some initial empirical evidence challenging this view was offered in a plenary talk at an inter-national workshop on “Cognitive Motivations of Second(ary) Voices: A Multimodal Perspective on Parentheses and Quotations” (Bamberg 12/06/2014).3 In this case study, I will disregard, for space limitations, dashes or commas as principal com-petitors, which exhibit some distinctive constraints on the syntactic patterns they tolerate.


be seen to occupy the opposite ends of a conceived register continuum (cf. my analysis in Section 6). As a general caveat, I am all too aware of the obvious limi-tations of this sketch. Nevertheless, this study, which suggests a form-to-function correlation on a minor parameter, explicitly draws on the strengths of such func-tional analysis as an essential component in any register analysis (cf. Biber 1988: 52–53, 55 and 62), especially regarding the identification of individual functions of textual dimensions that are potentially relevant to register distinctions.

The structure of my paper is then as follows: in Section 2, I will briefly intro-duce the attention ‘theme’ vis-à-vis parenthetical constructions, in fact as a long-standing issue, from the traditional point of view; Section 3 will then spot major correspondences between the situational factors of register analysis and cognitive parameters pertaining to attention as advanced in cognitive semantics. Section 4 will detail some relevant mechanisms underlying parenthetical con-structions in the spoken mode, which then serves as a basis for the attentional profile of its written counterpart elaborated on in Section 5. Probing into the sample illustrations, Section 6 will scrutinise their register-specific commonali-ties and differences, and the final section addresses some critical issues in view of a more descriptively adequate analysis of parenthetical constructions following from the proposed link-up of register analysis and cognitive semantics.

2 (In)Attention for Parenthetical ConstructionsTo begin with, a note on terminology is in order: as is evident from the above template, alphabetical orthographic systems have conventionalised figural ele-ments, typically the pairwise occurring delimiters, “with distinct opening and closing characters” (Huddleston and Pullum 2002: 1731) and originally pertaining to a non-alphanumeric representational system. To avoid any ambiguities asso-ciated with parenthesis, which is found to refer to both the figural markers and the overall pattern, I will instead borrow Lennard’s (1991: 1) term lunulae as an unequivocal label for the crescent-shaped round brackets that set a (sequence of) alphabetical elements (graphemes, numerals, punctuation marks) off from their linguistic environment (cf. also Brown 2009 in his introductory paragraph, “Some terms”, who follows Lennard’s suggestion). Parenthetical construction will generally serve as a cover term for the whole structural ensemble instantiating the concept parentheticity, and parenthesised sequence is used whenever it appears relevant to make exclusive reference to its ‘content’ (cf. Lampert 2011 for some details).

174 Martina Lampert

In stark contrast to parenthetical constructions’ pervasiveness and structural versatility in the written medium in general, and in the more formal registers in particular,4 the absence of any in-depth study on the range of both their struc-tural variability and functional potential manifests a general inattention to par-entheticity as an object of research in its own right. This observation may seem quite iconic to the pattern’s presupposed communicative function as conveying a secondary, defocussed and/or incidental aside to the allegedly primary, true and essential message of the non-parenthesised text. Typically confined to the sentence as their host domain in both pertinent reference grammars but also in most linguistic research, parenthetical constructions, especially in the written modality, thus prove under-researched, or effectively un-researched, in (recent) linguistics. If they have become the topic of current research at all, it has been with exclusive reference to the spoken modality (cf., e.g., Dehé 2014, the most recent publication).

It may be interesting to note, however, that attention has been invoked as a framing concept ever since Eduard Schwyzer’s (1939) study, which is arguably the first serious (cross-linguistic) investigation of what may be referred to as the parenthetical construction. Remaining entirely intuitive and vague, attention appears to echo William James’ (1950/1890: 403) famous dictum dismissing the notion only suitable as a presupposed allusion at best: “Every one knows what attention is.” Schwyzer (1939: 32–33) qualifies parenthesised sequences as ‘aside meanings’ (Zwischengedanke or Nebengedanke) and thus perpetuates, in fact, a long-cherished bias advanced and actually codified, for the English language, at the latest in the Late Modern English grammarians’ accounts (cf. Lennard 1991: 84–113). “Alien” to the primary layer of information and incidental to the current (text) topic, parenthetical constructions “disrupt” both the syntactic structure and the line of argumentation of their environment. Hence, in rhetoric or stylis-tics and in the notorious usage guides, but also in some grammars, parentheti-cal constructions are usually considered as either undesirable or as negatively connoted meaningless fillers or wilful digressions – as the “obstinate” title But I Digress of Lennard’s remarkable study on The Exploitation of Parentheses in English Printed Verse indicates. They are widely conceived to testify to authors’

4 It comes as no surprise that register-specific details are neither available on the total frequen-cies nor the relative proportions of parenthetical constructions; however, on a cursory and infor-mal inspection, frequencies of occurrence and especially variation in structural complexity seem to increase toward the written end of a conceived spoken-written continuum, i.e., those registers that are at a considerable distance to casual and spontaneous conversation where recurrent for-mulaic patterns like comment clauses dominate.


caprice and/or lack of clarity or perspicuity in organizing their texts, a stereotype that is already present in Schwyzer (1939: 5–7 and 27) and has survived until this day, testifying to a poor understanding, if not an actual misconception, of the pat-tern’s functional versatility in terms of sophisticated information management and elaborate discourse structuring.

Along the same lines, major current reference grammars of English locate the content(s) of parenthetical constructions “in the shade as background”, as “addi-tional” and “related”, providing “supplementary information” which is “not part of the main message” (Biber et al. 1999: 137). And, iconically, the pertinent termi-nology appeals to the concept of (less or lesser) attention: parenthetical construc-tions qualify as peripheral elements of clause grammar (cf. Biber et al. 1999); or when specified in terms of the syntactic patterns they allow, comment and com-plement clauses along with appositive or non-restrictive modifiers are the catego-ries regularly reoccurring in the literature. In more idiosyncratic, and at the same time presumably more general terminology, parenthetical constructions emerge as non-dependent, disintegrated supplements (cf. Huddleston and Pullum 2002: 1350) – all of which imply the connotation of minor (structural) relevance.

I will, however, (hope to) demonstrate that this received view, which une-quivocally associates the parenthetical pattern with “only” incidental or back-ground information of low conceptual import, may well derive from, or be attrib-uted to, a general misconception tacitly presupposing a “simple” equivalence of the two language modalities: following from a deep-rooted structuralist bias toward or ideology of the spoken language as the primary and “true” medium of communication (cf., e.g., Biber 1988: 5–9 and, very pronouncedly with reference to punctuation, Nunberg 1990: 1–7 or Bredel 2008: 2–11), the attenuating effect of parenthetical delivery is indiscriminately imposed on its written counterpart; and even this tacitly presupposed assumption fails to be confirmed by empirical evidence, at least in two different settings, reading out subclausal quotes in an experiment and public speeches (cf. Kasimir 2008 and Lampert 2014). Such an (over-)generalising approach to parentheticity, however, reveals a principled dis-regard of fundamental, though never simplistic and binary, characteristics intrin-sically associated with mode and medium as well as properties that derive from the human cognitive make-up (cf. Biber 1988: 22, 26, 160–161).

This rather deplorable state of the art may, in part, have been due to the lack of an adequate analytical tool that is sufficiently explicit to capture (all) relevant characteristics, calling on attention as a critical explanatory construct for paren-theticity as a linguistic category. Before Section 4 sketches the baseline of such an approach, the brief remarks to follow are intended to address some fundamental preconditions of writing, as they become manifest in situational characteristics and are identified in register analysis. Capitalising on an evolutionary argument,

176 Martina Lampert

some cursory notes on the options and constraints of the production and recep-tion circumstances are advanced from a cognitive semantics perspective, which might again support the sensibility of the cross-framework alignment proposed in this paper.

3 Situational context and cognitive determinantsRegister analysis, as an articulate perspective of linguistic practice, gives prece-dence to situational characteristics of linguistic events like participants, includ-ing their specific social relationships, the particulars of the mode, as well as the setting of the communicative interaction, paying due respect also for the import of communicative purposes and topics. These situational categories are “more basic”, since they “cannot be derived from any linguistic phenomena”, that is, they functionally govern the choice of medially admissible and physically possi-ble linguistic patterns, as their pervasive and conventionalised instantiations in a given context. As a result, “registers differ in their characteristic distribution” (Biber and Conrad 2009: 9) of a particular selection and pattern of lexical and grammatical features that emerge as common determinants of registers and sub-registers along continuous dimensions of linguistic variation (cf. Biber 1988: 9).

To comment on some basic determinants of register and outline the con-ceptual compatibility of register analysis and cognitive semantics as fundamen-tally comparative perspectives,5 I will spot the most relevant correspondences between situational and linguistic features in the permanent medium of print (cf. Biber 1988: 36–42; Biber and Conrad 2009: 40–47) as they become manifest in the samples selected for analysis: an experimental report from scientific writing, published online in the summer issue of Brain and Language 2013, and a spec-imen of experimentalist poetry, E. E. Cummings’ famous untitled poem of 1958, “i(a” (cf. Section 6).

Instantiating, as printed documents, the same physical mode, the poem features a single author, the American poet E. E. Cummings, while the scientific article is co-authored by five US scientists specialising in brain studies, neurol-ogy and (cognitive) neuroscience. As professionals in their respective field of expertise, they are likely to exhibit similar general social characteristics as their readers – a feature that is, however, less predictable for the poem. Both represent

5 For register analysis, see Biber (1988: 20) and Biber and Conrad (2009: 36); Talmy’s attention factors are essentially framed in terms of same- and cross-venue comparison.


unequivocal instances of texts with un-enumerated (typically) unknown readers, most likely without any significant amount of interaction, though addressors and addressees will share, to different degrees, specialist background knowledge (colleagues in the case of the academic text6 or ‘fans’ in the case of Cummings), whereas their relative social statuses would perhaps vary more with the poem. Likewise, the samples are presumably identical regarding their principled pro-duction and reception circumstances as planned, scripted, revised and (multi-ply) edited texts lacking any indication as to the actual extent of editing; also, the setting will not significantly differ between the two: the participants neither share time nor place, with the readers typically in private (though some public place is possible, as when reading the poem or the article in class) as well as in complete control over the text; and both samples feature the same specific setting as parts of a published book being relatively contemporary.

Regarding ideational properties, such as topic and communicative purpose, however, the two instances are significantly distinct: while the general and origi-nal purpose of the poem is entertainment with no further specification suggesting itself7, the article’s communicative intent may be specified – inform, describe and report; likewise, the factuality status proves a discriminating feature: an imagis-tic (rather than narrative) poem vs. a factual academic fragment of non-opinion-ated statements. And whereas the poem does not display any overt stance, the academic text features epistemic stance expressions, for instance, purpose and approach in the samples selected for illustration. The general topic, again, distin-guishes between the poem’s entertaining function through a picturesque image and the article’s scientific import, which may be specified as a report on a con-trolled experiment using an “electronic device designed to alleviate stuttering by manipulating auditory feedback via time delays and frequency shifts” (Foundas et al. 2013: 141).

Contrary, however, to their immediate and decisive impact on lexical choices, it should be noted that, perhaps against expectation, it is not “topical differ-ences [that] are […] influential for determining grammatical differences” (Biber and Conrad 2009: 46). As has been repeatedly emphasised, the key relevance in shaping the overall linguistic appearance is accredited to language-external factors, as “the pervasive grammatical characteristics of a register are mostly determined by the physical situational context and the communicative purposes”

6 Typically, academic prose is contextualised by shared background knowledge (cf. Biber 1988: 48).7 Disregarding some marginal cases as when the poem serves as an exercise in literary discourse or, like in the present context, register and attention analysis.

178 Martina Lampert

(Biber and Conrad 2009: 46; see already Biber 1988: 11 and 38 as well as very explicitly Nunberg 1990: 3–4, 7 and 14–15).

Regarding these two major determinants, I will, in the following, elaborate on the compatibility of Biber’s and Talmy’s approaches as they become relevant for the subsequent analysis of the samples: for a potential mapping of register analysis’ situational features, two factors suggest themselves in cognitive seman-tics.

First, Talmy (2007b) acknowledges the substantial import of the channel-re-lated situational features (including production circumstances as it were), which rank high as major determinants of linguistic variation. He in fact refers to the fundamental nature of the two modes’ production and reception circumstances inherited from evolution and giving rise to their characteristic modality-related reflexes – a view that would correspond to Biber privileging them as more deci-sive. More specifically, it is categorical physical differences in the representational format that essentially separate the analogous, coextensive and simultaneous spoken modality (which in principle allows for gradient and relative distinctions as in vocal dynamics) from the exclusively digital and discrete written system of representation. It disallows gradient and relative distinctions and is characteris-tically confined by two-dimensional space (see Section 5 below for some details on the constraints imposed by conventionalised print).

Second, in Talmy’s cognitive semantics, situational features of register analy-sis may be conceived as inbuilt in lexical items’ associated meaning sectors. To illustrate: participant-related characteristics like (encyclopaedic and shared) knowledge or epistemic, affective and attitudinal stance become accessible via the conceptual complexes of linguistic items themselves, which, in turn, are notably shaped by another language-external general principle, a language user’s cognitive state (including attention resources and memory capacity). Such cognitive reflexes are at the heart of Talmy’s (forthcoming) The Attention System of Language, and they may be captured, quite generally, as a linguistic item’s attentional profile, critically determining its usage (cf. Sections 4 and 5).

Such salience effects, I would argue on a more general level, comply with register analysis in many respects: Talmy (forthcoming), in fact, proposes to (re-)analyse the contextual components of lexical items as part of their associated meaning; and the “central notion of a speaker’s particular attitude can then – through a backgrounding of the role of the speaker – be interpreted instead as a type of speech situation”, which, in turn, accommodates the concept of regis-ter. Accordingly, “any speaker attitude or register pertaining to the core meaning that is lexicalised in a morpheme” as well as targeting “the speech participants (the speaker himself, the addressee, or the relation between the two) or […] the current circumstance” would then appear as “introjecting” register distinctions


into their “minds” and thus be subject to the fundamental attentional processes of activation and attenuation. Under such analysis, “register can always be traced back upstream to speaker attitude”, incorporating specifics of the communicative setting in the contextual sector of an item’s meaning for that matter; and

what [for example] might best be treated at root as a speaker’s attitude of respect toward the addressee – or a speaker’s attitude of solemnity about the circumstance – could also be interpreted as the presence of a formal situation that triggers the use of a formal register (Talmy forthcoming).

4 An attentional analysis of parenthetical delivery

Following this sketch of a situational analysis, this section will focus on and contextualise one meta-linguistic attentional mechanism from Leonard Talmy’s (forthcoming) The Attention System of Language8 that specifically accounts for the pattern of parenthetical delivery in the spoken mode.

In this model, each individual attention-specifying device is seen to increase or decrease the relative attentional weight of a particular linguistic representa-tion’s (semantic) component or (surface) constituent, which, irrespective of its linguistic format or structural category, thus coherently accounts for the linguis-tic variation in terms of attentionally specified, discriminate usage profiles. It is this functionality of linguistic choices to which “skilled speakers and writers can devote considerable meta-cognitive attention [in] their options for setting an enti-ty’s degree of salience” (Talmy forthcoming) and which arguably again invokes fundamental issues likewise systematically addressed in Biber’s register, genre and style perspectives.

For the present analysis, I will only selectively and cursorily call on two such basic attention factors: one that captures attentional properties of an individual morpheme and one that specifies attentional effects of one entity on another; that

8 Talmy’s forthcoming book introduces a coherent theoretical and powerful analytical factor model of linguistic attention, informed by a sophisticated theory of language-specific attentional parameters and accounting for a wide range of attentional effects in language, so far privileging the (more basic) spoken modality. The individual basic attention factors successively integrate as component mechanisms, or Areas, in (hierarchically organised) Domains: Domain A, Atten-tional properties of an individual morpheme, Domain B, Attentional properties of a morpheme combination, Domain C, Attentional effects of one entity on another.

180 Martina Lampert

is, they assign “different degrees of salience to the parts of an expression or of its reference or of the context” (Talmy 2007a: 264).9

The attentional mechanisms relevant for parenthetical constructions are framed in terms of meta-linguistic causal triggers and targets with two distin-guishable attentional effects: for one, as an immediate effect of a target’s “desig-nation as the relevant entity out of the entities co-present in the environment”, its activation level is raised, thus increasing its salience; as a second effect, the conceptual or referential content of the respective entity will either be activated or attenuated, lending this selectional target its specific “dual character” that, in turn, calls for its “differential attentional treatment”, which depends on the actual impact on the referent, i.e., foregrounding or backgrounding it (Talmy forthcoming).

The factor addressing attentional effects of parentheticity in the spoken modality identifies a prosodic device as trigger that first highlights the parenthe-sised sequence’s referential content as the selected-out target and whose salience is subsequently attenuated via its prosodically differential realisation. The widely assumed medium-specific mechanism induces an “expression-spanning loud-ness reduction and pitch lowering” that together “seems in general to reduce a hearer’s attention on the expression’s meaning”; and such parenthetical delivery would then “trigger attentional decrease in a target – in particular, to attenuate the expression’s reference”, in effect instructing the addressee to consider the target’s referential content as incidental (Talmy forthcoming).

Accordingly, the parenthesised clause in example (2), “if pronounced as just described, seems to encourage a hearer to treat its content as merely incidental information, readily disregarded” (Talmy forthcoming):

(8) My cousin Sue (who happened to be visiting at the time) wanted to go to the museum.

This “attenuative effect of reduced loudness over an expression derives readily from the attentional principle of quantity” and involves a general cognitive prin-ciple: “the smaller the magnitude of some perceptual dimension of a form – here, its loudness – the less salient its referent”. In the spoken modality, then, this triggering device would uniformly decrease the target’s salience through reduced physical parameters. According to the received view, this very mechanism is assumed to be also operative in the written modality and to directly “translate” into the attentional effect attributed to the lunulae. I would, however, argue

9 This is a substantially simplified version of the actual attentional analysis, abstracting away from intriguing details in the description and largely avoiding the usage of Talmy’s terminology.


instead that a corresponding, yet distinct device is called for to accommodate the representational format of its visual counterpart, whose cognitive effects on readers in processing parenthesised sequences seem to be essentially constrained by the physical characteristics of the medium (cf. Lampert 2011 for some sugges-tions to this end). Notably, the presupposed uniform and iconic attenuation is readily available only to the analogue channel of vocal dynamics (but absent from the written modality in principle), where such reduction of prosodic parameters allows for gradience. As the following section may well demonstrate, the written mode, however, deprives the pattern’s attentional profile of its characteristic dual nature, owing to its representational design features of digital discreteness.

5 Toward an attentional profile of parenthetical constructions

In view of the general comparative perspective, I will now address major medial differences as well as cross-medium correspondences in the “attentional behav-iour” of the parenthetical pattern: like the supposed prosodic signature in the spoken medium, the selected-out parenthesised sequence will, in print, undergo (some) activation as an effect of being marked off as different from the adjacent text; but unlike parenthetical delivery in the spoken mode, which allows for gradience along a quantitative parameter (i.e., activation and attenuation from minimal to substantial) and is essentially “fluid” in character as well as subject to individual variation (hence, probably less discriminately effective), the lunulae will attract (some) attention to themselves by virtue of their qualitative differ-ence in figural shape as against their linguistic environment. Though members in the inventory of alphabetical script, the crescent-shaped delimiters are perceiv-ably distinct from their graphemic vicinity on account of their physical make-up instantiating a non-alphanumeric representational system.10

Quite analogous to parenthetical delivery in the spoken modality, the lunulae will attract attention to themselves as well as direct attention to another entity, the parenthesised item(s); however, the figural elements themselves, categor-ically different from the analogue gradient parameters of vocal delivery, lack any perceptual quality to iconically induce an attention attenuating effect on

10 Nunberg (1990: 6–7), for instance, emphasizes the independence of this figural “linguistic subsystem” as relatively autonomous and sets it on a par with non-linguistic graphical-rep-resentational systems (cf. also Biber 1988: 7 and 9; Bredel 2008: 10–14).

182 Martina Lampert

the target and will instead initiate an(other) activation process. The digital and discrete lunulae do not readily support gradience in a quantitative dimension, and the vision-based linguistic subsystem in alphabetical languages exclusively relies on the principle of categorical (figural) difference, having conventionalised only discrete, all-or-none devices – with no perceptually gradient feasible to indi-cate reduced salience in some physical parameter.11 The lunulae are essentially separative, ‘point-like’ spatial delimiters, unequivocally signalling the begin-ning and end of what is considered the prototype of parenthetical constructions in print. Again, contrary to their functional equivalent of parenthetical overlay delivery in the spoken modality, they are not coextensive with the parenthesised sequence: by their curved shape, wide at their centres and pointed at their two ends, lunulae – iconically speaking – “embrace” a sequence that in this way receives an identity of its own, both separated from and integrated into its envi-ronment, and effective in delimiting the item(s) “inside”. Accordingly, any poten-tial attenuation that may be associated with the parenthetical pattern does not derive from perceptual stimuli but would exclusively have to be understood as a mere convention that has been negotiated in the literate community. It is thus ultimately an effect of (prescriptive) formal instruction or cultural practice exhib-iting the view that these characters signal the reader to treat the parenthesised target as an aside deserving lesser attention.

In conclusion: while the parenthetical pattern cross-medially shares the essentially dual character of attentional selection and weighting, the outcome is different: discrete lunulae do not allow for the unequivocal attenuating effect of parenthetical delivery. Readers encounter a visual stimulus whose categorical difference both in the type of triggering device and its attentional impact is at the mercy of the written medium’s essentially digital nature; and with the parenthe-sised sequence being perceptually non-distinct from the previous and subsequent typographical environment, no perceptual effect in either attentional direction is reasonably to be expected. This cross-modal variance in parenthetical construc-tions’ fundamental characteristics ultimately results in a categorical difference of the same functional pattern: it derives from the tangible features pertaining to the production and reception circumstances and gives rise to the profound (though

11 In principle, the written modality would not prohibit attentional gradience in the target, though: light fonts, e.g., in a context of regular fonts might conventionally correspond to the re-duced loudness over an expression in the parenthetical delivery and would in fact implement the attentional principle of reduced quantity in a physical parameter. Exploiting such attenuating potential has obviously never been considered as a possible general strategy.


not absolutely but continuously quantifiable) divide into speech and writing by major situational parameters (cf. Biber 1988: 38–45).

6 Tracing the parenthetical pattern across written registers

Whereas the preceding section focused on mode-dependent differences, the analysis to follow will now highlight commonalities across registers: my prelimi-nary findings from an initial small-scale case study of arguably the extreme ends of the register continuum, (scientific) academic prose and experimental poetry, seem to speak in favour of a common cognitive principle underlying the pattern – despite its enormous range of structural variability (cf. Lampert 2011 for some exemplification). Parenthetical constructions may then qualify as an integration feature (cf. Biber 1988: 43), significantly sharing both the structural pattern and the communicative function across the two samples. In fact, scrutinising a novel candidate for inclusion in the pool of register-indicating characteristics is not unlikely to produce unexpected results, as when “certain linguistic features will occur more frequently […] than […] expected” (Biber and Conrad 2009: 10).

A case in point apparently are the lunulae – a pertinent and prominent signa-ture feature of E. E. Cummings’ minimalist poetry, in combination with an uncon-ventional use and expressive functionalisation of not only punctuation marks in general but also of deviant orthography and innovative typography. Though clearly a major issue of style analysis12 and typically “associated with aesthetic preferences” rather than being functional (Biber and Conrad 2009: 18), Cum-mings’ usage of lunulae indeed appears to sensibly allow for, or even invite, a comparative analysis regarding the parenthetical pattern, all the more so in light of Biber’s (1988: 13) emphasis that any such function must not be “posited on an a priori basis; rather [it is] required to account for co-occurrence patterns among linguistic features”. Following a similar rationale and giving the linguistic dimen-sion priority for the time being, I would indeed suggest that, across written regis-ters (perhaps even including styles), parenthetical constructions are more likely to share an essentially cognitive function rather than exhibiting a great extent of variation; and it might turn out that it is “only” their contextual co-occurrence

12 Biber and Conrad (2009: 18) note that style analyses are “similar to register perspective” in that “typical linguistic features [are] associated with a collection of text samples from a variety”, they characteristically differ “in the underlying reasons for the observed linguistic patterns”.

184 Martina Lampert

features that would conceivably discriminate more specialised (sub)register functions, while not “separating”, or telling apart, even opposite written regis-ters, in the face of the most critical defining characteristics of a register: shared communicative functions (cf. Biber and Conrad 2009: 16). Now, what would this commonality across the “extreme” registers of scientific article and experimental poetry mean in light of the assumption that, according to Biber (1988: 16 and 19–20), differences (due to situational factors) are more likely to be expected?

To begin with one randomly chosen article from neuroscience, Foundas, Mock, Corey, Golob and Conture’s “The SpeechEasy device in stuttering and nonstuttering adults: Fluency effects while speaking and reading”, I will address some major issues with respect to the overall line of argumentation in this paper. For reasons of greatest comparability, I have only selected instances of paren-thetical constructions from the article that match those in the poem regarding their structural type, i.e., the parenthesised sequences in example (9) exclusively feature lunulae but lack any verbal specification of the relation between their own referent and the referents of the outside linguistic environment (but cf. examples 11 and 12 below).

I will first comment on the three excerpts selected from science writing and spot some major salience-related effects:13

(9) a. In the case of DAF, the speech is amplified and delayed (alteration in the time domain), whereas FSF shifts the whole spectrum of speech.Gloss: DAF abbreviates externally-delayed auditory feedback, and FSF replaces fre-quency-shifted feedback.

(9) b. Three speech tasks (Reading Aloud, Monologue, Conversation) were used to examine speech fluency at baseline and in each condition repeated independently for each participant with the device in the left and right ear.

(9) c. For purposes of this study, attention was measured by a computerized version of the CPT with this measure approaching significance with the PWS having higher scores (more impaired attention) compared to controls.Gloss: CPT abbreviates Conner’s Continuous Performance Test and PWS substitutes people who stutter.

Following the linearity constraints of the reading process,14 a reader of the above samples will, after an uninterrupted sequence of graphemes (delayed, tasks and scores) and an obligatory blank space, encounter the opening character, which would – according to the attentional analysis presented in the previous section –

13 In this description, I do not imply any claim whatsoever about the actual on-line processing.14 These constraints testify to the strict(er) principle of linearity in written language; cf. Biber (1988: 38), Bredel (2008: 9 and 30–31).

first attract their attention to the character itself, resulting in its activation. The lunula is then immediately succeeded by another uninterrupted sequence of graphemes (alteration, Reading, and more) – the first constituents of the respec-tive parenthesised sequences, all separating, by blank spaces, the word forms in their specific linear sequences. Directly attached to the final items domain, Conversation, and attention, another such figural element, the complementary closing lunula, signals the end of the parenthetical construction, which is imme-diately followed by a comma as well as another blank space in (1) a., while (1) b. and c. only feature a blank space preceding the word forms of the non-parenthe-sised text: whereas, were and compared.

Example (10), Cummings’ untitled experimentalist poem, exhibits only some variations on the same theme:

(10) l(a

leaffa

ll

s)onel

iness

Note, first, that all instances of the shape <l> have to be considered typograph-ically ambiguous to represent the lower case grapheme of the corresponding lateral approximant/l/, the numeral one, and the first person singular pronoun, I.

In addition to the deviant vertical arrangement of the characters, a reader is confronted with the homograph I, which adds to the effect of alienation ini-tiated by the unconventional assembly of symbols and is likely to arouse sur-prise, irritation or delight in the recipient. Note that the integration of a paren-thetical construction into a morpheme is, in principle, admissible beyond the poetic context, as in formal academic registers, for instance: cf. as one example W(h)ither English, the title of a 1990 article by Peter Widdowson15. Deviating, however, from conventionalised practice, the poem incorporates a sequence of

15 See Lampert (2011) for further exemplification of how deliberate academic writers exploit the device to create the ambivalence crucial for their intended reading.

186 Martina Lampert

dissociated graphemes (assembled in four pairs) that have to be reconstructed as the complete simple clause a leaf falls. Different from a canonical, horizontally arranged running text, line breaks and three space lines replace its regular blank spaces, while the closing lunula expectedly follows the final letter <s> of falls directly. Without venturing a final decision on the parenthetical pattern’s degree of alienation in the poem, I would nevertheless argue that the gestalt remains perfectly decodable against its conventional form – in fact, I would suggest that it is the lunulae in the first place that, apart from the two lexical items, high-fre-quency one and the transparent nonce iness, render the sequence of graphemes “readable” as a clause.

In terms of discourse functions, and irrespective of any structural specifics16 of the parenthesised sequence, (9) a. illustrates one out of only few principal options: the parenthesised sequence (alteration in the time domain) represents an instance of generalisation over the subcategory of DAF (to be spelled out as externally-delayed auditory feedback) as one of its specimen with respect to the time dimension of speech (relevant for an analysis of stuttering); that is, the parenthesised sequence links the more specific information in the preceding text to a superordinate category of temporal changes. The most likely communica-tive purpose underlying this author-causal strategy is to offer the reader a more general reference system for the information to integrate, in the service of safe-guarding an approximation of the shared knowledge base between author and (a less specialist) reader.

(9) b., by contrast, suggests the opposite textual and informational relation between the preceding environment and the parenthesised sequence, identify-ing the three concrete test items: the parenthetical construction (Reading Aloud, Monologue, Conversation) specifies the subcategories of tasks to establish a base-line of a test person’s speech fluency profile; again, the exact relational specifica-tion will have to be reconstituted through inferencing processes on account of the knowledge base available to a reader.

A third option is instantiated in (9) c., where the parenthesised sequence (more impaired attention) most plausibly establishes a same-level category rela-tion of higher scores […] compared to controls, essentially reformulating the same referential content from another perspective, i.e., framed with reference to the

16 It may be worth noticing that the clausal pattern in (10) would comply with an expected gen-eral preference in non-academic texts, but see (6), while (9) a. through c. represent the phrasal/nominal prototype of informational writing (cf. Gray 2013: 368). Both, however, document refer-entially non-explicit structures (without any indication of the relation), which would, following Biber (cf. 1988: 145), go against the stereotype of academic samples.


study’s objective; it may, however, be plausible to conceive this case as a spec-ification as well, since more detail is added to the text’s informational import, ultimately leaving us with two major principled and (logically) complementary relations: generalisation and specification.

Interestingly, the poem indeed appears to instantiate both the very same rela-tional pattern and discourse function suggested for the examples from academic prose: E. E. Cummings’ text features the same delimiter and the same selected-out structure, with no categorically different principle inside and outside the lunulae (even though, admittedly, the non-parenthesised sequence incorporates only two conventional lexemes, one and the nonce iness): hence, it follows the same cog-nitive principle. Other than the science article, the poem, however, plays on the option of simultaneously meaningful processing alternatives, that is, between specification and generalisation as ‘legitimate’ reading variants of the text – either privileging the non-parenthesised sequence, i.e., specifying the abstract concept loneliness through an example, or generalising over the parenthesised sequence as a (metaphorical) specimen of loneliness; and, like in (9) c., there is even the option to conceive the two component texts as complementary perspec-tives, ‘melted’ into one ‘statement’.17

It should be added that, in all instances, it is up to the reader to reconsti-tute the presupposed relation, with the risk of their misconceiving its actual meaning (cf. Lampert 2011 for some detail). To explain: unlike the cases in (9), which do not explicitly instruct the reader how to exactly process the respective information, the structurally variant examples (11) and (12) from the same article below feature optional, conventionalised triggers that maximally control for the author-intended processing of the relational specification. While in (11) i.e. asserts the relation of semantic equivalence between laterality and handedness, the e.g. in (12) indicates that changes in speed rate and amplification represent exemplars of the general category other factors (cf. Foundas et al. 2013: 146–147).

(11) No significant associations were observed between device effect and any of the three measures of motor (manual) laterality (i.e., handedness) (all Bonferroni-corrected p-values = 1.00).

(12) It should be noted, however, that findings regarding the influence of the device on stuttering must be interpreted with caution as other factors (e.g., changes in speech rate, amplification) may contribute to enhanced fluency with this treatment.

17 Any claim to do justice to the literary intricacies involved is explicitly beyond the purpose of these few remarks; my concern is solely with a demonstration of a common cognitive principle.

188 Martina Lampert

As argued out in the previous section, the parenthesised sequences, though attentionally made more salient as the selected-out targets, are neither different in quality nor in quantity in any typographical feature(s): without any difference in a perceptual dimension of form, only separated off from the environment, no attenuating or activating effect is feasible; and this perceptual non-distinctness between parenthesised and non-parenthesised text may – plausibly and readily – invoke ambiguity, or irritation, in readers how to process the respective informa-tion in terms of priority – an (attention) effect that has been around ever since parentheses have been used (cf., e.g., Lennard 1991: 5). It is this perceptual (or cognitive) dimension that appears to be the source of the ambivalence,18 with targets perceptually indiscriminate from their surrounding and quite iconically inviting alternative attributions of salience to either the textual environment or the parenthesised sequence: in form, the lunulae both separate and integrate; and in function, the perceptual non-distinctness opens up reading alternatives between “aside and drama” as there is “nothing […] to prevent [the lunulae] from being […] emphatic” (Lennard 1991: 5). They only delimit a portion of (the same) text, or in Cummings’ case, two “texts” melted into each other as two layers of the same poem (cf. Tartakovsky 2009: 228–229).

This very effect of simultaneously available options among which to choose is well known from another cognitive system: in vision (which is the natural per-ceptual domain of print), the fundamental (per)ceptual phenomenon of Gestalt psychology’s figure-ground distinction gives rise to the reflexive cognitive ambiv-alence of visual illusions or bi-stable images as represented in Edgar Rubin’s clas-sic.19 Either the vase is attended to as figure against two faces as (back)ground or the two faces as foregrounded figure against the vase as (back)ground.20 Trans-lated into the medium of print, this effect, emerging from the same general cog-nitive principle, may lend lunulae their modality-specific profile: as a meta-lin-guistic/cognitive device, they instruct the reader to simultaneously attend to linguistic alternatives outside and inside of them; and attention will have to be divided to select among different processing options – in an attempt to “liber-ate” the text from its spatial linearity, or unidimensionality (cf. footnote 16), and

18 Cf. also Brown’s (2009) “ambivalent nature of the parenthesis” and his repeated appeal to the concept of attention, i.e., “foregrounded” vs. “unimportant” and attention vs. importance in the introductory lines of his “dissertation”.19 On the associated implications of figure-ground reversal in vision see Palmer (1999: 280–287).20 Perceptual psychology abounds in experiments that confirm a robust effect: having realised this ambivalence, a (per/con)ceiver’s perceptual system is likely to switch to and fro, and it is hard to keep attention on only one “interpretation” or effect (see Palmer 1999). By analogy, the two alternatives are accessible for a reader, who may, however, choose one option as primary.


creating, through competing readings, the illusion of conveying more than one message at a time. Accordingly, an individual reader may choose to either focus their processing capacity on the parenthesised or the non-parenthesised infor-mation – depending, in the case of the academic text, presumably on a particular reader’s expertise and knowledge; hence, the actual processing, then, proves a question of adequacy of understanding. In the poetic context, in contrast, the choice appears to be(come) an issue of preference, with the ensuing effect of sur-prise or delight, and thus, a matter of propensity for playfulness.

A final word goes to the poem, which, according to literary critic Alistair Brown (2009) stretches the edges of the pattern (too far?); he expresses his “verdict” that “the example […] is extreme […], useful for illustrating the range of possibility in the lunulae but hardly representative of the general use.” I would, however, argue that this conclusion is justified only at first sight: in attentional terms, the poem ‘only’ employs the principle of divided attention, which quite naturally invites strategic instrumentalisation – playing on the systematic ambi-guity of what to attend to more. This option is indeed exploited by Cummings, yet entirely within the ‘legitimate’ confines admissible in the visual medium of print: “The synthesis of two different possibilities occurs here, but visually rather than metaphorically” (Brown 2009). My conjecture is rather that both the figural elements and the structural pattern – even in such an allegedly extreme case – “just” perform their general cognitive “task” along with its well-known effect, generating (per)ceptual ambiguity – which, if sensible, may indeed be quite dis-illusioning in the context of an avantgarde piece of art.

Against the distinctive dual nature of parenthetical delivery’s attentional profile with its (potential) selective activation and attenuation in the spoken mode, the pattern in the vision-based written modality rather suggests divided attention as an appropriate reference concept to capture parenthetical con-structions’ modality-specific effect: well-known from Gestalt psychology’s fig-ure-ground-distinction, it entails that attention should be divided between two possible readings and thus creates the illusion of conveying two “messages” at a time. In particular, with respect to the device’s predictive potential I would argue that the difference in impact across the two sample registers selected for this paper “boils down” to an after-effect of surprise in the poem, resulting from the non-conformity to reader expectations associated with its conventionalised genre norms that are prevalent in formal (academic) writing.

190 Martina Lampert

7 New vistas: Balancing out cognitive determinants and situational constraints on parentheticity

Though definitely limited in both scope and variation of the constructional type, with only two samples scrutinised, this outline account may nevertheless have given at least a sense of the central argument, spelling out a principal cognitive determinant of parenthetical constructions in general in its medium-specific profile of print in particular: first, the nature of human cognition imparts, and allows for, certain forms of implementation that are shared across cognitive systems and is largely controlled by attention; as a second major effector, the tangible properties of the production and reception circumstances impose their constraints on the pattern’s structural options, manifesting a fundamental divide between the language modalities and giving rise to two distinct medium-sensitive attentional profiles of the parenthetical construction.

Based on the general principle of divided (visual) attention, parenthetical constructions emerge, in the written mode, as a register and genre-independent phenomenon. The pattern’s perceptual ambivalence “naturally” follows from the lunulae’s attentional activation as non-graphemes, on the one hand, and from the parenthesised sequence’s non-difference to its graphemic environment, on the other. With any palpable perceptual attenuation effect missing in the con-struction’s formal representation, neither a more nor a less activated sequence may be identified, unless one would permit, once more, that the bias toward the spoken modality be conceptually imposed on print, perpetuating the false ideol-ogy that writing is dependent on speech.

Thus, the crucial question is why parenthetical constructions exist in the first place, and why they are so pervasive in the written registers, given the principled option of this production circumstance for multiple revisability. One reasonable suggestion may be that, as a meta-cognitive device, the pattern allows for a con-ceived additional (or separate) information level, hence fictively circumventing the linearity constraint of the linguistic medium’s spatial two-dimensionality. With their general cognitive parameter of divided attention, parenthetical con-structions convey two “messages” at a time, or, as Nunberg (1990: 115–116) has it, they – like quotations – “depart from a presumptive text”. If attenuation as a general feature were retained, equivalent processing of the alternatives regarding a specific ideational content in the linear form of progression available in print is precluded in principle, thus sacrificing a text’s adaptability to the mind-sets and expectations of individual readers that will play a decisive role in determining the preferred reading. Globally, then, parenthetical constructions indeed instan-


tiate a convenient textual strategy that may both support and be (consciously) exploited for specific communicative purposes.

Apart from these general (register and genre-independent) cognitive impli-cations, however, it proves an entirely empirical issue whether the suggested discourse function(s) – here restricted to the complementary logical relations of generalisation and specification (plausible as they may be for the selected cases) – might be hypothesised to hold across registers. In this vein, an in-depth corpus-based register analysis of representative samples that will pay respect to higher-level discourse functions and their expected complex, systematic inter-action with situational characteristics is essential to ultimately determine the range of variation across the textual dimensions – whether a limited set of func-tional relations possibly constrains parenthetical constructions or whether any determination will rest on the unique interaction between the given text and an individual reading experience, probably with few or no a priori generalisations possible.

What the commonality of the two samples from disparate written registers may, however, indeed suggest is the significance of the communicative “task”, which perhaps proves a – or: the – decisive criterion of register variation (cf. Gray 2013: 364), being largely independent of the type of structural integration: while Cummings’ text with its clausal specimen rather invokes a presumptive oral written register, the academic article conforms to the stereotype of phrasal modification characteristic of formal writing; but in both cases the parenthetical construction manifests itself as a “cognitive ‘marker’ of written discourse, which can only be produced in circumstances that allow planning and manipulation of the text” (cf. Gray 2013: 368). Register analysis will certainly contribute its findings to provide insights into detailing the exact distribution of frequencies of distinctive and salient co-occurrence patternings across (sub)registers, paying respect to functional differences between registers in terms of their “internal coherence”, i.e., the degree of variation that they tolerate (Biber 1988: 26). Con-verging on the same observation of non-linearity or multi-layeredness (cf. Biber 1988: 21), the cognitive semantics view might offer a sensible motivation for the abstracted underlying functional dimension – effects resulting from the cognitive constraints that divided attention (dis)allows.

192 Martina Lampert

ReferencesBiber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP.Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP.Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.

Longman grammar of spoken and written English. London: Longman.Bredel, Ursula. 2008. Die Interpunktion des Deutschen. Tübingen: Niemeyer.Brown, Alistair. 2009. Parentheses and ambiguity in poetry of the twentieth century. http://

www.thepequod.org.uk/essays/litcrit/parenthe.htm (accessed 30 January 2015).Corpus of Contemporary American English (COCA). corpus.byu.edu/coca (accessed 29

September 2015).Cummings, E. E. 1973. Complete poems, 1904–1962. George J. Firmage (ed.). New York: Liveright

Publishing Corporation. Dehé, Nicole. 2014. Parentheticals in spoken English: The syntax-prosody relation. Cambridge:

CUP.Foundas, Anne L., Jeffrey R. Mock, David M. Corey, Edward J. Golob & Edward G. Conture. 2013.

The SpeechEasy device in stuttering and nonstuttering adults: Fluency effects while speaking and reading. Brain and Language 126(2). 141–150.

Gray, Bethany. 2013. Interview with Douglas Biber. Journal of English Linguistics 41(4). 359–379.

Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English language. Cambridge: CUP.

James, William. 1950 [1890]. The principles of psychology. New York: Dover Publications.Kasimir, Elke. 2008. Prosodic correlates of subclausal quotation marks. ZAS Papers in

Linguistics 49. 67–77.Lampert, Martina. 1992. Die parenthetische Konstruktion als textuelle Strategie. Zur kognitiven

und kommunikativen Basis einer grammatischen Kategorie. München: Otto Sagner.Lampert, Martina. 2009. Attention and recombinance: A cognitive-semantic investigation into

morphological compositionality in English. Frankfurt am Main: Peter Lang.Lampert, Martina. 2011. Attentional profiles of parenthetical constructions: Some thoughts

on a cognitive-semantic analysis of written language. International Journal of Cognitive Linguistics 2(1). 86–106.

Lampert, Martina. Forthcoming. Cognitive motivations of second(ary) voices: A multimodal perspective on parentheses and quotations. [Conference proceedings of the international workshop on secondary syntax: Parentheticals, vocatives, quotations. University of Bamberg, 6 December 2014].

Lennard, John. 1991. But I digress: The exploitation of parentheses in English printed verse. Oxford: Clarendon Press.

Nunberg, Geoffrey. 1990. The linguistics of punctuation. Stanford: CSLI.Nunberg, Geoffrey. 1999. Introductory Essay to the Norton Anthology of English Literature,

Seventh Edition. http://people.ischool.berkeley.edu/~nunberg/norton.pdf [1–22] (accessed 29 September 2015).

Palmer, Stephen E. 1999. Vision science: Photons to phenomenology. Cambridge, MA: MIT Press.

Patt, Sebastian. 2013. Punctuation as a means of medium-dependent presentation structure in English: Exploring the guide functions of punctuation. Tübingen: Narr.


Schwyzer, Eduard. 1939. Die Parenthese im engern und im weitern Sinne. Berlin: de Gruyter.Talmy, Leonard. 2003. The representation of spatial structure in spoken and signed language.

In Karen Emmorey (ed.), Perspectives on classifier constructions in sign language, 169–195. Mahwah, NJ: Erlbaum.

Talmy, Leonard. 2007a. Attention phenomena. In Dirk Geerarts & Hubert Cuyckens (eds.), The Oxford handbook of cognitive linguistics, 264–293. Oxford: OUP.

Talmy, Leonard. 2007b. Recombinance in the evolution of language. Proceedings of the 39th annual meeting of the Chicago Linguistic Society: The panels. Chicago: Chicago Linguistic Society. 26–60.

Talmy, Leonard. Forthcoming. The attention system in language. Cambridge, MA: MIT Press. [draft version from 2010]

Tartakovsky, Roi. 2009. E. E. Cummings’s parentheses: Punctuation as poetic device. Style 43(2). 215–247.

Widdowson, Peter. 1990. W(h)ither English? In Martin Coyle, Peter Garside, Malcolm Kelsall & John Peck (eds.), Encyclopaedia of literature and criticism, 1221–1236. London: Routledge.

Stella Neumann and Jennifer FestCohesive devices across registers and varieties: The role of medium in English

Abstract: The present paper aims at analysing varieties of English from a func-tional as well as regional perspective, arguing that these two parameters of varia-tion differ, but are closely related in the way they influence and shape language. For that purpose, the six regional varieties of Singapore, Hong Kong, India, Canada, Jamaica and New Zealand are examined in a corpus-based approach drawing on the data from the International Corpus of English (ICE). All regional varieties are represented in the study by the same five registers: academic writing, administrative writing, broadcast discussions, conversations and exams.

The analysis focuses on the dimension of medium, which is examined in terms of three concrete linguistic markers: the use of pronouns, conjunctions and lexical density. The results clearly show differences along both regional and func-tional lines which allow comparative conclusions about the speech societies in question.

1 IntroductionEnglish has a peculiar status amongst the languages found in different parts of the world. For numerous reasons it developed along diverse lines in various regions, resulting in a large number of different varieties spoken in almost all corners of the world (for an overview of 76 varieties including pidgins and creoles cf. Kortmann and Lunkenheimer 2013). These different regional varieties show particularities which depend on the socio-cultural background and history of the respective speech communities and the status English has in that context. New varieties continue to evolve, arguably because the role of English is still growing. One interesting question in this context is how to determine whether a speech community’s use of English can be categorised as a new variety with its own set of linguistic features (or whether the observed peculiarities are mistakes and the putative variety is simply learner language). Amongst the criteria that have been mentioned to determine whether a given use of English has emerged as a new variety is the development of a distinct set of registers (cf. Mollin 2007).

Stella Neumann, RWTH Aachen UniversityJennifer Fest, RWTH Aachen University

196 Stella Neumann and Jennifer Fest

The notions of functional variation, i.e. register, and regional variation are thus closely related (cf. Schubert, this volume). It should be noted that throughout this paper the notion of regional variation is used drawing on Halliday’s distinction between variation according to the language user versus variation according to language use (cf. Halliday 1978: 183): regional variation in this sense refers to speaker-related variation based on his/her geographical provenance in contra-distinction to functional variation capturing context-related variation independ-ent of the speakers’ personal background. ‘Regional’ in this sense is being used in a way that is thus broader than the more specific reference to dialects and related, more local varieties, which is more commonly used in variational linguistics.

Varieties of English have been described extensively both individually and comparatively (e.g. Kortmann and Szmrecsanyi 2004; cf. Section 2), and the same is true for register variation (e.g. Biber 1995; Neumann 2013). What is still largely missing is a systematic account of register variation across varieties of English, a notable exception being Xiao (2009; cf. Section 2). Apart from that, even though Systemic Functional Linguistics prioritises paradigmatic relations, i.e. the lan-guage user’s choice depending on the meaning s/he wants to express and accord-ing to different contexts, register-based research across varieties of English is still at its beginning and often focuses on individual linguistic features (cf. e.g. Güldering, this volume; Schaub, this volume).

This study presents a partial analysis of registers across different varieties of English as part of an ongoing research project that aims at taking stock of the dif-ferences and similarities in terms of register variation across varieties of English. In the framework of this corpus-based project, we examine six components of the International Corpus of English (ICE; Nelson, Wallis and Aarts 2002)1 and cur-rently five of its text categories in order to collect findings for the different regis-ter parameters drawing on systemic functional register theory (e.g. Halliday and Hasan 1989). A previous study (Neumann 2012) provided a first set of findings on one subcategory for each of the three register parameters “field”, “tenor” and “mode” of discourse respectively, namely experiential domain, social distance and medium. This study takes up medium again, this time concentrating on cohe-sion, since the choice and frequency of cohesive devices reflect some interesting specificities of the spoken versus written medium.

Since linguistic features are polyfunctional, it should not come as a surprise that a register study re-analyses the same features in the light of different register parameters. In the case of this study, this is true for two of the three indicators

1 <http://ice-corpora.net/ice/> (accessed on 28 April 2015)

Cohesive devices across registers and varieties: The role of medium in English 197

that will be examined in Section 4, which have also been discussed by Neumann (2012) in the context of other register parameters.

The remainder of the paper is organised as follows: Section 2 discusses the relationship between register and variety in more detail, thus motivating the approach chosen for this study. We will go on to summarise the corpus method-ology including the operationalisation of medium as well as the concrete quan-titative measures used in this study in Section 3 before discussing the results of the corpus analysis in Section 4. The paper closes with some concluding remarks in Section 5.

2 Variation across registers and varietiesInitially, “variety” is a cover term for different ways of using language. In an early paper, Gregory (1967), for instance, points out that the term covers as heteroge-neous types of dialectal variety as idiolect (“Miss Y’s English”), temporal dia-lects (“Old English”), geographical dialects (“American English”), social dialects (“Upper Class English”) and what he tentatively calls standard dialects (“Stand-ard English”). These types are closely related to the language user’s situation in time, space and society. Gregory continues to distinguish “diatypic variety”, which has since come to be known as “register”. This type of variety is based on use, depending on the situational context, regardless of the dialectal background of the language user in Gregory’s above sense. The term “variety” has become established as a term covering geographical and social dialects (acknowledging the increasing interaction between these two types of variation), and “register” is used to refer to functional variation, which is due to the recurring characteristics of the situational context. In the course of this development, the interrelation between variety and register has been neglected and scholars specialising in one of the two areas mention a potential impact of the other area only in passing – if at all (e.g. Biber 1995; Neumann 2013: 1).

When exploring register from the point of view of varieties of English, we are actually looking at the question of whether there is one English language with a certain range of registers and varieties as some kind of generalised dialects, as it were, or whether what we label as “English” is actually a loose collection of vari-eties more closely related to what one might call different languages.

In linguistic theorising, the notion of language is typically used to refer to some kind of abstract system which – depending on the theoretical stance – con-tains a collection of rules or the options to express different meanings. Functional language theories such as Systemic Functional Linguistics describe the relation-


ship between the abstract system and concrete instances of language use as mediated by probabilities of (co-)occurrence of linguistic features (e.g. Nesbitt and Plum 1988; Halliday and James 1993; cf. also similar ideas in usage-based accounts of the cognitive family of theories). This entails a skewed distribution of features across different types of instances. Systemic Functional Linguistics specifically argues that this probabilistic distribution results in subsystems which filter the available features depending on the requirements or conventions of recurring situations (e.g. Matthiessen 1993; cf. from a sociolinguistic point of view Berruto 2004). The specific constellation of features in a given situational context is called register. Given this link between register and situation types, it appears plausible to assume that registers are constrained by the cultural context in which they occur: given the range of variation in terms of cultural contexts across varieties of English, it is unlikely for registers (i.e. situational contexts) to be congruent across varieties.

Sociolinguistics usually focuses on the description of varieties and often does not emphasise more general claims about language – even though current the-orising in the cognitive family of theories tends to interact with sociolinguistics and in particular investigates areas traditionally associated with sociolinguis-tics (cf., for instance, Kristiansen and Dirven 2008). Descriptive linguistics, on the other hand, tends to ignore variety-specific features and take an immediate shortcut to general claims about the language system. A general account of a lan-guage system organised by register is still not the norm.2 A further stratification of language according to varieties is even less common, even though it may make sense to insert variety as an intermediate category between language and register (Berruto 2004). Significant differences in the type and number of registers as well as in their linguistic characterisation should affect the general description of the English language.

On the basis of this line of reasoning, we can conceive of the relationship between language, variety and register as follows: a language may consist of several varieties, and an established variety is partly identifiable as such because it has its own set of registers. More specifically, this means that the particular cul-tural context of a speech community gives rise to a specific set of situation types which are linked to specific linguistic registers.

One way of verifying this model and – if shown to be viable – of using it for systematic descriptions is to analyse corpora covering a broad range of situations

2 Examples of how the register perspective can be integrated in general descriptions of English are the fourth edition of Halliday’s introduction to functional linguistics (Halliday and Matthies-sen 2013) and the Longman Grammar of Spoken and Written English (Biber et al. 1999).


across a broad range of (potential) varieties. Existing related corpus-based inves-tigations either examine a range of linguistic features suitable for claims about registers (e.g. van Rooy et al. 2010) but are restricted to one or at least a narrow range of varieties, or, if comparative, focus on only a few features across a wider range of varieties (e.g. Kortmann and Szmrecsanyi 2004; Nelson 2006; Sand 2004, 2008). A notable exception is Xiao (2009), who adopts Biber’s Multi-Di-mensional Approach (e.g. Biber 1995) to compare register variation across five components of the International Corpus of English (ICE). His study is particularly welcome, since it addresses a central shortcoming of Biber’s (1995) study, namely the re-analysis of features developed originally for the study of variation between spoken and written English for a general analysis of register variation in English. By drawing on Biber’s methodology, however, Xiao (2009) also encounters the same methodological limitations of a strongly inductive approach (cf. Neumann 2012; for a critique of factor analysis as a standard statistical technique, cf. also Diwersy, Evert and Neumann 2014).

Although the call for a corpus-linguistic approach to investigating the rela-tionship between variety and register appears straightforward, there are some hard methodological problems which need to be kept in mind when attempting this approach. Assuming – as we do here – that registers differ across diverging cultural contexts, strictly speaking, a variety-specific corpus design is required in order to obtain evidence of differences in registers. However, a corpus design reflecting these diverging cultural contexts may result in the collection of regis-ters which are incomparable across varieties, thus not allowing us to make spe-cific claims about the deviation of similar registers. A corpus design common to the different varieties, as used for the components of the International Corpus of English, avoids this problem by using a fixed set of text categories. This approach, in turn, is at least problematic, because it privileges the analysis of comparable language use and blurs divergences between seemingly comparable registers.3 In the worst case, it may lead to artefacts of the corpus analysis, because texts were collected as specimens of a category according to the common corpus design which do not represent a recurring situation type, and hence a register, in a given variety. Against this background, the results obtained from the analysis of the International Corpus of English, which will be described in the following sec-tions, need to be treated with caution. If it is possible to show differences between corpus texts for a comparable text category across varieties, this, at least, indi-cates that there could be underlying differences between the populations.

3 This is exactly the problem that Biber’s (1995) comparative corpus design resolves.

3 Methodology

3.1 Data

As already mentioned in the introduction, the texts that were used for our analysis were extracted from the International Corpus of English, a comparative corpus of English worldwide, which contains spoken and written data from a whole range of varieties in the form of different components. Several additional components are currently being collected.

This study adopts the approach to the corpus analysis introduced by Neumann (2012) and thus analyses five different text categories from six ICE com-ponents. It should be noted that “text category” is a notion of the corpus compil-ers which is taken here to provide a rough estimate of what could turn out to be registers. The same is true for the notion of component. Again this is a label for sub-corpora representing what the compilers identified as a variety of English. In what follows, we will assume that the text categories in the ICE components can be roughly equated to registers in varieties. The registers examined are:

AcWrit: Academic writing from the natural sciences (file numbers W2A-021 – W2A-030 of the original corpus design)AdWrit: Administrative writing (W2D-001 – W2D-010)BCDiscs: Broadcast discussions (S1B-021 – S1B-040)Conv: Conversations (S1A-001 – S1A-030)Exams: Timed exams (W1A-011 – W1A-020)

Altogether, the set totals 80 files per component, with 50 files for two spoken and 30 for three written categories. This roughly mirrors the design of the ICE collec-tion, which contains more spoken than written data, although to a slightly lesser degree. Note that the standard ICE design uses a fixed set of 500 files, where the individual file may, depending on the text category, consist of several texts. Usually, the different texts in one file are not identified by individual IDs but are simply marked by the tag in the internal structure of the file. This poses a problem for register studies, where the unit of analysis is the text (not the file). As it is consequently impossible to compute frequencies per text, we lump all fre-quencies per files in each text category together in one value in Section 4.

Both spoken categories are classified as dialogic and unscripted, the only difference being that conversations are identified as private, whereas broadcast discussions are marked as public. They are distinguished from broadcast news and broadcast talks, which are marked as monologic and scripted.


The three written registers, too, represent a certain amount of diversity. Timed exams are classified as non-printed writing produced by students. Aca-demic and administrative writing are classified as printed, which can be taken to entail a non-spontaneous nature of the texts. Academic writing is narrowed down for this study to texts from the natural sciences. It is different from popular writing, which also includes texts from the natural sciences, yet can be assumed to aim at a different audience. Lastly, administrative writing is categorised as a type of instructional writing.

The components selected for analysis represent

Canadian English (CAN)Hong Kong English (HK)Indian English (IND)Jamaican English (JA)New Zealand English (NZ)Singapore English (SIN).

These varieties represent sufficiently different socio-cultural situations and cover different types of variety. Drawing on the categories used by eWAVE, the elec-tronic World Atlas of Varieties of English (Kortmann and Lunkenheimer 2013), these can be classified as high contact L1 (NZ, CAN) and indigenised L2 varieties (HK, IND, JA, SIN; for details of the classification cf. Neumann 2012). We use the corpus version annotated with part-of-speech information based on the CLAWS7 tagset using the Wmatrix interface (Rayson 2009) as provided by the ICE Corpus team.

Neumann (2012) documents a number of technical difficulties which can be summarised in the following three categories: firstly, mark-up is not entirely standardised (some types of encoding are optional; cf. Wong, Cassidy and Peters 2011). This leads to a fair amount of incomparability between components, which is harmful to the type of register studies discussed here. Secondly, sloppy imple-mentation of the mark-up in some components leads to errors in query output and furthermore is inherited by all later annotation stages (including the part-of-speech annotation we use here). This is also true for the third type of difficulty, the mark-up approach to extra-corpus text. Enclosing extra-corpus text such as editorial comments by tags will lead to the content being included in corpus queries instead of treating it as additional information. Since the present study draws on the non-revised corpus version, its findings are liable to inaccuracies due to these three areas of difficulties.


3.2 Operationalising cohesion

The notion of register, clearly distinctive as functional language variation against variation based on social or regional factors, is identified by the three rather abstract parameters of field, mode and tenor of discourse. Since these register parameters are too general to allow the formulation of concrete corpus queries, derivation of more precise linguistic features is a necessary step in the analysis. Register analysis in the systemic functional framework draws on a stepwise oper-ationalisation of indicators for the abstract parameters by way of intermediate categories, thus avoiding the risk of overgeneralisation by using individual and fairly shallow linguistic features to make far-reaching claims about groups of texts. An example for the stepwise operationalisation could be channel as a spec-ification of mode of discourse, the parameter concerned with the way language is typically structured in a given situational context. Channel refers to the phys-ical way in which language is transmitted in a given register. It could be phonic or graphic with ensuing restrictions for the linguistic features used in the given context. If, for instance, language is transmitted only via the graphic channel, prosody cannot be used to foreground (or background) information, but rather syntactic means such as cleft constructions in English. This means that phono-logical and syntactic features used for structuring information may be interpreted as operationalisations of channel to decide which particular channel is charac-teristic of a given register, thus, at the next level of interpretation, also character-ising the mode of discourse of this register. While channel would appear a fairly straightforward category which can be determined without extensive linguistic analysis, computer-mediated communication, such as, for example, chat com-munication, seems to defy traditional classifications of channel, thereby making a linguistic analysis of the specifics of the electronic channel appear useful (for a more detailed introduction to the derivation of register indicators, cf. Neumann 2013).

For this study, the intermediate category of medium, or, more precisely, the way in which spoken and written medium affects the organisation of language in texts (Halliday and Hasan 1989), and its underlying phenomenon of cohesion are selected from the area of mode of discourse. Cohesion refers to the linguistic means which make a text hang together (Halliday and Hasan 1976).

Spoken and written language can be said to vary – amongst other indica-tors – in the preferred type of cohesive devices and their frequency. For instance, as a corollary of their context dependence, spoken registers tend to draw more on pronominal means to link clauses (Biber et al. 1999: 237); written texts, in particu-lar those where the audience is unknown to the writer, will spell out more lexical information. The registers, independent of the variety they originate from, can be


supposed to have these indicators in common at least to a certain degree. Taking the perspective of comparing varieties, we can expect variation in the reliance on specific cohesive devices depending on the extent of register variation within the variety as well as specific socio-cultural characteristics of the speech community such as literacy.

Halliday and Hasan (1976) describe five cohesive devices, namely reference, ellipsis, substitution, conjunction and lexical cohesion. Our analysis presents a closer examination of features related to reference, conjunction and lexical cohe-sion in the form of pronoun frequency, frequency of conjunctions and lexical density. An example from the corpus visualises these linguistic features quite clearly:

Ivan Pavlov, was the first person studying about classical conditioning in 1903. His demon-stration was about the salivating of dog. He noticed that dogs accustomed to the proce-dure would start salivating before the meat powder was presented. These are considered as unconditioned stimulus and unconditioned response and they occur without previous conditioning. Ivan then used the ringing of a bell as another stimulus to be paired as a pres-entation with the meat powder. After a number of times, he found that the dogs responded by salivating to the sound of bell alone. (ICE Hong Kong, text W1A-004, Student Essays)

Personal pronouns serve as a basic form of (personal) reference, i.e. the realisa-tion of the same or similar referential meaning by different linguistic expressions. Typically, a full lexical item (including phrases) as an antecedent is taken up in the ensuing text by a pro-form, especially personal pronouns, articles or demon-stratives. The excerpt given above includes several such references: Ivan Pavlov, who is the sole human actor in this paragraph, is referred to using the pronouns he and his. Furthermore, the cause and effect of his experiments are referred to as these and they.

In accordance with previous studies and as mentioned above, pronominal reference is considered a characteristic typical of spoken registers (cf. Halliday and Hasan 1976; Biber et al. 1999). Lexical cohesion refers to links between text chunks by repeating a previously used lexical item or by replacing it by a semanti-cally related one. Typical indicators include various types of sense relations; this study, however, examines a summary indicator, namely lexical density, which summarises the overall role of the vocabulary in texts and is said to be higher in written language (cf. Biber et al. 1999: 62; Halliday 2001). In the example from the corpus, there are 43 function words and 49 content words (based on the part of speech-tagging included in ICE HK), and the 92-word-paragraph therefore shows a lexical density of 53.26.

Conjunctions represent a third type of cohesive device: they mark the log-ico-semantic relationships between linguistic units, rather than operating by


replacing linguistic units. Consequently they represent transitions between messages (cf. Halliday and Matthiessen 2013: 655). In their discussion of linking adverbials, i.e. adverbials serving to mark logico-semantic relationships, Biber et al. (1999: 884–887) report some clear differences between spoken and written registers, not just with respect to the specific items chosen but also to the fre-quency of linking devices, with a higher frequency in spoken registers. The example contains two very obvious instances, namely the and, operating once as a phrase-level and once as a clause-level connector in “These are considered as unconditioned stimulus and unconditioned response and they occur without previous conditioning.”

Halliday and Matthiessen (2013: 657) rightly distinguish clause complex-ing (in the realm of grammar) from conjunction (in the realm of cohesion) and point out how they complement each other across spoken and written regis-ters. This raises the question of how well a quantitative corpus study can dis-tinguish between grammatical and cohesive features. The approach to the cor-pus-based analysis of cohesion taken in this paper is liable to a methodological caveat. Although cohesive devices may also operate within the clause, grammar is the main locus of linking elements inside the clause. Consequently, cohesion obtains mainly between clauses. This study does not inspect the cohesiveness of each individual occurrence of the pronouns and conjunctions retrieved from the corpus query. The reported results therefore have to be seen as providing a first indication of certain register characteristics only.

3.3 By all means: Measures for the comparison of registers and varieties

The approach of studying language variation both from a functional as well as a regional point of view requires several successive steps in the analysis. A search for register features in only one variety would lead to narrow results and not fully take into account functional and regional aspects as separate, yet related aspects of language variation. We therefore first examine variation between registers, before focusing on the variation between varieties and ultimately combining individual results. All frequencies are given as percentages relative to the overall number of tokens per file.

In a first step described in detail in chapter 4.1, the three cohesive devices relevant for this study are examined within each register across all six varieties against the grand mean, i.e. the arithmetic mean of the relative frequencies of the respective feature across varieties and registers (Neumann 2012: 84). This grand mean functions as a reference value for the distribution of features across all


varieties and registers, thus describing the respective feature without any restric-tions. It is a necessary benchmark in order to put the register- and variety-specific results into perspective. We thus give the magnitude of difference between the grand mean for the respective feature and the specific value for one register in one variety by subtracting the specific value from the grand mean across varieties and registers.

In contrast to this register-oriented characterisation, a second cycle analy-ses the cohesive devices based on their occurrence within each variety, again as the magnitude of difference from the grand mean, but now disregarding regis-ter. Neumann (cf. 2012: 84) describes the related variety mean, i.e. the arithmetic mean of the relative frequency of a feature within a variety across all registers. This study compares other types of descriptive statistics represented by boxplots (cf. Section 4.2).

The major purpose of the final combinatory step is to compare the range of variation that register features display in a variety, thus allowing conclusions about register-specific characteristics. The cohesive features of pronominal ref-erence, conjunctions and lexical density will be analysed in terms of the range of variation across registers within one variety. These range values for the three cohesion indicators are then processed to obtain the mean range of variation, showing the overall degree of register variation in the variety.

As stated in Section 3.1 above, it is impossible to calculate occurrences per text given the structure of the components of the International Corpus of English used for this study. As a consequence, we cannot subject the data set to meaning-ful mean-based inferential statistics, even though this would allow us to examine the interaction between register and variety statistically. In Section 4, all results will therefore be reported in the form of descriptive statistics.

4 Analysis

4.1 Comparison of the registers

The variation in our corpus which was found on the basis of register showed results that were, for the most part, in line with what has been found about spoken and written language previously (e.g. Biber et al. 1999). This section will look at the results for the individual linguistic features, namely pronouns, conjunctions and lexical density, in more detail and examine their distribution within registers.

The relative frequency of pronouns given as the difference from the grand mean (cf. Section 3.3 for the calculation of the values) in Table 1 is clearly higher


in the spoken registers of broadcast discussions and conversations. In these two categories, the values throughout all varieties are above the grand mean, while the written registers of academic writing, administrative writing and timed exams generally display values below the grand mean, with the sole exception of timed exams in Canadian English.

Table 1: Personal pronouns per tokens presented as the difference from the grand mean in percentage points

Pronouns across varieties and registers (grand mean): 5.59 %

Academic Writing

Administrative Writing

Broadcast Discussions

Conversations Exams

Canada –4.80 –2.93 4.01 8.70 0.47Hong Kong –4.87 –4.27 2.97 6.56 –2.13India –4.94 –4.06 1.51 5.98 –2.29Jamaica –5.21 –4.64 4.63 8.47 –2.64New Zealand –4.89 –2.98 1.96 5.15 –2.33Singapore –4.44 –3.50 4.37 8.92 –2.78

These values, always in comparison to the grand mean, are not unexpected. They do, however, give a first indication of the differences between the registers when compared across the varieties. Although pronouns are frequent in spoken regis-ters in general, Canadian English, Jamaican English and Singapore English stand out in both broadcast discussions and conversations as containing considerably more instances of pronouns than the other three varieties.

Although these variety-based differences are less distinct in the written cat-egories, here, too, Jamaican English stands out as deviating most from the grand mean in all but the timed exams, where Singapore English displays a slightly clearer divergence from the mean. Canadian English, on the other hand, demon-strates less distinctive pronoun frequency in the written registers and stands out as the only variety to contain more pronouns in timed exams than indicated by the grand mean.4

4 This particularity should be treated with caution. The texts in this category in the Canadian ICE component show some striking similarities in topic, which suggests a glitch in the corpus compilation.


Figure 1: Boxplot5 of the percentage of personal pronouns per tokens for each variety in the five registers

The boxplots in Figure 1 show striking register differences, with academic writing and administrative writing having a low frequency and broadcast discussions and conversations displaying high frequencies. Exams display slightly higher frequencies than the other written registers. Interestingly, the range of variation across varieties is higher in the spoken registers with negligible variation in aca-demic writing. Writers in this register seem to align to some common convention in the use of pronouns (cf. Section 4.4).

Similarly, interesting observations can be made when looking at the registers separately. As shown by the diverging extension of the boxes for each text cate-gory, which indicates the range of variation of the middle 50 % of all observations, there is much less variation in academic writing and exams across varieties than in the other text categories (with the mentioned outlier of Canadian English). The other three registers show a broader range of the usage of this linguistic feature, most strikingly so the spoken fields of broadcast discussions and conversations. In the first case, the difference from the grand mean varies between 1.51 in India and 4.63 in Jamaica, while in the latter the range is from 5.15 in New Zealand to 8.92 in Singapore.

5 Boxplots contain information on the smallest and largest observation in a category (by the whiskers), the interquartile range, i.e. the middle 50 % of all observations (by the box) and the median, i.e. the value separating the higher half of the observations from the lower half (by the line in the box). Outliers, i.e. observations clearly distant from the other observations, are plotted as points.


Figure 2 containing the boxplot of lexical density shows an almost completely complementary picture to Figure 1 with only small differences in the range of variation. Again, timed exams are situated between academic writing and admin-istrative writing, at the one end, and broadcast discussions and conversations at the other end.

Figure 2: Boxplot of the lexical density for each variety in the five registers

Like the use of pronouns, the values for lexical density displayed in the texts meet the expectations depending on their register, as can be seen in Table 2.

Table 2: Lexical density for each register presented as the difference from the grand mean in percentage points

Lexical density across varieties and registers (grand mean): 51.00 %

Academic Writing



Conversations Exams

Canada 7.45 5.30 –6.90 –9.43 –1.27Hong Kong 7.36 3.58 –5.01 –8.34 2.23India 7.37 5.90 –3.96 –8.26 6.43Jamaica 7.23 6.57 –6.84 –9.69 2.72New Zealand 6.01 4.87 –5.45 –9.74 1.19Singapore 6.41 6.03 –5.45 –8.43 2.33

All varieties show the highest degree of lexical density in the field of academic writing, and there is no clear variation within the register. Administrative writing, too, yields only values above the grand mean. In this register, however, Hong


Kong English stands out as having a rather low lexical density, especially in com-parison to the higher value in Jamaican English. The difference between the two varieties amounts to 2.99 percentage points.

In contrast to academic writing and administrative writing, the last written category, which contains texts from timed exams, appears surprisingly diverse. Here, Hong Kong English, Jamaican English and Singapore English are closest, ranging only between 2.23 and 2.72 percentage points. New Zealand English shows a less distinct lexical density than these three. In contrast, Indian English shows a value of 6.43, making it the only variety to nearly reach the degree of lexical density it displays in the register of academic writing and surpassing that of administrative writing. Indian English shows the highest average of lexical density in written language, as opposed to Hong Kong English with the lowest.

The spoken registers, on the other hand, exclusively present degrees of lexical density below the grand mean. The register of conversations yields the lowest values for lexical density in all varieties, with Canadian English, Jamai-can English and New Zealand English with nearly identical values at the high end of the range, and Hong Kong English, Indian English and Singapore English grouped around a lower range value. In total, however, the range between the two most distant components is no more than 1.48. In comparison, the register of broadcast discussions shows slightly more internal variation. Here, Canadian English and Jamaican English are furthest below the grand mean with values of 6.90 and 6.84, and Hong Kong English, New Zealand English and Singapore English cluster between 5.01 and 5.45, while Indian English again stands out with a rather high lexical density (3.96). This variety seems to rely fairly strongly on lexical means to create cohesion.

While the analyses of the usage of pronouns and lexical density proved to comply with earlier studies of spoken and written registers (cf. Biber et al. 1999: 65, 333–334), the frequency of conjunctions, the last linguistic feature in our research, provides many unexpected values for the six varieties. Neither spoken nor written registers appear consistent in displaying values above or below the grand mean, with the sole exception of the category of broadcast discussions, which, however, still contains a considerable amount of intra-register variation, as Table 3 shows.


Table 3: Conjunctions per tokens presented as the difference from the grand mean in percentage points

Conjunctions across varieties and registers (grand mean): 6.05 %

Academic Writing



Conversations Exams

Canada 0.09 –0.14 0.72 0.45 0.76Hong Kong –0.55 –0.98 0.43 0.48 –0.72India –0.64 –0.39 0.69 0.18 –1.15Jamaica –0.84 –0.63 1.21 1.01 0.07New Zealand –0.20 0.76 0.53 0.60 0.23Singapore –0.44 –1.51 0.91 –0.72 –0.02

Like the use of pronouns and the degree of lexical density, the use of conjunc-tions in the spoken registers is particularly pronounced in Jamaican English. But while other varieties, mainly Canadian English and Singapore English, show an almost identical distinction for the first two features, Jamaican English stands out regarding conjunctions.

Canadian, Hong Kong, Indian and New Zealand English do not show any extraordinary values in the spoken registers, varying only slightly between broadcast discussions and conversations and displaying values in both catego-ries above the grand mean. Singapore English, in contrast, is the only variety that yields a frequency of conjunctions clearly below the grand mean in the register of conversations. This makes broadcast discussions the only one of the five reg-isters that complies with what is usually observed in spoken language, namely an above-average use of conjunctions in comparison to the grand mean and especially to written texts. Given the setting of broadcast discussions involving several speakers who all contribute to a particular topic, the register is likely to have some argumentative character. The frequent use of conjunctions appears particularly suitable for linking the arguments across clauses.

The written registers in this study each display exceptions, too. In academic writing, Jamaican English is again the variety which deviates furthest from the grand mean, but Canadian English is the only example of a variety using more conjunctions than on average, if only marginally so. In administrative writing, it is New Zealand English that deviates clearly; the use of conjunctions in this variety may reflect some argumentative style in presenting the administrative contents. In contrast, Singapore English shows the strongest tendency towards the written medium by using clearly fewer conjunctions in comparison to the grand mean. The most peculiar values, however, can be found in the register of timed exams. Indian English and, with quite some distance, Hong Kong English,


rely much less on conjunctions, both showing a frequency of this cohesive device below the average given by the grand mean. Singapore English almost equals the grand mean. New Zealand English, Jamaican English and Canadian English, on the other hand, contain more conjunctions in exams than the average. Although this makes timed exams the most diverse register, it has to be kept in mind that exams are written by students. Depending on the age and educational degree of the examinees as well as the topic of the exam, their styles of writing will thus differ from each other due to factors other than their language variety alone.

Figure 3: Boxplot of the percentage of conjunctions per tokens for each variety in the five registers

While the boxplots for pronouns and lexical density (Figure 1 and Figure 2) display clear differences between the registers, the differences are much less distinctive for conjunctions (see Figure 3). Academic writing and administrative writing have an almost identical median in terms of the relative frequency, i.e. not in comparison to the grand mean. Only the range of variation across varieties is larger in administrative writing. The other three registers display higher medians.

4.2 Summing up varieties

The tables and figures depicted above show values per register and all six vari-eties within them, this section sums up the register variation which the varie-ties display for every one of the linguistic features. So far, every variety rendered five values, one per register, for the use of pronouns, conjunctions and lexical density. Every one of these values determines the register-specific distinction of


this feature by comparing it to the grand mean. The distribution of each linguistic feature within a variety will be examined with the help of boxplots displaying the descriptive statistics for each variety for a linguistic feature across text categories.

As can be seen in Figure 4, Jamaican English renders the highest range of variation in terms of relative pronoun frequency, closely followed by Singapo-rean and Canadian English. The latter is clearly distinct from the other varieties because it displays the highest median, whereas all other varieties have an almost identical median. New Zealand and Indian English show the smallest range for the use of pronouns.

Figure 4: Percentage of pronouns per tokens represented as variation of text categories per variety

The picture looks a little more diverse for lexical density of the different varie-ties (as shown in Figure 5). Here, Canadian English and again Jamaican English display the highest overall range, yet their medians differ greatly, showing that Canadian English, as it did for pronouns, stands out, this time with the lowest median. In terms of range of variation, Jamaican English shows considerable variation across text categories as visible from the widest interquartile range.6 Indian English has the highest median of all varieties, reflecting what was said from the register perspective in Section 4.1. Hong Kong, Jamaican and Singapo-rean English are almost identical in terms of the median, whereas New Zealand English displays a slightly lower median in comparison with these three varieties. This suggests a tentative interpretation of a similarity between the two L1 varie-

6 The interquartile range visualises the middle 50 % of all observations.


ties, as Canadian English and New Zealand English are the two varieties with the lowest median value for lexical density. Apparently these two L1 varieties have a slightly reduced tendency to draw on lexical means to create cohesion in the registers under investigation.

Figure 5: Lexical density represented as variation of text categories per variety

The most diverse results were yielded for conjunctions (Figure 6). This is hardly surprising, as the closer analysis in the previous section already pointed in this direction; yet an overview over the varieties makes this even more obvious. The two L1 varieties Canadian and New Zealand English are again clearly similar: both display narrow ranges here and the median is rather high with 6.5 (CAN) and 6.6 (NZ). Similar to the findings for lexical density, Jamaican English again has the highest range for frequency of conjunctions, in particular when consid-ering the interquartile range. Its median is relatively high, at least in comparison to the other L2 varieties. Compared to the other L2 varieties, Indian English has a relatively small interquartile range. Singaporean English, despite most values being clustered around a median of 5.7 %, contains two outliers with considera-bly low as well as high deviations from the median. The two East-Asian varieties (Hong Kong and Singapore English) are similar in displaying the lowest median for conjunctions.


Figure 6: Percentage of conjunctions per tokens represented as variation of text categories per variety

4.3 Synopsis across indicators

The last step to reach a value that represents the register variation within a variety is the combination of the ranges obtained for the individual linguistic features described in the previous section. The absolute range from the highest to the lowest observation across text categories is calculated for every feature, and the three values are added up and their mean obtained, as shown in Table 4.

Table 4: Ranges of variation – Cohesion

Canada Hong Kong

India Jamaica New Zealand

Singapore

Pronouns 13.51 11.42 10.92 13.68 10.04 13.36Lexical density 16.88 15.69 15.93 16.92 15.75 14.84Conjunctions 0.91 1.46 1.84 2.05 0.96 2.42Sum of ranges 31.30 28.57 28.39 32.65 26.76 30.92Mean of ranges 10.43 9.52 9.46 10.88 8.92 10.21

Since this study only analysed three linguistic features, observations in the varie-ties do not differ much from each other, yet some tendencies can be observed that allow some conclusions. As could already be deduced from the previous discus-sion, Jamaican English displays the highest register variation for the three indica-tors as represented by the sum and mean of ranges, if only slightly so. Canadian


and Singapore English are almost even and close to Jamaican English; all three display a reasonable amount of variation in their use of cohesive devices, which can be traced back mainly to distinct deviations in the spoken registers. Indian and Hong Kong English, on the other hand, show less variation. The different registers, representing spoken and written language, are therefore less distant from each other in terms of cohesive devices, a pattern which, when looking back at the more detailed results, originates both from the spoken and written parts of the analysis. New Zealand English displays the least variation for the three indi-cators across registers.

4.4 Discussion

The analysis in the previous sections gives insight into the distribution and usage of cohesive devices from different perspectives, variation across registers as well as regional variation. While the combination of these points of view makes the analyses comparable, the explanatory power of the three features alone is limited: this methodology only reaches its full potential when applied to a broader range of features and registers.

When looking at registers, there is a not very surprising notable difference between the spoken and written registers. Lexical density and the frequency of pronouns behave complementarily: whenever a register across all varieties displays a low value for one feature, it will invariably display a high value for the other feature. This confirms the findings from the literature mentioned in Section 3. The frequency of conjunctions also confirms the distribution described for other varieties (cf. Biber et al. 1999: 81); the differences between registers are, however, much less pronounced than for the other two features. The registers can be organised along a scale of orality with conversations and broadcast discus-sions at one end and academic writing and administrative writing at the other end. Interestingly, timed exams always take up a middle position: apparently, while written, they still have a clear influence of the spoken medium as far as cohesive devices are concerned, which might be traced back to the incomplete development of a register repertoire of the students sitting the exams.

However, within these two major groups we can also find particularities and variation. The register of conversations displays more extreme deviations from the grand mean than that of broadcast discussions, which might be due to the fact that broadcast talk is more formal in terms of social distance (cf. below) and often prepared to a considerable degree in terms of medium. And, of course, it is public. By contrast, conversations are as spontaneous as can be. When looking at pronouns, especially, the difference between these two registers is clear and


reflects their respective functions. While conversations might vary with respect to their goal, the number and identity of the participants can be assumed to be rather stable. Therefore, using pronominal reference poses no problem. Radio audiences, however, are subject to substantial fluctuation, and pronominal ref-erence might be lost on some listeners if they enter the program at random times during its course, making it advisable for radio moderators to avoid or at least min-imise the usage of this feature. Furthermore, the purely phonic channel in which the radio broadcasts are transmitted requires more reliance on auto- instead of syn-semantic reference. These functional assumptions are also reflected in the lexical density, which is higher in broadcast discussions than in conversations. While the latter contain a considerable amount of function words (not least of which are pronouns) and are also described as featuring more intricate sen-tence structures (cf. Halliday 2001), broadcast information has to be designed to be understood easily while at the same time not taking too long in order not to strain the listeners’ attention span. The picture of conversations and broadcast discussions showing similar tendencies but displaying some striking differences has already been shown in the study of social distance on the same data set (cf. Neumann 2012).7 While both registers showed an above average use of contrac-tions and interjections, which can certainly be considered typical for spoken dis-course in general, the less spontaneous and most of all more anonymous register of broadcast talk showed a higher use of titles, especially in the L2 varieties of Indian, Jamaican, Singaporean and Hong Kong English. This certainly is a means of creating or maintaining a distance that is rare in conversations and indicates the fact that in broadcast discussions, the participants might not know each other very well or use the title as a piece of information for the radio audience.

In contrast to the spoken registers, the written categories of academic writing, administrative writing and exams show less variation. The former two are very close in their use of cohesive devices, which might be traced back to their rather high degree of standardisation and norms. Administrative writing, especially, can be assumed to follow strict guidelines or even use pre-fabricated forms or text blocks depending on the topic, which arguably draw more on lexical means than on pronominal reference. Lexical density is above average, which relates to Neu-mann’s (2012) finding of the register being content-oriented and rather neutral in social distance. Academic writing even shows slightly more extreme tendencies regarding cohesive devices; the reasons for this, however, are surely very differ-

7 The numbers discussed in this paper are updated from those reported in Neumann (2012). Nev-ertheless, all tendencies reported there remain unchanged and consequently the interpretation is also maintained.


ent. The register shows very little internal variation when separated according to the regional variety it comes from, which hints at the international character of the research community. Furthermore, the texts for this study were taken from the same general thematic area, namely natural sciences, and thus render the sample even more homogeneous. This observation is confirmed in the study of social dis-tance, where academic writing stands out as the most content-oriented register.

In contrast to this register-based perspective, differences between vari-eties are less clear-cut. Although there are many small divergences among the six regional varieties, no patterns or groupings emerge that would allow strong claims about patterns of the varieties; rather, individual trends stand out which hint at particularities in some of the varieties. The most striking of these can be found in Jamaican English, which displays extreme values for most features as well as most registers. While other varieties show peaks in the usage of one feature or in single registers, cohesive devices are notably high or low through-out most categories in Jamaican English. Singaporean English, too, is set apart in some aspects, most obviously regarding the use of conjunctions. Especially in administrative writing, conjunctions are by far rarer than in the other varieties, and it is the only variety which displays a below-average use of conjunctions in conversations.

Even though there is much less variation across varieties than across regis-ters – an observation which can hardly be surprising given that we are looking at varieties of English in contexts where a sufficient amount of functional variation is required –, some interesting observations can be made when comparing the boxplots displaying the register variation within each variety. Canadian English diverges from the other varieties in the median for all three indicators. The other L1 variety, New Zealand English, behaves similarly for lexical density and con-junctions. The three L2 varieties display more variation, so some tendency of the L1 varieties to display more homogeneous patterns than the L2 varieties seems to emerge. Reasons for this could be found in the exonormative characters of these L2 varieties; the fact that the standard towards which the language is oriented originates in a very different part of the world makes an adaptation to a certain degree inevitable. The language is made to fit the needs of the new speech com-munity with regard to societal, regional and geographical contexts, which can be expected to be mirrored in the registers in use. Furthermore, as part of this adaption, L2 varieties often come into more contact with other languages and might thus display linguistic characteristics originating from these interferences. More generally, one might speculate that this coarse patterning into L1 and L2 varieties reflects exactly this: the status of the respective types of varieties with a long history of (transplanted) mother tongue speakers in the case of Canadian and New Zealand English on the one hand and indigenized non-native varieties


with reduced exposure to native English on the other hand. At the same time, the two L1 varieties might also betray more interaction with a standard variety. However, only a multivariate study of the type reported by Szmrecsanyi and Kort-mann (2009), one which is based entirely on corpus findings rather than intro-spective data, can tell whether these assumptions hold true across a wide range of indicators and varieties.

5 ConclusionThe analysis presented in this paper aimed at determining the degrees of cohe-sion within a variety, based on the particularities that are displayed by differ-ent registers. By examining the distribution of cohesive devices as indicators of medium, the distinctiveness of individual registers can be observed and com-pared across different regional varieties of English. In order to obtain a broader and more representative overview of the register variations within different vari-eties of English, more linguistic features representing other register parameters have to be analysed. At the same time, more varieties would ensure a more even coverage of the English language, which would be a benefit particularly for the calculation of the comparative value of the grand mean – not only would it then represent more varieties and thus become ever more general or ‘grand’, but every new variety taken into the framework of the study would automatically be drawn on for comparisons by being included in this value.

Similar thoughts of course hold true also for the inclusion of more registers. Both for spoken and written language, the ICE components hold many more files than those of the five registers analysed here, and an augmentation of the data in this way would allow more universal statements about spoken and written texts and their differences. This distinction, then, apart from insights into individual registers, would show most clearly which functions a variety mainly serves in a community by laying open whether written or spoken registers have devel-oped a more distinct character. The present study also showed, however, that despite the usefulness of the investigation of the interaction between register and variety, the International Corpus of English, independent of its undisputed value for other types of varieties-related research questions, may not be the best data set to investigate this interaction. The recently compiled GloWbE Corpus (Davies 2013), a collection of English web texts from 20 countries, cannot be used as an alternative because it does not provide the information needed to distinguish reg-isters. Currently, research is under way (Fest, forthcoming) that will afford a more


detailed and statistically more robust analysis of the interaction between register and variety in English based on a corpus compiled for this specific purpose.

The combination of functional and regional variation thus still leaves a wide field to be explored and many questions to be answered. Even with the limited varieties and registers analysed so far, however, it becomes apparent that reg-isters function as a very suitable gateway to understanding and describing the development and status of a variety, determining its particularities and putting it into perspective among Englishes worldwide at the same time.

ReferencesBerruto, Gaetano. 2004. Sprachvarietät – Sprache (Gesamtsprache, Historische Sprache).

Linguistic Variety – Language (Whole Language, Historical Language). In Ulrich Ammon, Norbert Dittmar, Klaus J. Mattheier & Peter Trudgill (eds.), Sociolinguistics/Soziolinguistik. An International Handbook of the Science of Language and Society/Ein Internationales Handbuch zur Wissenschaft von Sprache und Gesellschaft, 188–195. Berlin, New York: de Gruyter.

Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: CUP.

Biber, Douglas, Geoffrey Leech, Stig Johansson, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Longman.

Davies, Mark. 2013. Corpus of Global Web-Based English: 1.9 billion words from speakers in 20 countries. Available online at http://corpus.byu.edu/glowbe/

Diwersy, Sascha, Stefan Evert & Stella Neumann. 2014. A weakly supervised multivariate approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech, 174–204. Berlin: de Gruyter.

Fest, Jennifer. Forthcoming. “News language in varieties of English: A corpus-based analysis of newspaper reports.” PhD Thesis, Department of English, American and Romance Studies, RWTH Aachen University.

Gregory, Michael. 1967. Aspects of varieties differentiation. Journal of Linguistics 3(2). 177–198.Halliday, Michael A. K. 1978. Language as Social Semiotic: The Social Interpretation of

Language and Meaning. London: Arnold.Halliday, Michael A. K. 2001. Literacy and linguistics: Relationships between spoken and

written language. In Anne Burns & Caroline Coffin (eds.), Analysing English in a global context, 181–193. London: Routledge.

Halliday, Michael A. K. & Christian M. I. M. Matthiessen. 2013. Halliday’s introduction to functional grammar. 4th ed, rev. Abingdon: Routledge.

Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.Halliday, Michael A. K. & Ruqaiya Hasan. 1989. Language, context, and text: Aspects of

language in a social-semiotic perspective. Oxford: OUP.


Halliday, Michael A. K. & Zoe L. James. 1993. A quantitative study of polarity and primary tense in the English finite clause. In John McHardy Sinclair (ed.), Techniques of description: Spoken and written discourse, 3–35. London: Routledge.

Kortmann, Bernd & Benedikt Szmrecsanyi. 2004. Global synopsis: Morphological and syntactic variation in English. In Bernd Kortmann, Edgar W. Schneider, Kate Burridge, Rajend Mesthrie & Clive Upton (eds.), A handbook of varieties of English, 1142–1202. Berlin: Mouton de Gruyter.

Kortmann, Bernd & Kerstin Lunkenheimer (eds.). 2013. eWAVE – The electronic world atlas of varieties of English. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://ewave-atlas.org/. (Accessed on 2014-03-08.)

Kristiansen, Gitte & René Dirven. 2008. Cognitive sociolinguistics: Language variation, cultural models, social systems. Berlin: de Gruyter.

Matthiessen, Christian M. I. M. 1993. Register in the round: Diversity in a unified theory of register analysis. In Mohsen Ghadessy (ed.), Register analysis: Theory and practice, 221–292. London: Pinter Publishers.

Mollin, Sandra. 2007. New variety or learner English? Criteria for variety status and the case of Euro-English. English World-Wide 28(2). 167–185.

Nelson, Gerald. 2006. The core and periphery of World Englishes: A corpus-based exploration. World Englishes 25(1). 115–129.

Nelson, Gerald, Sean Wallis & Bas Aarts. 2002. Exploring natural language: Working with the British component of the International Corpus of English. Amsterdam: Benjamins.

Nesbitt, Christopher & Günter Plum. 1988. Probabilities in a systemic grammar: The clause complex in English. In Robin P. Fawcett & David J. Young (eds.), New Developments in Systemic Linguistics, 6–33. London: Pinter Publishers.

Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik & Benjamin Kohlmann (eds.), Anglistentag 2011 Freiburg: Proceedings, 75–94. Trier: WVT.

Neumann, Stella. 2013. Contrastive register variation. A quantitative approach to the comparison of English and German. Berlin: de Gruyter Mouton.

Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment. Website. Lancaster. http://ucrel.lancs.ac.uk/wmatrix/. (Accessed on 2014-03-09.)

Sand, Andrea. 2004. Shared morpho-syntactic features of contact varieties: Article use. World Englishes 23(2). 281–298.

Sand, Andrea. 2008. Angloversals? Concord and interrogatives in contact varieties of English. In Terttu Nevalainen, Irma Taavitsainen, Päivi Pahta & Minna Korhonen (eds.), The dynamics of linguistic variation: Corpus evidence on English past and present, 183–202. Amsterdam: Benjamins.

Szmrecsanyi, Benedikt & Bernd Kortmann. 2009. The morphosyntax of varieties of English worldwide: A quantitative perspective. Lingua 119(11). 1643–1663.

Van Rooy, Bertus, Lize Terblanche, Christoph Haase & Joseph Schmied. 2010. Register differentiation in East African English: A multidimensional study. English World-Wide 31(3). 311–349.

Wong, Deanna, Steve Cassidy & Pam Peters. 2011. Updating the ICE annotation system: Tagging, parsing and validation. Corpora 6(2). 115–144.

Xiao, Richard. 2009. Multidimensional analysis and the study of World Englishes. World Englishes 28(4). 421–450.

Section III: Regional, contrastive and diachronic register variation

The final section of the present volume broadens the analytical perspective in order to include further issues that need to be addressed in a comprehensive dis-cussion of variational text linguistics. While Section I gave a detailed analysis of selected registers and Section II provided a juxtaposition of registers, Section III offers a synchronic investigation of regional and contrastive register variation as well as a diachronic study. The contributions will show that these different approaches are by no means mutually exclusive but represent different facets of one common research paradigm.

Both Barbara Güldenring’s paper “Metaphors in New English academic writing” and Steffen Schaub’s contribution “The influence of register on noun phrase complexity in varieties of English” deal with international varieties of global English on the basis of the International Corpus of English, each focusing on one particular linguistic category. In this way, Güldenring and Schaub con-tinue Neumann and Fest’s comparative approach that concluded Section II, although they place more emphasis on variational and sociolinguistic aspects. While Güldenring discusses the semantic phenomenon of metaphor, Schaub concentrates on the syntactic structure of the noun phrase. Güldenring deals with English as a Second Language exclusively in Asia (India, Hong Kong and Singapore), whereas Schaub includes Englishes in Asia (India, Hong Kong and Singapore), the Caribbean (Jamaica) and North America (Canada), covering first- and second-language use. Since it is not feasible to compare all these regional varieties per se, Güldenring focuses on academic discourse, as previous research has shown that this register is particularly rich in metaphor. In particular, she compares metaphors in academic writing from New Englishes with more tradi-tional varieties of English and examines the occurrence of metaphorical domains in the sub-registers Humanities, Natural Science and Social Science with the help of Conceptual Metaphor Theory. By contrast, Schaub takes into account not only academic writing but also the registers of conversation, unscripted speeches and social letters. He argues that an investigation of noun phrase complexity based on modification types sheds new light on both internal register consistency and regional variability, especially with respect to the situational features of commu-nicative purpose and production circumstances. Both contributions demonstrate that such multivariate approaches open manifold new research possibilities with each potential parameter shift.

222 Section III: Regional, contrastive and diachronic register variation

In contrast to the articles in Section II, Valentin Werner’s paper “Real-time online text commentaries: A cross-cultural perspective” does not compare differ-ent registers but studies one particular register across the two linguacultures of German and British English. In this way, Werner’s contribution transcends the monolingual English viewpoint and relates English to an adjacent language in the Germanic family tree. Since it discusses computer-mediated communication, the paper links up with Biber and Egbert’s study of web registers at the beginning of Section I. The medium-dependent register of online text commentaries is further narrowed down by concentrating on the subject matter of sports, with data drawn from the online versions of widely read British and German newspapers. Having established online text commentaries as an emergent register with specific lin-guistic features, the paper highlights quantitative tendencies of cross-cultural diversification on the basis of communicative intentions and the respective target group of internet users. Hence, apart from purely linguistic deviations, the results of the study also have significant repercussions on cross-cultural differences.

The volume closes with Javier Pérez-Guerra’s article “Word order is in order here: A diachronic register analysis of syntactic markedness in English”, which discusses grammatical developments of word order in written registers from Middle English to Late Modern English. Thus, the paper stands out not only by virtue of its diachronic approach but also because of its distinct focus on the three syntactic constructions of left dislocation, topicalisation and extraposition, which are here considered as register markers. By concentrating on these fea-tures, the paper is able to give a comparative account of an exceptionally large number of registers, including handbooks, law, philosophy, science, trials, trav-elogues, romance, fiction, diaries, drama, history, letters, biography, education, religious treatises and sermons. It is thus demonstrated that some constructions have undergone a significant change in usage, such as left dislocation, which used to be a feature of literate registers but today is a common marker of conver-sation. Thus, with the help of historical corpora such as the Penn-Helsinki Parsed Corpus of Early Modern English, it is possible to examine the historical background of present-day English registers. The volume comes full circle when Pérez-Guerra finally mentions the hybridity of historical registers, thereby creating a link to hybrid web registers, as postulated by Biber and Egbert in their opening chapter.

Barbara GüldenringMetaphors in New English academic writing

Abstract: In recent decades, heightened academic interest in World Englishes has led to a growing body of research surrounding institutionalised second-language varieties of English, often referred to as New Englishes. This paper aims at con-tributing to this research by exploring metaphorical variation in New English aca-demic writing as represented by three components of the International Corpus of English (ICE), namely India, Hong Kong and Singapore. It asks two major questions: What kinds of metaphor occur across New English academic texts? and How are these metaphors distributed across the academic sub-registers of Humanities, Natural Science and Social Science?. While generally suggesting that metaphor can be viewed as a characteristic feature of academic writing, this study considers metaphors that are ubiquitous to all varieties and academic dis-ciplines under investigation as well as potentially variety- or discipline-specific conceptualisations, which leads into a brief discussion about metaphorical func-tion in New English academic writing.

1 IntroductionThe field of World Englishes, including research devoted to the study of New Englishes1, has grown into a prominent linguistic discipline within the last thirty years. In addition to this body of research, an increasing number of studies devoted to metaphor in authentic discourse have been exploring the relation-ship between metaphor and register (Goatly 1997; Cameron 2003; Skorczynska and Deignan 2006; Semino 2008; Steen et al. 2010; Semino et al. 2013). Steen

1 While “World Englishes” in reference to the academic discipline usually refers to the study of any kind of English variety worldwide, “New Englishes”, a term attributed to Platt, Weber, and Ho (1984), denotes those varieties that have grown in direct consequence of English’s spread around the world (prominently via British colonialism) and, thus, have developed as nativised varieties in areas, in which English was not traditionally the native language of the population, fulfilling various (institutionalised) societal functions. In a descriptive sense New Englishes are often characterised by variation on all linguistic levels due to substrate influence.

Barbara Güldenring, Philipps-Universität Marburg

224 Barbara Güldenring

et al. (2010: 203) have found that the academic register “is characterized by the highest proportion of metaphor-related words” of the registers they investigated, including news, fiction and conversation. Corroborating this finding, Krennmayr (2011: 322) concludes that “news texts contain a larger proportion of metaphor-ically used words than fiction and conversation but a smaller proportion than academic texts”. This not only indicates a significant metaphoricity of academic texts vis-à-vis other registers, but it also marks a departure from older views on the value of metaphor in the academic realm:

To draw attention to a philosopher’s metaphors is to belittle him – like praising a logician for his beautiful handwriting. Addiction to metaphor is held to be illicit, on the principle that whereof one can speak only metaphorically, thereof one ought not to speak at all. (Black 1954: 273, also qtd. in Römer 2000: 353).

Nevertheless, Black (1954: 294) comes to the conclusion that there is “[n]o doubt metaphors are dangerous and perhaps especially so in philosophy. But prohibition against their use would be a wilful and harmful restriction upon our powers of inquiry”. Especially this last statement comes across as a reluctant admittance of the inescapable presence of metaphor, particularly in academic texts. Nowadays, after decades of metaphor research, most prominently in the vein of Conceptual Metaphor Theory (henceforth CMT), a negative stance, such as the one described above, has been largely dispelled. This is due to the observations that academic texts display a multitude of metaphors that are closely connected to the phenom-ena they are used to describe (cf. Römer 2000: 353) and that metaphors, in fact, constitute a large part of expert (as well as everyday) discourse (cf. Jäkel 1997: 284). Furthermore, this understanding of metaphor has led to important insights about the function of metaphor in academic discourse, including its role in the acquisition as well as imparting of knowledge (cf. Drewer 2003, Cameron 2003).

The present pilot study aims at contributing to the growing understanding of the nature of metaphor and register by introducing a varietal perspective on metaphor and register. It also aims at contributing to the cognitive approach to World Englishes, which has only recently developed from the merging of these two previously isolated paradigms (cf. Wolf and Polzenhagen 2009: 1). With these particular goals in mind, I am primarily concerned with the following questions concerning New English metaphor and academic writing: 1) What kinds of metaphor occur across New English academic texts in the first

place?2) In view of their functional contributions, how are these metaphors distrib-

uted across the academic sub-registers of Humanities, Natural Science and Social Science?

Metaphors in New English academic writing 225

The first question will be largely devoted to issues pertaining to metaphor distri-bution on the basis of several conceptual mappings for the target domain concept IDEAS in New English academic texts. In addition to this, I will briefly consider to what extent the varieties under investigation differ in terms of the various entail-ments or elaborations apparent in shared mappings. Addressing the second ques-tion will involve scrutiny of the distribution of IDEAS metaphors according to the academic sub-registers and, consequently, discussion of their functional roles. However, before delving into these issues, the following outlines the theoretical construct and analytic framework to which the present study adheres.

2 Theoretical Background In their immensely influential book Metaphors We Live By, Lakoff and Johnson (2003 [1980]: 3) present a theory of metaphor, widely known as CMT (Conceptual Metaphor Theory): “metaphor is pervasive in everyday life, not just in language but in thought and action. Our ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature”. They continue by asserting that “[t]he essence of metaphor is understanding and experiencing one kind of thing in terms of another” (Lakoff and Johnson 2003: 5). Thus, one impor-tant claim of Conceptual Metaphor Theory is an experiential basis for metaphor, which can be clearly seen by how an abstract concept like IDEAS is understood in terms of more concrete concepts, with which we have more direct (bodily) expe-rience:

(1) a. IDEAS ARE FOODWhat he said left a bad taste in my mouth.

b. IDEAS ARE PEOPLEThe theory of relativity gave birth to an enormous number of ideas in physics.

c. IDEAS ARE PLANTSThat idea died on the vine.

d. IDEAS ARE PRODUCTSHe produces new ideas at an astounding rate.

e. IDEAS ARE COMMODITIES That idea just won’t sell.

f. IDEAS ARE RESOURCES Don’t waste your thoughts on small projects.


g. IDEAS ARE MONEYHe’s rich in ideas.

h. IDEAS ARE CUTTING INSTRUMENTSThat cuts right to the heart of the matter.

i. IDEAS ARE FASHIONS That idea went out of style years ago.

j. IDEAS ARE LIGHT-SOURCES That’s an insightful idea.

(Lakoff and Johnson 2003 [1980]: 46–48)

This should by no means be considered an exhaustive list of IDEAS metaphors, largely due to the fact that these examples were intuitively formulated and were, at the time, not corroborated with authentic data. However, they do make clear the relationship between the target domain (the abstract concept, e.g. IDEAS) and the source domain (the more concrete, structured concept, e.g. FOOD) as postulated by Conceptual Metaphor Theory. That is, our knowledge of the source domain serves to achieve better comprehension of the target domain of which our knowledge is less structured. Thus, a linguistic metaphor2 such as What he said left a bad taste in my mouth relates our experience with a sensory-bound dislike of certain food to the more abstract dislike of a certain idea.

In terms of academic discourse, Zichler (2010: 97–98) affirms the suitability of conceptual metaphor, because academic discourse often involves the investi-gation of phenomena that escape our direct experience; metaphors can be seen as an opportunity to reveal the unknown through the known, i.e. the abstract through the concrete. Nevertheless, Partington (1998: 107–108) criticises Lakoff

2 Conceptual metaphors can be distinguished from linguistic metaphors, metaphorically used words (following the terminology by Steen et al. 2010) or metaphorical linguistic expressions. Kövecses (2010: 4) makes this distinction and defines a conceptual metaphor as consisting of “two conceptual domains, in which one domain is understood in terms of another”, while defin-ing linguistic metaphor as “words or other linguistic expressions that come from the language or terminology of the more concrete conceptual domain”. By way of example, reconsider (1c) above: IDEAS ARE PLANTS. That idea died on the vine is the linguistic instantiation of the mapping between IDEAS and PLANTS. A conceptual metaphor makes clear that IDEAS are understood as behaving in a way similar to PLANTS in order to express something about IDEAS via this analogy, and the linguistic metaphor codifies this. Thus, died on the vine not only points to the existence of the IDEAS – PLANTS mapping, it also helps to efficiently communicate that some ideas are not fully developed and thus can be discarded, much like a dead plant.


and Johnson for not making room for genre in their theory, because “[c]ertain metaphors may well be much more prevalent in one kind of writing than another, in fact, one of the characterising features of a genre is probably the kind of meta-phor generally to be found therein.” By extending this argument to New English academic writing from the register perspective, the present study does suggest that metaphor can be viewed as a characteristic register feature in the sense of Biber and Conrad (2009):

The register perspective combines an analysis of linguistic characteristics that are common in a text variety with the analysis of the situation of use of the variety. The underlying assumption of the register perspective is that core linguistic features like pronouns and verbs are functional, and, as a result, particular features are commonly used in association with the communicative purposes and situational context of the texts.

With the aim of extending the notion of “core linguistic features”, this paper takes the position that metaphors can be viewed as features of the text with which a communicative function is linked.3 In terms of being “linguistic”, metaphors, due to their pervasiveness, are accessible and can be located by their relation to the lexico-grammatical, that is, linguistic realisations of underlying conceptual met-aphors. Lakoff and Johnson (2003: 7) maintain that “[s]ince metaphorical expres-sions in our language are tied to metaphorical concepts in a systematic way, we can use metaphorical linguistic expressions to study the nature of metaphorical concepts” and this has found application in various methods for metaphor iden-tification. Therefore, I will explore metaphors as core functional features that, while being conceptual in nature, provide evidence of their existence via the linguistic expressions that point to them. This, in turn, allows for a more flexi-ble notion of linguistic feature or characteristic in the register perspective. Simi-larly, Wolf and Polzenhagen (2009: 16–17) make the case for including metaphor in research on varieties of English, otherwise dominated by studies describing variation of the more traditional structural elements of language, i.e. phonology, morphology, and syntax:

From a CL perspective, the core of the descriptive approach advocates, however, a far too narrow understanding of ‘form’ and of what counts as ‘linguistic peculiarities.’ This narrow understanding deliberately excludes important dimensions of variation […] Unaddressed are also crucial aspects of relations between linguistic units, beyond standard structuralist formal and semantic parameters. Specifically, little or no attention is paid to the fact that linguistic material from various domains is systematically linked through metaphoric and metonymic mappings, which constitutes a key dimension of relatedness.

3 For a similar position, cf. Krennmayr (2011).


Thus, the present paper is devoted to the investigation of metaphor as a core feature of the academic register, as constructed by New English varieties. The fol-lowing sections provide some detail about the corpus data and serve to sketch out the method used to access metaphors via their linguistic realisations encountered in New English academic writing.

3 Corpus data The data in the present study stems from those components of the International Corpus of English (henceforth ICE) which are representative for the New English varieties associated with Hong Kong, India and Singapore. The subcategory of academic writing was explored for metaphors pertaining to the target domain IDEAS, since it was assumed that this domain would feature prominently in aca-demic writing of all kinds. True to the overall design of the ICE project, for each component under investigation, academic writing includes ten 2,000-word pub-lished texts covering the disciplines of Humanities, Natural Science and Social Science respectively.4

Nelson (1996: 32) provides a description of ICE academic writing in terms of intended readership and mode of composition:

Printed material is written for a large, unrestricted audience that the writer does not know. […] Academic writing reaches a smaller, more well-defined readership [as compared to newspapers, popular writing or fiction], but the exact individual readership is unknown to the writer at the time of composition. […] writers of printed works are usually required to follow the house style of the publisher […] for which they are writing. Printed material may have been edited by a number of different people, and the final version is often a product of several earlier revisions. […] Learned writing is produced by specialists for specialists. In the humanities, for example, it may include journal articles by academic historians written for other academic historians.

Integrating this into Biber and Conrad’s (2009) framework for the analysis of situ-ational characteristics of a register, New English academic writing, as represented by ICE, can be described as involving most commonly single or plural authors addressing an un-enumerated audience. The author-reader relationship is on a professional, specialist level that is characterised most significantly by shared knowledge. Furthermore, as Nelson (1996) points out above, ICE academic texts, because of their printed status, most likely entail highly revised and edited pro-

4 The ICE project also includes texts from the field of Technology.


duction circumstances, whereas the place of communication is public, but not in a shared setting. The communicative purposes are to describe, explain, summa-rise, report on research pertaining to a specific topic. Based on this description, New English academic writing shares the same characteristics that define aca-demic writing associated with other varieties of English.

4 Methodology In order to identify metaphors in ICE academic writing, a corpus-based study (which has become more prominent in metaphor research in general) was the logical choice. Berber Sardinha (2007: 12) summarises the distinct advantages of studying metaphor with the help of corpora, such as ICE, over intuition-based studies:

corpus-based studies can offer reliable information about the use of metaphors in language. Another [advantage] is that corpora typically include large amounts of data, which can be searched to provide information about the frequency of known metaphorical expressions. Yet another is that genre or register-specific corpora can be explored to indicate metaphors that are typical of certain fields or subject areas.

4.1 Retrieving metaphor candidates

Before reporting on the methodological details of the present study, it is impor-tant to determine what type of method can be best employed to retrieve poten-tial metaphor candidates from the corpora for further analysis. Berber Sardinha (2012: 21–22) sorts previous methods into two overriding groups: 1) “sampling techniques”, which involve a pre-selection of lexical units with which to approach the corpus, and 2) “census techniques”, which involve examining each unit of the corpus as a whole. Since the present study makes use of both of these techniques to a certain extent, it is worth briefly considering previous research utilising these methods.

A prominent example of a sampling technique for metaphor retrieval can be found in Deignan (2005: 27), who uses the Bank of English corpus to study metaphors pertaining to horse-racing and gambling and, before consulting the corpus, establishes “the key lexical items in the field [of horse-racing and gam-bling] using intuition, thesauri, dictionaries and collocational information from concordances”. This method elicits linguistic metaphors such as At 48 he is too


young to be in the running for Prime Minister or For months, polls showed the two main parties neck and neck.

A similar sampling technique is used in Stefanowitsch’s (2006) Metaphor Pattern Analysis (MPA), which is a corpus-based approach that aims at investigat-ing metaphorical target domains by pre-determining lexical items that represent those domains. Yet, in contrast to Deignan (2005), Stefanowitsch (2006: 66) is more specific about identifying metaphor on the basis of what he calls metaphor-ical patterns: “A metaphorical pattern is a multi-word expression from a given source domain (SD) into which one or more specific lexical item [sic] from a given target domain have been inserted”. Furthermore, metaphorical patterns “do not merely instantiate general mappings between two semantic domains […]. [T]hey establish specific paradigmatic relations between target domain lexical items and the source domain items that would be expected in their place in a non-met-aphorical use” (Stefanowitsch 2006: 67). This can be illustrated by a linguistic metaphor such as That idea went out of style years ago, for which Lakoff and Johnson (2003 [1980]) formulated the conceptual metaphor IDEAS ARE FASH-IONS. According to Metaphor Pattern Analysis, we can clearly see how the meta-phorical pattern establishes a paradigmatic relationship between idea and words denoting clothing, which are also expected to fill the same slot, such as in That shirt went out of style years ago. Stefanowitsch (2006: 71) investigates EMOTION metaphors, like ANGER, by pre-defining a set of lexical items that correspond to this domain, e.g. anger, fury, rage, wrath, etc., which help to retrieve metaphors such as ANGER IS AN OPPONENT IN A STRUGGLE (X wrestle with anger, X protect Y from anger, etc.) (Stefanowitsch 2006: 76).

The present study recognises the value of such methods and draws from them by involving, in essence, a sampling technique of a similar kind. However, the particular method used here departs from these types of sampling technique in two important ways. Firstly, Deignan (2005: 92) claims that “the direction of investigation in corpus studies is from the linguistic form through to meaning. It is not possible to use the corpus to proceed in the other direction”. This is cer-tainly valid from the perspective of an approach that relies on pre-formulated lists of lexical items or strings of words. However, the present study aims at challeng-ing this unidirectional view by taking a cue from Hardie et al. (2007) and attempt-ing to approximate the direction of meaning to form. Secondly, the present study automates the initial step of establishing key or representative lexical items for investigating a specific target domain by prompting the corpus itself to provide all lexis related to that domain. In the interest of incorporating both aspects into the present method, I employed the web-based corpus analysis software Wmatrix (Rayson 2009).


In order to approach this corpus-based metaphor study from meaning to form, in a first step, I uploaded the respective ICE texts for each variety-specific sub-register, e.g. India academic Natural Science, to Wmatrix, which semanti-cally annotates corpus texts with the aid of USAS (UCREL Semantic Annotation System)5. “The semantic fields automatically annotated by USAS can be seen as roughly corresponding to the domains of metaphor theory” (Hardie et al. 2007). One such semantic field can be illustrated by the semantic tag used in this study to extract metaphors involved in conceptualising the domain of IDEAS, namely the tag X4.1 (Mental object: Conceptual object). It was assumed that this tag would feature prominently in academic texts from various sub-registers. There-fore, this particular tag was selected to query for IDEAS metaphors, which can be accomplished by simply concordancing for X4.1 and, thus, setting meaning as the starting point and not form.

However, Wmatrix also provides frequency lists of words tagged with X4.1 (e.g. idea, concept, notion, theory, etc.), which give an indication of which linguis-tic forms are associated with this domain in the corpus texts before concordanc-ing. Therefore, in this second step, we can see an automated version of Deignan’s (2005) or Stefanowitsch’s (2006) pre-determination of lexical items with which to approach the corpus. This, of course, has the distinct advantage that there is no reliance on pre-selected linguistic metaphors, which may or may not be present in the corpus. Thus, the present study was less restricted in terms of which met-aphorical data could be uncovered. In addition, the corpus texts acted as their own reference by supplying lexical information from this domain that otherwise would have involved a certain degree of guesswork. Furthermore, with recourse to Metaphor Pattern Analysis, already established advantages remained intact, such as the “retrieval of large number of instances of a target domain item” as well as the potential to quantify certain metaphorical instances of lexis from a particular domain and “to make generalizations concerning the importance of the conceptual metaphors underlying these patterns” (Stefanowitsch 2006: 66). Finally, in a third step, I systematically examined the concordance lines for each X4.1 item with the aid of AntConc (Anthony 2012).

All in all, the retrieval of metaphor candidates from the corpora involved a more automated sampling technique than has been previously employed. Since at this point, we are still talking about “metaphor candidates,” this method is in need of a separate step that identifies these candidates as metaphorical or non-metaphorical. This step takes on characteristics of the “census technique” and will be outlined in the following.

5 For a more extensive overview, cf. Archer et al. (2002).


4.2 Identifying metaphors

Once all potential IDEAS metaphors were retrieved from the corpus data, these candidates were manually analysed and marked as being instances of linguistic metaphors or not. In order to do away with as much analyst intuition as pos-sible, for this step I relied on MIP (the Metaphor Identification Procedure), ini-tially developed by the Pragglejaz Group (2007) and further refined as MIPVU (Metaphor Identification Procedure Vrije Universiteit) (Steen et al. 2010),6 which assumes that “[m]etaphorical meaning in usage is indirect meaning: it arises out of a contrast between the contextual meaning of a lexical unit and its more basic meaning, the latter being absent from the actual context but observable in others.” This procedure has been assessed by Berber Sardinha (2012: 22) as belonging to the census technique. Nonetheless, the present study did not fully adhere to the “census” quality of this technique because the texts comprising the ICE academic writing under investigation were not all read in their entirety. However, for making decisions on many of the metaphors, it was often necessary to undertake a closer reading of the greater context to the extent that a contextual meaning for the metaphor could be established.

In order to provide further support for metaphorical decisions made in this study, I often additionally consulted the VU Amsterdam Metaphor Corpus Online, which is the largest corpus annotated for metaphorical language according to MIPVU (Steen et al. 2010), in order to tackle uncertain cases and to benefit from the insight of multiple analysts. If a linguistic metaphor was identified in the VU Amsterdam Metaphor Corpus Online for the uncertain case I was investigating, then I also considered it metaphorical.7 For instance, advocates of a theory or advocating a theory were considered metaphorical due to the fact that in the VU Amsterdam Metaphor Corpus Online, similar formulations were also judged to be metaphorical: Most advocates of biological theories was identified as an indirect metaphor there (Steen et al. 2010).

4.3 Formulating conceptual metaphors

After establishing which linguistic metaphors were present in the data, a final step was undertaken to formulate potential conceptual mappings underlying these

6 For details of the individual steps of MIPVU, cf. Pragglejaz Group (2007) and Steen et al. (2010). 7 If it was not found in the VU Amsterdam Metaphor Corpus Online, I discarded the uncertain case under investigation in order to avoid relying solely on my own intuition.

metaphors. Harkening back to Lakoff and Johnson’s (2003) list of IDEAS meta-phors, a few of these mappings were found in the data, but not all were present. Therefore, the data warranted the consideration of other conceptual mappings, and due to this circumstance, the following (broadly formulated) source domains have been suggested as a means of categorising and, thus, quantifying the meta-phors encountered in the data:

Table 1: Overview of IDEAS metaphors in New English Academic Writing

ORGANISMS: ideas are … Total: 178

PEOPLE: The role of theory is to give quantitative predictions <ICE-SIN:W2A-027#17:1>PLANTS: Language and thoughts have different genetic roots. <ICE-SIN:W2A-005#56-1>UNSPECIFIED: thought […] and language […] assume a unitary existence <ICE-SIN:W2A-005#58:1>

OBJECTS: ideas are … Total: 131

OBJECTS IN CONTAINERS: Medical personnel […] should keep this concept in mind <ICE-HK:W2A-027#104:1>CONTAINERS: philosophy does not confine to one particular subject matter <ICE-IN-D:W2A-001#18:1>POSSESSIONS: the parties […] would take the view <ICE-HK:W2A-014#52:1>LANDMARKS: feminists also turn to the phenomenological reflection on the body, especially to the idea […] <ICE-HK:W2A-003#86:1>

ARTEFACTS: ideas are … Total: 58

TOOLS: countries choose to use environmental issues to spark trade wars <ICE-HK:W2A-011#27:2>MIRRORS: Through an ideological mirror, individuals are constituted as subjects. <ICE-HK:W2A-002#59:1>CLOTHS: The common thread linking these two ideals <ICE-HK:W2A-004#41:1>GOODS: straight thinking, therefore, is at a discount <ICE-IND:W2A-012#57:1>

STRUCTURES: ideas are … Total: 42

BUILDINGS: These early ideas […] form the foundation of the modern idea of corporate social responsibility <ICE-SIN:W2A-017#32:1>PARTS OF BUILDINGS: These early ideas […] form the foundation of the modern idea of corpo-rate social responsibility <ICE-SIN:W2A-017#32:1>

EVENTS / ACTIVITIES: ideas are … Total: 29

JOURNEYS: the safe and well trodden areas of basic and general principles and practices <ICE-SIN:W2A-002#5:1>(VIOLENT) CONFLICTS: the concept […] came to be challenged exceedingly in Supreme Court <ICE-IND:W2A-005#32:1>GAMES: The succeeding tales also […] play off the themes <ICE-IND:W2A-008#59:1>COMMUNICATIVE EVENTS: The notion of an individual text develops […] to the historical and cultural dialogue <ICE-HK:W2A-002#110:1>

MATTER / ENERGY / OTHER NATURAL PHENOMENA: ideas are … Total: 11

PRECIOUS METAL: The touchstone, of all ideas, should be not their novelty <ICE-IN-D:W2A-012#97:1>LIGHT/LIQUID: Christian theology/philosophy also absorbs the idea of process philosophy <ICE-HK:W2A-005#46:1>

IMAGES: ideas are … Total: 7

He regards the political perspective of understanding as the absolute horizon of all reading and interpretation <ICE-HK:W2A-002#67:1>

With this framework in place, it is now possible to pinpoint variation between varieties and sub-registers of ICE academic writing. Although the formulation of conceptual metaphors can be a tricky business at times (and I acknowledge that they are, in fact, plausibility offerings), for the purposes of this study, it was necessary to establish the grounds of comparison, because it makes evaluating the distribution of the various metaphors possible. Nevertheless, these categories were first formulated after analysis of the whole data set and on the basis of their most salient conceptual features, as expressed by the linguistic metaphors. Addi-tionally, taken together, these categories can be viewed as supplying a metaphor-ical profile for the way the New English varieties, as represented by ICE-Hong Kong, ICE-India and ICE-Singapore, conceptualise the IDEAS domain.

On a final note, due to the nature of the academic register and the highly con-ventionalised informational language it contains, it was assumed that the bulk of metaphors encountered in the corpora would be of the conventional type, which is in line with previous studies (cf. Jäkel 1997; Steen et al. 2010) and confirmed by the present results. With this in mind, we turn to some specific findings in the following section.

Table 1 (continued)


5 ResultsThe method described above elicited a total of 458 metaphors for the target domain IDEAS across all varieties (Hong Kong, India and Singapore) and aca-demic sub-registers (Humanities, Natural Science and Social Science) with the aid of a total of 1,011 X4.1 lexical items as provided by Wmatrix. Therefore, an initial finding is that a good portion of the words used to talk about IDEAS in New English academic writing show up in metaphors, namely 45.3 %.

Considering the basic distribution across varieties as well as across sub-reg-isters of New English academic writing, as illustrated in Table 2, Hong Kong and Academic Humanities emerge as the most metaphorical variety and the most metaphorical sub-register, respectively, in terms of conceptualising the IDEAS domain.

Table 2: Distribution of IDEAS metaphors by variety and sub-register

Varieties: Frequency: Sub-registers: Frequency:

Hong Kong 199 Academic Humanities 336India 124 Academic Natural Science 30Singapore 135 Academic Social Science 92

Taking account of the distributional patterns of IDEAS metaphors according to the source domains involved, some clear preferences can be observed from both the variety as well as the sub-register perspective. This is demonstrated by the results in Table 3.


Table 3: Distribution of IDEAS metaphors by sub-register and variety according to source domains

Hong Kong India Singapore

Hum

aniti

es

Natu

ral S

cien

ce

Soci

al S

cien

ceTotal: Hu

man

ities

Natu

ral S

cien

ce

Soci

al S

cien

ce

Total: Hum

aniti

es

Natu

ral S

cien

ce

Soci

al S

cien

ce

Total:

ORGANISMS 70 4 8 82 25 3 13 41 37 2 16 55

OBJECTS 47 2 14 63 27 4 5 36 21 1 10 32

ARTEFACTS 16 0 3 19 14 3 3 20 14 1 4 19

STRUCTURES 16 3 1 20 4 4 0 8 10 0 4 14

EVENTS/ ACTIVITIES

3 1 0 4 8 1 3 12 11 0 2 13

MATTER/ ENERGY/OTHER NATURAL PHENOMENA

4 0 3 7 2 0 2 4 1 0 1 2

IMAGES 4 0 0 4 2 1 0 3 0 0 0 0

Total: 160 10 29 199 82 16 26 124 94 4 37 135

One tendency that is apparent from the data in Table 3 is the reliance on the source domain ORGANISMS to conceptualise IDEAS. As outlined above, this broadly formulated category groups together more specific domains such as PEOPLE and PLANTS. Nevertheless, of the domains in this category, the most prominent for all varieties and sub-registers is, in fact, PEOPLE. For instance, the bulk of IDEAS metaphors with the source domain ORGANISMS make use of the PEOPLE domain. For Hong Kong academic Humanities the PEOPLE domain is used 84.3 % of the time (59 out of 70), while in India academic Humanities it is used 96 % (24 out of 25) and 81 % in Singapore academic Humanities (30 out of 37). In Academic Natural Science texts, no matter what variety, PEOPLE is the sole domain involved for the ORGANISMS category. This type of metaphor clearly involves personification, which is in turn “the most obvious ontological meta-phor” (Lakoff and Johnson 2003: 33). This finding is consistent with other studies that have pinpointed personification as a characteristic feature of academic texts, especially of the type “when a non-human entity (referring to some discourse entity, such as a text) is the subject with a verb that requires a human agent”

(Steen et al. 2010: 108). This type was found in the data across varieties as well as across sub-registers, as the following briefly illustrate:

Hong Kong Academic Humanities:(2) the body in contemporary thought that may be regarded as legacies of the Cartesian view, which treat the body as primarily an object <ICE-HK: W2A-003:67:1>

India Academic Natural Science:(3) Actually JuddOfelt theory works less satisfactorily <ICE-IND:W2A-025#16:1>

Singapore Academic Social Science: (4) the concept of dialect groups is too embracing to be able to take care of internal segmen-tations <ICE-SIN:W2A-016#58:1>

Furthermore, as Table 3 shows, there is also a prominence of another type of onto-logical metaphor that I have subsumed under the category of OBJECT. Because PEOPLE and OBJECT metaphors make up the bulk of all metaphors collected, their analysis merits special attention, which I will turn to below. However, beforehand, it is worthwhile to consider certain focal points for interpreting the data from the perspectives of variety and sub-register.

From the cross-discipline perspective, it is perhaps not a very surprising result that IDEAS metaphors in general are more present in Humanities (113 met-aphors in total), as compared to the other academic sub-registers (9 in Natural Science and 10 in Social Science), considering the nature of the domain itself and its informational contribution to the Humanities texts. IDEAS, as represented in the corpus by lexical items with the tag X4.1, occur in general more frequently in the Humanities as compared to Social Sciences and Natural Sciences (624 items in Humanities, 114 in Natural Sciences and 273 in Social Sciences). This, of course, relates to the specific topical domain(s) of the Humanities texts that warrant discussion of the respective histories of ideas, at least more often than in Natural Science or even Social Science writing, which can be exemplified by a glance at a sample of titles from the Hong Kong corpus texts:

(5) Academic Humanities:(a) “Re(-)presenting the Unconscious: From Sigmund Freud to Fredric” <ICE-HK:W2A-002>(b) “Chinese-Western Comparative Drama in Perspective” <ICE-HK:W2A-007>(c) “Anthropology and Christology in Christian-Confucian Dialogue” <ICE-HK:W2A-005>

(6) Academic Natural Science:(a) “Infections of the Central Nervous System” <ICE-HK:W2A-021>

(b) “Old stone walls as an ecological habitat for urban trees in Hong Kong” < ICE-HK:W2A-022>(c) “Patterns of referral to the paediatric specialist clinic of a regional hospital: descriptive study” < ICE- HK:W2A-023>

(7) Academic Social Science:(a) “A Strategy for Hong Kong Industries, Inc.” <ICE-HK:W2A-012>(b) “The prospect of mediation in resolving construction disputes” <ICE-HK:W2A-014>(c) “The Rehabilitation Development Coordinating Committee and the Future of Ser-vices Concerning People with Disabilities in Hong Kong” <ICE-HK:W2A-019>

Moreover, IDEAS metaphors, based on this expected topical diversity8, occur with varying elaborations from discipline to discipline and also functionally contrib-ute to academic writing in different ways, as will be demonstrated below.

From the cross-varietal perspective, it is difficult to establish what variety presents itself as most metaphorical on the basis of data concerning one concep-tual domain. Additionally, although Hong Kong is clearly characterised by the highest frequency of IDEAS metaphors, these metaphors still show up in com-parable numbers in Indian and Singaporean academic writing. Therefore, what is of greater interest here is the consideration of metaphorical variation beyond frequency. Kövecses (2010: 216) states that “two languages may share the same conceptual metaphor, but the metaphor is elaborated differently in the two lan-guages”. For instance, the conceptual metaphors THE BODY IS A CONTAINER FOR THE EMOTIONS and ANGER IS FIRE have an attested existence in both Hun-garian and English; in Hungarian the body with fire inside is often elaborated as a pipe – an elaboration that does not appear to be at work in conventional English metaphors of this kind (Kövecses 2010: 216). By extending this notion to the study of varieties, it is possible to establish variation along the lines of this kind of elaboration. For instance, IDEAS were conceptualised in Hong Kong aca-

8 In the study of metaphor, we should not underestimate the problematic aspect of topical diver-sity, which is related to the design of the ICE corpora. As far as the author of the present paper is aware, the ICE texts, despite being carefully selected as representative examples of the text types comprising the general design of the ICE project, were not selected on the basis of topic similar-ity. Thus, ICE-based research into metaphor may run into the problem of absence of a domain, not because a variety does not make use of this domain, but because it just so happens that the topics of the text selected does not make use of it. This factor, along with the smaller nature of the ICE components, does in the long run present difficulties for more extensive research into meta-phor variation, for which more frequencies for a particular domain may be required. However, in terms of register research, ICE’s design is still the best option for comparative study of varieties and thus has been used in the present study.

demic Humanities as MORAL GUIDES, illustrated by (8) to (11) in the following section, which was not found to be part of the mappings for the other varieties.

Whether or not these differences have an overall characterising role for the study of varieties remains to be seen and would require more extensive research. However, this does indicate a starting point from which to consider overall meta-phorical variation along the variety divide. Specifically, it helps to create a basis for separating those metaphors and metaphorical expressions that are ubiqui-tous to all varieties from potentially variety-specific conceptualisations or at the very least variety-specific domain preferences. This will be briefly considered in the next section, which is followed by a closer look at metaphor variation and function from the sub-register perspective on New English academic writing.

6 Discussion

6.1 Metaphor across New English varieties

In distributional terms, it is clear that the IDEAS domain is conceptualised by all categories in all three varieties, with the exception of the IMAGES category (no instances in Singapore academic writing), which did not contribute many meta-phors in general (four for Hong Kong and three for India). Due to the fact that all varieties make use of nearly all source domain categories to conceptualise IDEAS, I conclude that there is no great difference between the varieties, especially in regard to the non-presence of a certain domain.

Nevertheless, the similarities in domain exploitation for IDEAS metaphors do not necessarily exclude potentially variety-specific conceptualisations. If we consider differences in terms of the various entailments or elaborations appar-ent in shared domain mappings, it becomes clear that varieties, in fact, display a certain degree of variation. Consider (8) to (11) below from Hong Kong academic Humanities:

(8) identity as a woman depends on the specific social regulatory ideals by which female bodies are trained and formed <ICE-HK:W2A-003#26:1>

(9) it is widely accepted that general principles serve to guide moral conduct and decisions <ICE-HK:W2A-004#114:1>

(10) Ethical behaviour is guided by the ethical ideal of caring and not by principles or rules. <ICE-HK:W2A-004#125:1>

(11) we are under the guidance of the ethical ideal, that vision of the best self. <ICE-HK:W2A-004#103:1>

These extracts illustrate an elaboration that could be represented by the concep-tual metaphor IDEAS ARE MORAL GUIDES, which occurs a total of 11 times in the Hong Kong corpus. While (8) to (11) display a personification of IDEAS (repre-sented by principles, ideal and vision) in that they are pursuing a uniquely human activity, that is, serving as a good example of moral behaviour or actively guiding and training, this elaboration is not present in Indian and Singaporean academic writing and, thus, has the potential to be variety-specific. IDEAS ARE MORAL GUIDES belongs to a more general metaphor, IDEAS ARE DOMINANT PEOPLE that, by contrast, has been attested for all varieties and shows no major tendency towards a certain elaboration:

(12) general principles do not always determine what is appropriate <ICE:HK-W2A-004#139:1>

(13) writers […] are very much influenced by the theories of black Aesthetics <ICE-IN-D:W2A-009#29:1>

(14) Whereas once EAP was dominated by the concept of registers <ICE-SIN:W2A-007#39:1>

Another example for a variety-specific elaboration comes from Singapore, which is the only variety to conceptualise IDEAS as a TEACHER, illustrated by (15):

(15) a controversial issue could be either a good or bad teacher by affecting learning through its contents or through its dynamics. <ICE-SIN:W2A-002#6:1>

Although this TEACHER conceptualisation is unique to the Singaporean corpus, the notion that IDEAS can impact an individual or a society in a positive or nega-tive manner, as illustrated in (15), is still part of metaphors found in all varieties, for instance:

(16) IDEAS ARE PEOPLE WHO HELP (a) translation as the ideological handmaid of imperialism <ICE-HK:W2A-009#8:1>(b) they [principles] may all work together to facilitate the use of language <ICE-IN-D:W2A-002#32:1>(c) What would soothe her is […] the thought that his action in comforting her is a response to her need <ICE-SIN:W2A-004#29:1>

(17) IDEAS ARE PEOPLE WHO HARM (a) many forms of oppressive ideologies <ICE-HK:W2A-003#16:1>(b) Ayer’s notion of philosophy deprives philosophy of its empirical content <ICE-IN-D:W2A-001#17:1>(c) understanding of human phenomena are sometimes distorted by […] political beliefs, ideology and sheer ethnocentrism. <ICE-SIN:W2A-005#12:1>


All in all, these metaphors show that, despite the potential for individual prefer-ence for certain elaborations, such as IDEAS AS MORAL GUIDES or TEACHERS, New English varieties, specifically in academic writing, tend to draw from the same conceptual pool, that is, their metaphors display more conceptual similar-ities than differences. This is perhaps not so different from varieties tradition-ally conceived of as more “standard”, such as British or American English, which would also speak to the strong conventional nature of the academic register, to which I turn in the following.

6.2 Discussion: Metaphor across academic sub-registers

Distributional differences in IDEAS metaphors can also be identified from the sub-register perspective. One obvious observation relates to the distribution of PEOPLE metaphors. Humanities emerges as the most clearly metaphorical sub-register for the conceptualising of this domain, followed by Social Science and Natural Science. Incidentally, each variety individually displays the same tendency, as portrayed in Table 4.

Table 4: Distribution of IDEAS ARE PEOPLE metaphors

Academic Humanities

Academic Natural Science

Academic Social Science

Hong Kong 59 4 7India 24 3 11Singapore 30 2 12

Total: 113 9 30

All varieties consistently place Humanities on the more metaphorical side and Natural Science on the less metaphorical side of the continuum, with Social Science somewhere in between. This is a general trend for most other categories9, illustrated for the second most prominent ontological metaphor, IDEAS ARE OBJECTS, by Table 5.

9 The exceptions are 1) India academic Natural Science and Social Science, which both contain three IDEAS ARE ARTEFACTS metaphors; 2) Hong Kong and India academic Natural Science has more IDEAS ARE ARTEFACTS metaphors than SOCIAL SCIENCE; and 3) India academic Natu-ral Science has one IDEAS ARE IMAGES, whereas India Social Science has none. However, the frequencies involved here are very small and do not necessarily detract from the general trend.

Table 5: Distribution of IDEAS ARE OBJECTS metaphors

Academic Humanities

Academic Natural Science

Academic Social Science

Hong Kong 47 2 14India 27 4 5Singapore 21 1 10

Total: 95 7 29

Despite these frequency differences, the New English academic register as a whole displays a common characteristic, in that it makes use of well-established conventional metaphors, of the kind we would expect to find in other academic English varieties. Examples (19) to (20) represent conventional metaphors from the OBJECTS category of that feature in all academic sub-registers:

(18) IDEAS ARE CONTAINERS(a) The interesting aspect […] lies in its intra-Asian comparative literature perspective <ICE-HK:W2A-007#51:1> (Humanities)(b) one of the major problem in the lanthanide f-f intensity theory <ICE-IN-D:W2A-025#28:1> (Natural Science) (c) Such a view hides in it subtle dangers. <ICE-HK:W2A-019#88:2> (Social Science)

(19) IDEAS ARE OBJECTS IN CONTAINERS(a) the eternal world become latent dream-thoughts stored in the unconscious psyche <ICE-HK:W2A-002#17:1> (Humanities) (b) Medical personnel […] should keep this concept in mind <ICE-HK:W2A-027#104:1> (Natural Science) (c) the word “wealth” […] now occupies the vacated slot in the dirt dictionary as an unworthy concept <ICE-IND:W2A-012:56:1> (Social Science)

(20) IDEAS ARE POSSESSIONS (a) Full cognizance should be given to the influences on the curriculum planner <ICE-SIN:W2A-002#82:1> (Humanities)(b) the panoramic view gives the idea that it is more slopy and undulating <ICE-IN-D:W2A-030#42:1> (Natural Science)(c) The argument […] was given another perspective <ICE-SIN:W2A-017#63:1> (Social Science)

However, in view of the topical diversity sketched out above, the mere fact that these metaphors can occur in all academic sub-registers (albeit for some in small numbers) is not necessarily an indication of similarity in the way these meta-phors are elaborated. Just as we entertained the notion of variety-specific concep-tualisations, we can consider discipline-specific conceptualisations, or at least

preferences, by examining those metaphors from the OBJECTS category that are not found in all academic sub-registers.

When IDEAS are conceptualised as OBJECTS in general, there is more of a tendency in the Humanities, first and foremost, and in Social Science, secondly, to highlight certain qualities, whereas in Natural Science no specific qualities are attributed to IDEAS as OBJECTS. To exemplify this, consider the following quali-ties, which can be formulated as individual mappings:

(21) IDEAS ARE MOVEABLE OBJECTS (a) Wittgenstein has replaced Kant’s concept of mind by language <ICE-IN-D:W2A-002#7:1> (Humanities)(b) One view, advanced in the 1920s and 1930s <ICE-SIN:W2A-017#23:1> (Social Science)

(22) IDEAS ARE VISIBLE OBJECTS(a) Although the concept was never defined formally, it is clear on the basis of these answers <ICE-HK:W2A-004#57:1> (Humanities)(b) financial statements that show a “true and fair view” <ICE-IND:W2A-020#32:1> (Social Science)

Examples (21) to (22) may not illustrate metaphors in the strictest “discipline-spe-cific” sense due to their occurrence in two separate sub-registers, or they may be an indication of Social Science containing texts of a more “Humanities” nature than a “Natural Science” one. Nevertheless, when considering the frequency of these metaphors, it becomes apparent that Humanities shows a slight preference for them over Social Science, since IDEAS ARE MOVEABLE OBJECTS occurs 8 times in the Humanities and 6 times in Social Science, while IDEAS ARE VISIBLE OBJECTS occurs 18 times in the Humanities and only 4 times in Social Science. In fact, by taking a closer glance at the latter category, we can see a perhaps more suitable candidate for a discipline-specific elaboration, because IDEAS are not only VISIBLE OBJECTS in the Humanities, but also represented as VISIBLE OBJECTS that were previously hidden from view and, by their revealing, have attained the VISIBLE quality:

(23) IDEAS ARE VISIBLE OBJECTS (PREVIOUSLY HIDDEN FROM VIEW)(a) the article is a legitimate attempt at establishing rapports de fait […] shedding light on certain issues <ICE-HK:W2A-007#44:1> (b) Subsequently they have exposed this notion as a historical and ideological con-struct <ICE-HK:W2A-009#6:1>(c) This paper aims to put a step towards that by highlighting certain pragmetic [sic] principles, some of which may go otherwise unnoticed <ICE-IND:W2A-002#26:1>

This particular elaboration makes up 72.2 % of IDEAS ARE VISIBLE OBJECTS (13 out of 18) in the Humanities texts and perhaps points to a functional role for this metaphor in this academic sub-register. Humanities texts, often in introductory sections, typically inform the reader about the history of ideas involved in the discussion of the topic at hand. For instance, if we consider the greater context of (23c), a linguistic paper from the India corpus entitled “Pragmatic Principles and Language”, it becomes clear that IDEAS ARE VISIBLE OBJECTS (PREVIOUSLY HIDDEN FROM VIEW) functions to locate the paper within these previous ideas and accentuate its contribution to these ideas:

(24) Philosophers have found pragmatics to be quite close to what they have called “ordi-nary language analysis”. They have often used isolated insights about the working of language in solving philosophical riddles without paying much attention to many of the underlying pragmatic principles of the language that they are using. As they have primarily concerned themselves with the theories of meaning, rules, and other related issues, they were forced to study pragmatics of language incidentally without which they would not have found it possible to explain, for example, what is “meaning”. A fuller understanding of pragmatic aspects of the working of language is yet to be achieved despite numerous attempts by philosophers and linguists. This paper aims to put a step towards that by highlighting certain pragmetic [sic] principles, some of which may go otherwise unnoticed. <ICE-IND:W2A-002#23:1-26:1>

By conceptualising IDEAS (principles) as becoming VISIBLE OBJECTS in need of revelation (highlighting, go otherwise unnoticed), it becomes obvious to the reader that the present article’s aim is to fill those knowledge gaps left by previous “phi-losophers and linguists”. Incidentally, the other 12 metaphors of this kind found in the Humanities texts function in exactly the same way. This is not necessarily evidence for Natural Science or Social Science texts being completely void of this metaphorical function, despite the data indicating a clear preference for it in the Humanities, which is most likely due to the nature of the topics that these texts comprise.

At this point, we could entertain the possibility that metaphors of this kind act systematically as metaphorical “register features” (cf. Biber and Conrad 2009; Schubert, this volume; Sanchez-Stockhammer, this volume) due to a register’s or, in this case, sub-register’s preference for this particular mapping and func-tion. Furthermore, the more extensively we investigate the relationship between metaphor and register within the study of varieties of English, we could conceive of the existence of metaphorical “register markers” (cf. Biber and Conrad 2009; Schubert, this volume; Sanchez-Stockhammer, this volume), whose uniqueness

is not only determined by the register in which they prominently feature, but also perhaps by the extent to which a variety is nativised.10

Nevertheless, the present data provides insight into another preference and, thus, another potential metaphorical register feature that can be seen in IDEAS ARE PEOPLE, particularly those that stretch beyond the sentence boundary over a larger portion of the text. Consider (25) below, which serves as an example of how metaphors can influence textual structuring, that is, how they contribute to the cohesion as well as coherence of a text more significantly in Humanities than in Natural Science and Social Science:

(25) It is time for courses to introduce controversial issues in management studies. A con-troversial issue covers new grounds. It enhances the learning process. It could facil-itate further the practice of examining, analyzing and deciding skills. However, if not carefully introduced, controversial issues could generate a disproportionate degree of confusion, and result in demotivating the students. As such, the introduction of a controversial issue in the curriculum would have to be properly managed because a controversial issue could be either a good or bad teacher by affecting learning through its contents or through its dynamics. <ICE-SIN:W2A-002#6:1-11>

We have encountered this metaphor before as IDEAS ARE TEACHERS (15) and determined that it is a metaphor specific to the Singapore corpus. However, in (25) we see that it functions to promote the coherence of the text, because an IDEA (issue) is portrayed as having all those teacher-like qualities one could expect when encountering a real teacher: A good teacher covers new grounds (top-ic-wise), enhances the learning process, facilitates the practice of skills, while a bad teacher can generate confusion and demotivate students. These qualities are attributed to IDEAS via the repeated presence of the metaphor IDEAS ARE TEACHERS, which is then directly stated at the end of the passage, acting as a summary of sorts.

Here, it is also conceivable to consider this metaphor’s function in creating cohesion due to the fact that almost each instantiation of IDEAS (issue(s), it) is embedded in the same metaphor throughout the passage and all are linked by language pertaining to both helpful attributes of a teacher (e.g. enhancing learn-ing and facilitating practice of skills) as well as negative attributes (e.g. generat-

10 For extensive discussion about nativisation and the extent to which a variety, as it is develop-ing, orientates itself towards the English input variety, cf. Schneider’s “Dynamic Model” (Schnei-der 2007, 2003). Furthermore, research is currently being completed by the author of the present paper exploring the relationship between metaphor and nativisation and, thus, considering to what extent a variety, e.g. Indian English, behaves metaphorically different from its traditional input variety, British English, for certain target domains, e.g. EMOTIONS.


ing confusion and demotivating). This is different for Natural Science and Social Science texts, which do not give such prominence to IDEAS metaphors, and, in doing so, leave little room for them to structure their respective texts in this manner. Again, from this perspective, it seems to make more sense to talk about potential metaphorical “register features” over “register markers” (cf. Schubert, this volume).

7 ConclusionsThe assumption behind the present study is that metaphor is a characteristic and functional feature of the academic register. Although this study focuses on meta-phors conceptualising a single domain, it shows that, despite traditional notions of the metaphorical poverty of this register, academic writing is by no means void of metaphorical language, which, in turn, indicates the presence of concep-tual metaphors. In particular, New English academic writing, as represented by the ICE components under investigation, makes use of conventional metaphors that can be encountered in academic writing associated with more traditional varieties of English. This is perhaps the result of the highly revised and edited production circumstances and international reach of this register, which, taken together, may discourage more variety-specific conceptualisations in favour of conventional metaphors intelligible to speakers of all varieties and non-native speakers alike. Despite this conventionality, it is nevertheless possible to point out potentially variety-specific conceptualisations by taking a finer-grained look at how a variety elaborates on a more general metaphor. In fact, it is perhaps on this level of analysis that metaphorical variation across varieties can be encoun-tered in general. In order to provide more evidence for this, research on other domains and with other varieties is required.

From the sub-register perspective, it is possible to pinpoint the most meta-phorical discipline for a specific domain, e.g. Humanities as most metaphorical for the IDEAS domain. Nevertheless, if other domains were examined, it could very well be the case that a completely different academic sub-register emerges as the most metaphorical. Furthermore, for metaphorical variation across the disciplines in this study of New English academic writing, at this stage it is pos-sible to identify potential candidates for metaphorical “register features” rather than metaphorical “register markers” due to the fact that none in the data were exclusive to one specific academic sub-register, although a preference for certain metaphors can be determined. This also requires more research, which would most certainly benefit from the inclusion of other sub-registers or comparison


with metaphorical data from popular texts pertaining to the Humanities, Natural Sciences and Social Sciences, which the ICE corpora also provide. In terms of their functional properties, a metaphor conceptualising a certain domain may exhibit functional features that can only be demonstrated for a particular sub-register, like signalling a paper’s contribution to a body of research in the Humanities. However, here again, further research can improve on the study of metaphorical function by adhering more strictly to a “census” technique, such as MIPVU, as well as relying on texts that do not display such a topical diversity, as the ICE components do. Additionally, recent work in metaphorical variation and the varieties11 exploit the advantages of using a significantly larger corpus, like Davies’ (2013) Corpus of Global Web-Based English (GloWbE), in order to make more extensive frequency-based claims about variety-specific domain prefer-ences as well as to contribute to research into web registers (cf. Biber and Egbert, this volume) from the cross-variety perspective12. All things considered, employ-ing metaphor as a feature to investigate both variety-based and register variation has the potential to provide many more insights into the nature of these highly relevant fields of study.

ReferencesAnthony, Laurence. 2012. AntConc (Version 3.3.5) [Computer Software]. Tokyo, Japan: Waseda

University. http://www.antlab.sci.waseda.ac.jp/Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS Category System.

http://ucrel.lancs.ac.uk/usas/usas%20guide.pdf (accessed 5 May 2011). Berber Sardinha, Tony. 2012. An assessment of metaphor retrieval methods. In Fiona

MacArthur, José Luis Oncins-Martínez, Manuel Sánchez-García & Ana María Piquer-Píriz (eds.), Metaphor in use: Context, culture, and communication, 21–50. Amsterdam & Philadelphia: John Benjamins.

Berber Sardinha, Tony. 2007. Metaphor in corpora: A corpus-driven analysis of Applied Linguistics dissertations. Rev. Brasileria de Lingüística Aplicada 7(1). 11–35.

Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP. Black, Max. 1954. Metaphor. Proceedings of the Aristotelian Society 55. 273–294. Cameron, Lynne. 2003. Metaphor in educational discourse. London: Continuum. Davies, Mark. 2013. Corpus of global web-based English. http://corpus.byu.edu/glowbe/.

11 Cf. Díaz-Vera’s (2015) study on various conceptualisations of LOVE in India, Pakistan and Nigeria. 12 GloWbE provides an opportunity to efficiently compare 20 distinct varieties of English world-wide, of which the bulk could be categorised as belonging to the “New Englishes”.


Deignan, Alice. 2005. Metaphor and corpus linguistics. Amsterdam & Philadelphia: John Benjamins.

Díaz-Vera, Javier E. 2015. Love in the time of corpora. Preferential conceptualizations of love in world Englishes. In Vito Pirrelli, Claudia Marzi & Marcello Ferro (eds.), Word structure and word usage. Proceedings of the NetWordS final conference, 161–165. http://ceur-ws.org/Vol-1347/paper37.pdf (accessed 13 May 2015).

Drewer, Petra. 2003. Die kognitive Metapher als Werkzeug des Denkens. Zur Rolle der Analogie bei der Gewinnung und Vermittlung wissenschaftlicher Erkenntnisse. Tübingen: Narr.

Goatly, Andrew. 1997. The Language of metaphors. London & New York: Routledge. Hardie, Andrew, Veronika Koller, Paul Rayson & Elena Semino. 2007. Exploring a semantic

annotation tool for metaphor analysis. In Matthew Davies, Paul Rayson, Susan Hunston & Pernilla Danielsson (eds.), Proceedings of the Corpus Linguistics 2007 Conference, 1–12. http://corpus.bham.ac.uk/corplingproceedings07/paper/49_Paper.pdf (accessed on 19 August, 2011).

Jäkel, Olaf. 1997. Metaphern in abstrakten Diskurs-Domänen. Eine kognitiv-linguistische Untersuchung anhand der Bereiche Geistestätigkeit, Wirtschaft und Wissenschaft. Frankfurt am Main: Peter Lang.

Kövecses, Zoltán. 2010. Metaphor: A practical introduction, 2nd edn. Oxford: OUP. Krennmayr, Tina. 2011. Metaphor in newspapers. Utrecht: LOT.Lakoff, George & Mark Johnson. 2003 [1980]. Metaphors we live by, 2nd edn. Chicago & London:

Chicago UP. Nelson, Gerald. 1996. The design of the corpus. In Sidney Greenbaum (ed.), Comparing English

worldwide: The International Corpus of English, 27–35. Oxford: Clarendon. Partington, Alan. 1998. Patterns and meanings: Using corpora for English language research

and teaching. Amsterdam & Philadelphia: John Benjamins. Platt, John, Heidi Weber & Ho Mian Lian. 1984. The New Englishes. London: Routledge. Pragglejaz Group. 2007. A practical and flexible method for identifying metaphorically-used

words in discourse. Metaphor and Symbol 22(1). 1–39. Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment, Computing

Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/Römer, Christine. 2000. Metaphern in der Wissenschaftssprache: Bildfelder der

sprachwissenschaftlichen Fachkommunikation. In Josef Bayer & Christine Römer (eds.), Von der Philologie zur Grammatiktheorie, 353–365. Tübingen: Max Niemeyer.

Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: CUP. Schneider, Edgar W. 2003. The dynamics of New Englishes: From identity construction to dialect

birth. Language 79(2). 233–281.Semino, Elena. 2008. Metaphor in discourse. Cambridge: CUP. Semino, Elena, Alice Deignan & Jeannette Littlemore. 2013. Metaphor, genre, and

recontextualization. Metaphor and Symbol 28(1). 41–59.Skorczynska, Hanna & Alice Deignan. 2006. Readership and purpose in the choice of

economics metaphors. Metaphor and Symbol 21(2). 87–104. Steen, Gerard J., Aletta G. Dorst, J. Berenike Herrmann, Anna Kaal, Tina Krennmayr & Trijntje

Pasma. 2010. A method for linguistic metaphor identification. From MIP to MIPVU. Amsterdam & Philadelphia: John Benjamins.

Stefanowitsch, Anatol. 2006. Words and their metaphors: A corpus-based approach. In Anatol Stefanowitsch & Stefan Th. Gries (eds.), Corpus-based approaches to metaphor and metonymy, 63–106. Berlin & New York: Mouton de Gruyter.


Wolf, Hans-Georg & Frank Polzenhagen. 2009. World Englishes: A cognitive sociolinguistic approach. Berlin & New York: Mouton de Gruyter.

Zichler, Csilla. 2010. Metaphern in der Wissenschaftssprache. Sprachtheorie und germanistische Linguistik 20(1). 95–112.

Steffen SchaubThe influence of register on noun phrase complexity in varieties of English

Abstract: This study explores noun phrase (NP) complexity variation in registers of regional varieties of English. The focus is on the description of NP complex-ity in four registers (academic writing, conversation, unscripted speeches and social letters) across five regional varieties of English (Canada, Hong Kong, India, Jamaica, Singapore). For that, noun phrases are extracted from a register-strati-fied subsample of the International Corpus of English and annotated for NP com-plexity based on a four-way categorisation system: i) unmodified, ii) premodified only, iii) postmodified only, iv) pre- and postmodified. The results corroborate the strong influence of register on NP complexity, depending on two situational char-acteristics: communicative purpose (informational vs. interactional) and mode (written vs. spoken). Finally, it is assessed whether NP complexity is a viable marker of regional variation in comparative varieties research.

1 IntroductionThis study explores noun phrase (NP) complexity variation in registers of regional varieties of English. There are three motivations for pursuing this particular research topic: the lack of descriptive work on the noun phrase in varieties of English, a growing interest in register variation in English varieties research and awareness of the strong influence of register on NP structure. These motivations are discussed in more detail in the following.

Descriptive work on the regional varieties of English has developed a focus on comparison. With the emergence of comparable linguistic corpora, such as the International Corpus of English (ICE), linguists have compared individual varieties against a normative ‘yardstick’ (usually British English) or against each other. Most of the attention has been devoted to phonology, lexis and morphosyntax. Interest in the latter was mainly guided by investigations of ‘non-standard’ fea-tures, i.e. features reported to occur in Englishes around the world that do not occur in the norm-providing standard varieties. The task is to re-evaluate early feature reports based on anecdotal observation (e.g. Platt, Weber and Ho 1984)

Steffen Schaub, University of Marburg

252 Steffen Schaub

and to confirm their validity using empirical means. With regard to the noun phrase across regional varieties of English, three ‘non-standard’ features are frequently mentioned in surveys and grammatical descriptions: noun pluralisa-tion (Platt, Weber and Ho 1984; Ahulu 1998; Hall, Schmidtke and Vickers 2013), use of the article system (Sand 2004; Lamidi 2007; Wahid 2013; Sand forthc.), and subject-verb concord (Asante 1995; Ahulu 1998; Blair and Collins 2001; Sand forthc.). Other, less frequently reported phenomena include variation in the pronoun system (Lamidi 2007; Kortmann and Lunkenheimer 2013), the expres-sion of possession (Kortmann and Lunkenheimer 2013) and adjective comparison (Kortmann and Lunkenheimer 2013).

More recently, interest in the noun phrase across varieties of English has moved beyond the investigation of isolated morphosyntactic features. Brunner (2014) introduces NP modification patterns as a marker of regional variation across varieties of English. He compares NP structures in British, Kenyan and Singapore English and finds that “[i]n Singapore English, premodified NPs are significantly overrepresented [while] in Kenyan English, postmodifiers are more frequent than premodifiers” (Brunner 2014: 44). He attributes these preferences to contact influence from the indigenous languages of the respective areas, based on their typological profiles (head-final vs. head-initial word order). These find-ings are drawn from the register of spontaneous spoken conversation, which is “arguably the least stylized and can therefore be expected to be susceptible to contact-induced language change” (Brunner 2014: 30). In order to substantiate the claim that preferences in NP modification are the result of language contact, it is necessary to study more registers to see if these tendencies can be confirmed.

The notion of ‘register’ is a relatively recent addition to research into vari-eties of English. Register is defined here, in accordance with Biber and Conrad (2009: 6), as “a variety associated with a particular situation of use (including particular communicative purposes)”. So far, English varieties have mainly been handled as homogeneous entities conveniently defined by the borders of political nation-states rather than linguistic criteria, but this is not due to a lack of aware-ness. Already in early reports we find observations that take register variation into consideration. Platt, Weber and Ho (1984: 49), for instance, frequently dif-ferentiate between written and spoken as well as formal and colloquial language when discussing individual features, e.g.: “It is common in some New Englishes to mark the plural of the noun more often in writing and in more formal speech. There would be less marking in colloquial speech”. Nevertheless, for much of World Englishes research, the nation-state variety remained the preferred level of comparison. Macro-scale projects such as the Electronic World Atlas of Varieties

The influence of register on noun phrase complexity in varieties of English 253

of English (Kortmann and Lunkenheimer 2013) show that demarcating varieties even at this general level produces a large number of distinct entities.1

An essential component of register research is the analysis of grammatical features and their function in particular registers (see Schubert, this volume). With the emergence of comparable, computer-readable corpora in the 1990s, which are also subdivided into genres, it is possible to move beyond anecdotal observation and to verify hypotheses about register variation systematically. For instance, Sand (2004) compares article use across varieties and concludes that “differences across text types are observable and genre differences within one variety are practically always more pronounced than overall variation across vari-eties” (Sand 2004: 294–295). A growing number of studies extend Biber’s (1988) multidimensional approach to register variation to the study of regional varie-ties of English (see, for instance, Balasubramanian 2009; Xiao 2009; Neumann 2012; Neumann and Fest, this volume). Balasubramanian (2009: 4) specifically addresses register variation in Indian English, arguing that

just as traditionally recognized ‘native’ varieties of English are recognized for the variation within them, so too, should the emerging new varieties. The ‘native’ varieties of English are recognized for the differences within them stemming from region, social status, and reason for use or register […] to name just a few variables. […] Any study of a new variety of English, then, should focus on identifying the variation within it, (and not just on describing a set of features that characterize the national variety), and provide detailed descriptions of the national variety […].

Xiao (2009) explores variation across twelve registers and five varieties using the multidimensional analysis (MDA) approach developed by Biber (1988). The study encompasses 141 grammatical and semantic features. Xiao concludes that “var-iations in language use involve regional varieties as well as variants in different registers and along different dimensions” (Xiao 2009: 447). In sum, register dif-ferences are increasingly addressed in English varieties research, and it becomes clear that the influence of register on the overall structural variation of regional English varieties must be taken into account.

The connection between register and NP complexity has been demonstrated repeatedly. Aarts (1971) analyses NP complexity across four different text types and concludes that NP complexity correlates with syntactic function: While the subject slot prefers ‘light’ noun phrases, the object slot prefers ‘heavy’ ones. In

1 The eWAVE database covers 76 mostly national varieties of English, including, however, a number of localised dialectal varieties, for instance East Anglian English or Appalachian English (Kortmann and Lunkenheimer 2013).

254 Steffen Schaub

addition, Aarts found a tendency for heavy noun phrases to be much less fre-quent in spoken than in written texts. The latter point is taken up by de Haan (1993), who confirms Aarts’ (1971) hunch about the relation between NP complex-ity and text type. De Haan (1993) further investigates the combined influence of text type and syntactic function on NP complexity, and finds that, in some cases, the two reinforce each other, while in other cases they cancel each other out. Halliday (1989) argues that spoken language is no less complex than written lan-guage, but that the complexity is located differently. While spoken language has a more elaborate clausal structure, in written language, the complexity lies in the constituents below the clausal level, foremost in what he calls the nominal group. Nominals, in writing, carry “the meat of the message” (Halliday 1989: 72). Schäpers (2009), using a corpus of spoken and written British English, confirms that “[n]oun phrases are more complex in written language with regard to pre-modification, postmodification, and both pre- and postmodification” (2009: 153). On the level of registers, Biber et al. (1999) find that almost 60 % of noun phrases in academic prose have a modifier, while only 15 % of noun phrases in conversa-tion are modified (Biber et al. 1999: 578). In general, academic prose is character-ised by a more frequent use of nouns than conversation (Biber and Conrad 2009: 116–117). The linguistic differences between these two registers, Biber and Conrad argue, can be explained on the basis of their different situational characteristics: while the purpose of conversation is to develop personal relationships, academic prose focuses on communicating information (Biber and Conrad 2009: 109). To sum up, the strong connection between NP complexity and register has been con-firmed in various studies of British and American English.

The present study combines the three interconnected research interests outlined above. NP complexity is systematically compared across five varieties of English (Canadian English, Indian English, Jamaican English, Hong Kong English and Singapore English) and four registers (academic writing, conversa-tion, unscripted speeches and social letters). The regional varieties reflect diverse socio cultural and linguistic backgrounds. The registers were selected as counter-parts based on two situational characteristics, namely mode (spoken vs. written) and communicative purpose (information vs. interaction).2

2 Although the texts are meant to represent the extremes of these two situational characteristics, a strict line cannot be drawn. For example, social letters may also be used to inform, for instance in work-related exchange between colleagues. Likewise, unscripted speeches contain interac-tional elements, as will be evident from the discussion of personal pronouns below.


Table 1: Situational characteristics of registers in sample

Mode/Communicative Purpose informational interactional

written academic writing social lettersspoken unscripted speeches conversation

Based on the discussion above, a number of tentative hypotheses can be formu-lated. First, it is expected that register exerts a strong influence on NP complex-ity. Matched with the two situational characteristics mode and communicative purpose, NP complexity is likely to increase a) from interactional to informational texts, and b) from spoken to written texts. For our four registers, this yields the following:– Academic writing is expected to show the highest frequency of complex

noun phrases. This is mainly due to the informational character, the high level of formality and the careful planning and revision during the produc-tion process.

– Conversation is a highly interactive face-to-face exchange between two or more parties. Due to these situational characteristics, a higher frequency of pronouns, particularly personal pronouns, is expected. Furthermore, conver-sation is expected to contain the lowest frequency of complex noun phrases of all four registers, both due to mode and communicative purpose.

– Unscripted speeches are expected to show a higher degree of NP complexity than conversation. This is due to the formal and informational character of unscripted speeches. However, complexity is expected to be lower than in academic writing because of the spoken mode.

– Social letters are expected to contain more complex noun phrases than con-versation because they are written and are planned and possibly revised during production. The level of NP complexity, however, is expected to be lower than in academic writing, because the communicative purpose of social letters is to interact.

A second motivation of the present study is to further explore the potential of NP complexity as a marker of regional variation, especially in the light of a regis-ter-sensitive comparison (see the discussion in Section 4). Due to the exploratory nature of the study, the results are not tested for statistical significance.

256 Steffen Schaub

2 MethodologyThe present section describes the data and the annotation process used in the following analysis. Section 2.1 discusses various categorisation systems used to mark NP complexity and introduces the system used in the analysis to follow. Section 2.2 describes the corpus data and the annotation process.

2.1 Categorising NP complexity

There are a number of methods for categorising NP complexity. The simplest is a binary distinction into ‘simple’ and ‘complex’ noun phrases, although the line is drawn differently by different authors. The most common understanding of this two-way distinction distinguishes between the presence and the absence of mod-ification; in other words, all pre- and/or postmodified noun phrases are ‘com-plex’,3 while the remaining are ‘simple’. Some authors (de Haan 1993; Biber et al. 1999: 573–655) distinguish four classes of complexity (unmodified, premodified, postmodified, pre- and postmodified), with determination being optional for all four types. A more elaborate system is used in Jucker (1992: 259–260), whose annotation scheme not only specifies the type of head noun and modification(s), but also records the structural depth of the noun phrase, i.e. the degree of embed-ding in the modification.

The present study makes use of the categorization system developed in de Haan (1993), which is also used in Biber et al. (1999: 573–655). It distinguishes four classes of NP complexity: class 1 comprises all noun phrases that lack modifica-tion, including pronouns, proper nouns, as well as unmodified common nouns. In the analysis to follow, class 1 is further subclassified: personal pronouns have been identified as a word class that is highly sensitive to register, so that a finer distinction of class 1 into personal pronouns on the one hand and other types of NP heads on the other is desirable. Class 2 includes all noun phrases that are premodified only. Class 3 includes all noun phrases that are postmodified only. Finally, class 4 includes all noun phrases that are both pre- and postmodified. As a slight modification to de Haan (1993) and Biber et al. (1999), class 4 here also includes multi-head coordinated constructions, e.g. the men and women. All four classes optionally contain determination. Although four classes are distin-guished, the discussions below make occasional reference to the binary simple–

3 In the present paper, determination is not treated as modification.


complex distinction referred to above. The former is identical with class 1, while the latter comprises classes 2–4. The system is summarised in Table 2.

Table 2: Categorisation system for NP complexity (based on de Haan 1993); the (+) symbol indicates possible multiple instances

Simple NPs Class 1 (DET) – HEAD –Class 2 (DET) PREM(+) HEAD

Complex NPs Class 3 (DET) – HEAD POSTM(+)Class 4 (DET) PREM(+) HEAD(+) POSTM(+)

2.2 Corpus and annotation

The analysis to follow in Section 3 is based on a sample of 8,000 noun phrases taken from five components of the International Corpus of English: Canadian English (CAN), Indian English (IND), Jamaican English (JA), Hong Kong English (HK) and Singapore English (SIN). The varieties were selected in order to repre-sent both traditional and ‘new’ Englishes, while at the same time covering differ-ent regions of the world. For each variety, texts from four registers were included: academic writing (from the sub-register ‘humanities’), conversation, social letters and unscripted speeches. For each register, three text units comprising 2,000 words were selected at random. The resulting sub-corpus is a selection of 60 text units stratified across four registers and five varieties, totalling approxi-mately 120,000 words.

In the following, I will describe the annotation process in more detail. First, the noun phrases are marked in the raw data using a simple bracket-and-label system. Only top-level noun phrases are marked; in other words, noun phrases that are embedded in larger noun phrases are not marked separately. As an illustration of the marking system, consider the sample sentence in (1a) and its marked version in (1b). Note how the embedded NP the line in the larger NP the other end of the line is not marked individually.

(1a) Was a pleasant surprise to hear your voice again from the other end of the line.(1b) Was [NP a pleasant surprise] to hear [NP your voice] again from [NP the other end of

the line].

Randomisation was introduced at two steps in the annotation process. First, as stated above, for each register–variety combination, three textual units were picked at random. In these textual units, all noun phrases were marked. Second,

258 Steffen Schaub

a sample of 400 NPs for each register–variety combination was extracted ran-domly, adding up to a total of 8,000 NPs. In the second step, the extracted noun phrases were annotated in a spreadsheet: the annotation includes the variables complexity, based on the four-way categorisation system outlined in Section 2.1, as well as variety, register and length (in orthographic words).

3 ResultsTable 3 shows the frequencies of the four complexity classes across the four reg-isters for all five varieties combined. In general, simple NPs without modification (class 1) are most frequent overall (5,084 tokens or 64 %). Complex NPs (classes 2 to 4) are considerably less frequent: NPs with premodification (13 %) and post-modification (14 %) are relatively equally frequent, while NPs with both pre- and postmodification are the least frequent class (9 %).

Table 3: NP complexity across registers (class 1 = unmodified NPs incl. pronouns; class 2 = premodified NPs; class 3 = postmodified NPs; class 4 = pre- and postmodified NPs and coordi-nated multi-head NPs)

Conversation Unscripted speeches

Social letters Academic writing

Total

Class 1 1,559 (77.95 %) 1,291 (64.55 %) 1,402 (70.10 %) 832 (41.60 %) 5,084 (63.55 %)Class 2 209 (10.45 %) 249 (12.45 %) 249 (12.45 %) 334 (16.70 %) 1,041 (13.01 %)Class 3 159 (7.95 %) 300 (15.00 %) 209 (10.45 %) 466 (23.30 %) 1,134 (14.18 %)Class 4 73 (3.65 %) 160 (8.00 %) 140 (7.00 %) 368 (18.40 %) 741 (9.26 %)

Total 2,000 (100 %) 2,000 (100 %) 2,000 (100 %) 2,000 (100 %) 8,000 (100 %)

The frequencies of the four classes vary with regard to register: simple NPs (class 1) are frequent in conversation (78 %), unscripted speeches (65 %) and social letters (70 %), but relatively infrequent in academic writing (42 %). Analogously, complex NPs (classes 2–4) are relatively infrequent in conversation (22 %) and highly frequent in academic writing (58 %). Taking into consideration the two situational characteristics of the registers as defined in the introduction (mode and communicative purpose), NP complexity increases from spoken to written


mode: social letters have a higher mean NP complexity4 than conversation (1.54 compared to 1.37), and academic writing has a higher mean NP complexity than unscripted speeches (2.18 compared to 1.66). In addition, NP complexity increases from interactional to informational communicative purpose: unscripted speeches have a higher mean NP complexity than conversation (1.66 compared to 1.37), while academic writing has a higher mean NP complexity than social letters (2.18 compared to 1.54).

Table 4: NP complexity across varieties

CAN HK IND JA SIN Total

Class 1 1,034 989 1,001 997 1,063 5,084Class 2 193 251 216 173 208 1,041Class 3 210 219 227 267 211 1,134Class 4 163 141 156 163 118 741

Total 1,600 1,600 1,600 1,600 1,600 8,000

Table 4 shows the distribution of complexity classes across the varieties for all registers combined. Class 1 is the most frequent and class 4 is the least frequent in all varieties (with a relatively low value in Singapore English). Looking at classes 2 and 3, the frequencies are differently balanced across varieties: while most varieties have a higher frequency of class 3, Hong Kong English shows a tendency towards class 2. Furthermore, the frequencies of classes 2 and 3 are rel-atively balanced in some varieties (Indian English, Canadian English, Singapore English), while in others there is greater divergence (Jamaican English, Hong Kong English).

Both Table 3 and Table 4 provide a general overview of NP complexity distri-butions across register and variety. They allow the formulation of first tentative conclusions, such as variety-specific tendencies towards particular classes (e.g. pre- or postmodified NPs). As a second step, it is necessary to look at the distribu-tion of NP classes across both varieties and registers simultaneously.

4 Mean NP complexity is defined here as a numeric value ranging from 1.0 to 4.0. It is the sum of complexity values of n noun phrases divided by n. The higher the mean value, the more frequent-ly we find ‘complex’ noun phrases, i.e. classes 2–4.

260 Steffen Schaub

Table 5: NP complexity in conversation across all varieties

CAN HK IND JA SIN

Class 1 314 298 303 312 332Class 2 36 59 43 27 44Class 3 33 31 34 47 14Class 4 17 12 20 14 10

Table 6: NP complexity in unscripted speeches across all varieties

CAN HK IND JA SIN


Table 7: NP complexity in social letters across all varieties

CAN HK IND JA SIN


Table 8: NP complexity in academic writing across all varieties

CAN HK IND JA SIN


Tables 5 to 8 show the distribution of the complexity classes for each individual register across all varieties. In the following sections, the registers are discussed separately.


3.1 Academic Writing

Academic writing yields the highest frequency of complex NPs across all classes (2–4). This is expected, as academic writing is characterised by dense informa-tion packaging (due to its informational communicative purpose) and carefully planned and revised production, both of which facilitate the use of complex NPs. In academic writing, NPs contain elaborate pre- and postmodification, and they typically contain the majority of lexical content of a sentence. Examples (2) to (6) illustrate typical uses of noun phrases in academic writing (NPs are emphasised in bold).

(2) The left side of Ayearst’s diptych reproduces in painstaking detail, and with close attention to seventeenth-century techniques of glazing, Rembrandt’s frag-mentary Anatomy Lesson of Dr Joan Deijman of 1656, now in the Rijksmuseum, Amsterdam. (ICE-CAN:W2A-001#10:1)

(3) The integration of these two perspectives can form a more comprehensive picture of the person of Jesus Christ. (ICE-HK:W2A-005#14:1)

(4) The whole misunderstanding about Hume’s philosophical position is the outcome of his treatment of causation that is often misunderstood. (ICE-IN-D:W2A-001#58:1)

(5) The casual centrality of the ‘supernatural’ in Brodber’s fiction is also an excellent example of the writer’s adaptation of marginalised thematic concepts from the oral tradition which she legitimises in the very process of ‘writing them up’. (ICE-JA:W2A-005#X14:1)

(6) Though Wittgenstein was mainly concerned with the problem of philosophical explanation, his writings on the relation between language and thought and language and meaning have tremendous implications for both the theory and practice of linguistic science. (ICE-SIN:W2A-005#48:1)

Analogously, academic writing has the lowest frequency of class-1 (or ‘simple’) NPs in our sample (832 tokens or 41.6 %). The relatively low frequency of unmodi-fied noun phrases can likewise be accounted for by the informational character of the register: unmodified noun phrases carry less information than modified ones. Personal pronouns are particularly uncommon: only 225 tokens (11 % of all NPs in academic writing) are realised by personal pronouns, the most frequent being it (61 tokens) and I (33 tokens). 1st and 2nd person pronouns are rare, which can be attributed to the fact that interaction in academic texts is uncommon. The 2nd person pronoun you is particularly rare, since no specific addressee is involved.

With regard to regional variation, I find that academic writing is largely homogeneous across varieties. Few differences appear to exist with regard to pronouns, although two exceptions are worth a brief discussion here. The first person singular pronoun I occurs more frequently in some varieties (Hong Kong

262 Steffen Schaub

English: 15; Jamaican English: 10) than in others (Canadian English: 2; Indian English: 4; Singapore English: 2). However, it would be premature to attribute a more personal writing style to the Hong Kong and Jamaican English varieties based on such low absolute frequencies. Secondly, looking at the frequencies of you, it is noteworthy that the sample contains six occurrences in Singapore English, while the remaining varieties have zero occurrences. A closer look at the data reveals that all occurrences of you in Singapore English originate from one text unit, which is not an academic text in the traditional sense, but instead could best be described as a guide to real estate investment in Singapore. This text unit is characterised by a much more interactive style of writing; it frequently addresses the reader directly and makes use of imperatives, e.g. Take advantage of this law (ICE-SIN:W2A-001#48:1), or Invest your CPF savings in property (ICE-SIN:W2A-001#49:1). Whether such a text constitutes an instance of academic writing, much less in the humanities, is debatable. Nevertheless, the text could be clearly distinguished from other texts of the same register on the basis of one grammatical feature.

There are slight indications of regional variation in the distribution of the complex NP classes, for instance the relative overuse of class 2 and underuse of class 3 in Indian English. Overall, however, there appears to be little variation in academic writing across varieties. This can be interpreted in two ways: one, there is no discernible difference between regional varieties for this register. An argument in favour of this interpretation would be that the homogeneity of the register, and by extension its conformity on an international level, is guaranteed by the publication process. A second interpretation is that the level of abstrac-tion in categorising NP complexity, as it is used in this analysis, is too superficial to bring to light any discernible differences; in other words, although there may be no differences across regional varieties on the superficial level of abstraction assumed here, significant distributional differences might be observed when, for instance, specifying the types of modification involved. At this point, however, we have to conclude that we cannot find regional variation with regard to NP complexity in academic writing.

3.2 Conversation

Conversation has the highest frequency of simple noun phrases of all registers in the study (78 %). This is in line with Biber et al., who find that ca. 85 % of all NPs in their conversation data have no modifier (Biber et al. 1999: 578). Of the class-1 NPs in conversation, more than half are personal pronouns (857 tokens, or 55 %). This also confirms Biber et al.’s finding that “pronouns are slightly more common

than nouns in conversation” (Biber et al. 1999: 235). The relatively frequent reli-ance on pronouns is due to the “shared situation and personal involvement of the participants” (Biber et al. 1999: 235).

Class-2 NPs are the most common type of modified noun phrase in conversa-tion. They account for 10 % of the NPs. With regard to premodification, Biber et al. find that the vast majority of premodification sequences in noun phrases does not exceed two words (Biber et al. 1999: 597). This is confirmed in the present analysis: the average length (in orthographic words) of class-2 NPs in conversa-tion is 3.2 (including head and any determiners). This means that premodification amounts to 1–2 words on average. The most common type of premodification is by adjective or noun, optionally including a determiner, as the examples below illustrate.

(7) Uhm because David does say that hiking boots make an enormous difference not slide on anything (ICE-CAN:S1A-001#3:1:A)

(8) Sometimes uhm the people uh sorry people of India they are they belong to different communities and they have their separate cultures (ICE-IND:S1A-005#62:1:B)

Longer class-2 NPs (>3 words) are uncommon and usually the result of correction or coordination, as can be seen in examples (9) and (10) below. Proper cases of multiple premodification, as in examples (11) and (12), are rare. This is because the real-time analysis of longer premodification sequences places a heavy cogni-tive burden on the listener, rendering spoken communication ineffective.5

(9) I know because I I can’t talk to an answering machine telephone answering machine <unc> three-words </unc> (ICE-HK:S1A-009#4:1:D)

(10) nine hundred but on average about four hundred five hundred dollars both lah the reception and the sanctuary (ICE-SIN:S1A-001#33:1:A)

(11) A very bright cheerful smiling face (ICE-IND:S1A-001#108:1:A)(12) We are entirely functional loving human beings (ICE-CAN:S1A-009#54:1:B)

Postmodified NPs (class 3) are relatively uncommon in conversation (8 %). Post-modification tends to be slightly longer than premodification. The mean word length of the former is 7.1 (as compared to 3.2). This value is relativised to some extent when looking at the median, which is 5. Subtracting head and optional determiner, this means that the length of postmodification averages between 3–4

5 See Quirk et al. (1985: 1039): “Considerable left-branching is possible in the noun phrase, […] although comprehension becomes more difficult as the complexity of left-branching increases”.

264 Steffen Schaub

words. The slightly higher mean value (7.1) is caused by rare instances of complex postmodification, as in examples (13) and (14).

(13) Uhh I remember my friend Mendela that beautiful millionaire meatpacker from Saskatoon who was so nice to me when I was a young man […] (ICE-CAN:S1A-009#85:1:A)

(14) Naturally if Mitterand President Mitterand [sic] can run his government for a period of ten years uh why India cannot have a government consisting of some <uh> party <uh> national party national party representing the national capital or some pro-gressive elements <uh> <uh> in some some political parties like Congress-I Con-gress-S or even Janata Dal with some <uh> radical members belonging to <uh> communist party or socialist party (ICE-IND:S1A-005#19:1:A)

Finally, class-4 NPs are extremely rare in conversation, accounting for only 4 % of all noun phrases in the data. The most frequent type is a combination of a one-word (nominal or adjectival) premodification plus postmodification by a short prepositional phrase (usually with of), as the following examples illustrate:

(15) But what is after the road No the other side of the road (ICE-SIN:S1A-001#88:1:B)(16) I said I behave as if this might be the last day of my life […] (ICE-CAN:S1A-009#88:1:A)(17) […] and you would have seen a different spin to the thing (ICE-JA:S1A-009#X67:1:A)

Orthographically longer class-4 NPs are often the result of multiple coordina-tion or performance phenomena, including repetitions, repairs and hesitations. Example (18) is a coordinated list of postmodified NPs, which contains several repairs and repetitions as well as a hesitation marker (uh).

(18) Political exchange <uh> tourist exchange tourist exchange or scholars exchange of scholars or exchange of technocrats (ICE-IND:S1A-005#37:1:A)6

Comparing the frequencies of the conversation data across varieties, we observe distributional differences, which are mainly the result of individual varieties over- or underusing certain complexity classes. We can pinpoint a) a relative overuse of class-2 NPs in Hong Kong English, b) an underuse of class 2 in Jamaican English, c) an overuse of class 3 in Jamaican English, and d) an underuse of class 3 in Sin-gapore English. Looking at the data, however, it is difficult to identify a pattern which explains the over- or underuse (see discussion in Section 4).

6 The example in (18) is assigned the complexity value 4, as it is a coordinated (multi-head) con-struction (see Section 2.1). A ‘cleaned-up’ version of the noun phrase could be political exchange, tourist exchange or exchange of scholars or exchange of technocrats.


3.3 Unscripted speeches

Unscripted speeches are characterised by their spoken mode, a spontaneous, conversation-like production situation and the informational and/or persuasive communicative purpose. With regard to NP complexity, unscripted speeches rank between conversation and academic writing. While NP complexity is expected to be high due to the register’s informational communicative purpose, it is expected to be low because it is unscripted and spoken. The result is an intermediate level of NP complexity with slightly higher frequencies in the three complex noun phrase types, as compared to conversation.

Unscripted speeches have the third-highest frequency of class-1 NPs in the sample (1291 tokens or 65 %). Personal pronouns constitute about half of the class-1 NPs (683 tokens or 53 %). The most frequent personal pronouns are I (162), you (126) and it (109). The reliance on personal pronouns can be related to the setting, since speeches usually take place in public in front of an audience and speakers use personal pronouns to create an impression of interaction between themselves and the audience. Furthermore, speeches frequently have the purpose of persuading the audience, which is facilitated by direct references, such as I and you. Examples (19) and (20) illustrate the kind of direct addressing typically found in speeches.

(19) Okay don’t think that they’re going to give you time okay after your job interview Don’t think they’re going to take care of you in a very big way okay (ICE-CAN:S2A-021#29–30:1:A)

(20) You have to vote more opposition strong opposition not only to establish opposition in parliament Make opposition part of our political culture not only that but also an effec-tive an effective hammer over the head of PAP If you don’t do that what will happen You can bet your last dollar after this election prices will sure to go up (ICE-SIN:S2A-021#34–37:1:A)

With regard to complex noun phrases, unscripted speeches have the second-high-est overall frequency in the sample (35 %). This is due to the informational com-municative purpose of speeches, which necessitates the use of modified noun phrases to convey information. The overall level of NP complexity is higher in speeches than in social letters, despite the latter being written. In direct compari-son, unscripted speeches and social letters make equally frequent use of premod-ification, while in classes 3 and 4, unscripted speeches surpass social letters. Like in conversation, the tendency for a stronger reliance on postmodification instead of premodification in unscripted speeches can be explained on the basis of easier comprehensibility of right-branching (see Quirk et al. 1985: 1039).

266 Steffen Schaub

Comparing the results across varieties, the following observations are note-worthy: assuming an even distribution, the frequency of premodified NPs (class 2) is relatively low in Canadian English (33 tokens) and high in Hong Kong English (78 tokens). Furthermore, postmodified NPs (class 3) are relatively frequent in Jamaican English (82 tokens), but infrequent in Singapore English (44).

3.4 Social letters

Class-1 NPs are by far the most frequent noun phrase class in social letters, con-stituting between 65 % and 75 % of all NPs in each 400-word variety sample. Per-sonal pronouns form the majority of class-1 NPs (ranging from 52 % to 61 % across varieties). This can be attributed to the interactional character of social letters, which mainly rests on the frequent use of I and you.

The frequencies of class-2 and class-3 NPs are relatively balanced, with a slight preference for class 2. Class 4 is the least frequent noun phrase type in this reg-ister across all varieties, with the exception of Canadian English. Constructions in this category show a range of variation. A typical kind of class-4 construction are multi-head NPs coordinated with and or or. Class-4 NPs which are not coordi-nated are often nouns premodified by one adjective or noun and postmodified by a prepositional phrase, as in the examples (21) to (23). Complex noun phrases in social letters are very similar to those found in conversation and form a contrast to the lexically heavy class-4 NPs found in academic writing.

(21) I hope that I will be able to come to Kolhapur in the first week of Jan. (ICE-IN-D:W1B-002#47:1)

(22) My point is that if one can love the other person without calculate what one can get back from the relationship, this will be the greatest love of all. (ICE-HK:W1B-001#144:5)

(23) The team is still waiting for a final reply from the administration of this university but I’m not optimistic. (ICE-SIN:W1B-001#148:2)

More complex examples are rare in social letters. Long, heavily modified noun phrases clearly originate from letters with an academic background, as example (24) illustrates.

(24) I would need a formal invitation from you for collaboration with specific refer-ence to the project & [sic] that it would not involve financial liabilities for the University. (ICE-IND:W1B-005#7:1)

In general, the register category of ‘social letters’ in ICE contains heterogene-ous content, with some letters discussing everyday activities (e.g. basketball


practice, reports from an exchange year) and others clearly coming from an aca-demic context (correspondence between students and professors). NP complex-ity is higher in the latter. It remains debatable whether one text category should include both subtypes.

Comparing NP complexity across varieties, there is relative underuse of pre-modified NPs in Jamaican English, overuse of postmodified NPs in Singapore English, and overuse of pre- and postmodified NPs in Canadian English and Indian English.

4 NP complexity across varietiesIn this section, I review the potential of NP complexity as a marker of variation across regional varieties of English. As discussed in the introduction, the field is currently in the process of shifting from studies of regional nation-state varieties as holistic entities, and towards acknowledging register variation. Any compara-tive study of varieties of English, it is argued, must take register into account. The inclusion of register leads to a more discriminating picture of structural prefer-ences in regional varieties of English. Such preferences may occur ina) one or more varieties and one specific register, b) one or more varieties and several registers (with shared situational character-

istics), andc) one or more varieties as a whole (i.e. in all registers).

The preceding sections already isolated the first, namely variety-plus-regis-ter-specific preferences, such as the relative overuse of premodified NPs in Hong Kong conversational data. Regarding the second ― registers that share one situ-ational characteristic ― we can identify a number of variety-specific tendencies. Again assuming even distribution within NP classes across varieties, the follow-ing tendencies can be observed: – relative overuse of premodified (only) NPs in spoken Hong Kong English– relative overuse of postmodified (only) NPs in spoken Jamaican English– relative underuse of postmodified (only) NPs in spoken Singapore English– relative underuse of premodified (only) NPs in interactional Jamaican

English.

These preferences can be matched with descriptions of varieties of English: for instance, the relative overuse of premodification in spoken Hong Kong English is in line with the description in Setter, Wong and Chan (2010: 61). Although this

268 Steffen Schaub

approach enables us to isolate structural preferences of NP complexity for par-ticular varieties and register situations, these tendencies do not take into account other factors influencing NP complexity, such as syntactic function, and have to be interpreted with caution.

The explanation most commonly offered for the emergence of structural innovations in varieties of English (in particular, postcolonial or New Englishes) is language contact (also called ‘transfer’ or ‘cross-linguistic influence’). Gut (2011: 105) points out that “as yet there exists no reliable method of quantifying the relative contribution of cross-linguistic influence on any structure produced by language learners”. This is especially true for NP modification patterns, which are strongly influenced by other factors, such as register and syntactic function. In addition to that, NP modification patterns can only be identified in the form of (statistical) preferences and are thus not directly identifiable as the result of contact-induced change (unlike, for instance, loanwords). The approach in the present study is suitable for detecting candidates for such structural tendencies. However, more factors need to be included and weighed against each other in order to confirm these preferences (see Schilk and Schaub forthc.).

5 Conclusion and outlookThis study systematically compared NP complexity in a selection of registers and across a range of regional varieties of English. The results, based on data from five varieties of English, corroborate the strong connection between NP complex-ity and register. Across all regional varieties, NP complexity correlates with two situational register characteristics:– communicative purpose: NP complexity increases from interactional to infor-

mational registers, and– mode: NP complexity increases from real-time and spoken to planned and

written registers.

Overall, NP complexity is largely homogeneous within registers across the regional varieties. Consistency is higher in registers which have stricter codifica-tion (e.g. academic writing). Nevertheless, assuming even distribution across all varieties, it is possible to isolate individual varieties which show relative over- or underuse of particular NP structures. Furthermore, it is possible to match such preferences for pairs of registers that share situational characteristics. NP com-plexity has already been established as a register marker, and, it is argued here, is a viable marker of regional variation on the register level.


There are numerous ways in which subsequent research can improve on the study presented here. First, the database of noun phrases has to be extended to provide a more solid empirical foundation. Second, by adding further annotation to the data, such as syntactic function, type of head noun, and type of modifi-cation, more fine-grained statements about differences in NP complexity across varieties are possible. This study has also shown that random selection of text units from the International Corpus of English for the purposes of a register anal-ysis is not desirable. The texts included in some of the register categories in ICE are too heterogeneous. Instead, text units have to be carefully selected in order to ensure compatibility across varieties. Finally, any variety-specific structural pref-erences have to be matched against the typological inventory found in the sub-strate languages. Only then is it possible to draw any connections to the possible origin of such preferences, and to substantiate claims about structural transfer.

6 ReferencesAarts, Flor G. A. M. 1971. On the distribution of noun-phrase types in English clause-structure.

Lingua 26. 281–293.Ahulu, Samuel. 1998. Grammatical variation in international English. English Today: The

International Review of the English Language 14(4). 19–25.Asante, Mabel Yeboah. 1995. Ghanaian English: Motivation for divergence from the standard

in certain grammatical categories. Tübingen: Eberhard Karls University Tübingen dissertation.

Asante, Mabel Yeboah. 2012. Variation in subject-verb concord in Ghanaian English. World Englishes 31(2). 208–225.

Balasubramanian, Chandrika. 2009. Register variation in Indian English. Amsterdam: Benjamins.

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP.Biber, Douglas, Stig Johansson, Geoffrey N. Leech, Susan Conrad & Edward Finegan. 1999.

Longman grammar of spoken and written English. 9th impr. (2011). Harlow: Longman.Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP.Blair, David & Peter Collins. 2001. English in Australia. Amsterdam: John Benjamins.Brunner, Thomas. 2014. Structural nativization, typology and complexity: Noun phrase

structures in British, Kenyan and Singaporean English. English Language and Linguistics 18. 23–48.

Fludernik, Monika & Bernd Kortmann (eds.). 2012. Proceedings: Anglistentag 2011 Freiburg. Trier: Wissenschaftlicher Verlag Trier.

Gut, Ulrike. 2011. Studying structural innovations in new English varieties. In Joybrato Mukherjee & Marianne Hundt (eds.), Exploring second-language varieties of English and learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44), 101–124. Amsterdam: John Benjamins.

270 Steffen Schaub

Haan, Pieter de. 1993. Noun phrase structure as an indication of text variety. In Andreas H. Jucker (ed.), The noun phrase in English: Its structure and variability, 85–106. Heidelberg: Winter.

Hall, Christopher J., Daniel Schmidtke & Jamie Vickers. 2013. Countability in World Englishes. World Englishes 32(1). 1–22.

Halliday, Michael A. K. 1989. Spoken and written language. 2nd edn. Oxford: OUP.Jucker, Andreas H. 1992. Social stylistics: Syntactic variation in British newspapers (Topics in

English Linguistics 6). Berlin: Mouton de Gruyter.Jucker, Andreas H. (ed.). 1993. The noun phrase in English: Its structure and variability.

Heidelberg: Winter.Kortmann, Bernd & Kerstin Lunkenheimer (eds.). 2013. The electronic world atlas of varieties of

English. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://ewave-atlas.org (accessed 28 February 2015).

Lamidi, Mufutau T. 2007. The noun phrase structure in Nigerian English. Studia Anglica Posnaniensia: An International Review of English Studies 43. 237–250.

Mukherjee, Joybrato & Marianne Hundt (eds.). 2011. Exploring second-language varieties of English and learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44). Amsterdam: John Benjamins.

Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik & Bernd Kortmann (eds.), Proceedings: Anglistentag 2011 Freiburg, 75–94. Trier: Wissenschaftlicher Verlag Trier.

Platt, John, Heidi Weber & Mian Lian Ho. 1984. The new Englishes. London: Routledge and Kegan Paul.

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. 4th edn. London: Longman.

Sand, Andrea. 2004. Shared morpho-syntactic features in contact varieties of English: Article use. World Englishes 23(2). 281–298.

Sand, Andrea. forthc. Angloversals? Shared morpho-syntactic features in contact varieties of English. Amsterdam: Benjamins.

Schäpers, Uta Katharina Elisabeth. 2009. Nominal versus clausal complexity in spoken and written English: Theory and description (English Corpus Linguistics 8). Frankfurt: Peter Lang.

Schilk, Marco & Steffen Schaub. forthc. Noun phrase complexity across varieties of English: Focus on syntactic function and text type. English World-Wide 37(1).

Setter, Jane, Cathy Wong & Brian Chan. 2010. Hong Kong English. Edinburgh: Edinburgh UP.Wahid, Ridwan. 2013. Definite article usage across varieties of English. World Englishes 32(1).

23–41.Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes. World

Englishes 28(4). 421–450.

Valentin WernerReal-time online text commentaries: A cross-cultural perspective

Abstract: In the area of electronically-mediated communication, real-time online text commentaries (OTCs) as a new specialised register have become popular as an alternative to traditional broadcasting. OTCs have been recognised as “mediated quasi-interaction” (Chovanec 2010) and a hybrid genre showing characteristics of spoken discourse within a written mode (Jucker 2006), as well as a character-istic combination of simultaneous information and entertainment (“infotain-ment”), where familiarity or “pseudo-intimacy” (O’Keeffe 2006; cf. Chovanec 2008) between commentator and the audience is created. This contribution helps to situate this emerging register from a cross-cultural perspective. I use OTCs by English and German media outlets from the EURO 2012 football championship to tackle the following issues with the help of a corpus-linguistic approach: (i) What are register-specific structural features of OTCs? (ii) Are there any culture-specific aspects along language boundaries or the dimension “intended readership”? I also consider the interaction of layout and content, production circumstances, and the influence of recent developments (such as the incorporation of Twitter messages) on reporting styles.

0 IntroductionReal-time online text commentaries (henceforth OTCs)1 have become more and more popular2 and represent an alternative to traditional live TV and radio broad-

1 Alternative labels are live text commentary (LTC), live blogging, live ticker, news ticker and the more sport-specific minute-by-minute report (MBM) or live match tracker.2 According to a recent survey, OTCs have become “the default format for covering major break-ing news stories, sports events, and scheduled entertainment news”, even surpassing online articles and picture galleries in popularity (Thurman and Walters 2013: 82; cf. also Wells 2011). The growing importance of the format is revealed both by the sheer number of OTCs (almost 150 per month for The Guardian) and also in terms of page view counts, which are at least twice as high for OTCs compared to articles and galleries. User reports seem to confirm that OTCs are the

Valentin Werner, University of Bamberg

272 Valentin Werner

casting, reporting and commenting on live events controlled for duration, loca-tion and topic (cf. Siever 2011: 171), in particular major sports events. As the name implies, they are usually categorised as a written form of web communication (cf. Biber and Egbert, this volume) and are similar to (we)blogs in that they consist of individual consecutive postings (cf. Grieve et al. 2010: 303).

While previous research has recognised the narrative properties and ana-lysed the vocabulary and morphosyntax of football reportage in general (Brandt and Quentin 1983; Ghadessy 1988; Hennig 2000; Krone 2005; Müller 2007; Levin 2008), others have noted that OTCs as “mediated quasi-interaction” (Fairclough 1995: 40) constitute a hybrid register: They show characteristics of spoken dis-course within a written mode (Jucker 2006; cf. Lakeberg, this volume) and are an interesting combination of simultaneous information and entertainment (“info-tainment”). Thus, familiarity or “pseudo-intimacy” (O’Keeffe 2006: 92; cf. Cho-vanec 2008, 2010; Jucker 2010) between the commentator and the audience is created.

Two further issues are important for establishing OTCs as a register, defined (following Biber 1988) as language variety by situational (i.e. non-linguistic) characteristics (see also Schubert, this volume). First, “situational context tends to exert functional pressures on linguistic output” (Grieve et al. 2010: 315), which implies there should be common linguistic features traceable across different OTCs, particularly if they report on the same matches. Second, there is the con-trastive view. It was hypothesised, albeit for other types of football reportage, that “[t]ypological differences between […] two languages are expected to be neutral-ised to a certain degree” (Krone 2005: 51) when texts from two languages fulfil the same function (in our case, football match reportage). Others (e.g. Müller 2007: 44), however, have emphasised that cultural differences may lead to noticeable stylistic differences.

Starting from these observations, this paper will address the following aspects with the help of a corpus-linguistic approach: (i) What are register-specific structural features of OTCs on different levels of

linguistic analysis?

online format par excellence to track stretches of live events, as more than 35 % of respondents follow OTCs continuously. Nearly two fifths of all OTCs are sports-related (Thurman and Walters 2013: 82–95). Data for Der Spiegel are in line with these general findings as OTC football report-age receives more than 1 million clicks per match (See <www.spiegelgruppe.de/spiegelgruppe/home.nsf/0/CEF3A44164AED9BBC1256F720034CBAC>, accessed 20 April 2013).

Real-time online text commentaries: A cross-cultural perspective 273

(ii) Are there any culture-specific aspects along language boundaries or the dimension “intended readership”; or do OTCs rather form a relatively uniform cross-linguistic/cross-cultural register?

Further aspects addressed are the interaction of the layout of OTCs with their content as well as the influence of very recent developments (such as the incor-poration of Twitter messages) on the style of reporting.

After a few notes on data and methodology, the present study first sets out to locate OTCs as a register in general terms in Section 2. Section 3 provides an anal-ysis of the language of OTCs, focusing on vocabulary and collocations and related semantic aspects, discourse features and potential implications of the interaction of format and textual commentary. A discussion of OTCs as a cross-cultural reg-ister follows in Section 4, while Section 5 sums up the results and presents some generalisations as well as avenues of further research.

1 Data and methodologyWhile previous research on OTCs has been dedicated almost exclusively to foot-ball reportage from The Guardian (Chovanec 2008, 2009, 2010, 2011; Perez-Sa-bater et al. 2008; but cf. Jucker 2006, 2010), the present analysis is based on OTCs of two English (The Sun, henceforth SUN; The Guardian, henceforth GUAR) and two German (Bild, henceforth BILD; Der Spiegel (online), henceforth SPON) media outlets, all stemming from the coverage of the UEFA 2012 EURO Champi-onship Finals. This facilitates the comparison of the reportage along language boundaries as well as along the dimension of intended readership.

The print versions of both SUN and BILD can be categorised as tabloids, pre-dominantly aimed at a working-class readership (see also Höke 2007). Another common feature is their circulation, with approximately 2.4 million (SUN) and 2.5 million (BILD) copies sold on a daily basis, making them the most popular papers in England and Germany respectively. In contrast, GUAR and SPON can be viewed as quality-press products, primarily catering to a middle-class readership. Their circulation, with 0.2 million daily (GUAR) and 1.0 million weekly (SPON), is less extensive. The online versions of all four sources are amongst the news sites visited most often nationwide (Press Gazette 2013; see also <daten.ivw.eu/index.php>). For the present analysis it is presumed that the intended readership of the online version roughly corresponds to the intended readership of the printed version (cf. Newsworks 2013a, 2013b).

274 Valentin Werner

To have a comparable dataset, the corpus includes a total of 36 match reports for the English and the German squads (amounting to a token count of 120,414 words; see the appendix for a detailed list). In the first instance, the main focus of the structural analysis lies on the English OTCs, which are organised in a linear way, usually in reverse order and time-stamped (see further Section 2).

For the extraction of the data, the running text chunks with the correspond-ing time stamps were manually copied and saved as plain text files in order to exclude unwanted meta-data and to make them machine-readable. Subse-quently, these text files were loaded into Wmatrix (Rayson 2008; see <ucrel.lancs.ac.uk/wmatrix/>). This online annotation tool provides automatic part-of-speech tagging with the CLAWS 7 tagset (<ucrel.lancs.ac.uk/claws/>) as well as semantic annotation with USAS (<ucrel.lancs.ac.uk/usas/>). In addition, it offers various concordancing, wordlist and keyword functions. For the analysis of n-grams, for keyword analyses and for concordance searches, AntConc 3.3.5w (<www.antlab.sci.waseda.ac.jp/antconc_index.html>) was used both for the English and the German data.

In addition, the webpages containing the OTCs were saved in order to access paratextual features such as Twitter feeds, tables, graphs or integrated videos and to assess their potential influence on the main text. The rationale behind includ-ing this data is the growing trend in linguistics that “[e]ver more phenomena that would previously have been termed paralinguistic, in the sense of accompanying but only weakly influencing linguistic form and expression, are now being moved into the center of concern” (Bateman 2012: 3990). Therefore, the present corpus can be seen as multimodal.

2 OTCs as a register

2.1 Electronically-mediated communication and sports reportage

Broadly speaking, in the scarce amount of work available to date, the style of football reportage has been described as resembling conversation (cf. Ferguson 1983: 156–157), but some have highlighted its monologic quality, emphasising its narrative properties where the commentator acts as mediator and filter (Brandt and Quentin 1983: 21; Hennig 2000: 44). A comparison of OTCs and traditional types of live reportage in terms of a summary overview of results from previous analyses (Perez-Sabater et al. 2008; Chovanec 2008, 2010; Jucker 2010; Thurman and Walters 2013) yields the picture displayed in Table 1.


Table 1: Comparison of traditional registers of sports/football live reportage with OTCs

Radio TV OTC

STRUCTURAL FEATURES

Event-related versus non-event-related sections

✓ ✓ ✓

Unscripted ✓ ✓ ✓

Channels (visual/aural/textual) ✗/✓/✗ ✓/✓/(✓) (✓)/✗/✓

Temporal limitation ✓ ✓ ✓

LINGUISTICFEATURES

Narrative style ✓ ✓ ✓

Monologic structure (one-to-many) ✓ ✓ (✓)

Orality/informality/casual tone ✓ ✓ (✓)

Jargon/slang/idioms ✓ ✓ ✓

Formulaic language ✓ ✓ ✓

Ellipsis ✓ ✓ ✗

Table 1 shows the shared characteristics of both OTCs and traditional live report-age as events in mass communication, while humour is another broad commu-nication strategy characteristically used in all types. Owing to the features listed above, sports reportage generally has been described as some kind of “enter-tainment” genre, even though its primary function arguably is to report factual content (Brandt and Quentin 1983: 20; Chovanec 2011: 253–254).

However, a number of differences on account of the channel of distribution (web page mainly with textual content + interactive elements) and to the particu-lar properties of electronic communication (e.g. the staging of familiarity,3 see further Section 3.2 and Jucker 2010: 66) emerge. Above all, a point worth noting is the way in which the recipients consume media forms such as OTCs. They are produced fairly quickly and without many corrections as the commentator is under time pressure due to the co-extensive nature of the event described and its description (Jucker 2010: 64).4 Likewise, the consumption is quick and cursory, as

3 According to Dürscheid (1999: 23), the staging of familiarity (and the resulting “pseudo-inti-macy” between participants; O’Keeffe 2006; see further Section 3.2) in written electronic com-munication is characterised by an apparent closeness of those involved in such a communicative situation. This is due to the immediacy of the exchanges via the electronic medium, which is supported by the use and acceptance of features typically occurring in the spoken mode.4 Indeed, typos, interpretable as a typical feature of online production under time constraints, repeatedly occur in all of the OTCs analysed (see e.g. examples (41), (56) and (59) below).

276 Valentin Werner

is the case with many other electronic offerings (Dürscheid 1999: 21). These find-ings suggest that there are areas of both overlap and divergence between OTCs and traditional forms of sports reportage. In addition to the aspects mentioned in the foregoing, it will be shown in the following how OTCs can be further related to the domains of sports and news reportage, but why they should be categorised as a separate, fairly institutionalised, register serving a discourse community (O’Keeffe 2006: 19, 29).

2.2 Layout and production

The fundamental difference between commentators in traditional and in elec-tronic media (including OTCs) is the loss of their ‘gatekeeping’ function. With the advent of internet communication, reporters are supposed to transfer, modularise and visualise information without any prioritising (Jucker 2005: 17). OTCs seem to be a nearly perfect format to achieve this, while another of the defining properties is their immediacy and speed and a particular ‘live’ atmosphere, highly valued by the online audience (Simons 2011: 180; Thurman and Walters 2013: 95). That OTCs in practice actually represent a new form of journalism can also be deduced from the fact that the task of creating the input is more often than not assigned to a freelance journalist or intern rather than to a regular editorial staff member. Economic considerations also play a role here, of course. OTCs as a rule are com-posed in an editorial office in front of a TV screen and only rarely in the foot-ball stadium (Holger Müller, personal communication). In the majority of cases a single commentator is responsible for the coverage, who acts as the voice in the OTC. That means he introduces himself and refers to himself in the first person. At times, however, a person mirroring and choosing readers’ mailings for inclu-sion in the commentary may support the commentator. This person may also be responsible for taking care of any technical issues occurring during the reportage (Thurman and Walters 2013: 91–92; see below for other interactive elements).5

5 The corpus even contains a few meta-comments on technical issues during production, after the conventional layout and the technical platform apparently had been changed: Yes, yes this looks a bit different to our usual minute-by-minute reports, but rather than moan about change, why not embrace it? Or moan about it privately. I’m just a drone who’s following orders and doing what he’s told. And besides, I quite like it, because I can put in big red quotation marks… (ukr_eng_1906_guar); I do love this new headline facility… (ukr_eng_1906_guar)

Figure 1: Commentary and overview section of the SUN OTC (from swe_eng_1506_sun; <www.thesun.co.uk/sol/homepage/sport/football/match_centre/article3670013.ece>, accessed 12/07/2012, 10:21)

Jucker defines OTCs as a “complex combination of visual and textual features […] giv[ing] the recipient not only a narrative account of the events so far, but also an overview of the situation at present” (2010: 59). Typically, the textual informa-tion is shown in reverse chronological order, with the most recently added post

278 Valentin Werner

appearing at the top of the page (see Figure 1 for an example).6 This post-by-post (or minute-by-minute) reporting style is supposedly a fairly recent development illustrating the influence of structure on activity (O’Keeffe 2006: 31). This means that the special properties of OTCs as a form of electronically mediated communi-cation have an impact on the style of reporting. In fact, OTCs surprisingly resem-ble a certain type of after-match report which appeared in printed publications as early as the 1950s (see Figure 2).

Figure 2: Excerpt from Kicker FUSSBALL-ILLUSTRIERTE (1954) adapted from Burkhardt (2010: 11)

What is new, however, are the opportunities offered by the technology to use a similar reporting style for live reportage, and the additional options the electronic

6 The content management system of a media outlet may allow reversing the anti-chronological order once the event has finished, so that the report appears as a kind of article readable from top to bottom. For instance, this is the case with GUAR (Thurman and Walters 2013: 92) but does not apply for the other OTCs explored in this study. Occasionally, earlier postings are corrected or altered in order to make them more readable after the description of the actual event (e.g. during half-time breaks or before the order is reversed (Simons 2011: 181). Thus, OTCs are a register that is both dynamic and static (Chovanec 2010: 239).

medium offers. Sometimes the readers have the choice to filter the textual data to quickly update on the most important events in the match (i.e. goals, fouls and substitutions). Other elements that could be added (usually outside the frame or area where the main commentary appears) are links and embedded audiovisual content (Thurman and Walters 2013: 83). In football reportage in particular, the majority of OTCs offers sections, tabs or links on the score (also of simultaneous matches) and scorers, current and starting team line-up and on general statis-tics (shots on goal, cards, ball possession, etc.). One of the most intricate OTCs is offered by SPON, where readers can also retrieve the real-time statistics for each individual player. This OTC further includes “heatmaps” (see Figure 3) showing the positions/operating range of the individual player or of the full team on the pitch.

Figure 3: Heatmap of the English team (left) and Italian central midfielder Andrea Pirlo (right) in SPON (from ita_eng_2806_spon; <www.spiegel.de/sport/fussball/em-2012-liveticker- spielplan-und-alle-statistiken-a-836448.html>, accessed 02/07/2012, 10:28)

The presence of all of these elements appears to suggest a secondary impor-tance of the textual data of the commentary (cf. Jucker 2005: 17). Actually, the paratextual elements are also mainly textual (that is, they encode information orthographically) and present factual information. This might determine the style and content of the commentary, as factual information is constrained to the para-textual elements (Perez-Sabater et al. 2008: 251; cf. Bateman 2012: 3985). Occa-sionally (this mainly applies to GUAR), these additional elements are used for mere entertainment purposes without any direct relation to the event described (Thurman and Walters 2013: 85). In any case, it is necessary to consider the com-bination and interplay between these two categories in a linguistic analysis (see Section 3.3 below).

Generally speaking, we can describe OTCs as examples of mash-ups of dif-ferent journalistic styles (reporting, commenting, glossing; cf. Simons 2011: 179). Turning to the common layout of the commentary, we can establish the following

280 Valentin Werner

simplified scheme (in chronological order), abstracted from the four OTC types investigated:

Table 2: OTC phases and their typical content

Phase Typical content

“Appetiser” (published a few days or hours in advance)

Statements on the relevance of the match

Preamble/preview Self-introduction of the commentator, welcoming the readers, match-related interview passages

Background information Team line-ups, tactics, referees, results in previous encounters, description of atmosphere, jersey colours, national anthems

Commentary Play-by-play description and comment, half-time summary and preview (readers’ comments)

Summary and overall match comment

Consequences for teams, naming goal scorers and order of scoring

Outlook Next fixture of the team(s)

Goodbye

This highly structured layout in large parts corresponds to the progression in tra-ditional football reportage, but OTCs usually finish shortly after the actual match coverage and lack post-match comments and interviews commonly found in radio and especially on TV (cf. Ferguson 1983: 154). Note the differences between the individual OTCs: while the posts of some media (e.g. from SUN) are always organised in the same fixed way (preview – early team news – head to head – the ref – etc.) and are apparently prepared in advance (cf. Simons 2011: 180–181), the data from the other media outlets suggest that they take a more liberal approach and leave the exact arrangement of the posts (particularly in the phases before the actual commentary begins) up to the commentator.7

The length of the individual phases may vary. For example, the length of the pre-match coverage ranges between 176 (swe_eng_1506_spon) and 2,969 words (ita_eng_2406_guar), GUAR overall being most verbose in this respect (see Figure

7 Boundaries between the (idealised) phases are blurred at times, so that information typically found in one phase may also appear somewhere else. For example, information on jersey colours may appear within the first minutes of the actual match commentary, as illustrated by 1’ KICK OFF Germany, in their all-white kit, start the game kicking from right to left (ger_den_1706_sun).


4) and particularly when matches of the English squad are reported (for further quantitative assessment of OTCs, see Section 3.3 below).

Figure 4: Length (in words) of pre-match commentary (AVG = overall average; AVG ENG = average of England match reports; AVG GER = average of Germany match reports)

The phases before the match actually starts serve at least two important commu-nicative functions. First, the ‘appetiser’ section is a device to incite interest in readers and to emphasise the relevance of the match. (1) and (2) can be seen as typical posts.

(1) A titanic clash awaits. (ger_ita_2806_sun)(2) Deutschland gegen Niederlande, das ist der Klassiker, das Non-Plus-Ultra im

Fußball, ach, was sag ich, der heilige Gral bei dieser EM. Ich begrüße Sie herzlich zu diesem Top-Event (ger_ned_1306_spon)8

A second function, also applicable to the background information phase, is to directly address and accommodate the readers into the spectacle and make them part of the match. In this regard OTCs are quite similar to traditional mass media, which aim at linking “the significant and the mundane” (Gerhardt 2006: 131), that is, the allegedly spectacular match and the allegedly ordinary everyday life of the readers. (3) and (4) nicely illustrate this point.

8 Translation: Germany versus the Netherlands, that’s the classic, the non-plus-ultra of football – what am I saying, the Holy Grail of this European Championship. A warm welcome to this top event.

GUAR SUN BILD SPONAVG 1258.3 696.9 606.4 558.9AVG ENG 1708.5 771.25 534.25 298.5AVG GER 898.2 637.4 664.2 767.2

0200400600800

10001200140016001800

wor

d co

unt

282 Valentin Werner

(3) Good evening, everybody. Are ya nervous? Are ya? (ukr_eng_1906_guar)(4) Die Nationalhymnen. Gänsehaut für jeden Fußballfan. Was für eine Stimmung.

(ukr_eng_1906_bild)9

The commentary can be viewed as the core part, with the main communicative function of conveying factual information, although further functions, such as entertainment (see below), should not be discounted. OTCs usually finish with a summary and overall match comment, potentially aimed at members of the audi-ence who only look for a quick round-up of the match and who do not want to read the full coverage.

2.3 Audience participation

Studies of internet communication have always recognised its multimedial nature in the sense that textual data rarely appears in isolation (Dürscheid 1999: 28–29), and the same naturally applies to OTCs. Another dimension of multimediality is the opportunity of interacting with commentators before and while the match coverage is in progress. The question is whether this has ramifications for the structure and content of OTCs.

On the one hand, Chovanec (e.g. 2008) has convincingly shown that audience mail-ins constitute an essential element of OTC football reportage. In addition, he has found that readers’ comments and their citing by the commentator are rarely directly related to the gameplay and thus constitute a second layer of “gossip” with a social rather than an informative function. This considerably extends the scope of the OTC beyond the provision of factual information (as its primary purpose) and is testimony to the entertainment function OTCs can carry. As only a selection of readers’ mails are presented and addressed and, more often than not, reduced to clichés (Chovanec 2008: 260), he labels this type of discourse “qua-si-conversational interactions” (Chovanec 2011: 252). Readers may participate in the creation of the content of the OTC, but only at the discretion of the commenta-tors (or their aides; see above). Given that commentator and contributing readers usually do not know each other personally, casual conversation is only simulated to a certain extent. However, the general applicability of Chovanec’s findings is limited as his analyses are restricted to GUAR data only (see also Thurman and Walters 2013: 85).

9 Translation: The national anthems. Creeps for every football fan. What an atmosphere.

On the other hand, the advent and growing popularity of genuinely inter-active internet applications (the so-called “web 2.0” technologies) could have led to a widespread integration of these into OTCs as another “webby” form of communication, creating dynamic content. The most popular application, poten-tially also most adapted to OTCs as another immediate form of journalism (cf. Chovanec 2010: 239), is the microblogging service Twitter (<www.twitter.com>). Despite its presence on the market since 2006, only one of the OTCs considered in the present study, SPON, has reserved some space for Tweets (that is, Twitter posts). This area (called “Live-Fanblock”, ‘live fan section’) is placed prominently next to the main commentary box (see Figure 5).

Figure 5: Main commentary and Tweets in SPON (from ger_ita_2406_spon; <www.spiegel.de/sport/fussball/em-2012-liveticker-spielplan-und-alle-statistiken-a-836448.html>, accessed 02/07/2012, 10:29)

Commentators actively encourage readers to participate, as in (5), but they do not cite readers’ Tweets in the main commentary. The one exception to this rule is presented below as (6).

(5) Jetzt ist es amtlich – Klose, Schürrle und Reus spielen von Beginn an. Twittert der DFB. Sollten Sie auch den Drang verspüren, ihren Kommentar via Twitter in den Live-Fanblock rechts nebenan zu Tickern, so benutzen Sie bitte den Hashtag #gergre (ger_gre_2206_spon)10

10 Translation: Now we know for sure – Klose, Schürrle and Reus are in the starting line-up. Twit-ters the DFB (= the German football association). Should you also feel the urge to post your com-ments to the Live-Fanblock to the right, please use the hashtag #gergre

284 Valentin Werner

(6) PS: Mein Tweet des Abends: Dehnen ist gut für die Bänder, Bender ist schlecht für die Dänen – @wintersjon! In diesem Sinne, gute Nacht! (ger_den_1706_spon)11

Therefore, rather than engaging in quasi-conversation in the sense defined above, Tweets in SPON should be viewed as truly parallel comment, where readers can express their (unfiltered) opinion and post links.

Although OTCs in GUAR do not comprise a formalised way of incorporat-ing Twitter comparable to the “Live-Fanblock” of SPON, commentators refer to Tweets in a similar fashion as they do with regard to mails (that is, with added comment), albeit rarely in the present data (see (7)).

(7) Over on the Twitter @ianapplegate has this suggestion. Maybe they should at least give Esperanto a go? Can anyone even speak Esperanto? (ger_den_1706_guar)

It emerges from the analysis that, at present, no unequivocal answer can be given to the question as to whether interactive elements influence OTC commentary. However, it could be shown (i) that the extent of how much reader-generated content influences the style and content of OTCs varies considerably and (ii) that different OTCs have different approaches towards interactivity. While two (SUN, BILD) do not provide any opportunity for the readers to get involved, OTC report-age in GUAR provides extensive, though filtered, reader-generated content and related comments, and thus yields a quasi-conversational structure as defined above. The most direct approach arguably is taken by SPON, where Tweets are displayed unfiltered as a by-commentary right next to the commentator’s text. However, the latter does not usually refer to the former in any way, so audience participation could be viewed as constrained in another way.12

11 Translation: PS: My Tweet of the night: Stretching is good for the ligaments, Bender is bad for the Danes – @wintersjon! In this spirit, good night! Note: In the German version, the author of the Tweet exploits the homophony /bendɐ/ between Bänder (‘ligaments’) and Bender (player’s name) for a comic effect.12 Even if the extent of filtering varies, both ways of incorporating interactive elements pre-sumably take account of a point made in audience studies of other media types. To be precise, Gerhardt (2006: 129) maintains that the audience consists of “active social agents whose lives do not come to a halt when they are exposed to a mass medium”. Accordingly, it could be argued that OTCs with interactive elements take a socially more adequate approach towards their read-ers. This view is also supported by Simons (2011: 156), asserting that modern audiences have developed a feeling of being entitled to participation and interaction. Therefore, it is argued that state-of-the-art journalistic practice is liable to incorporate social media in order to render mass media production and use a shared experience. A related point of minor importance is that OTCs sometimes also serve as some kind of by-medium to TV broadcasts where a commentator adds


3 The language of OTCs

3.1 Vocabulary, collocations and semantics

3.1.1 General picture

Like traditional types of sports reportage (cf. Ghadessy 1988: 19), OTCs can be expected to contain a substantial amount of technical vocabulary to describe the gameplay. An exploration of the most frequent content words reveals that items can be broadly categorised into what is shown in Table 3.

Table 3: Categories of content words amongst the top 100 wordlist created with AntConc

Examples for GUAR + SUN SPON + BILD

Names of teams ( geographical location)

England, Germany, Sweden, France, Portugal, Ukraine, Italy

England, Deutschland (‘Germany’), Italien (‘Italy’), Portugal, deutschen (‘German’)

Temporal location min, time, (first/second) half, after

Minute (‘minute’), jetzt (‘now’), dann (‘then’), heute (‘today’), nach (‘after‘)

Sports-/game-related terms

ball, goal, shot, corner, side, kick, area, cross, chance, post, team, game

Ball (‘ball’), Tor (‘goal’), Ecke (‘corner’), (gelbe) Karte (‘(yellow) card’), Strafraum (‘penalty area’), Flanke (‘cross’), Spiel (‘game’), Wechsel (‘substitution’)

Names of players and coaches

Hart, Rooney Hart, Gomez, Klose, Özil, Neuer, Löw

Overall, the comparison between the most frequent content words in English and German OTCs reveals some striking similarities (especially as regards the first three categories in Table 3), but with a slight change in national focus (as regards the players’ names). Note also that the expression of movement, location and direction figures prominently in terms of function words – mainly prepositions – amongst the highly frequent lexical items (e.g. right, up, left, down, back, over,

“colour commentary” to the “action” on the screen. This is especially salient in designated OTCs on particular shows, for instance such as the regular SPON OTC on “Tatort”, a popular German crime series.

286 Valentin Werner

against, to, in, for, from, on, at, by, into vs. in, auf, mit, von, zu, im, aus, an, bei, gegen, vor, nach, zum, ab, am, über, durch, zur, ins).

These findings can be closely related to a semantic keyword analysis in Wmatrix, where the English OTC data are compared against the spoken and written BNC sampler. In this quantitative perspective, salient semantic areas emerge. These are ‘competition’, ‘numbers’ (usually related to spatial and tem-poral orientation), ‘warfare, defence and the army; weapons’, ‘violent/angry’, ‘chance, luck’, ‘long, tall and wide’, ‘success’, ‘failure’, ‘anatomy and physiol-ogy’, illustrated by examples (8) to (15) respectively.13

(8) As it stands, Portugal will go through with a better head-to-head record. (ger_den_1706_sun)

(9) And how England love that decision, because the second effort is sent right onto Lescott’s head, eight yards out, level with the left-hand post. (fra_eng_1106_guar)

(10) That was Klose’s 64th goal for Germany four off Gerd Muller’s record and he almost made it 65 moments later, following up a loose ball and sweeping in a low shot that was kicked behind at the near post by the besieged Sifakis. (ger_gre_2206_guar)

(11) Evra whips a cross into the England area from the left. (fra_eng_1106_guar)(12) It’s high-stakes major-championship Holland versus Germany. (ger_ned_1306_guar)(13) Germany also prevailed in the third-place play-off at World Cup 2006, winning 3-1 in

Stuttgart. (ger_por_0906_sun)(14) Designated scapegoat for when it all goes wrong: Pedro Proenca (Portugal). (ita_

eng_2406_sun)(15) He curls a cross onto the head of Gomez, but the big striker’s header is weak and

wafted miles to the left of the target. (ger_ita_2806_guar)

The analysis of highly frequent content items and the semantic keyword anal-ysis suggest that OTCs do not fundamentally differ from other forms of football reportage, particular radio reportage, as “good playing, moments of risk, signif-icant points of heightened competition” (Ferguson 1983: 156–157) receive most extensive coverage. This can be deduced for example from the high salience of ‘success’ and ‘failure’ semantic tags or the high frequencies of players’ names usually involved when chances in a game occur; that is, strikers/offensive players (Rooney, Özil, Klose, Gomez) and goalkeepers (Hart, Neuer).

Levin (2008: 146) has pointed out that “traditions developed in sports com-mentary are often unintelligible to the uninitiated”, one reason being that com-mentators rely on formulaic language with specialised meanings. In order to test

13 Some of the findings of the corpus software may be due to the metaphorical processes in-volved (cf. also the usage of the terms shot, target and squad, captain, etc.). It is controversial whether “football is war” metaphors still apply or whether they have conventionalised (see also Section 3.1.2).


this claim, I compared the ten most frequent 4-grams in the material for both languages, as shown in Table 4.

Table 4: The ten most frequent 4-grams extracted with AntConc

GUAR+SUN SPON+BILD

Rank Freq. 4-gram Freq. 4-gram

1 41 the edge of the 13 Meter vor dem Tor(‘meters before the goal’)

2 25 edge of the area 11 auf der anderen Seite(‘on the other side’)

3 25 on the edge of 11 aus der zweiten Reihe(‘from the second row’)

4 19 down the inside-right 10 Tooor für Deutschland, X:X(‘goal for Germany, X:X’)

5 16 the inside-right channel 8 auf dem rechten Flügel(‘on the right wing’)

6 14 down the right and 7 in der zweiten Hälfte(‘in the second half’)

7 14 from the edge of 6 da war mehr drin(‘there was more in it’)

8 13 down the inside-left 6 doch der Ball geht(‘but the ball goes’)

9 13 in the first half 6 im Strafraum an den(‘in the penalty area at the’)

10 12 for the first time 6 Meter vor dem Kasten(‘meters before the goal’)

According to the absolute usage frequencies, English OTCs apparently use for-mulaic expressions much more than the German ones. A particularly common collocation (see ranks 1, 2 and 3 in Table 4), better represented as a 6-gram, is on the edge of the X.14 Levin’s (2008) findings can be confirmed insofar that somebody reading OTC reportage has to have (i) knowledge about conventions and a mental image as regards the layout of a football pitch and (ii) about foot-

14 Realisations for X occurring in the data are D, six yard box, England box, Italy penalty area, Sweden penalty area, penalty area.

288 Valentin Werner

ball-related jargon. Fact (i) is especially illustrated by the English data, where the majority of the 4-grams describes movement and/or position and (ii) espe-cially by the German data, where technical terms (partly also related to position) such as Strafraum (‘penalty area’), Flügel (lit. ‘wing’; ‘outer part of the pitch’) or aus der zweiten Reihe (lit. ‘from the second row’; ‘from far away’) appear. The present data therefore suggest that it is not merely “goal scoring and measuring time” (Levin 2008: 146) where formulaic language is employed, although some of the items included in Table 4 (e.g. in the first half; for the first time; Tooor für Deutschland, X:X; in der zweiten Hälfte) support Levin’s claim.

A related aspect is the extended reliance on informal and slang items (Perez-Sabater et al. 2008: 242; cf. Ferguson 1983: 156–157), exemplified by Kasten (‘goal’, lit. ‘box, case’) in Table 4. A recent study on informality (Burkhardt 2010: 14–15) has identified a long-standing tradition of dialectal and informal influence as regards (German) football language, and a similar situation in English appears highly likely. Indeed, the OTC data from both languages confirm a general ten-dency towards informal usage, as examples (16) to (19) show (see also below):

(16) Neat turn from Ozil who twists in the box before feeding Khedira for a low 20-yarder, which Sifakis parries. (ger_gre_2206_sun)

(17) (…) on the sideline Joachim Low is waving his hands around in frustration like an eejit. (ger_gre_2206_guar)

(18) Huiuiui, dieser Reus hat sich einiges vorgenommen. Diesmal rutscht ihm das Spiel-gerät über den Schlappen und fliegt zwei Meter am rechten Außenpfosten vorbei. (ger_gre_2206_bild)15

(19) Fortakis hält einfach mal drauf. Neuer hält einfach mal fest. (ger_gre_2206_spon)16

3.1.2 Intended readership

Lexical differences along the dimension “intended readership” are harder to determine. First of all, a quantitative assessment of the lexical density of OTCs (see Table 5) shows only marginal differences between languages and individual OTCs (SD = 1.50) and standardised type/token ratio values approximating values normally found in written data (e.g. of the written components of the Interna-tional Corpus of English).

15 Translation: Huiuiui, this Reus guy is up for something. This time, the playing device (infml.) slides over his worn-out shoe/slipper and misses the right outer post by two meters.16 Translation: Fortakis just shoots. Neuer just saves.


Table 5: Standardised type/token ratios (TTR) calculated with frequencies from AntConc

GUAR SUN BILD SPON

std. TTR 45.88 42.50 45.51 42.93

In fact, keyword analyses contrasting the vocabulary of the two OTCs respectively (GUAR vs. SUN and SPON vs. BILD) yield a very diverse picture. First, a look at the top 100 keyness words of GUAR vs. SUN (and vice versa) reveals some (groups of) characteristic items. Commentators for GUAR seem to have a preference for technical terms such as tiki-taka or its ad-hoc (mock) variant (das) bundestikiund-taka17 to describe the particular playing style the Spanish and German teams are known for. On a related note, the acronym TBOF (‘two banks of four’), referring to the traditional tactical formation of the England squad, reaches a high keyness rating. Another conspicuous item in the GUAR data is beard. Here, an idiosyn-cratic use of the GUAR commentator, again from the Germany vs. Greece match, is responsible for its salience. While at the beginning of the coverage the player Salpingidis is introduced with the metonymic nickname beard to be feared, as in example (20), at a later point in the match, we can witness a process of personifi-cation and the reference merely by a physiological feature is taken as established, as can also be seen from the capitalisation of the term in example (21).18

(20) Gekas will go up front, with the beard to be feared, Salpingidis moving to the right of midfield. (ger_gre_2206_guar)

(21) The Beard To Be Feared slides a cool low penalty to the right as Neuer goes the other way. (ger_gre_2206_guar)

In contrast, we can generalise from the SUN vs. GUAR keyness list that SUN com-mentators more often than not refer to players by their first names (Mario, Bastian, Antonio, Manuel, Mesut, Cristiano, Miroslav, etc.) and employ more war-/aggres-

17 Burkhardt (2010: 14) presents an overview of the genesis of the term tikitaka. Consider also the word formations das bundestikiundtakafussball (ger_por_0906_guar); I fell asleep after 63 minutes and have only just woken up from a tiki-taka-induced snooze (ita_eng_2406_guar) or Be-cause over-intellectualising Spain’s tiki-totalitarianism isn’t going to be enough when you try to big this up in ten years’ time, I can tell you that for nothing (ger_ita_2806_guar).18 Cf. the following references to England striker Wayne Rooney: Dicke Chance für Mister Haupt-haar! (‘Big opportunity for Mister scalp hair!’; ita_eng_2406_bild); Wieder kommt das lebende Haartransplantat Rooney angeflogen, doch sein Kopfball ist eher eine Rettungstat denn ein Torver-such. (‘Again the living hair transplant Rooney is approaching, but his header is more of a save than an attempt on target.’; ita_eng_2406_spon).

290 Valentin Werner

sion-related terminology (e.g. fires, impact, strike, shot, kill, onslaught) – although it might be argued that some of these items have become conventionalised meta-phors. Puns on players’ names and ad-hoc formations are a common feature of all OTCs and illustrate creative language use in this type of sports commentary (see also Section 3.2 on discourse features below; cf. Golebiowski 2012: 58):

(22) It’s Robben-esque at times from Ibrahimovic (…) (swe_eng_1506_guar)(23) “It’s Goetzille.” Who needs Xaviesta? (ger_gre_2206_guar)(24) Super Mario was brilliant at times for Manchester City this season (…) (ita_eng_2406_

sun)(25) Immer wieder Mad Mario. (ita_eng_2406_spon)(26) THE LAHM BELLS ARE RINGING (ger_ned_1306_sun)(27) LACKING in KLAAS (ger_ned_1306_sun)(28) Schewagol (ukr_eng_1906_bild)(29) Kjaer has the ball toe-poked pass him by Muller with the result that Muller is mullered

to the ground by the Dane. (ger_den_1706_guar)

The keyword analysis of the German OTCs shows that BILD is much more prone to using dialectal and jargon words than SPON. Two illustrative instances are references to Ball (‘ball’) and Tor (‘goal’). While the standard variants (i.e. Ball and Tor) rank high in the keyness list of SPON, within the top 100 keyness items of BILD a variety of informal terms both for the former (e.g. Kugel ‘bowl’, Leder ‘leather’, Pille ‘pill’, Murmel ‘marble’)19 and the latter (e.g. Kasten ‘box’, Hütte ‘shed’) occur. On a related note, other salient items worth mentioning due to their high keyness in BILD are Schlappen (‘foot’; lit. ‘worn-out shoe/slipper’) or Dampf-hammer (‘fast shot on goal’; lit. ‘steam hammer’). This does not mean, however, that SPON commentators do not use informal or jargon items, as the occurrence of some other words listed in Burkhardt (2010) shows (see examples (30) to (32)) – they are just used less frequently.

(30) Also Balotelli sollte heute besser keinen Elfer mehr schießen (ita_eng_2406_spon)20(31) De Rossi schießt, Hart lässt prallen, Balotelli feuert aus kurzer Distanz drauf, wieder

Hart und dann muss Monotolivo das Ding im Nachschuss machen (ita_eng_2406_spon)21

19 The Kicktionary (<www.kicktionary.de>; Schmidt 2007), a multilingual dictionary of football terms, includes Kugel and Leder (in addition to Spielgerät (‘the thing to play with’)); cf. Neuer faustet das Spielgerät weg (‘Neuer punches the ball away’; ger_por_09_06_bild), but not Pille and Murmel.20 Translation: Well, Balotelli rather shouldn’t shoot any more penalties (infml.) today.21 Translation: De Rossi shoots, Hart rebounds the ball, Balotelli fires from a short distance, again Hart and then Montolivo must score [lit. make the thing] in the follow-up.


(32) Garmash wagt einen Distanzschuss und knallt aus 30 Metern vom linken Flügel aus auf das Tor. (ukr_eng_1906_spon)22

3.2 Discourse features

Again relating to in-group knowledge (see also Gerhardt 2006: 140; O’Keeffe 2006: 155) required by the audience, an earlier analysis has identified “British-ness” (Chovanec 2008: 261) as common ground of the cross-references in GUAR OTCs. Some of these findings can be extended to OTCs from other media outlets. In-group knowledge is required by the reader whenever commentators refer or allude to particular players, coaches or commentators not part of the current game or action (and their alleged characteristics, statements or achievements). Examples (33) to (38) illustrate that this happens in OTCs of all kinds.

(33) Call it the Crouch Effect, if you will. (swe_eng_1506_guar)(34) The full-back likes attacking more than defending, apparently, so appears to be the

Portuguese equivalent of Glen Johnson. (ger_por_0906_sun)(35) Gomes slides in, Gascoigne at the Euro 96 semi style, but can’t get his boot to the ball.

(ger_ned_1306_guar)(36) Aber Kroos mit einer Christian-Rahn-Gedächtnis-Ecke. (ger_ita_2806_spon)23(37) Pirlo kommt trotzdem an den Ball, macht aber den Robben. (ger_ita_2806_spon)24(38) Balotelli will den Ibrahimovic machen. (ita_eng_2406_bild)25

In the GUAR data, this is also often observable in the readers’ comments included in the actual OTC. A similar effect is created by numerous references to scenes from other games and to other teams, as shown in examples (39) to (43).

(39) Mellberg produces a tackle not too dissimilar to Bobby Moore’s famous one on Jair-zinho in the 1970 World Cup. (swe_eng_1506_guar)

(40) He makes it to penalty area before old hand Mellberg stops him in his tracks with a challenge akin to Moore on Pele, 1970. (swe_eng_1506_sun)

(41) I just had a horrible premonition of Balotelli making this match his Maradona ’86 moment and crushing us single-handledly [sic] because he feels like it (ita_eng_2406_guar)

22 Translation: Garmash tries a distance shot and rifles the ball from 30 meters from the left wing towards the goal.23 Translation: But Kroos with a Christian-Rahn-memorial corner.24 Translation: Pirlo gets the ball anyway, but does the Robben.25 Translation: Balotelli wants to do the Ibrahimovic.

292 Valentin Werner

(42) Doch im Gegensatz zum FC Bayern nimmt keiner Reißaus oder zeigt auf den Anderen. (ita_eng_2406_bild)26

(43) Schlecht war die deutsche Mannschaft gegen Portugal eigentlich nur im Jahr 2000. Damals setzte es ein 0:3. Aber die Abwehrspieler hießen auch Rehmer oder Nowotny. (ger_por_0906_spon)27

While these intertextual28 references as listed above are not restricted to OTCs from GUAR, these are the ones where they occur most frequently (see Table 6).

Table 6. Average number of intertextual references per match report

GUAR SUN BILD SPON

cross references 6.67 2.78 2.44 4.78

This is also due to another unique feature of GUAR OTCs, which is reference to popular culture (e.g. actors, movie titles etc.) by both commentators and audi-ence comments, as exemplified in (44) or (45):

(44) See you in 10 minutes for more of the same, or the most dramatic twist since The Crying Game/The Usual Suspects/Fight Club/Turner & Hooch. (ger_gre_2206_guar)

(45) Now that Walcott has replaced Ron Perlman England might actually win. (ita_eng_2406_guar)

All this nicely illustrates the extensive additional knowledge required to become an actual part of the game, or rather its mediated presentation (see also Gerhardt 2006: 140). In other types of media, commentators deliberately employ intertex-tual references as one way to create “pseudo-intimacy”, that is, “some sense of common identity and nationality or some other familiarity built up through fre-quent ‘contact’” (O’Keeffe 2006: 92)29 and this seems to be the case also in OTC reportage, most clearly in the GUAR data.

26 Translation: But in contrast to Bayern Munich nobody runs away or points to somebody else.27 Translation: The only time the German team actually was bad against Portugal was in the year 2000. They got defeated 0:3. But the defenders were called Nowotny and Rehmer.28 Intertexuality is conceived of in broad terms, including e.g. previous matches, scenes, other players etc. as (non-linguistic) pre-texts. In addition, this intertextuality may also comprise ste-reotyped (national) clichés requiring generalised cultural knowledge, such as “[…] but Andreas Brehme has to be the best Left Back,” says John Duffy. “He had a few problems in the hairstyle department, mind, but what German doesn’t?” (swe_eng_1506_guar).29 Cf. also Ferguson’s term “dialog on stage” (1983: 156).


Another remarkable discourse feature already extensively covered by Jucker (2006: 128) is what he labels “parlando prosodics”: in the written medium the commentator imitates “spoken language through exclamations, capitalisation, graphical indication of vowel lengthening […] and hesitations”.30 For reasons of space, suffice it to say that also the current dataset yields a range of examples and that these realisations can be found in OTCs of any provenance (see examples (46) to (50)).

(46) Gooooooooooooooooal! but in the other game. (ger_den_1706_guar)(47) They couldn’t, could they??? (ger_ita_2806_sun)(48) Peeeeeeep! Peeeeeeep! Peeeeeeeeeeeeeeep! Nothing more to report here folks.

(ger_den_1706_guar)(49) Aber gut, es bedeutet immerhin: GLEICH GEHT ES LOS! (ger_ger_2206_spon)31(50) Rooooooooney zahlt zurück. (ukr_eng_1906_bild)32

Therefore, Perez-Sabater et al.’s (2008: 255) finding that prosody is usually not typographically marked in OTCs from British newspapers has to be revised. In addition, commentators indicate spoken modes of discourse by other means such as (i) question tags, (ii) interjections and (iii) hesitation markers (or combinations of these), all typically found in speech (cf. Chovanec 2008). Examples (51) to (54) illustrate the first type and are commonly used as rhetorical questions or as a means to convey surprise.

(51) You’d fancy that run continuing this year, no? (ger_por_0906_guar)(52) Motta reißt Kroos um, Italien bekommt Freistoß. Häh? (ger_ita_2806_spon)33(53) Oh no they didn’t! Football eh? (ger_gre_2206_sun)(54) Wenn man sowas übersteht, kann doch nichts mehr schiefgehen, oder? (ger_

por_0906_spon)34

The wide range of interjections found in the data fulfils a similar function of sim-ulating spoken discourse. Again, they occur across all OTCs, as examples (55) to (59) show.

30 Expressive punctuation, exemplified in (47), could also be added to the list of parlando pro-sodics and may thus be seen as a characteristic register feature (cf. Sanchez-Stockhammer, this volume).31 Translation: But well, at least this means: IT’S ABOUT TO START!32 Translation: Rooooooooney pays back.33 Translation: Motta knocks Kroos down, Italy gets a free kick. Eh?34 Translation: If you get over such a thing, nothing can go wrong, right?

294 Valentin Werner

(55) Blimey, Liberopoulos is a man on a mission. (ger_gre_2206_sun)(56) Oooooooooh. A ball as delicious as your mother’s Sunday roast is swung into the box

from Ozil but it goes out for a corer [sic]. (ger_den_1706_guar)(57) Boah! Kann man das bitte nochmal in Zeitlupe sehen? (ger_ned_1306_bild)35(58) Drei Minuten gibt es obendrauf! Puuh, das ist viel! (ger_por_0906_spon)36(59) Oh Gott, was macht den [sic] Müller da? (ger_den_1706_spon)37

In the above instances, medium determines content, or at least its typographical representation and many of the discourse features listed contribute to the crea-tion of “pseudo-intimacy”, also meaning that both commentator and audience “pretend the relationship is not mediated and is carried on as though it were face-to-face” (O’Keeffe 2006: 92).

3.3 Interaction of text and other elements

Another aspect largely having escaped researchers’ attention is the interaction between formal layout/paralinguistic phenomena and textual/linguistic content. For the four OTCs under investigation, this indeed plays a role. It was already indi-cated above that some of the OTCs come with many additional features such as team statistics, heatmaps, etc. Thus, it could be hypothesised that the more para-linguistic material is present, the shorter the individual OTCs are.38 This potential interaction can be measured quantitatively by considering absolute token counts (average number of words per match reported) and relating these values to the presence of further structural elements.

Table 7: Average token number per match report

GUAR SUN BILD SPON

Average token number 4,646 3,105 2,580 3,047

35 Translation: Boah! Can we see this in slow motion again?36 Translation: Three minutes of additional time! Phew, that’s a lot!37 Translation: Oh my god, what’s Müller doing there?38 The present analysis applies a “micro-level approach” (Santini et al. 2010: 11); that is, only el-ements reachable within one click and which are part of the actual OTC are included (excluding ads and general navigation tabs, etc.).


Table 7 shows the relevant frequencies, and a “wordiness hierarchy” along the lines GUAR > SUN > SPON > BILD emerges, which suggests that the German OTCs are shorter on average. Two aspects are worth considering here: in addition to the textual commentary, the different OTCs rely on various other forms of presenta-tion of match-related information, all allocated to different areas on the page or reachable by clicking on a tab (see Section 2.2 above). Table 8 gives an overview of presence or absence of these features.

Table 8: Comparative overview of presence/absence of paratextual features.

GUAR SUN BILD SPON

Textual commentary ✓ ✓ ✓ ✓

Match score and goal scorers ✓ ✓ ✓ ✓

Parallel matches and scores ✗ ✗ ✗ ✓

Team line-ups ✓ ✓ ✓ ✓

Live table ✗ ✓ ✓ ✓

Tactical formations ✗ ✓ ✓ ✗

“Event” filter or timeline (goals, cards, substitutions) ✗ ✓ ✓ ✓

Team and player statistics ✓ ✓ ✓ ✓

Player positions/“heatmaps” ✗ ✓ ✗ ✓

Player ratings ✓ ✗ ✓ ✗

Referee statistics ✗ ✗ ✓ ✗

Area for Tweets ✗ ✗ ✗ ✓

While Table 8 shows that there are some basic elements for all OTCs (match score and goal scorers, team line-ups, statistics), it also illustrates a fundamental struc-tural split between GUAR and the remaining three OTCs. GUAR emerges as the one with least additional informational elements, necessitating, in turn, a more explicit, or “wordy” style of reportage. The other OTCs, in contrast, rely more on iconographic and tabular representations (see also Figures 6 and 7), which pro-vides a first explanation for the lower number of tokens in these.

296 Valentin Werner

Figure 6: Team line-up, statistics and heatmap from SPON (fra_eng_1106_spon; <www.spiegel.de/sport/fussball/em-2012-liveticker-spielplan-und-alle-statistiken-a-836448.html>, accessed 02/07/2012, 10:30)

A second decisive point is that GUAR focuses on the entertainment aspect (Cho-vanec 2010: 242), whereas the other three OTCs are more informational in the sense that they provide an extended range of factual information and statistics. This might also be the reason why the individual entries in the commentary are short, as noted by Jucker (2010: 58–60).39 GUAR, in contrast, not only has longer individual entries than the other OTCs, but relies extensively on readers’ comments and replies by the commentator, comprising up to one third of the textual material (in number of words). Another characteristic feature of GUAR is the incorporation of pictures, video clips and links only indirectly related to the actual match, which rather serve to support the entertainment function. The other OTCs do not incorporate audience participation at all (SUN, BILD) or do so in a more direct manner, via Twitter messages displayed next to the main commentary (SPON), thus creating another layer of commentary (see Section 2.3 above), which breaks the uni-directionality of the communication.

39 However, the span (in terms of word length) across the OTCs is considerable and can range from just a few words (e.g. Ecke Deutschland ‘corner Germany’; ger_por_0906_bild) to more than 125 tokens.

Figure 7: Timeline, statistics and commentary from SUN (ita_eng_2406_sun; <www.thesun.co.uk/sol/homepage/sport/football/match_centre/article3670013.ece>, accessed 02/07/2012, 10:20)

298 Valentin Werner

4 Discussion: Cross-cultural aspectsHaving considered some linguistic and structural aspects of OTCs, this section addresses the question as to whether OTCs should be seen as a cross-cultural register or whether differences are salient along the dimensions of regional prov-enance or intended readership. Based on the findings from the previous sections, a diverse picture emerges.

A first area with considerable overlap is the general structure of reportage. Many elements (e.g. an “appetiser” section; see Section 2.2) occur universally and also the other components of a textual match report are principally similar. This is determined to some extent by the fact that all OTCs report on the same event with a fixed duration and thematic focus (Siever 2011: 171) – a football match –, so that a certain congruence could be expected. However, with respect to content, GUAR is more extensive in its pre-match coverage of England matches, while the German OTCs use more words to describe Germany playing. Word counts in SUN, however, are relatively indifferent to the type of match reported (see Section 2.2). The picture changes slightly when we consider the average word counts for the full reports on matches by either England or Germany, as shown in Figure 8.

Figure 8: Overall average word count and according to team playing (AVG = overall average; AVG ENG = average of England match reports; AVG GER = average of Germany match reports)

Both GUAR and SPON are more extensive in their coverage of the “home” team (these commentaries comprise approximately one third more words than com-mentaries of the respective other), while this tendency is less clear for SUN (approximately one quarter more words for England matches) and even slightly reverse for BILD. Thus, despite claims that audiences of new media are “poten-

GUAR SUN BILD SPONAVG 4746.5 3147.2 3059.8 2528.0AVG ENG 5646 3525.5 3132.8 2054.5AVG GER 3847 2768.8 2986.8 3001.4

0

1000

2000

3000

4000

5000

6000

wor

d co

unt


tially global” (O’Keeffe 2006: 16), this finding indicates some kind of persisting “national allegiances”.

Turning to the lexicon and collocations, the analysis above revealed that content and function vocabulary are broadly comparable across languages. Equally, OTCs of all types rely on formulaic language, which could be expected with relation to earlier research on football discourse. From a quantitative per-spective, however, English OTCs tend to use these combinations more than German OTCs, in particular when referring to location of the action on the pitch. Other commonalties are, first, the usage of slang terms and informal items typical for football language in general. Second, a comparison of the type-token ratios did not yield any significant differences. Thus, one of the points mentioned above, namely the restricted lexical range of this particular register and that especially OTCs associated with yellow press papers (SUN, BILD) are “simple” as regards lexical content, has to be qualified to a certain extent.

An area where the OTCs clearly diverged along the dimension “intended audience” emerged in the keyness analysis. Both the English and the German OTCs yielded some inner differentiation – the former as to a higher salience of war-related metaphors in SUN, the latter as to a higher salience of dialectal and jargon vocabulary in BILD. Given the quantitative evidence, it is highly unlikely that this is a chance finding. Rather, it may be interpreted as an adaptation of the SUN and BILD commentators to the alleged language use of their intended readership. Whether this adaptation is deliberate or intuitive remains a matter of speculation. Puns on players’ names and creative ad-hoc formations can be found across all OTCs, however.

Discourse features represent a further area where differences and similarities could be observed. On the one hand, the salience of football- and culture-related intertextual references as identified by Chovanec (2008) for GUAR OTCs could also be traced in the other OTCs considered, thus representing another uniting feature. However, these references are most frequent in GUAR and SPON, sug-gesting that both the creation of an in-group atmosphere and the often-related entertainment aspect are more important in the quality-press related OTCs. On the other hand, the present study confirmed and extended earlier research pos-iting the staging of orality as a trademark feature of OTCs, showcasing creative manipulation of restrictions of the written medium, while no cultural specificity of this phenomenon can be claimed on the basis of the present data (see Perez-Sa-bater et al. 2008: 256 for a comparison of English, Spanish and French).

Finally, with regard to the interaction between the textual commentary and other elements of the OTCs, it was evident that all OTCs apart from GUAR rely on an extended range of supplementary features (mainly tabular and iconographic), while GUAR may compensate for this lack of factual information with a more

300 Valentin Werner

extensive description in the textual commentary. In addition, GUAR and, with qualifications, SPON can be viewed as more “entertaining” or “fan-like”, while SUN and BILD are more factual (although the latter pair uses more jargon). This division reproduces Jucker’s (2010: 69) categorisation of OTCs.

By way of summary, we can posit that there are indeed many commonalities transcending borders (set by cultural specificity and intended readership), but there is also room for variability both within and across language boundaries. This highlights the flexibility of the register despite the formal constraints of the electronic medium.

5 Summary and conclusionAbove all, OTCs emerged from the analysis as a “webby” genre that has gained prominence within the last decade as an immediate form of online journalism, particularly adequate for live coverage of sports events. Production circumstances were established to be markedly different from those of traditional sports report-age and it was shown that OTCs can be viewed as an amalgamation of different journalistic, or, speaking more broadly, discursive styles (narration, description, opinion, quasi-conversation, etc.; see further Biber and Egbert, this volume). Some OTCs relied on an extended number of paratextual elements and the data suggested a split picture as regards the potential influence of audience partici-pation (both in terms of “web 2.0” applications and via other channels) on the reporting. While two (SUN, BILD) did not take account of readers’ contributions, SPON had a designated paratextual element (the “Live-Fanblock” containing Tweets), where the audience could express their views as some kind of paral-lel comment, and GUAR covered an intermediate position as comments (usually sent-in mails) were frequently quoted and referred to, albeit in a mediated and fil-tered form. An overall comparison of OTCs and traditional forms of sports report-age indicated that the former should be identified as a new and specific register. At the same time, this showcased the “interweaving of old and new formats” as posited by O’Keeffe (2006: 27) as one of the general properties of newly emerging registers.

Turning to language-related aspects, the present study first showed by way of a lexical and semantic analysis that OTCs do not fundamentally differ from other types of football reportage in their use of technical vocabulary. Second, the explo-ration of n-grams revealed the importance of position-related collocations and furthermore of informal and slang vocabulary, while differences between the indi-vidual OTCs, especially along the dimension “intended readership” were clearly


evident. In contrast, the consideration of discourse features showed a remarkable overlap between the four OTCs, while intertextual references were found to be most salient in OTCs with “entertainment” as a communicative function (GUAR and SPON). However, there were some instances with limitations posed by the electronic (written) format, in particular as regards the staging of orality promi-nent in OTCs. While all OTCs shared a similar general structure, GUAR emerged as “the odd one out”. It was the one using most words but least paratextual ele-ments, one potential explanation being that there the entertainment function is strongest, while the other OTCs provided more factual information, supported through tabular and iconographic elements. This highlighted the need to con-sider the interaction between format and content and the communicative aim of the individual OTCs as well as the tension between information and entertain-ment emblematic of modern media discourse (cf. Fairclough 1995: 10).

No definitive answer could be given to the second guiding research question as to whether OTCs can be seen as a cross-cultural register. Rather, OTCs emerged as a highly diversified form of reportage. Formal constraints and the similar struc-ture of the matches reported determined similarity to a certain extent. However, the present analysis revealed (mostly, quantitative) diversity and flexibility, both across (e.g. as regards length of the coverage of the “home” team) and within (e.g. as to reliance on informal and slang items) languages. I suggest this is again mainly due to the communicative aim of the individual OTCs and adaptation towards their intended audience.

For a future exploration, it would be desirable to obtain a better insight into the receptive dimension,40 for instance in terms of eye-tracking experiments establishing how fast users read the OTC text and which elements (statistics, textual commentary, icons etc.) they focus on. From a linguistic point of view, further areas worth considering in more detail are creative language use (see example (60)) as well as metonymies (see examples (20) and (21) above) and met-aphors (see example (61) for a musical metaphor; cf. also Burkhardt 2010; Küster 2010: 32; Lewandowski 2012).

(60) The German fans are ole-ing. (ger_gre_2206_guar)(61) Martin Olsson setzt sich auf links mit einem tollen Solo gegen Walcott und Johnson

durch […] (swe_eng_1506_bild)41

40 This could also include a case study focusing on the linguistic properties and functions of the “twitterese” mentioned above.41 Translation: Martin Olsson prevails against Walcott and Johnson on the left with a great solo.

302 Valentin Werner

While the present study offered a select comparison of German and English OTCs, an analysis including even more OTCs from other languages and intended audiences may help to establish a more fine-grained typology of OTCs world-wide, potentially also considering diachronic developments. In this connection, it remains to be seen whether audience participation, found to be relatively restricted in the present study, will play a more important role in the future and whether further technological developments (e.g. in terms of an integration of TV and OTC reportage) will have an impact on the style of reporting.

ReferencesBateman, John A. 2012. Multimodal corpus-based approaches. In Carol A. Chapelle (ed.), The

encyclopedia of applied linguistics, 3983–3991. Oxford: Wiley-Blackwell.Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP.Brandt, Wolfgang & Regina Quentin. 1983. Zeitstruktur und Tempusgebrauch in

Fussballreportagen des Hörfunks [Temporal structure and tense use in radio football reportage]. Marburg: Elwert.

Burkhardt, Armin. 2010. Abseits, Kipper, Tiqui-Taca: Zur Geschichte der Fußballsprache in Deutschland [Offside, keeper, tiki-taka: The history of football language in Germany]. Der Deutschunterricht 62(3). 2–16.

Chovanec, Jan. 2008. Enacting an imaginary community: Infotainment in on-line minute-by-minute sports commentaries. In Eva Lavric, Gerhard Pisek, Andrew Skinner & Wolfgang Stadler (eds.), The linguistics of football, 255–268. Tübingen: Narr.

Chovanec, Jan. 2009. ‘Call Doc Singh’: Textual structure and coherence in live text sports commentaries. In Olga Dontcheva-Navratilova & Renata Povolná (eds.), Coherence and cohesion in spoken and written discourse, 124–137. Newcastle: Cambridge Scholars.

Chovanec, Jan. 2010. Online discussion and interaction: The case of live text commentary. In Leonard Shedletsky & Joan E. Aitken (eds.), Cases on online discussion and interaction: Experiences and outcomes, 234–251. Hershey: IGI Global.

Chovanec, Jan. 2011. Humor in quasi-conversations: Constructing fun in online sports journalism. In Marta Dynel (ed.), The pragmatics of humour across discourse domains, 243–264. Amsterdam: Benjamins.

Dürscheid, Christa. 1999. Zwischen Mündlichkeit und Schriftlichkeit: Die Kommunikation im Internet [Between speech and writing: Communication on the Internet]. Papiere zur Linguistik 60(1). 17–30.

Fairclough, Norman. 1995. Media discourse. London: Arnold.Ferguson, Charles A. 1983. Sports announcer talk: Syntactic aspects of register variation.

Language in Society 12(2). 153–172.Gerhardt, Cornelia. 2006. Moving closer to the audience: Watching football on television.

Revista Alicantina de Estudios Ingleses 19. 125–148.Ghadessy, Mohsen. 1988. The language of written sports commentary: Soccer – a description.

In Mohsen Ghadessy (ed.), Registers of written English: Situational factors and linguistic features, 17–51. London: Pinter.


Golebiowski, Adam. 2012. Wortverschmelzungen und Sportsprache: Zur Kreativität im Wortbildungsbereich [Blends and the language of sport: Creativity in word formation]. In Janusz Taborek, Artur Tworek & Lech Zielinski (eds.), Sprache und Fußball im Blickpunkt linguistischer Forschung [Language and football in the view of linguistic analysis], 51–61. Hamburg: Kovač.

Grieve, Jack, Douglas Biber, Eric Friginal & Tatjana Nekrasova. 2010. Variation among blogs: A multi-dimensional analysis. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the web: Computational models and empirical studies, 303–322. Dordrecht: Springer.

Hennig, Mathilde. 2000. Tempus und Temporalität in geschriebenen und gesprochenen Texten [Tense and temporality in written and spoken texts]. Tübingen: Niemeyer.

Höke, Susanne. 2007. Sun vs. Bild: Boulevardpresse in Großbritannien und Deutschland [Sun vs. Bild: Yellow press in Great Britain and Germany]. Saarbrücken: VDM.

Jucker, Andreas. 2005. News discourse: Mass media communication from the seventeenth to the twenty-first century. In Janne Skaffari, Matti Peikola, Ruth Carroll, Risto Hiltunen & Brita Warvik (eds.), Opening windows on texts and discourses of the past, 7–21. Amsterdam: Benjamins.

Jucker, Andreas. 2006. Live text commentaries: Read about it while it happens. In Jannis K. Androutsopoulos, Jens Runkehl, Peter Schlobinski & Torsten Siever (eds.), Neuere Entwicklungen in der linguistischen Internetforschung [Recent developments in linguistic internet research], 113–131. Hildesheim: Olms.

Jucker, Andreas. 2010. ‘Audacious, brilliant!! What a strike!’ Live text commentaries on the internet as real-time narratives. In Christian R. Hoffmann (ed.), Narrative revisited: Telling a story in the age of new media, 57–77. Amsterdam: Benjamins.

Krone, Maike. 2005. The language of football: A contrastive study of syntactic and semantic specifics of verb usage in English and German match commentaries. Stuttgart: Ibidem.

Küster, Rainer. 2010. ‘Im Tabellenkeller brennt noch Licht’: Metaphern in der Fußballsprache [At the bottom of the table there’s still some light: Metaphors in football language]. Der Deutschunterricht 62(3). 26–37.

Levin, Magnus. 2008. ‘Hitting the back of the net just before the final whistle’: High-frequency phrases in football reporting. In Eva Lavric, Gerhard Pisek, Andrew Skinner & Wolfgang Stadler (eds.), The linguistics of football, 143–155. Tübingen: Narr.

Lewandowski, Marcin. 2012. Football is not only war: Non-violence conceptual metaphors in English and Polish soccer language. In Janusz Taborek, Artur Tworek & Lech Zielinski (eds.), Sprache und Fußball im Blickpunkt linguistischer Forschung [Language and football in the view of linguistic analysis], 79–96. Hamburg: Kovač.

Müller, Torsten. 2007. Football, language and linguistics: Time-critical utterances in unplanned spoken language, their structures and their relation to non-linguistic situations and events. Tübingen: Narr.

Newsworks. 2013a. The Guardian. http://www.newsworks.org.uk/The-Guardian (accessed 20 April 2013).

Newsworks. 2013b. The Sun. http://www.newsworks.org.uk/The-Sun (accessed 20 April 2013).O’Keeffe, Anne. 2006. Investigating media discourse. London: Routledge.Perez-Sabater, Carmen, Gemma Pena-Martinez, Ed Turney & Begona Montero-Fleta. 2008. A

spoken genre gets written: Online football commentaries in English, French, and Spanish. Written Communication 25(2). 235–261.

304 Valentin Werner

Press Gazette. 2013. UK national newspaper sales: Relatively strong performances from Sun and Mirror. http://www.pressgazette.co.uk/uk-national-newspaper-sales-relatively-strong-performances-sun-and-mirror (accessed 21 May 2013).

Rayson, Paul. 2008. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4). 519–549.

Santini, Marina, Alexander Mehler & Serge Sharoff. 2010. Riding the rough waves of genre on the web: Concepts and research questions. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the web: Computational models and empirical studies, 3–30. Dordrecht: Springer.

Schmidt, Thomas. 2007. The Kicktionary: A multilingual resource of the language of football. In Georg Rehm, Andreas Witt & Lothar Lemnitzer (eds.), Data structures for linguistic resources and applications, 189–196. Tübingen: Narr.

Siever, Torsten. 2011. Texte i. d. Enge: Sprachökonomische Reduktion in stark raumbegrenzten Textsorten [Constricted texts: Language-economical reduction in heavily space-constrained text types]. Frankfurt am Main: Lang.

Simons, Anton. 2011. Journalismus 2.0 [Journalism 2.0]. Konstanz: UVK.Thurman, Neil & Anna Walters. 2013. Live blogging: Digital journalism’s pivotal platform. Digital

Journalism 1(1). 82–101.Wells, Matt. 2011. How live blogging has transformed journalism: The benefits and the

drawbacks of the open-to-all digital format. http://www.guardian.co.uk/media/2011/mar/28/live-blogging-transforms-journalism (accessed 13 April 2013).

Appendix

Match Match day Commentators (if available)

Associated files (guar = The Guardian; sun = The Sun; bild = Bild; spon = Der Spiegel)

Germany – Portugal 09/06/2012 GUAR: N/A ger_por_0906_xxSUN: N/ABILD: N/ASPON: Christian Paul

Germany – Netherlands 13/06/2012 GUAR: N/A ger_ned_1306_xxSUN: N/ABILD: N/ASPON: Jan Reschke

Germany – Denmark 17/06/2012 GUAR: Ian McCourt ger_den_1706_xxSUN: N/ABILD: N/ASPON: Mike Glindmeier


Match Match day Commentators (if available)

Associated files (guar = The Guardian; sun = The Sun; bild = Bild; spon = Der Spiegel)

Germany – Greece 22/06/2012 GUAR: Rob Smyth ger_gre_2206_xxSUN: N/ABILD: N/ASPON: Lukas Rilke

Germany – Italy 28/06/2012 GUAR: N/A ger_ita_2806_xxSUN: N/ABILD: N/ASPON: Mike Glindmeier

France – England 11/06/2012 GUAR: Scott Murray fra_eng_1106_xxSUN: N/ABILD: N/ASPON: Christian Paul

Sweden – England 15/06/2012 GUAR: Jacob Steinberg swe_eng_1506_xxSUN: N/ABILD: N/ASPON: N/A

Ukraine – England 19/06/2012 GUAR: Barry Glendenning ukr_eng_1906_xxSUN: N/ABILD:N/ASPON: N/A

Italy – England 24/06/2012 GUAR: N/A ita_eng_2406_xxSUN: N/ABILD: N/ASPON: Mike Glindmeier

Javier Pérez-GuerraWord order is in order here: A diachronic register analysis of syntactic markedness in English

Abstract: In line with multidimensional proposals under which registers can be stylistically and/or situationally defined by paying attention to the frequency of a selection of linguistic features, this study explores the connection between syntactic markedness at the level of the clause and stylistic characterisation in a number of registers in the history of English. In particular, this chapter inves-tigates three syntactic constructions leading to syntactically marked clausal designs which do not conform to subject-verb-complement word order: left dis-location, topicalisation and subject-inversion/extraposition. The data, retrieved from multi-register parsed corpora, show that the distribution of these construc-tions correlates with the degree of stylistic specificity and conventionalisation of the registers. In particular, those registers in which these constructions are par-ticularly frequent feature more specific situational or stylistic choices related to literacy or subject-/participant-involvement. As a matter of fact, out of the three constructions, topicalisation has proved to have less radical consequences for the syntax of the clause, and this correlates with its even distribution across registers.

1 Introduction1

The linguistic analysis of registers/genres/text types in a language has always been controversial, possibly because of the intangible status of such key concepts (see Schubert, this volume). As Swales (1990: 33) points out when he refers to specifically genres, “[t]he word [‘genre’] is highly attractive – even to the Parisian timbre of its normal pronunciation – but extremely slippery”. A first termino-

1 I am grateful to the following institutions for generous financial support: the Spanish Minis-try of Economy and Competitiveness and the European Regional Development Fund (grant no. FFI2013-44065-P), and the Autonomous Government of Galicia (grant no. GPC2014/060).

Javier Pérez-Guerra, University of Vigo

308 Javier Pérez-Guerra

logical remark seems thus in order here as regards the definition of ‘register’, which constitues the research topic in this study. Following, for example, Taavit-sainen (2001), who maintains that genres are based on “external evidence in the context of culture” (140; my italics), where “external evidence” refers to the con-ventions that have come institutionalised “so that they can function […] as ‘hori-zons of expectation’ for readers to know what to expect and models of writing for authors” (141), I will use ‘genre’ when I refer to exclusively the cultural and/or social dimension of a given textual category. ‘Register’ will be used here with a focus on the way in which the internal linguistic features of texts are codified in a given text or category of texts, which matches Taavitsainen (2001: 141) term ‘text type’. Even though text types and genres commonly go hand in hand since the linguistic characterisation of a textual category prototypically leads to the latter’s conventionalisation and specialisation in fulfilling a certain discoursive, communicative or social function, Taavitsainen herself recalls Fairclough’s (1992: 126) claim that a “genre [on occasions] implies not only a particular text type, but also particular processes of producing, distributing and consuming texts”, which broadens the notion of genre and covers elements which lie beyond the scope of this chapter.

Such lack of definition of concepts such as register, genre or text type has led to multi-faceted studies in this area, adopting a number of different theoretical frameworks. On some occasions, linguists have addressed the linguistic analy-sis of registers by focusing on the core or prototypical communicative purposes attributed to these in (quite often traditional) stylistics. For example, Swales (1990: 46) notes that “[t]he principal criterion that turns a collection of commu-nicative events into a [register] is some shared set of communicative purposes”. In Halliday’s (1978: 122) Systemic Functional Grammar, registers (genres, in their terminology) are analysed in terms of three variables: their content (or ‘field’), the participants (‘tenor’) and the channel of communication (‘mode’), that is, three dimensions which focus on the communicative elements and purposes involved in a given register. On other occasions, in an approach that will be used in the present chapter, the study of registers has been addressed through focusing on empirically-observable stylometric features (e.g. type-token ratios, length of syl-lables, words, sentences, paragraphs) which are themselves said to reflect more greater-level concepts such as lexical or syntactic complexity, lexical richness and ornamentation, etc. In Biber and Conrad (2009) the two basic approaches just summarised, which I refer to, respectively, as the ‘communicative’ and the ‘language-based’ views, are embodied in a taxonomy which identifies three perspectives on text varieties (see, for a brief overview, their Table 1.1): (i) style, which analyses aesthetic and authorial preferences in a given text or group of texts; (ii) genre, which focuses on the conventional linguistic devices specific to

Diachronic register analysis of markedness 309

a text variety (e.g. ‘genre markers’ such as Dear Sir in a letter); and (iii) register, which, as already pointed out, deals with the linguistic characteristics common within a text variety ‒ and also with the situation of use of the variety as will be argued later. The taxonomy is described in more detail in Dorgeloh (this volume; Section 2 in particular) and Schubert (this volume).

So far I have equated register with the language-based characterisation of a given textual category. In this scenario, a further dimension of register must be brought into play. In line with previous proposal couched in the mutidimen-sional tradition, Biber and Conrad (2009: 6) claim that the linguistic character-istics of the textual categories, materialised by means of pervasive and frequent linguistic features, are “well suited to the purposes and situational context of the register”. That said, this chapter adheres to such a two-fold view of text varie-ties, that is, both language-based and situational, and, within a register-centred approach (as suggested in, for example, Biber 1995a: 1), focuses on the study of a number of texts in an attempt to explore register variation over the course of the history of English. On the one hand, I will describe a number of textual categories by exploring their dependency on a list of structural features, thus adhering to what is commonly understood by ‘text type’, that is, “grouping of texts that are similar in their linguistic form” (Biber 1988: 170) or, in other words, codifications of linguistic features (Taavitsainen 2001: 141). On the other hand, I will connect the language-based characteristics of the texts with their siatuational interpreta-tion, thus accepting, for example, Virtanen’s (2010: 57) claim that such linguistic features “clearly relate to the form that [discourse functions] will take through aggregates of linguistic exponents of the particular text strategies that are asso-ciated with them”. The situational interpretation (better said, the functional interpretation) of the linguistic characteristics of a given text type will lead to the latter’s status as a ‘register’, in Biber’s terminology. This approach departs from, for example, Dorgeloh and Wanner’s (2010: 10) terminological account, summa-rised in Figure 1, where ‘register’ is used as a cover term for text type, genre and style, and sticks to a twofold characterisation of register which comprises mainly Dorgeloh and Wanner’s both text type and genre.


Figure 1: Register, text type, genre and style in Dorgeloh and Wanner (2010)

This chapter will focus on register variation and, more specifically, on the rel-evance of syntax for this issue. In this respect, Dorgeloh and Wanner (2010) observe that resgiter is “language variation beyond the limits of semantic equiv-alence, which is why syntax […] provides a promising area of study” (8) and that “[i]t is form, and here morphosyntactic form in particular, that constitutes ‘a prior condition for reasoning about [register]’” (9). In this scenario, under the philos-ophy of Biber’s (1988, 1995a) groundbreaking multifactorial multidimensional model, this study will combine the main approaches to the analysis of registers already mentioned, that is, communicative and more language-based (syntactic) standpoints, in that findings from the latter will be associated with a correspond-ing functional interpretation (or dimensional interpretation, as Biber puts it). In other words, by investigating the spread of a number of objectively identified linguistic constructions in a selection of registers, and by interpreting the statis-tical results of (co-)occurrence, this study will not only shed some light on the functional interpretation of registers but also detect diachronic variation across them. Furthermore, this chapter will suggest some kind of link between syntactic markedness and the degree of (functional) conventionalisation or specialisation of registers.

This paper, then, focuses on the analysis of registers in English while also describing variation in the recent history of the language. It also aims to con-sider the application of some of the assumptions of Biber’s model to syntactic strategies at a supra-phrasal level. In Section 2, I will very briefly summarise the features of the multidimensional model which constitutes the inspiration for the study, this case study and its specific methodology. The results are discussed in Section 3. Section 4 offers a summary of the investigation plus some suggestions for further avenues for research.


2 The case studyBiber’s model, which has inspired this study, is based on three theoretical assumptions, summarised in Schubert (this volume) and recapped here only for introductory purposes: (i) the distinctive characteristics of a register are derived from inherent tendencies affecting the statistical productivity of a number of lin-guistic features; (ii) the patterns of these (co-)occurring features portray under-lying dimensions of variation on which texts differ significantly; and (iii) these dimensions can be interpreted in terms of the social, situational and text-func-tional roles that their constitutive features have been found to play in previous research. As summarised in Biber (1995b), the sixty-seven features used in the first applications of the model belonged to different fields of linguistic analysis: syntactic (causal subordination, coordination, deletion of complementiser that, wh subject relativisers, pied-piped prepositions, stranded prepositions, particip-ial adverbial clauses), grammatical (morphosyntactic categories such as nouns, adjectives, prepositions, demonstratives) and lexical categories (hedges, ampli-fiers, emphatics), as well as other metrics such as word length and type/token ratio. As noted above, the factors or families of features lead to the dimensions which are interpreted situationally.

Biber and Conrad (2009: 51) established the pillars of the methodology: the need for a comparative approach, for quantitative analysis and for a representa-tive sample. First, as regards the comparative approach, this study investigates three syntactic constructions, described in Section 2.1, by assessing two variables which will allow for comparison and contrast: diachrony and register. Second, the need for quantitative analysis has been accomplished by the empirical methodology described in Section 2.2. Third, the whole survey is driven by data retrieved from multi-register balanced corpora, as a means of attaining empirical representativeness and significance.

2.1 The linguistic variables

The study outlined in this chapter reports on a construction-driven analysis of historical registers in English by looking at supra-phrasal variables or features which have not thus far been explored in the literature. As pointed out in Section 1, early studies by Biber and his colleagues – and practically all subsequent studies derived from these – are based on counts of lexical features. In fact, even those syntactic features which operate at the clause- or sentence-level were singled out by computing the frequency of lexical items such as complementiser that, specific (causal) conjunctions, members of the closed set of English pre-


positions, relativisers which, who, etc. In this chapter, and this makes this study particularly innovative, I concentrate on syntactic supra-phrasal variables, spe-cifically word-order phenomena, which cannot be determined by focusing on the occurrence ratios of specific lexical elements. Following the multidimensional model, these will be given so-called social or functional interpretations which will pave the way for the detection of diachronic variation in English as far as sentence linearisation is concerned.

As regards the variables to be analysed here, I have focused on syntactic markedness at the level of the clause. From (at least) a statistical standpoint, the default organisational schema of a declarative clause in English is subject- verb-(complement), this being the most versatile design of the clause from the point of view of information structure and processing. Deviation from such a schema implies some degree of markedness. In particular, in what follows I will focus on three syntactic strategies which, first, lead to marked designs as far as word order is concerned and, second, involve elements other than the subjects in sentence-initial position. Since this methodology aims to determine not strictly linguistic but also social or situational variation in the language, I will follow Virtanen (2004: 12) in her claim that “the sentence-initial slot itself constitutes a rich source of discourse meanings precisely because of its cognitive relevance for our processing capacities and memory constraints”. The three constructions are:

(i) Topicalisation (TOP), in which a (marked) constituent is in sentence-ini-tial position ‒ example (1) below illustrates the topicalisation of the that-clause object that I had received such from Edward,

(ii) Left dislocation (LFD), with a (marked) non-argument constituent in sen-tence-initial position ‒ in (2), the constituent he that thynkethe it a harde thynge to agre to the conclusion is a left-dislocated noun phrase which corefers with the pronominal object hym in the ensuing main clause,

(iii) What I call other ‘subject-last’ strategies (SUBJ-LAST), which contain (marked) non-subject constituents in sentence-initial pre-verbal position. The SUBJ-LAST strategy comprises basically those examples of subject-verb inver-sion and subject-extraposition ‒ example (3) below illustrates subject-verb inver-sion, with the subject complement very great in sentence-initial position and the subject following the verb; example (4), in which the that-clause that for x. yeres then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s p~sones functions as the (logical) subject of the sentence and occurs in sentence-final


position, involving the insertion of expletive it in sentence-initial (preverbal) position, exemplifies subject-extraposition.2

(1) [That I had received such from Edward]i also I need not mention ∆i (Austen-180X,187.621) [TOP]

(2) […] but [he that thynkethe it a harde thynge to agre to the conclusion,]i it behoueth hymi to shew eyther that some false thynge hath gone before, (BOETHCO-E1-H,99.610) [LFD]

(3) […] and very great was [my pleasure in going over the house and grounds]Subject. (Aus-ten-180X,168.182) [SUBJ-LAST, subject inversion]

(4) yt was enacted ordeigned and graunted by auctorite of the same p~liament, [that for x. yeres then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s p~sones]Subject, (Statutes(II):524) [SUBJ-LAST, subject extraposition]

As already pointed out, the strategies TOP, LFD and the so-called SUBJ-LAST constructions investigated here have been chosen because they are syntactically marked since they do not comply with the default subject-verb(-complement) design. In particular, their markedness is basically due to the location occupied by the subjects, which are not clause-initial when constituents are topicalised (TOP) or left-dislocated (LFD), when verbs and subjects swap positions (sub-ject-verb inversion, a type of SUBJ-LAST) or when the subjects are placed in clause-final position (subject-extraposition, another instance of the SUBJ-LAST construction). Since subject placement is the trigger for these construction, in line with the above-mentioned consequences which the unmarked placement of the subject has for the processing and interpretation of clauses and sentences, in what follows I will provide a very brief overview of the informative and/or com-municative properties of the strategies TOP, LFD and SUBJ-LAST.

First, TOP merits attention in register analysis because this syntactic strat-egy involves a specific not only syntactic but also informative arrangement of the clause. Following Virtanen (2004: 80–82) [my italics],

Starting points are assumed to be light, small in size, and consist of given information. The reader’s main inferencing effort is expected to take place later in the sentence […]. Secondly, elements placed at the outset of a sentence also help readers anticipate what is to come as they pinpoint what the sentence is about and how it relates to the discourse topic (…). Fur-thermore, it is occasionally profitable to start with what is regarded as ‘crucial information’

2 TOP, LFD and the constructions within the frame of the SUBJ-LAST strategy have been ap-proached from different perspectives in, for example, Virtanen’s (2010) qualitative scrutiny of sentence openers in narrative texts and in Kreyer’s (2010) paper on sentence-initial locatives in inversion constructions, in which a qualitative perspective on the description of the so-called ‘immediate-observer effect’ function is adopted.


[…] Sentence-initial adverbials […] tend to form chains of text-strategic markers which have two basic functions in the discourse. They help create coherence and at the same time they signal text segmentation.

Virtanen thus summarises the informative function of TOP, that is, introduc-ing constituents which do not convey given information in a position which is reserved for given elements according to the given-new principle. This analysis of TOP is in keeping with Prince (1981: 128), who highlights the salient status of topicalised constituents. Prince claims that TOP implies “inference on the part of the hearer that the entity represented by the initial NP stands in a salient partially-ordered set relation to some entity or entities already evoked in the dis-course-model”. Furthermore, she contends that “if the entity evoked by the left-most NP represents an element of some salient set, make the set-membership explicit”.

Second, the discourse functions which have been attributed to LFD in the literature can be reduced to two: (i) a ‘simplifying’ function, according to which a constituent conveying discourse-new information can be placed in sentence-in-itial position, and (ii) a ‘poset’ function. As for the simplifying function, Prince (1997: 138–139) contends that LFD can “simplify discourse processing by removing a Discourse-new entity from a position in the clause which favors Discourse-old entities, replacing it with a Discourse-old entity (i.e. a pronoun)”. In the same vein, Gundel (1985) and Geluykens (1992) claim that LFD introduces a new topic into discourse. On the other hand, Prince (1997: 138–139) maintains that sen-tences containing left-dislocated phrases “trigger an inference that the entity rep-resented by the initial NP stands in a salient partially-ordered set relation to some entities already in the discourse-model”, and that this favours the so-called poset function. In other words, the left-dislocated constituent resumes a number of ref-erents previously evoked in the sentence by introducing a new expression which activates previous earlier (thus, informatively given or old) referents. In short, like TOP, LFD implies the placement of a new constituent in sentence-initial posi-tion, the main difference between TOP and LFD being that the former selects an extralinguistic referent already evoked in discourse and marks it as informatively salient, whereas LFD constituents seldom refer to topics which have already been introduced in the discourse.

Third, as already stated, the SUBJ-LAST constructions involve examples of subject-verb inversion and subject-extraposition, illustrated, respectively, in (3) and (4) above. As regards subject-verb inversion, it is commonly acknowledged in the literature (e.g. Green 1980: 583; Birner 1994: 241; Dorgeloh 1997: 46) that the informative principle given-new is not at work in subject-verb inversion, since the preverbal constituent conveys information which is salient in the discourse,


whereas the subject is informatively anti-prominent or, in other words, materi-alises referents which have already been evoked. In fact, Takahashi (1992: 138) contends that subject-verb inversion fulfils a “Subtopically-Presentational- Focus-emphasizing function”, that is, it accommodates (discourse-new) presenta-tional constituents in sentence-initial position and relegates to sentence-final or postverbal position discourse-given grammatical subjects. Bolinger (1992: 294) emphasises the focusing or presentational effect of inversion when he says that it locates the informatively non-prominent subject almost physically ‘on-stage’.

The second SUBJ-LAST construction considered in this chapter is subject- extraposition. Its function is claimed to be different from that of subject-verb inversion (see, for instance, McCawley 1988), since, as a newness device, sub-ject-extraposition accommodates informatively new subjects in final position, thus keeping track of given-new. However, the empirical analysis of extraposed subjects from Late Middle to Present-Day English in Pérez-Guerra (2005: 349–350) shows that information structure is not a decisive factor in explaining subject- extraposition since 60 to 70 percent of the extraposed subjects in this study are informatively referring and the information conveyed by sentence-medial con-stituents (mostly subject complements) in the examples of subject-extraposition is less referring in nature than that carried by the extraposed subjects.3 In conse-quence, it can be concluded that both subject-verb inversion and subject-extra-position are mostly new-given constructions and can be subsumed under SUBJ-LAST in the present approach.

This section has provided a basic characterisation of LFD, TOP and SUBJ-LAST in terms of information structure. The syntactic marked organisation of these constructions correlates with their deviation from entrenched informative rules such as given-new. In short, informatively new and/or salient constituents are placed sentence-initially in LFD, TOP and SUBJ-LAST structures, where one would expect elements conveying given information, and informatively given subjects are preferred in postverbal and/or final position in the SUBJ-LAST con-struction type.

3 The data in Pérez-Guerra (2005: 350) confirm that the determinant of subject-extraposition is not end-focus but end-weight. The strategy of extraposition is, then, redistributional in the sense that its main role is to place long clausal subjects in final position and thus preserve the unmarked subject-verb(-complement) pattern from having non-prototypical material in sen-tence-initial position.


2.2 The data and the methodology

The data for the present study were retrieved from the following corpora:– the Penn-Helsinki Parsed Corpus of Middle English, second edition (1150–

1500; henceforth PPCME2; Kroch and Taylor 2000), – the Penn-Helsinki Parsed Corpus of Early Modern English (1500–1710;

PPCEME; Kroch et al. 2004)– the Penn Parsed Corpus of Modern British English (1700–1914; PPCMBE;

Kroch et al. 2010).

The periods to be investigated are Middle (ME), Early Modern (EModE) and Late Modern English (LModE), that is, the periods following the initiation of the process of word-order syntacticisation or fixation in English around the default pattern subject-verb(-complement) in declarative clauses. These corpora were selected because, first, they are multi-register and, as noted above, this accom-modates the need for representativeness. Second, they are parsed corpora follow-ing (almost) identical parsing conventions. These make use of part-of-speech and syntactic tagsets based on what we might call a shallow version of Principles-and- Parameters. To give an example from the corpora, (5’) plots the graphical adaptation of the parsed version of sentence in (5) from PPCMBE:

(5) a serious cheerfulness; that is the right mood in this as in all cases. (CARLYLE-1835,2,278.374)

(5’) ( (1 IP-MAT (2 NP-LFD (3 D a) (5 ADJ serious) (7 N cheerfulness)) (9 , ;) (11 NP-SBJ-RSP (12 D that)) (14 BEP is) (16 NP-OB1 (17 D the) (19 ADJ right) (21 N mood)) (23 PP (24 P in) (26 NP (27 D this) (29 PP (30 P as) (32 PP (33 P in) (35 NP (36 Q all) (38 NS cases)))))) (40 . .))

(5’) includes part-of-speech tagging (e.g. lexical morphosyntactic categories such as D(eterminer), ADJ(ective), N(oun) or P(reposition)) and syntactic annotation (e.g. phrasal categories such as IP for Inf(lection) phrase ‒ basically correspond-ing in the Principles-and-Parameters model to the category clause ‒, NP for noun phrase and PP for prepositional phrase, as well as functional labels such as OB1


for object, LFD for left-dislocated constituent and RSP for resumptive, that is, the proform which corefers in the clause with the left-dislocated material).

LFD is parsed as such in the corpora, which means that the data can be retrieved automatically by means of specific software. In this case, the raw empir-ical results of the search had to undergo extensive manual revision. Thus LFD was retrieved by means of the (CorpusSearch) query in (6), which identifies clauses (or IPs) dominating left-dislocated constituents.

(6) node: IP*query: (IP* Doms *-LFD)

A very small number of examples of LFD in my database are not nominal,4 as is the case in (7) below, which contains a left-dislocated prepositional phrase and a resumptive pronoun governed by a preposition in the main clause:

(7) But of the tree of the knowledge of good and euill, thou shalt not eate of it: (AUTHOLD-E2-H,II,1G.155)

By contrast, many of the examples parsed as LFD in the corpora which contain non-(pro)nominal resumptives have not been considered in this study. Exam-ples of such constructions are given in (8) to (10), in which the resumptives are, respectively, then, yet and so:

(8) […] but if it worke vpon it selfe, as the Spider worketh his webbe, then it is endlesse, (BACON-E2-H,1,20R.49)

(9) […] and though he suffer’d only the name of a slave, and had nothing of the toil and labour of one, yet that was sufficient to render him uneasy; (BEHN-E3-H,193.231)

(10) And as these Languages ought to be well understood, so they shou’d be learn’d in as short a Time as may be. (ANON-1711,3.6)

As regards TOP, which was not specifically tagged in the corpora used here, the CorpusSearch queries in (11) and (12) were used to retrieve examples, respectively, of topicalised complements (more specifically, nominal objects, subject predica-

4 A few examples from the database contain TOP and LFD of that-clauses. As regards LFD, since such that-clauses are resumed by a (pro)nominal copy, they fit the concept of LFD as established in this study. An example of a left-dislocated that-clause is given in (i): (i) [That false Locks as they call them of some Hair, being by curling or otherwise brought to a

certain degree of driness, or of stiffness, will be attracted by the flesh of some persons, or seem to apply themselves to it, as Hair is wont to do to Amber or Jet excited by rubbing.]i Of thisi I had a Proof in such Locks worn by two very Fair Ladies that you know. (BOYLE-E3-H,27E.93)


tives5 and prepositional/adverbial complements6 occurring before nominal sub-jects) and adjuncts (prepositional and adverb phrases preceding nominal sub-jects). As already pointed out, some of the examples retrieved by the queries had to be excluded manually, since they were not correct instantiations of TOP.

(11) node: IP-MAT*query: (IP-MAT* iDoms NP-OB*|NP-SPR)

AND (IP-MAT* iDoms NP-SBJ*)AND (NP-OB*|NP-SPR precedes NP-SBJ*)

(12) node: IP-MAT*query: (IP-MAT* iDoms PP*|ADVP*)

AND (IP-MAT* iDoms NP-SBJ*)AND (PP*|ADVP* precedes NP-SBJ*)

Finally, with respect to SUBJ-LAST, the CorpusSearch query in (13) retrieved matrix IPs or clauses containing at least the following two immediate consti-tuents: sentence-final noun phrases functioning as subjects and pronominal (expletive) subjects.

(13) node: IP-MAT*query: (IP-MAT* iDomsLast NP-SBJ)

AND (NP-SBJ iDoms !PRO)

Table 1 provides the raw figures of the distribution of the three constructions under analysis (the TOP data in Table 1 only includes topicalised complements for reasons which will be explained below). Figure 2 sets out the frequencies for LModE normalised to 1,000 clauses (or IPs):

5 An (archaic) illustration of a clause introduced by a topicalised object predicative is Male and female created he them (ERV-OLD-1885,1,20G.66).6 My database includes only a small number of examples of topicalised prepositional comple-ments (in (i)) and adverbial complements (in (ii)): (i) To them may be applied what St. James says on a like occasion (BURTON-1762,2,5.116) (ii) In the inward Frame the various Passions, Appetites, Affections, stand in different Respects

to each other. (BUTLER-1726,235.69)


Table 1: Totals of LFD, TOP and SUBJ-LAST constructions from ME to LModE

LFD TOP SUBJ-LAST Clauses

PPCME2 1,638 1,878 2,989 74,092PPCEME 575 359 611 34,896PPCMBE 369 352 677 60,100

Total 2,582 2,589 4,277 169,088

Figure 2: Normalised frequencies of LFD, TOP and SUBJ-LAST constructions in LModE

Since, as Figure 2 shows, the frequencies of topicalised adjuncts (TOP_adj), as in (14) below, and of complements (TOP_compl), in (1) above, differ greatly, I have opted for focusing exclusively on topicalised complements, whose proportion is closer and thus comparable to that of the LFD and the SUBJ-LAST construc-tions. In this vein, since the criterion for the distinction between complement and adjunct is syntactic (and semantic) selection by the verb, in what follows I will consider only those examples of topicalised constituents which are subcate-gorised by the verb (e.g. objects, prepositional complements, adverbial comple-ments, predicative complements).

(14) [After that a childe is come to seuen yeres of age,]Adjunct I holde it expedient that he be taken from the company of women (ELYOT-E1-H,23.27)


The proportions of LFD, TOP and SUBJ-LAST were analysed in all the registers in the corpora, namely Biography, Diary, Drama, Education, Fiction, Handbook, History, Law, Letters, Philosophy, Science, Sermon, Religious treatises, Trave-logue, Trials and Romance. Due to their archaic style and clausal organisation, I did not include Bible texts. Also, given that comparison with other Fiction texts in the latter periods is impracticable, the Fiction material in ME was not ana-lysed. Following Culpeper and Kytö’s (2010: 16–18) typology of registers, those listed above can be argued to provide an overall view of the English language in its recent history: (i) writing-related registers such as Science, Law, Education, Religious treatises, that is, registers which are primarily attested in the written form; (ii) speech-purposed registers, designed to be articulated orally (either read out or performed), like Drama and Sermons; (iii) speech-like texts in the Diaries, Letters and Biographies, which contain features of “communicative immediacy” (Culpeper and Kytö 2010: 17); and (iv) speech-based registers, based on actual real-life speech events, here illustrated by the Trials.

The normalised frequencies of the three constructions in all the registers are plotted respectively in Tables 2, 3 and 4.

Table 2: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in ME

LFD TOP SUBJ-LAST

Biography 15.11 34.62 41.93Handbook 16.65 11.10 19.26History 9.25 13.26 36.30Law 30.70 38.28 17.54Philosophy 44.18 16.83 21.71Religious treat. 29.32 31.95 35.12Romance 5.76 17.10 109.40Sermons 29.05 31.57 23.38Travelogue 14.79 15.09 72.72

Mean 21.65 23.31 41.93


Table 3: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in EModE

LFD TOP SUBJ-LAST

Biography 30.33 13.31 19.22Diary 4.07 5.63 25.42Drama 4.18 8.12 19.77Education 24.29 10.05 9.12Fiction 8.77 10.96 59.90Handbook 33.44 7.17 5.30History 19.55 17.46 21.03Law 91.65 8.15 3.64Letters 14.65 8.99 4.50Philosophy 26.20 16.16 3.50Science 41.50 9.94 16.89Sermon 32.95 6.71 18.17Travelogue 6.47 5.55 29.29Trials 6.28 4.63 12.42

Mean 24.59 9.49 17.73

Table 4: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in LModE

LFD TOP SUBJ-LAST

Biography 4.34 3.67 5.67Diary 1.61 5.37 5.55Drama 1.84 2.61 15.96Education 8.97 5.12 7.04Fiction 5.99 5.99 54.17Handbook 6.87 5.62 5.31History 5.29 4.70 15.88Law 11.94 3.47 13.86Letters 2.42 3.70 4.12Philosophy 16.27 15.18 5.42Science 3.49 2.33 3.72Sermon 29.07 10.65 13.10Travelogue 1.13 4.75 9.73Trials 1.85 4.51 0.21

Mean 7.22 5.55 11.41

With a view to determining the statistical role of each construction in the periods under investigation, Figure 3 below displays the frequencies of the three con-structions and reveals that, in line with the syntacticisation of subject-verb(-com-plement) word order in English, they all decrease considerably over time. More


specifically, whereas LFD accounted for approximately 20 to 25 examples (per 1,000 IPs) in ME and EModE, its normalised frequency is 7 clauses in LModE. As regards TOP, around 23 clauses per 1,000 contained topicalised complements in ME, this normalised frequency being slightly higher than 10 in LModE. Finally, sentence-final subjects are also rare in LModE, when approximately 13 clauses (per 1,000 IPs) belong to the SUBJ-LAST construction type, and this was the pre-ferred pattern at a normalised frequency of 42 in ME. These proportions evince the statistically marked condition of the three syntactic strategies and thus their potential status as markers of other functional or situational roles. I will return to the connection between markedness and situational delimitation in Section 3.

Figure 3: Frequencies of LFD, TOP and SUBJ-LAST in ME (PPCME2), EModE (PPCEME) and LModE (PPCMBE)

3 Analysis of the dataIn this section I employ what Biber (2013) would call both a ‘linguistic variationist’ approach, in which the register itself is taken as a variable, and a ‘text-linguistic’ perspective, according to which the registers or the texts are the research objects. In other words, the small-scale multifeature analysis which is developed in this chapter aims, first, at describing register variation across time and, second, at profiling the situational or functional roles played by three marked word-order designs in the various registers from which the data were extracted.


The section is organised as follows: 3.1 deals with the distribution of the LFD data. Section 3.2 focuses on the analysis of the TOP examples from the database. Finally, Section 3.3 considers the diachronic progression of the SUBJ-LAST con-structions under investigation.

3.1 Left dislocation and register

Figures 4, 5 and 6 contain the normalised frequencies (per 1,000 clauses) of LFD in, respectively, ME, EModE and ModE. Table 5 provides an overview of the fre-quency of LFD per register.7

Figure 4: LFD in ME (the dotted line plots the mean normalised frequency)

7 In the columns containing the registers with lower/higher proportions of LFD, TOP and SUBJ-LAST in, respectively, Tables 5, 6 and 7 I have included a selection of the registers occurring either before (lower proportions) or after (higher proportions) of the dotted line expressing the mean normalised frequency of the distribution in the figures preceding the tables. As the figures reval, the groups of registers resulting from the classification into those exhibiting more or fewer examples of the constructions under investigation is not neat and, in consequence, in order to determine the connection between register and syntactic markedness I have considered only those registers which are more representative for that purpose.


Figure 5: LFD in EModE

Figure 6: LFD in LModE


Table 5: LFD and registers across time

LOWER PROPORTIONS HIGHER PROPORTIONS

ME Romance Religious treatisesLawPhilosophy

EModE Diary ScienceDrama LawTrialsLetters

LModE Travelogue PhilosophyDiary LawTrialsLetters

In light of the proportions of sentences containing left-dislocated constituents in initial position in ME, the following conclusions can be reached: first, the regis-ters which are stylistically less literate (Biography, Romance, Travelogue), that is, those which demand on the reader’s part fewer technical understanding skills and linguistic abilities, contain a lower number of examples of LFD and, second, the registers which are stylistically more literate (Law, Philosophy, Religious trea-tises) contain more examples of LFD.8 The fact that Sermons (and possibly this can also be applied to the type of texts contained in the Philosophy historical registers, with predominant speech-related/purposed status due to the inclusion of the dialogues in Boethius’ De Consolatione Philosophiae) are grouped with the more literate registers implies that the distribution of LFD is conditioned by reg-ister literacy (the more literate the register is, the greater the frequency of LFD) and not by the production circumstances associated with either the spoken or the written medium. As for EModE and LModE, the relative proportions of LFD per register are quite similar and reinforce the view that stylistic literacy also seems to be the significant factor in these periods. As shown in Table 5, this tendency is relatively stable across time.

8 The adscription of the historical registers under investigation to the more/less literate options is based on stylistic pervasiveness within the text types. Even though the degree of stylistic hy-bridity is noteworthy in some of the registers (see my comments in Section 4), in order to de-termine connections between register and productivity of LFD, I have adhered to the taxonomy ±literate by relying on the style which is dominant in the texts studied.


From a theoretical perspective, LFD is a strategy which disrupts the unmarked organisation of the clause. First, as already pointed out, subjects are not sen-tence-initial in contexts of LFD. Second, the constituents in sentence-initial posi-tion in LFD contexts (that is, the constituents which are left-dislocated) do not fulfil a syntactic function within the clause or, in order words, cannot be syntac-tically integrated with the ensuing clause. In fact, LFD is possibly the only syn-tactic strategy in English which enables the allocation in a clause of a constituent which is semantically connected with the clause and yet syntactically untethered to it. Consequently, the syntax of LFD leads to the characterisation of this con-struction as a highly marked syntactic device in English. From this perspective, I will argue below, and at greater length in Section 4, that linguistic markedness can be claimed to be closely connected with functional specificity in register analysis, and that this paves the way for the consideration that LFD is a formal indicator of stylistic literacy, at least in the recent history of English. Couched in the terminology of multidimensional register analysis, LFD can be taken as a linguistic feature which positively contributes to the minus-plus dimension ‘less literate versus more literate’.

3.2 Topicalisation and register

Following the outline in Section 3.1, Figures 7, 8 and 9 show the distribution of TOP in, respectively, ME, EModE and LModE in the database. Table 6 summarises the results by classifying the registers into those in which TOP is frequent and those with low levels of TOP.


Figure 7: TOP in ME

Figure 8: TOP in EModE


Figure 9: TOP in LModE

Table 6: TOP and registers across time


ME Handbook Religious treatisesHistory BiographyTravelogue Law

EModE Trials BiographyTravelogue PhilosophyDiary HistorySermon Fiction

LModE Science PhilosophyDrama SermonLaw DiaryBiography Fiction

The distribution of TOP over different registers in ME is considerably more com-plicated than the partitioning of registers according to the frequencies of LFD, since the families of registers resulting from the grouping in Figure 7 do not lead to an easy explanation in terms of, for example, narrative versus expository status, written versus speech-based nature or dialogic versus monologic charac-ter. The binomial condition of less formal versus more formal/literate could pos-sibly constitute the baseline for the assessment of the cline in Figure 7, with less formal registers (for example, Handbook, Travelogue and the speech-purposed


Philosophy texts) in the group of registers containing fewer examples of TOP, and more formal registers (Religious treatises and Law) with many more instances of TOP. Nonetheless, Figures 8 and 9, which provide the information corresponding to, respectively, EModE and LModE, and Table 6, with an overview of the pre-vailing trends over time, reveal that TOP is no longer a textual marker in Modern English, in that it is a frequent syntactic device found in registers like Law and History, commonly classified as formal registers, and in Fiction or Diary, which are indisputably less formal. The data thus make clear the textually unmarked status of TOP as a functional or situational marker.

As mentioned in Section 3.1, in an attempt to give value to the connection between the distribution of formal linguistic features and the situational or func-tional status of registers, I would like to establish a link between the unmarked textual condition of TOP, resulting from the analysis of the data, and the linguis-tic characterisation of TOP as a syntactic device in English. Syntactically, TOP involves the promotion of a constituent (either complement or modifier/adjunct) to sentence-initial position, which does not imply the violation of the unmarked subject-verb design of the English declarative clause. In Section 4 I will hold the position that if a given linguistic feature (in this research, a construction type) does not trigger a significant level of linguistic (here, syntactic) markedness, then a blatant functional or situational interpretation derived from the occur-rence of the feature will not necessarily be at work. What I will be hypothesising later, although I am aware that this demands further research, is that linguistic markedness runs parallel to consistent functional specificity. If this is indeed the case, it would further emphasise the empirical relevance of multidimensional approaches.

3.3 Subject-last constructions and register

This section provides statistical information corresponding to the SUBJ-LAST con-structions analysed in this chapter, namely subject-inversion and subject-extra-position. Figures 10, 11 and 12 display the distribution of SUBJ-LAST across time and Table 7 summarises the groups of registers depending upon the frequency of SUBJ-LAST.


Figure 10: SUBJ-LAST in ME

Figure 11: SUBJ-LAST in EModE


Figure 12: SUBJ-LAST in LModE

Table 7: SUBJ-LAST and registers across time


ME Handbook TravelogueLaw RomancePhilosophy

EModE Philosophy FictionLaw Diary

TravelogueDrama

LModE Science DramaTrials HistoryHandbooks Fiction

Both the previous figures and Table 7 show that the frequencies of the SUBJ-LAST constructions investigated in this study are somehow connected to the degree of subject-involvement, as evinced by the registers in the database. In registers such as Law and Science (and many Handbooks in the LModE corpus), which proto-typically avoid speaker/writer- or hearer/reader-oriented linguistic features, one finds fewer examples of SUBJ-LAST constructions. By contrast, practically all the registers in the rightmost column in Table 7 (Travelogue, Romance, Fiction, Diary, Drama) would be described as subject-oriented registers in the traditional stylo-metric literature and do contain many examples classifiable as SUBJ-LAST in this


study. Furthermore, such a functional characterisation of the registers which are more prominent as far as the frequency of the variable SUBJ-LAST is concerned is strikingly stable in the periods explored here.

The finding reported in the previous paragraph reinforces the connection between, on the one hand, the highly marked syntax of a construction and, on the other, its substantive functional defining role. Two remarks seem in order here: first, SUBJ-LAST constructions by definition wreak havoc on the unmarked syn-tactic design of English clauses, since their syntactic subjects are placed in final postverbal position. Second, the data reflect that the frequency of SUBJ-LAST is a strong indicator of the degree of participant-involvement of a given register. Briefly, then, syntactic markedness and functional priming have been shown to go hand in hand in the recent history of English also as far as subject-inversion and subject-extraposition are concerned.

4 Summary and concluding remarksThis study has drawn on the multidimensional assumption that registers are (basically) linguistic units which can be associated with specific functional, textual and stylistic interpretations, which is in line with Biber and Conrad’s (2009: 1) well-known ‘register perspective’, which “combines an analysis of lin-guistic characteristics that are common in a text variety with analysis of the situ-ation of use of the variety”. In this study I have explored the premise that a set of linguistic constructions, in particular three syntactic strategies with marked word-order designs in English, can be taken as markers of the functional, textual and stylistic characterisation of registers. The three constructions investigated here are topicalisation, left dislocation and extraposition.

This study has shown, first, that LFD is a linguistic strategy which has been associated with literate registers from ME to LModE. This is a weighty finding, since the connection between LFD and textual literacy is not in keeping with the conversational character which is attributed to LFD in Present-Day English in the literature. To give an example, Biber et al. (1999: 957–958) claim that “Prefaces [LFD] […] are almost exclusively conversational features […] Prefaces are a sign of the evolving nature of conversation”. Second, it was found that TOP can be described as a literacy strategy in ME which has become progressively more textu-ally unmarked in Modern English. Finally, the so-called SUBJ-LAST constructions investigated in this chapter are claimed to feature subject-hearer involvement.

I have also suggested that the data serve to illustrate the link between lin-guistic markedness and situational definition. It was proposed that those con-


structions which are syntactically most marked as far as word order is concerned constitute hallmarks of well-defined situational interpretations of the registers in which they occur at an appropriate frequency. In this respect, since TOP does not significantly alter the unmarked subject-verb(-complement) organisation of the English clause, it has thus been shown not to trigger a register-specific situ-ational interpretation and, as already reported, has been defined as a textually unmarked linguistic device. By contrast, the occurrence of LFD and SUBJ-LAST in sentences which end up exhibiting syntactically marked word-order designs has been related to specific situational interpretations: LFD evinces register literacy and SUBJ-LAST is a marker of subject- or participant-involvement in a register.

The study concludes that word-order strategies can be added to the list of linguistic features, units or variables on which register analysis can rely. This not-withstanding, a final remark is in order here to acknowledge the high level of heterogeneity in the registers which the statistical analysis of the texts has iden-tified. First, hybridity in registers is sometimes a formal or a linguistic issue. In this respect, Biber and Finegan (1988: 3) recognise that for some registers “greater linguistic differences exist among texts within the categories than across them” – to give some examples, in this chapter I noted both the speech-related status of some Philosophy texts and the differences in subject-involvement among modern Handbooks. Second, as contended by writers such as Virtanen (2010: 58) when she says that “texts are seldom unitype; text types usually appear in embedded hybridized forms, resulting in multiple texts”, the multidimensional model must be able to encompass the existence of texts and even text types which are not prototypical indicators of a given situational or textual interpretation. Finally, as recognised in Biber and Conrad (2009: Chapter 7), hybridity also underlies the classification of texts into registers – see also Biber & Egbert (this volume) for an experiment on the classification of (mostly) hybrid internet registers. Virta-nen (2010: 76) also refers to this when she says that “[o]ne and the same text type can be put to use in very different genres [registers], and one and the same genre easily manifests texts that can be related to very different types”. The model would thus benefit from the statistical analysis of individual texts by means of factorial or logistic regression techniques.

To conclude, two issues have been left for further research. On the one hand, the validity of the findings in this study should be tested by extending the time span of the investigation to include Present-Day English data. In this respect, parsed corpora of contemporary English would provide empirical evidence of the issues raised in this chapter. On the other hand, a key issue in historical regis-ter variation, one pointed out in Biber and Conrad (2009: 166), is the distinction between language change and register variation. As recognised in Lijffijt et al. (2012), the null assumption in diachronic textual studies has usually been that


a single-register corpus provides homogeneous linguistic data over time with regard to unique functional or situational implications. Were this the case, varia-tion in corpus studies would lead straightforwardly to the observation of general diachronic change in language. By contrast, if the defining linguistic and/or sty-listic features of registers were claimed to be subject to change over time, then linguistic register variation would not necessarily imply diachronic change of the language’s grammar. This leads us to the conclusion that corpus-based register analysis will benefit from fine-grained analyses of the data in order to detect quali-tative inconsistencies which are, on occasions, blurred by the statistical results.

ReferencesBiber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University

Press.Biber, Douglas. 1995a. Dimensions of register variation. A cross-linguistic comparison.

Cambridge: Cambridge University Press.Biber, Douglas. 1995b. On the role of computational, statistical, and interpretive techniques in

a multi-dimensional analysis of register variation. A reply to Watson. Text 15(3). 341–370.Biber, Douglas. 2013. Register as a predictor of linguistic variation. Paper presented at

‘Register revisited: New perspectives on functional text variety in English’ International Conference, University of Vechta, 27–29 June.


Biber, Douglas & Jesse Egbert. This volume. Towards a user-based taxonomy of web registers.Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.

Longman grammar of spoken and written English. London: Longman.Birner, Betty J. 1994. Information status and word order: An analysis of English inversion.

Language 70(2). 233–259.Bolinger, Dwight. 1992. The role of accent in extraposition and focus. Studies in Language

16(2). 265–324.Culpeper, Jonathan & Merja Kytö. 2010. Early Modern English dialogues: Spoken interaction as

writing. Cambridge: Cambridge University Press.Dorgeloh, Heidrun. 1997. Inversion in modern English: Form and function. Amsterdam: John

Benjamins.Dorgeloh, Heidrun. This volume. The interrelation of register and genre in the medical register.Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner

(eds.), Syntactic variation and genre, 1–26. Berlin: Mouton de Gruyter. Fairclough, Norman. 1992. Discourse and social change. Cambridge: Cambridge University

Press.Geluykens, Ronald. 1992. From discourse process to grammatical construction: On

left-dislocation in English. Amsterdam: John Benjamins.Green, Georgia M. 1980. Some wherefores of English inversion. Language 56. 582–601.


Gundel, Jeanette K. 1985. ‘Shared knowledge’ and topicality. Journal of Pragmatics 9(1). 83–107.

Halliday, Michael A. K. 1978. Language as social semiotic. London: Edward Arnold.Kreyer, Rolf. 2010. Syntactic constructions as a means of spatial representation in fictional

prose. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 277–303. Berlin: Mouton de Gruyter.

Kroch, Anthony & Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second edition.

Kroch, Anthony, Beatrice Santorini & Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English.

Kroch, Anthony, Beatrice Santorini & Ariel Diertani. 2010. Penn-Helsinki Parsed Corpus of Modern British English.

Lijffijt, Jefrey, Tanya Säily & Terttu Nevalainen. 2012. CEECing the baseline: Lexical stability and significant change in a historical corpus. In Jukka Tyrkkö, Matti Kilpiö, Terttu Nevalainen & Matti Rissanen (eds.), Studies in variation, contacts and change in English. Vol. 10: Outposts of historical corpus linguistics: From the Helsinki Corpus to a proliferation of resources. Helsinki: University of Helsinki (Research unit for Variation, Contacts and Change in English). http://www.helsinki.fi/varieng/series/volumes/10/lijffijt_saily_nevalainen (accessed 9 February 2015).

McCawley, James D. 1988. The syntactic phenomena of English. Vols. 1, 2. Chicago: The University of Chicago Press.

Pérez-Guerra, Javier. 2005. Word order after the loss of the verb-second constraint or the importance of Early Modern English in the fixation of syntactic and informative (un-)markedness. English Studies 86(4). 342–369.

Prince, Ellen F. 1981. Topicalization, focus-movement, and Yiddish-movement: a pragmatic differentiation. In Danny K. Alford Karen, Ann Hunold & Monica A. Macaulay (eds.), Proceedings of the Seventh Annual Meeting of the Berkeley Linguistics Society, 249–264. Berkeley: Berkeley Linguistics Society.

Prince, Ellen F. 1997. On the functions of left-dislocation in English discourse. In Akio Kamio (ed.), Directions in functional linguistics, 117–144. Philadelphia: John Benjamins.

Schubert, Christoph. This volume. Introduction: current trends in register research.Swales, John M. 1990. Genre analysis: English in academic and research settings. Cambridge:

Cambridge University Press.Taavitsainen, Irma. 2001. Changing conventions of writing: The dynamics of genre, text types,

and text traditions. European Journal of English Studies 5(2). 139–150.Takahashi, Kunitoshi. 1992. Constructionally presentational sentences. Lingua 86. 119–148.Virtanen, Tuija. 2004. Point of departure: Cognitive aspects of sentence-initial adverbs. In

Tuija Virtanen (ed.), Approaches to cognition through texts and discourse, 78–97. Berlin: Mouton de Gruyter.

Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 53–84. Berlin: Mouton de Gruyter.

Indexacademic writing 4, 8, 10, 137–138, 139–165,

169–191, 195, 200–201, 206–211, 215–217, 221, 223–247, 251, 254–269

air traffic control communication 67–73, 75, 79–80, 82–83

air traffic management 69–70, 75, 79attention system 169, 178–179attenuation effect 172, 190audience participation 282, 284, 296, 300,

302automatic register/genre identification 20,

23, 39Aviation English 10, 17, 67–83

brackets 137, 139, 142, 147, 150–153, 156–157, 160–162, 164–165, 173

cognitive linguistic(s) 111, 113, 130cognitive representation 113, 121, 124,

129–130cognitive semantics 169–170, 173, 176, 178,

191cohesion/cohesive 3, 114–115, 120, 138, 196,

202–205, 209, 213–218, 245comic 2, 10, 137, 139, 153–165comma 139–140, 142, 150–157, 160, 164, 185conceptual metaphor/conceptual metaphor

theory (CMT) 88, 221, 224–226, 227, 230–234, 238, 240, 246

conjunction 138, 195, 203–205, 209–215, 217, 311

contrastive linguistics 7, 10cross-cultural/cross-linguistic register 2, 7,

222, 268, 271, 273–274, 298, 301

description 9, 24, 26, 28–38diachronic 1, 5–6, 10, 88, 221–222, 307–334dialect 2, 4, 6–7, 82–83, 102, 196–197, 237,

288, 290, 299discourse hybridity 17, 44, 49, 52, 55–56, 62discourse type 44, 47, 62discussion 24–25, 28–29, 31–32, 34, 37–38dislocation 222, 307, 312, 323, 332, see also

left dislocation

divided attention 189–191double referentiality/doubly referential 122,

131–132dual nature 181, 189

electronic communication 275electronic media 276, see also medium,

electronic mediumelectronically-mediated 271, 278, 300exclamation mark 118, 139, 142, 147–153,

155–158, 162–164extraposition 9, 222, 307, 312–315, 329, 332

face-to-face conversation 2, 72–73, 83, 255football language 272, 274, 279–280, 282,

286, 288, 299–300frame 130–133

genre 1, 2, 4–5, 8, 17, 20–21, 23, 43–62, 88, 95, 123, 142, 163, 189, 227, 253, 271, 275, 300, 307–309, 333

hip-hop 10, 17–18, 87–109hybrid(ity) 33, 43–45, 49–53, 55–59, 271,

325, 333 – hybrid register 19, 22–23, 27–28, 30–32,

36–40, 44, 62, 222, 272

ICE see International Corpus of English (ICE)illness blog 17, 43, 48–52, 58–61infotainment 271–272intercultural communication 2International Corpus of English (ICE) 138,

195–196, 199–201, 203, 205, 218, 221, 223, 228–229, 231–234, 237–247, 251, 257, 261–269, 288

internet/web 9–10, 17, 19–40, 50–52, 89, 92, 113, 151, 163, 218, 222, 230, 247, 271–302, 333

intertextual/intertextuality 18, 111–133, 292, 299, 301

inversion 100, 148, 307, 312–315, 329, 332

left dislocation (LFD) 312–326, 328, 332–333

338 Index

lexical density 138, 195, 203, 205, 208–217, 288

lyrics 18, 25–26, 30–31, 87–109

marked(ness) 49, 57, 83, 118, 129, 152, 155, 158, 161, 200, 222, 307–334

MDA see multidimensional analysis (MDA)medical case report 17, 50, 52–53, 57, 59, 61medical discourse 17, 43–62medium 4, 6–7, 9, 57, 72, 91, 114, 138, 145,

170, 172–175, 180–181, 195–219, 222, 279, 294

– electronic medium 275, 300 – medium of print 176, 188–189 – spoken medium 181, 215 – written medium 174, 182, 196, 202, 210,

293, 299, 325metaphor(ical) 2, 5, 10, 88, 119, 187, 189,

221, 223–247, 290, 299, 301, see also conceptual metaphor/conceptual metaphor theory (CMT)

multidimensional 6–7, 139, 164, 253, 307, 309–312, 329, 332–333

multidimensional analysis (MDA) 4, 6, 9–10, 253, 326

narration 9, 31, 37–38, 44, 48, 52, 56–59, 139, 154, 300

narrative/narrativity 24–25, 28–40, 43–45, 47–62, 160, 177, 272, 274–275, 277, 328

New English(es) 7, 10, 221, 223, 252, 257, 268

newspaper writing 2, 139, 147–148, 222, 293noun phrase (NP) 9–10, 103, 114, 129–130,

144, 221, 251–269, 312, 314, 316, 318noun phrase complexity/NP complexity 221,

251–269

opinion 24–26, 29, 31–40, 177, 284, 300OTCs see real-time online text commentaries

(OTCs)

paratext(ual) 274, 279, 295, 300parenthetical construction 137, 151–152, 169,

170–175, 179–186, 189–191persuasion 9, 26, 28, 30–33, 36–39, 265plain Aviation English 10, 17, 67–83

popular music/pop songs 2, 18, 87–89, 91–92, 94–96, 98–99, 105, 125, 127–128

pronominal reference 56, 59–60, 202–203, 205, 216, 312

pronoun 9, 49, 56, 59, 93, 114, 120, 138, 143, 147, 195, 203–212, 214–216, 227, 252, 255–258, 314, 317

– personal pronoun 60, 88, 93, 103–105, 114, 185, 203, 206–207, 255–256, 261–262, 265–266

punctuation 90, 119, 137, 139–165, 171, 173, 175, 183

quasi-conversation 282, 284, 300question mark 118, 142, 146–153, 155–158,

162–165

raters 19, 23, 27, 30–39real-time online text commentaries (OTCs) 2,

10, 222, 271–302reference/referential 59, 122–124, 131–132,

180, 186, 203regional variation 6–7, 10, 138, 196, 215, 219,

251–252, 255, 261–262, 268

SFL see Systemic Functional Linguistics (SFL)sociolect 3sociolinguistic approach 3, 6–7, 112, 138specialised registers 10, 17, 67–83spoken mode 9, 24–25, 137, 172–173, 179,

181, 189, 255, 265, 275, 293standard(s) of textuality 115, 122standardised phraseology 17, 67, 70–83style 1, 4–5, 123, 139, 145, 150, 155, 158, 163,

179, 183, 210–211, 262, 271, 273–279, 284, 295, 300–302, 308–310, 320

sub-register 4, 17–18, 19–40, 44, 73–74, 88, 105, 111, 122–133, 142, 176, 221, 223–247, 257

suspension dots 142, 152, 154, 156–157, 160–165

synchronic 221Systemic Functional Linguistics (SFL) 3, 8,

196–198

teaching 8, 87text 1–5, 8–9

Index 339

time adverbials 49, 56–59, 143topic 3, 8, 33, 48–50, 58, 60–62, 74, 80,

82–83, 92–95, 114, 176–177, 237–238, 242, 272

topicalisation (TOP) 222, 307, 312–334Twitter 271–274, 283–284, 296, 301

unmarked 60, 119, 129, 152, 313, 315, 326, 329, 332–333

variational text linguistics 1, 221variety 2–11, 23, 40, 43–62, 71–78, 82–83,

138–139, 142, 158, 195–219, 221, 223–247, 251–269, 272, 308–309, 332

– regional variety 1, 195, 217–218, 221, 251–254, 262, 267–268

– text(ual) variety 1, 7, 44, 50, 54, 56, 111, 227, 309, 332

variety-specific 198–199, 205, 231, 239–240, 242, 246–247, 259, 267, 269

web see internet/webword order 9–10, 144, 222, 252, 307, 312,

316, 321–322, 332–333World Englishes 6, 223–224, 252written mode 24–25, 113, 172, 181, 190, 271

Christoph Schubert and Christina Sanchez-Stockhammer (Eds.) · brücken, Germany), Caroline Tagg...

Documents

Transcript of Christoph Schubert and Christina Sanchez-Stockhammer (Eds.) · brücken, Germany), Caroline Tagg...