Georgia State University Georgia State University
ScholarWorks @ Georgia State University ScholarWorks @ Georgia State University
Applied Linguistics and English as a Second Language Theses
Department of Applied Linguistics and English as a Second Language
Fall 11-30-2010
Analysis of Four-word Lexical Bundles in Published Resesarch Analysis of Four-word Lexical Bundles in Published Resesarch
Articles Written by Turkish Scholars Articles Written by Turkish Scholars
Betul Bal Georgia State University
Follow this and additional works at: https://scholarworks.gsu.edu/alesl_theses
Part of the Applied Linguistics Commons, and the First and Second Language Acquisition Commons
Recommended Citation Recommended Citation Bal, Betul, "Analysis of Four-word Lexical Bundles in Published Resesarch Articles Written by Turkish Scholars." Thesis, Georgia State University, 2010. https://scholarworks.gsu.edu/alesl_theses/2
This Thesis is brought to you for free and open access by the Department of Applied Linguistics and English as a Second Language at ScholarWorks @ Georgia State University. It has been accepted for inclusion in Applied Linguistics and English as a Second Language Theses by an authorized administrator of ScholarWorks @ Georgia State University. For more information, please contact [email protected].
ANALYSIS OF FOUR- WORD LEXICAL BUNDLES IN PUBLISHED RESEARCH
ARTICLES WRITTEN BY TURKISH SCHOLARS
by
BETUL BAL
Under the Direction of Viviana Cortes
ABSTRACT
This study investigated the use of lexical bundles in research articles written in English by
Turkish scholars. For the purpose of the study, a corpus of published research articles produced
by Turkish scholars in six different academic disciplines was collected. The four-word lexical
bundles that appeared at least twenty times in this one million word corpus were identified and
further analyzed both structurally and functionally based on the previous taxonomies developed
by Biber, Johansson, Leech, Conrad and Finegan (1999) and Biber, Conrad and Cortes (2004).
The results of this study revealed that the lexical bundles found have structural correlates as well
as strong functional features that help to construct discourse in academic writing. The
conclusions drawn from this study could be applied to the teaching of academic genres to
researchers in English as a Foreign Language context and are expected to provide insights for
further corpus-based studies in academic writing.
INDEX WORDS: Lexical bundles, Research articles, Corpus, Academic writing, Corpus-based
studies
ANALYSIS OF FOUR- WORD LEXICAL BUNDLES IN PUBLISHED RESEARCH
ARTICLES WRITTEN BY TURKISH SCHOLARS
by
BETUL BAL
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of
Master of Art Education
in the College of Arts and Sciences
Georgia State University
2010
Copyright by
Betul Bal
2010
ANALYSIS OF FOUR- WORD LEXICAL BUNDLES IN PUBLISHED RESEARCH
ARTICLES WRITTEN BY TURKISH SCHOLARS
by
BETUL BAL
Committee Chair: Viviana Cortes
Committee: Diane Belcher
Eric Friginal
YouJin Kim
Electronic Version Approved:
Office of Graduate Studies
College of Arts and Sciences
Georgia State University
December 2010
iv
ACKNOWLEDGEMENTS
It is a pleasure to thank the many people who made this thesis possible.
First, I wish to express my gratitude to the Fulbright Commission for the support that they
gave me in order to study in the U.S.A. and to the Department of Applied Linguistics and ESL at
Georgia State University for helping me write a master‟s thesis.
It is difficult to overstate my gratitude and appreciation to Dr. Viviana Cortes, Chair of my
committee and my thesis advisor, who has been and always will be an inspiration for me in my
academic career. It would have been impossible for me to complete this thesis without her
patience, constructive feedback, and insightful advice. I have benefited a lot from her stimulating
ideas and suggestions. I am truly lucky to be her student.
Besides, I would like to thank to my committee members, Dr. Diane Belcher, Dr. Eric
Friginal, and Dr. YouJin Kim, for the time they have dedicated to the reading of my thesis and
all the valuable comments they have made. I have learned a lot from each of them during my
time at Georgia State, and I know I will continue to do so in the future.
Moreover, my heartfelt thanks go to my family for their invaluable support and
encouragement. I am grateful to my parents, Hikmet Bal and Saim Bal, who not only raised me,
taught me, and loved me but also believed in me and supported every step I took in my life. I am
also thankful to my sister and brother for their presence and moral support.
Last but not least, I am forever grateful to my significant other, Cenk, for his
encouragement, understanding, endless patience, and for his unconditional love when it was
most required.
v
TABLE OF CONTENTS
ACKNOWLEDGEMENTS iv
LIST OF TABLES vii
LIST OF FIGURES viii
LIST OF ABBREVIATIONS ix
CHAPTER 1. INTRODUCTION 1
1.1 Purpose of the Study 2
1.2 Research Questions 3
1.3 Organization of the Study 3
CHAPTER 2. LITEARUTE REVIEW 4
2.1 Definition of Corpus and Corpus-Based Studies 5
2.2 Formulaic Language and Corpora 6
2.3 Lexical Bundles and Register Variations 9
CHAPTER 3. METHODOLOGY 14
3.1 The TSRA Corpus 15
3.2 Concordancing Software: AntConc 17
3.3 Structural and Functional Taxonomies 21
CHAPTER 4. RESULTS and DISCUSSION 25
4.1 TSRAC Lexical Bundles 26
4.2 Structural Analysis of TSRAC Lexical Bundles 27
4.3 Functional Analysis of TSRAC Lexical Bundles 28
4.3.1 Stance Bundles 31
4.3.2 Discourse Organizers 33
4.3.3 Referential Expressions 33
vi
CHAPTER 5. CONCLUSION 37
5.1 Summary of the Results 37
5.2 Limitations 38
5.3 Implications 39
5.4 Suggestions for Further Research 39
REFERENCES 41
APPENDIX A: Journals Used in the TSRAC 49
APPENDIX B: TSRAC Lexical Bundles 54
vii
LIST OF TABLES
Table 2.1 Major studies on lexical bundles
Table 3.1 Disciplines in the TSRAC
Table 3.2 Structural types of lexical bundles (Biber et al., p.1015)
Table 3.3 Functional classification of lexical bundles (Biber, Conrad and Cortes, 2004
p.384)
Table 4.1 Lexical bundles in TSRAC according to their functions in context
viii
LIST OF FIGURES
Figure 1. AntConc screenshot showing the TSRAC bundles (Anthony, 2007)
Figure 2. AntConc screenshot showing the concordances (Anthony, 2007)
Figure 3. Structural distribution of TSRAC lexical bundles
ix
LIST OF ABBREVIATIONS
TSRA: Turkish Scholars‟ Research Articles
TSRAC: Turkish Scholars‟ Research Articles Corpus
LSWE: Longman Spoken and Written English
T2K-SWAL: TOEFL 2000 Spoken and Written Academic Language
BASE: British Academic Spoken English
MICASE: Michigan Corpus of Academic Spoken English Corpus
1
CHAPTER 1. INTRODUCTION
Writing for academic purposes is a challenging journey since creating texts to convey
one‟s ideas in this environment requires special attention and effort. As stated by Zamel (1998),
academic discourse has its distinguishing features “because it appears to require a kind of
language with its own vocabulary, norms, sets of conventions, and modes of inquiry, academic
discourse has come to characterize a separate culture…” (p.187). Therefore, throughout the
history of language studies, there have been many investigations that focused on finding these
distinguishing features of academic writing. As cited by Biber (2006), the majority of these
studies focused on different aspects of academic writing such as expressions of stance (Charles,
2003; Crompton, 1997; Grabe & Kaplan, 1997; Holmes, 1986; Hyland, 1994, 1996a, b; Meyer,
1997; Myers, 1989, 1990; Salager-Meyer, 1994; Silver, 2003; Varttala, 2003); academic registers
(Flowerdew, 2002; Hewings, 2001); verb classes (e.g., Hunston, 1995), and the organization of
discourse (Ferguson, 2001), to mention only a few. Academic vocabulary is also one of the
features that attracted attention, and analyzing academic vocabulary has been the purpose of
numerous studies (Coxhead, 2000; Nation, 1990, 2001; Schmitt & McCarthy, 1997). Lately,
there has been a shift from studying single lexical items to studying multi-word expressions.
Therefore, studies have begun to go beyond the analysis of single lexical items and focused on
formulaic expressions (Altenberg, 1998; Biber, Johansson, Leech, Conrad and Finegan, 1999;
Nattinger and DeCarrico, 1992; Pawley and Syder 1983). All these studies highlight the
significance of these fixed expressions which perform particular structural forms and strong
discourse functions. In the light of previous research on the presence and significance of
formulaic language in academic prose, the present study focuses on a particular multi-word
expression which is called “lexical bundle” (Biber et al., 1999). Altenberg (1998) is considered
2
to be one of the first researchers to study recurrent word combinations using empirical-based
methods. Drawing on his work, Biber et al. (1999) focused on the study of recurrent expressions
that they called lexical bundles. Lexical bundles have been the focus of various further studies
(Biber et al., 1999, 2003, 2004; Butler, 1997). Studies on lexical bundles in various English
registers have presented noteworthy and prominent results looking from different perspectives in
different registers. The common conclusion drawn from these studies on lexical bundles in
academic writing is that lexical bundles constitute a large part of academic texts and they have
structural correlates as well as significant discourse functions that help to construct the text itself
(Biber et al. 1999, 2003, 2004; Biber, Conrad & Cortes, 2003; Cortes 2002, 2004).
Most of these studies on lexical bundles are based on lexical bundles in English with the
exception of two recent studies: Cortes (2008) includes Spanish in her analysis of lexical bundles
in academic history writing, and Kim (2009) analyzes the use of lexical bundles both in
academic and spoken registers in Korean. However, although there are studies that go beyond the
use of lexical bundles in languages other than English, little is known about the use of lexical
bundles by non-native speakers of English when they speak or write in English. Therefore, the
idea to investigate the lexical bundles produced by non-native speakers of English in their
published academic writing became the impetus for this study.
1.1 Purpose of the Study
The main objective of the present study is to identify the four-word lexical bundles used by
Turkish scholars who are non-native speakers of English. The academic texts used for the
purpose of the study are published research articles in international journals from six different
academic disciplines written by Turkish scholars. The lexical bundles identified are compared
3
with the bundles previously identified in several studies from the literature that analyzed lexical
bundles in different academic registers (Biber et al. 1999; Biber and Conrad, 1999; Biber,
Conrad and Cortes, 2003, 2004; Cortes, 2004; 2008). Moreover, using both quantitative and
qualitative analyses, this study aims to further investigate these lexical bundles in terms of their
structures and functions based on the taxonomies that have been previously designed and used
for the classification of lexical bundles (Cortes 2002, 2004; Biber, Conrad & Cortes, 2003).
1.2 Research Questions
In order to reach a comprehensive analysis of lexical bundles used by Turkish scholars
when they write research articles in English, this study will explore the following research
questions:
1. What are the most common four-word lexical bundles found in published research
articles written by Turkish scholars?
2. How much do these lexical bundles have in common with those bundles previously
identified in the literature?
3. What are the structural and functional features of the lexical bundles found in this study?
1.3 Organization of the Study
To address these research questions, Chapter 2 will provide background information on the
meaning of corpora and how corpus-based studies are conducted. Then the significance of
formulaic language in academic writing will be presented as well as a description of lexical
bundles and recent corpus-based studies on these expressions. In Chapter 3, will introduce the
procedures followed for the compilation of corpus data, the computer software used together
4
with the quantitative and qualitative analyses conducted, and the taxonomies used for the
analysis of the lexical bundles identified in this study. The characteristics of the TSRAC lexical
bundles will be introduced in Chapter 4, together with a detailed report of the results of the
analyses. To conclude, Chapter 5 will offer a brief summary of the study and its results, followed
by its limitations. Then implications for language teachers and researchers, and suggestions for
further studies on lexical bundles will also be provided in this final chapter.
5
CHAPTER 2. LITERATURE REVIEW
This chapter will provide background information for the present study by presenting two
sections: first, an introduction to corpora and corpus-based studies and second, a literature
review on studies of formulaic language in academic discourse followed by a detailed review of
recent corpus-based studies on lexical bundles in academic prose which are closely related to the
present study.
2.1 Definition of Corpus and Corpus-Based Studies
As a Latin rooted word, corpus means body which, when used in the linguistics field,
refers to a “body of texts”. In today‟s world, however, in the field of Applied Linguistics, the
term corpus is related to a large collection of machine-readable texts. As cited by McEnery and
Wilson (1996), some corpus-based studies were conducted in the past centuries (Eaton, 1940;
Fries and Traver, 1940; Preyer 1889; Kading, 1897). The actual meaning of corpus-based
research, however, refers to studies where a machine-readable corpus is created and computer
software is used to analyze it.
As Conrad (1996) states, there are certain important characteristics of corpus-based
investigations that need to be emphasized. Corpus-based studies
(a) are based on principled collections of naturally occurring texts (the corpus),
(b) use computers for both automatic and interactive analyses, and
(c) include both quantitative analyses and functional interpretations in order to describe
patterns in language features.
As these features suggest, in a corpus-based study, once the corpus is collected, a
concordancing program, for example, may be used to allow the researcher to search the target
6
item or items in the corpus. These programs provide lists of lines/concordances in which the
target item occurs, which enables further analysis. When automatic quantitative analyses such as
frequency lists, collocations etc. are retrieved, more qualitative interpretations are made based on
these findings. The target item in the corpus depends on the purpose of the study. While it can be
a specific language feature such as complex noun phrases (Vande Kopple, 1992), it can also be
writer attitude as in the example presented by Salager - Meyer (1992).
One of the first modern corpus-based analysis projects was begun by Francis and Kucera at
Brown University in 1961. This project deserves mention as it is the first major computational
corpus project. It was a one million word corpus known as the Brown Corpus drawn from
randomly sampled materials written in American English in 1961 in a variety of genres. It has
inspired many other corpus studies as representing a significant step from non-digital to digital
corpus-based investigations.
2.2 Formulaic Language and Corpora
In recent years, an increasing number of studies have made use of corpus data to analyze
formulaic expressions used in different registers. Academic registers1 have become one of the
registers that attracted attention of linguists. Research on defining and processing formulaic
language in academic prose has been the purpose of many studies, starting with the study of
Pawley and Syder (1983), followed by Nattinger and DeCarrico (1992) and more recently Biber
et al. (1999), Wray (2000, 2002) and Cortes (2002, 2004, 2008) to mention only a few. The latest
trend in the study of formulaic language in academic writing has focused on a particular type of
1 All the analyses conducted in this study used a register-based perspective, defining register as a situationally
defined variety of the language (Biber et al., 1999. p.15). It is necessary to point out that this perspective is different
from other perspectives on text types used for text analysis and classification. In addition, the register-based
perspective has been used by numerous corpus-based studies to categorize texts.
7
recurrent expressions called lexical bundles (Biber et al.,1999) which will be defined in the
following section.
For many years throughout the literature, groups of words that frequently occur together in
a language have been studied and described under different labels such as; recurrent word
combinations (Altenberg, 1998; De Cock, 1998), n-grams (Banerjee & Pedersen, 2003), lexical
bundles (Biber & Conrad, 1999; Biber, Johansson, Leech, Conrad, & Finegan, 1999; Stubbs,
2007a, 2007b), prefabricated patterns (Granger, 1998), formulas (Granger and Meunier 2008;
Sinclair 1991; Wray 2002), clusters (Hyland, 2008a; Schmitt, Grandage & Adolphs, 2004),
phrasal lexemes (Moon, 1998), prefabs or lexical phrases (Nattinger & DeCarrico, 1992),
sentence stems (Pawley & Syder, 1983), formulaic sequences (Schmitt & Carter, 2004), among
others. These studies focused on different types of word combinations and used different
research methods. The present study will focus on a particular type of word combinations called
lexical bundles which were first defined in the Longman Grammar of Spoken and Written
English (Biber et al., 1999). Lexical bundles are fixed group of words that occur together in a
language and are commonly used in particular registers, that is in different situationally defined
varieties of the language. As stated by Biber et al. (1999) lexical bundles are „„recurrent
expressions, regardless of their idiomaticity, and regardless of their structural status” (p. 990). In
order for a word combination to count as a bundle, it has to meet a set of defining criteria as
explained by Biber (1996). First, since frequency is the defining characteristics of the lexical
bundles, these expressions must occur frequently in a register. They are simply the most
frequently occurring sequences of words in a sub-corpus of texts from a single register. The
frequency cut-off point may vary from study to study. Biber et al. (1999) concluded that to be a
lexical bundle, a four-word expression had to recur ten times per million words and had to
8
appear in more than five texts. On the other hand, the criterion for Biber, Conrad, & Cortes
(2004) was that a lexical bundle had to occur forty times in a one-million word corpus; whereas,
Cortes (2004) decided to set the cut-off point at twenty times in one million words. These higher
cut-off points were chosen to be more conservative in the frequency of these expressions and to
ensure that the object of analysis in these studies consisted of unit expressions that were used in
extremely high frequencies. Second, in addition to frequency, lexical bundles must be used in at
least five different texts. This prevents focusing on idiosyncratic uses by the authors of the texts
in the corpus under consideration. Third, it should be noted that lexical bundles are not idiomatic
in meaning. Although a lexical bundle functions as a whole unit, unlike idioms, its meaning
could be clearly understood from the words that construct the bundle. Finally, lexical bundles do
not represent complete structural units. In fact, Biber et al. (1999) found that in academic writing
more than 95% of the lexical bundles were not complete units. The argument is further supported
by Cortes (2004): “Lexical bundles are identified empirically, rather than intuitively, as word
combinations that recur most commonly in a register, and therefore, lexical bundles are usually
not complete structural units, but rather fragmented phrases or clauses with new fragments
embedded” (p. 400).
In the study of lexical bundles, computer software and corpus tools have been essential for
researchers to complete these studies where the purpose is to reach empirical conclusions and to
analyze the collected data. The present study also utilizes computer software in order to conduct
the study. The concordance program AntConc, which is used in this study, will be introduced in
detail in Chapter 3.
Lexical bundles have attracted attention in language studies. Many corpus-based studies
were conducted looking at frequencies of lexical bundles or comparing lexical bundles in
9
different registers, in different contexts, or in the products of writers with different proficiency
levels (novice vs. experienced authors). Among many other results of these studies, it is found
that lexical bundles can be easily related to various discourse functions. In the next section, some
prominent studies on lexical bundles will be presented.
2.3 Lexical Bundles and Register Variations
Over the last few decades, there has been a sharp shift in the study of formulaic
expressions toward to study of recurrent expressions identified empirically and frequency-based.
An increasing number of studies on lexical bundles have been conducted. Most of these studies
have reported results on the distribution and use of lexical bundles in English. These studies have
had various purposes and looked at different registers. While some of these studies investigated
the lexical bundles in spoken vs. written registers, others looked at academic vs. non-academic
registers. In addition, there are studies that investigate lexical bundles in languages other than
English or comparing two languages (English vs. Spanish). Examining lexical bundles for
pedagogical purposes has also been the focus of a few studies on lexical bundles.
Table 2.1 below provides an overview of these previous corpus-based studies on lexical
bundles with different corpora, and research focus and purposes which were conducted in the
past decades. Further explanation of the purposes, findings and results for each of these studies
will be provided in the following paragraphs.
10
Table 2.1 Major studies on Lexical bundles
Author Year Corpus # Corpus Size
Biber, Johansson,
Leech, Conrad, &
Finegan
1999 LSWE Corpus
Over
40,000,000
Cortes 2002 Native freshmen
compositions (311 papers)
360,704
Cortes
2004
Published writings and
student writings
Published writings:
1,992,531; Student
writings: 904,376
Biber, Conrad, &
Cortes
2004
T2K-SWAL Corpus
2,009,400
Scott & Tribble 2006 MA dissertations
(POZ_LIT) and BNC World
English Edition
POZ_LIT: 352,258
BNC: 1,500,000
Nesi & Basturkmen 2006 BASE corpus and MICASE
1,270,798
Biber & Barbieri 2007 T2K-SWAL and LSWE T2K-SWAL:2,541,795
LSWE Academic:
5,330,000
Cortes 2008 Published history writing in
English and Spanish
English: 1,001,012
Spanish: 1,003,264
Hyland 2008a Research articles, doctoral
dissertations and master‟s
theses
3,400,400
Hyland
Kim
2008b
2009
Research articles, doctoral
dissertations and
master‟s theses
Korean Lexical Bundles in
Conversation and Academic
Texts
3,500,000
The Sejong Corpus:
Conv.: 2,604,054
Acad.: 3,407,020
11
Table 2.1 shows a list of some corpus-based studies on lexical bundles. The results of these
studies emphasize the importance of these linguistic features in different registers, contexts and
languages. The first study shown on the table is by Biber et al. (1999) which was based on a
large corpus of both American and British English conversation and academic prose. Biber et al.
(1999) coined the term lexical bundles for “…word forms often co-occur in longer sequences,
called lexical bundles” (p.989). In the same chapter, it is stated that “both conversation and
academic prose use a large stock of different lexical bundles” (p.993). This claim has become a
springboard for further studies on lexical bundles in different registers Biber et al. (2004)
conducted another extensive study by looking at the use of lexical bundles in university
classroom teaching and textbooks in comparison with the LSWE corpus previously mentioned.
They discovered that the lexical bundles in their corpora differ dramatically from other linguistic
features, and that university lectures use twice as many lexical bundles than conversation and
four times as many lexical bundles as textbooks. The structural and functional taxonomies
structured in these two studies (Biber et al. 1999, 2004) will also be used in the present study and
will be described in detail in the methodology chapter.
In addition, using the same corpus, the T2K-SWAL, Biber and Barbieri (2007) looked at
the use of lexical bundles in non-academic university registers and core instructional registers. In
contrast with previous studies which showed that lexical bundles were more common in speech
than in writing, they found that lexical bundles were very common in instructional written course
texts such as course syllabi.
Cortes (2002) analyzed freshman compositions in terms of lexical bundle use. After
collecting 311 student writings and using a specially-designed computer program, she found 93
different lexical bundles. Further analysis, however, showed that in terms of structure these
12
lexical bundles looked like the lexical bundles used in academic prose while functionally these
expressions served as temporal or locative markers which created redundancy in students‟
writings. This study showed that lexical bundles should be analyzed elaborately both structurally
and functionally and further studies should be done in students‟ written production at different
levels and in different disciplines. Following this argument in her next study, Cortes (2004)
compared the written productions of university students who were native speakers of English
with published journal articles. Her corpus of over 2 million words consisted of two main
disciplines; history and biology. This study revealed that students rarely used the lexical bundles
identified in the corpus of published writing. Similarly, Scott and Tribble (2006) also looked at
student writings and professional writings and concluded that apprentice writers used less varied
and less sophisticated lexical bundles.
Going beyond studies that focused on English, four years later Cortes (2008) published
another study aimed at comparing published history articles in English and in Spanish. After
collecting history articles from journals both in American English and Argentinean Spanish,
Cortes compared the lexical bundles identified in those corpora and analyzed them in terms of
both structure and function. It was clear that even though the number of lexical bundles found
was different, there was a certain degree of agreement in the expressions identified in each
language. Another recent study exploring another language than English has been published by
Kim (2009). Investigating a large corpus of Korean texts consisting of academic prose and
conversation, she found that lexical bundles are important expressions in Korean with the
function as discourse frames for new information.
As also shown in table 2.1, Nesi and Basturkmen (2006) used 160 monologic lectures from
the BASE corpus and MICASE. This study focused on the function of lexical bundles in
13
academic lectures and revealed that lexical bundles can play a discourse signaling role in lectures
and it is important to raise students‟ awareness of this use of lexical bundles.
The two other corpus-based studies on lexical bundles that deserve to be mentioned here
are by Hyland (2008a, 2008b) who has done many studies on the analysis of various linguistic
features frequently found in academic discourse. In these two studies based on findings from two
corpora of research articles, doctoral dissertations and master‟s theses, Hyland emphasized that
postgraduate students tended to employ more formulaic expressions than native academics and
there was disciplinary variation in the use of lexical bundles.
In addition to comparing registers or novice or experienced writers, there are also a few
studies that focused on a more pedagogical aspect of lexical bundles (Cortes 2006, Neely &
Cortes, 2009). Cortes (2006) reported the results of a study in which she explicitly taught lexical
bundles to students in a writing intensive history class. After analyzing the effectiveness of the
tasks she prepared for teaching lexical bundles by comparing students‟ writings, she concluded
that students‟ use of target bundles was rare and uneven and having a few lessons that
demonstrate some examples of lexical bundles in professional writing might not necessarily
result in students using more lexical bundles in a more appropriate way. However, she also
emphasized that this explicit teaching of lexical bundles might increase awareness of these
expressions and might lead to more academically appropriate written productions.
It should be noted that most studies on lexical bundles focused on the production of the
native speakers of a language, English or other language. So far, little is known about the lexical
bundles used by non-native speakers of a language in their academic written production.
In this chapter some corpus-based studies on lexical bundles have been reviewed in detail.
It is clear that the results obtained from these corpus-based studies reveal a lot of valuable
14
information about the significance of lexical bundles and how they differ both structurally and
functionally in different academic registers and in different contexts. Additionally, they provide
opportunities to explore lexical bundles in further studies, which was the impetus for the present
investigation. In the light of these and other studies on lexical bundles, the following chapter will
introduce the data collected for this study and the methodology used in this study.
15
CHAPTER 3. METHODOLOGY
This chapter describes the steps followed to conduct this study. First, the collection of the
corpus created for the purpose of this study (a corpus of published articles written in English by
Turkish scholars) will be introduced. In the second section, the concordancing program used to
facilitate the search for lexical bundles in the corpus will be described, and in the last section the
taxonomies used for structural and functional analysis of the identified lexical bundles will be
discussed in detail.
3.1. The TSRA Corpus
In one of her works, Conrad (1996) begins describing the corpus for her study by saying
that “In a corpus-based study, the design of the corpus is very important because the corpus must
be suitable to the research questions being addressed” (p. 303). Since this study focuses on
finding the lexical bundles used by Turkish scholars in their research articles written in English,
the corpus needed to be carefully compiled to serve this purpose. Only research articles were
included in the corpus because it is believed that including more than one type of academic prose
could affect the results of the study as lexical bundles are register-bound. Therefore, instead of
including a limited number of theses or dissertations from a limited number of researchers or
including different types of academic texts, only research articles from different authors have
been compiled which contributed to the reliability of the study. Using the library online database
at Georgia State University Library, articles written between 1990 and 2010 by Turkish authors
in six different disciplines were collected from various professional journals (see Appendix A for
a complete list of journals). Table 3.1 presents more information on the disciplines included and
the number of words for each discipline in the corpus. The articles collected for the Turkish
16
Scholars Research Articles Corpus used in this study, which hereafter will be referred to as
TSRAC, were individually checked to ensure that the article was from a journal published in an
English speaking country and was released within the time period previously established (1990-
2010). It was also ensured that the nationality of the authors was Turkish and they were in
Turkey while writing these articles. In addition, articles which had native speakers of English as
co-authors were not included in the corpus collection. After all the electronic copies of the
articles were collected, the process of erasing non-textual annotations such as the titles, page
numbers, tables, statistical graphics, numerical data, formulations, and references was
completed.
In terms of the size of the corpus for this study, the principle suggested by Biber (2006)
was followed. According to Biber (2006) “A corpus must be large enough to adequately
represent the occurrence of the features being studied”. He goes on explaining why corpus size
matters by emphasizing that it depends on the purpose of the study. For example, if the target
feature is a frequent grammatical structure such as nouns or verbs, the size of the corpus can be
smaller because these features occur frequently. However, if less common features are the target
of the study, then it is essential to work with a larger corpus. In this study; therefore, a one-
million word corpus was required. It should also be noted that it is ensured that the number of
words in each section of the corpus from different academic fields is almost equal.
Table 3.1 shows some information on corpus size and the disciplines the research articles
selected for the corpus collection belong to. When the corpus reached 1,000,000 words and was
ready to be further analyzed, the computer software AntConc was used.
17
Table 3.1 Disciplines in the TSRAC
Disciplines # of Words # of Articles
Economics 164,745 29
Education 167,541 32
History 169,299 20
Medicine 153,715 44
Psychology 164,358 50
Sociology 185,479 25
Total 1,005,137 200
3.2 Concordancing Software: AntConc
This present study aims to find the most common lexical bundles in TSRAC. It has been
noted that different studies have set different criteria for the identification of lexical bundles,
such as number of words within each bundle and the frequency and range cut-off points. In this
study the criteria followed in establishing the cut-off points agrees with that by Cortes (2008) “a
four-word combination has to occur twenty times in one million words, and has to appear in five
or more texts” (p.46) to be considered a lexical bundle. The reason to focus on four-word lexical
bundles is that, as Cortes (2004) observes, “many four-word bundles hold three-word bundles in
their structures” (p. 401) and four-word bundles are, in many cases, much more frequent than
five-word bundles. As also stated by Hyland (2008b), four-word lexical bundles are more
common and present a wider range of structures and functions.
With the increase of corpus-based research studies in the field of Applied Linguistics and
language teaching, new tools used to analyze language corpora have been developed. For the
purpose of this study, AntConc, a useful text analysis tool created by Laurence Anthony (2007),
was used. The reason why this software was chosen is because along with other features, it has
word and keyword frequency generators, and tools for cluster and N-grams analysis. Particularly
18
in terms of lexical bundles, AntConc can be considered an efficient tool to identify word
combinations after meeting the previously-established cut-off points for frequency. However, it
does not allow range which had to be processed manually as explained in detail below. The
procedure of finding lexical bundles began with first clearing the articles from non-textual
content such as graphics, formulas, page numbers, references, tables, figures etc. Since AntConc
requires plain text, all the articles are saved as plain texts before being uploaded to AntConc.
Second, for retrieving lexical bundles from those integrated files, frequency counts of 4-grams
using the “N-Grams” command in AntConc (Anthony, 2007) were conducted. This function
performs a full extract of any n-grams from the whole corpus once “n” is specified. In addition,
using the minimum n-gram frequency of AntConc, it is ensured that the expression found
appears at least twenty times in the corpus. After running AntConc based on these settings, a list
of four-word expressions is retrieved and the cut-off point for range had to be calculated
manually. The way in which the file information is presented by the software makes it easy to
manually count the number of texts in which an expression occurs in order for that expression to
meet the cut-off point for range and be considered a lexical bundle. As the next step, each
expression in the list had to be manually checked to find whether or not it appears in more than
five texts in the corpus. Expressions that appeared in less than five texts are not considered to be
lexical bundles and were, therefore, eliminated.
19
Figure 1. AntConc screenshot showing the TSRAC bundles (Anthony, 2007)
AntConc not only helped with the quantitative part of this study, providing a frequency list
as shown in Figure 1, but also provided the information required for the qualitative interpretation
of the results which is one of the aims of the study: the description of structural and functional
types of lexical bundles identified in the TSRAC. In previous studies on lexical bundles, it was
clearly stated that lexical bundles show variety in terms of their grammatical structures and their
functionalities (Biber et al.1999, 2003, 2004). Therefore, each bundle was analyzed elaborately
in its context in order to reach a conclusion about the functional type of a bundle. As the last
20
step, the concordancing tool of AntConc was used to get a clear viewing of the sentences in all
the texts in which the bundle occurred, which is also shown in Figure 2.
Figure 2. AntConc screenshot showing the concordances (Anthony, 2007)
Based on these analyses, lexical bundles that had similar grammatical structures and
functions were grouped together using the structural and functional taxonomies according to
their use and meaning in context. It should be noted that in order to reach a complete and more
reliable conclusion, a second rater helped with the identification and classification of the lexical
bundles found.
21
3.3 Structural and Functional Taxonomies
The structural classification of lexical bundles in the Longman Grammar of Spoken and
Written English (Biber et al., 1999) has been widely relied on in the studies on lexical bundles in
the field (Cortes, 2002, 2004; Hyland, 2008a, 2008b). A revised version of this classification was
used for the purpose of this study (see Table 3.2). According to this taxonomy, lexical bundles
were divided into 12 major structural categories which can be seen in Table 3.2. However, for
the purpose of this study, a slight change has been applied to this model by placing these
classification into two broader categories; phrasal and clausal. For the phrasal bundles, three
subcategories were distinguished: “Noun-Phrase (NP) based,” “Preposition Phrase (PP) based,”
and “Verb Phrase (VP) based.” NP-based bundles include any noun phrases with post-modifier
fragments, such as the role of the or the way in which; PP-based bundles refer to bundles starting
with a preposition plus a noun-phrase fragment or another prepositional phrase fragment, such as
at the end of or in relation to the. Lastly, VP-based bundles are those with any word combination
with a verb component, such as in order to make or was one of the. Clausal lexical bundles, on
the other hand, can be a verb or adjective followed by a to-clause fragment as in the example of
is likely to be, or a verb phrase followed by a that-clause fragment such as should be noted that.
Lexical clauses that incorporate that-clause (can be seen that), to-clause (are more likely to), or
adverbial clause (if there is a) are categorized in one broad group as clausal. Although Biber et
al. (1999) does not classify the lexical bundles into phrasal and clausal in the taxonomy modeled,
for the purpose of this study these two categorizations are used as seen in Table 3.2.
22
Table 3.2 Structural Types of Lexical Bundles (Biber et al., p.1015)
Category Example
A. Phrasal
1. NP-based
(connector +) NP with of- phrase fragment the end of the
NP with other post modifier fragment the way in which
2. PP-based
PP with embedded of-phrase fragment as a result of
Other Prepositional Phrase (fragment) at the same time, on the other hand
3. VP-based
Anticipatory it + VP/adjective P + comp. cl. it is possible to
Passive verb +PPf is based on the
Copula be + noun phrase/adjective phrase is one of the, is due to the
Pronoun/NP + be this is not the, there are a number of
B. Clausal
(verb/adjective +) to-clause fragment is likely to be, to be able to
(VP +) that-clause fragment should be noted that
Adverbial clause fragment as shown in figure, if there is a
C. Other Expressions as well as the
23
With regard to the functional categorization of the lexical bundles in this study, the
taxonomy designed by Cortes (2002) and improved by Biber and his colleagues (Biber et al.,
2003, 2004 and 2007) was used. In this taxonomy three major categories were distinguished:
“stance bundles”, “discourse organizers,” and “referential expressions” (see Table 3.3).
Stance Bundles are groups of words that reveal the writer‟s attitude, judgment, perspective
in terms of certainty or uncertainty, and proposition or ability as in it is important to, to come up
with, or the fact that the. On the other hand, as their name suggests, “discourse organizers” help
to compose and structure the text itself. They have various functions such as introducing a topic,
clarifying or elaborating on the topic (e.g., a little bit about, as well as the). Finally, “referential
expressions”, which are very frequent in academic texts, are those that relate to a given attribute,
a condition or refer to number, amount, size or quantity. Furthermore, expressions which reveal
information about time and place are also included in this broad category. The bundles that can
express different referential functions in different contexts are categorized as multi-functional
referential expressions. For example the bundle at the end of can both refer to place and time as
seen in the example, “at the end of this paper” or “at the end of the 19th
century”.
24
Table 3.3 Functional classification of lexical bundles (Biber, Conrad and Cortes, 2004
p.384)
Categories Example
1. Stance Expressions
A. Epistemic Stance
Personal I think it was
Impersonal are more likely to
B. Attitudinal/ Modality Stance
B.1) Desire if you want to
B.2) Obligation/ Directive
Personal you look at the
Impersonal it is necessary to
B.3) Intention/Prediction
Personal what we are going to
Impersonal is going to be
B.4) Ability
Personal to be able to
Impersonal it is possible to
2. Discourse Organizers
A. Topic Introduction/Focus in this chapter we
B. Topic Elaboration/ Clarification on the other hand
3. Referential Expressions
A. Identification/ Focus one of the most
B. Imprecision and things like that
C. Specification of Attributes C.1) Quantity Specification a lot of people
C.2) Tangible Framing Att. in the form of
C.3) Intangible Framing Att. in the case of
D. Time/Place/Text Reference
D.1) Place Reference in the United States
D.2) Time Reference at the same time
D.3) Text Deixis as shown in Figure N
D.4) Multi-functional Ref. at the end of
25
In this chapter, the details of how the texts were collected and the corpus was compiled,
followed by a brief description of computer software used to analyze these texts were presented.
Finally, the chapter introduced the two taxonomies developed by Biber et al. (1999) and Biber et
al. (2004) that will be used for the structural and functional analysis of lexical bundles found in
the TSRAC. Based on the data and procedures just described, the next chapter will present the
lexical bundles identified in this study together with their structural and functional
classifications.
26
CHAPTER 4: RESULTS AND DISCUSSION
This chapter introduces the lexical bundles identified in the TSRAC. In addition, the results
of the quantitative and qualitative analyses will be presented as well as a discussion for these
results.
4.1 TSRAC Lexical Bundles
A total of ninety-nine lexical bundles were identified in the TSRAC (see Appendix B for a
complete list). The most frequent lexical bundles found were on the other hand, the end of the, as
well as the, in the case of and one of the most, all of which are also identified as frequent lexical
bundles in the literature. In the Longman Grammar of Spoken and Written English, Biber et al.
(1999) state that the two most common four-word lexical bundles are in the case of and on the
other hand, which are also extremely frequent in the TSRAC.
Fourteen out of ninety-nine lexical bundles occurred more than fifty times per million
words, which shows a highly frequent use of these recurrent expressions. The first nine of these
fourteen frequently used bundles had been also identified by Biber et al. (2004), and Cortes
(2004, 2008). When individually compared to the lexical bundles identified before in the
literature, it was found that 53 of the total 99 lexical bundles had not been identified before.
As one of the purposes of this study is to find the structural and functional features of the
lexical bundles produced in the TSRAC, in the next two sections the structural and functional
analyses will be presented. These classifications will be followed by a detailed analysis of those
bundles that were exclusively found in the corpus collected for the present study, the TSRAC.
27
4.2 Structural Analysis of TSRAC Lexical Bundles
First of all, in parallel with what Biber et al. (1999) and Cortes (2004) argued in their
studies, the lexical bundles found in the TSRAC are not grammatically complete units as shown
by expressions from the TSRAC such as one of the most, the end of the, this study was to, to the
results of, etc. Even though lexical bundles are not complete units, they can be grouped
according to their structural characteristics. Overall, there are two broad types of lexical bundles,
phrasal and clausal. Phrasal lexical bundles are divided into sub-categories as noun phrase-based,
prepositional phrase-based and verb phrase-based. Clausal bundles, on the other hand, are
formed for example by a that-clause fragment and a verb followed by a to-clause fragment. The
third group in addition to phrasal and clausal fragments is called other expressions which is
further explained by Biber et al. (1999) as “lexical bundles that do not fit neatly into any of the
other categories” (p.1024). As shown in Figure 3, the largest part of the lexical bundles is
comprised of prepositional phrases (PP). The forty-eight lexical bundles in this group are made
up of prepositional phrases followed by thirty-three lexical phrases made up of noun phrases.
Examples of these prepositional phrases are: in the context of, at the time of, in this study the, in
line with the, in terms of their etc. Lexical bundles that are formed by noun phrases (NP) are
expressions such as aim of this study, results of this study, an increase in the, the second half of,
and others. Verb phrase (VP)-based bundles are relatively rare and examples are expressions
such as it was found that, it is necessary to, participate in the study, it is possible to etc. There
are only two lexical bundles that have clausal fragments (CF) in this corpus and they are that
there is a, and to be able to. Finally, lexical bundles called other expressions, as explained above,
are; as well as the, as well as in and than half of the.
28
The lexical bundles that had not been identified before show structural varieties. There are
bundles from each of these five groups except CF; PP (in accordance with the, according to the
results, of the most important, in line with the); NP (a result of the, the role of the, the purpose of
this, the second half of, the establishment of the); VP (it was determined that, to participate in
the, were included in the, participate in the study); and other expressions (than half of the, as
well as in).
Figure 3. Structural Distribution of TSRAC Lexical Bundles
4.3 Functional Analysis of TSRAC Lexical Bundles
As explained in Chapter 3, the taxonomy used for the functional analysis of lexical bundles
was developed by Biber et al. (2004) and included three broad categorizations; stance
expressions, discourse organizers and referential expressions with various subcategories for each.
It was found that overall, lexical bundles used by Turkish scholars perform functions similar to
those performed by bundles previously identified in the literature. In addition to the categories
0
10
20
30
40
50
60
70
80
90
100
PP NP VP Other CF
29
from Biber et al. (2004), the sub-category of referential expressions called institute bundle
(Cortes, 2008) had to be added to classify some bundles for the TSRAC. In addition a group of
bundles identified in the TSRAC which had not been identified before were performing a
function that did not match any of the functions in the existing taxonomy used for the
classification. Thus, a new category labeled as research referential had to be created within
referential expressions to classify these expressions. Table 4.1 presents the functional
classification of all the four-word lexical bundles identified in the TSRAC.
Table 4.1 Lexical Bundles in TSRA Corpus according to their functions in context
Category Sub-category Bundles
_____________________________________________________________________
Stance
a) Epistemic Stance
Personal
Impersonal the fact that the, to the fact that*, of the fact
that*
b) Attitudinal/Modality Stance
Desire
Obligation/Directive
Personal
Impersonal of the most important*, it is necessary to, the
importance of the*
Intention/ Prediction
Personal are more likely to
Impersonal
Ability
Personal to be able to, it is possible to
Impersonal
Discourse Organizers
a) Topic Introduction/Focus in the present study, that there is a,
with respect to the
b) Topic Elaboration/ on the other hand, as well as the, in accordance
Clarification
30
Table 4.1 Lexical Bundles in TSRA Corpus according to their functions in context (cont’d)
Category Sub-category Bundles
___________________________________________________________________________
Discourse Organizers
b) Topic Elaboration/
Clarification with the*, it was determined that*, it was
found that*, on the one hand, that there was a*,
with the help of* as well as in*, were found to
be, was found to be, in addition to the
Referential Expressions
a) Identification Focus one of the most, is one of the
b) Imprecision
c) Specification of Attributes
Quantity Specification the majority of the, the rest of the, the total
number of, for the first time, than half of the*,
the second half of*
Tangible Framing Attr. on the part of , in line with the*, the size of the
Intangible Framing Attr. in the case of, as a result of, on the basis of, in
terms of the, a result of the*, the beginning of
the, in the context of, the basis of the*, an
important role in, the case of the*, in terms of
their*, the nature of the, the course of the, in
the form of, an increase in the, Turkish version
of the, the ways in which, in the number of, the
establishment of the*, at the level of*, in the
face of, in the field of*, the characteristics of
the*, the relationship between the, the role of
the*
d) Time/Place/Text Reference
Time Reference at the same time, at the time of, in the early
#s*, the #s and #s, in the #s and*, in the late #s,
during the course of*, at the end of
Place/ Event Reference in the Ottoman Empire*, in the city of*
31
Table 4.1 Lexical Bundles in TSRA Corpus according to their functions in context (cont’d)
Category Sub-category Bundles
___________________________________________________________________________
Referential
Expressions
Text Deixis in accordance with the*, of this study was*,
this study was to*, according to the results*,
are presented in Table*, of the present study*
Institution Reference the Ministry of Education*, of the Ministry
of*the Turkish Republic*, Ministry of National
Education*, by the Ministry of*, at the
university of
Multi-Func. Reference the end of the, of the Ottoman Empire*, at the
beginning of
e) Research Reference to participate in the*, to the result of*, the
results of the, the aim of this*, purpose of this
study*, aim of this study*, the purpose of this*,
results of this study*, in this study the*, of this
study is*, in a study by*, of the patients were*,
were included in the*, for the purpose of*,
participate in the study*
____________________________________________________________________________
* is used for lexical bundles that had not been identified before in the literature
4.3.1 Stance Bundles:
According to Biber (2006), stance bundles express personal feelings, attitudes, perspective,
certainty, uncertainty etc. Stance bundles can be divided into two sub-groups: epistemic stance
bundles and attitudinal/modality stance bundles. Epistemic stance bundles are those expressions
that reveal information about certainty (impersonal) and uncertainty (personal). The lexical
bundles in the TSRAC that show impersonal epistemic stance bundles are expressions such as of
the fact that and to the fact that as shown in the following example:
32
Although the researcher is aware of the fact that the universities involve various levels, this
study only deals with the perceptions of the faculty members. (Edu.)
Another bias is related to the fact that a large part of the available evidence pertains to state
intervention in the economy of the capital city, which should not be construed as evidence
of conditions elsewhere in the empire. (Hist.)
The second sub-category of stance bundles is attitudinal/ modality stance with four major
further sub-categories: desire, obligation/directive, intention/prediction, and ability. As the
names suggest, these lexical bundles express personal attitudes. Examples of these bundles can
be found in the following excerpts from the TSRAC:
Furthermore our results on the difference between single and married women clearly
indicate the importance of the gender based division of labor in the household, indicated by
the slower and weaker response of married women to the macroeconomic changes. (Econ.)
She was one of the most important names in mobilizing the women's vote for the party in
the March 1994 local elections, which brought the party to power in major municipalities
including Ankara and Istanbul. (Soc.)
By tracing how these books have been actively appropriated and filtered through the
conceptual grid of prevailing controversies and ongoing events in the national arena, I hope
to be able to say something about the changing contours of the discipline. (Soc.)
33
4.3.2 Discourse Organizers:
The lexical bundles in this group either introduce a topic or elaborate/clarify the topic
introduced. The majority of the lexical bundles found were used for elaboration and clarification
purposes as shown in the examples below:
On the other hand, inflation adjustments made after January 1, 2004 will affect the tax
calculation (Pricewaterhousecoopers, 2004b). (Econ.)
Thus, in addition to the effects of demographical and organizational characteristics, the
effects of the variations in cultural orientations were tested by using rigorous analysis
techniques. (Soc.)
4.3.3 Referential Expressions:
As the last broad group, referential expressions play an important role in the identification
of functions of lexical bundles. As Biber et al. (2004) state, the bundles in this category
“generally identify an entity or single out some particular attribute of an entity as especially
important” (p.393).
This group has four sub-categories; identification/focus, imprecision, specification of
attributes, and time/place/text reference. Similar to what Cortes (2008) did in her classification, a
further sub-group called institution was added to the referential category. These lexical bundles
referring to institutions are expressions such as the Ministry of Education*, of the Ministry of*, of
the Turkish Republic*, Ministry of National Education*, by the Ministry of*, at the university of.
In the overall analysis of lexical bundles, it was found that, except for the imprecision
category, every type of referential bundles occurred in the TSRAC. Moreover, when the lexical
34
bundles found were further analyzed a new category named “research referential” had to be
added to the taxonomy used. This new classification included a large number of lexical bundles
found in the TSRA corpus. In his study, Hyland (2008) introduced a research-oriented category
which he explained as “helping writers to structure their activities and experiences of the real
world” (p. 13). In this group, he included bundles such as at the beginning of, the role of the, the
size of the, in the present study etc. However, when compared to the research-referential bundles
in the TSRAC, his categorization was found very general and none of the bundles found in the
TSRAC except for purpose of this study had been identified and included in Hyland‟s group of
research-oriented bundles. Unlike the bundles mentioned by Hyland, the bundles identified as
research referential in the TSRAC refer specifically to the study itself and provide information
about the purpose, procedure, results, or participants of the study as shown in Table 4.1.
Furthermore, research referential lexical bundles are also different from text deixis in that lexical
bundles in that text deixis refer to the paper (article or report) that presents the study and not to
the investigation. However, when the lexical bundles in the research referential group are
analyzed, it was found that these lexical bundles refer to more general features of the study rather
than referring to the text itself. As seen in the examples were included in the, participate in the
study, to participate in the, these research referential bundles do not refer to the text but to the
study, to the actions needed to conduct the study or to describe the participants involved in the
study, as shown in the following examples.
Thus, 223 teacher educators and 2,116 prospective teachers were selected from these
schools in May 2005 and invited to participate in the study by completing the
35
questionnaire. Follow-up questionnaires were sent in June and July 2005 to those who did
not respond to the first query. (Edu.)
Supervisors from eight different cities were included in the survey data. (Edu.)
In this study the respondents completed the same instrument again after four weeks. (Med.)
The aim of this study was to evaluate current use of surgical antibiotic prophylaxis in
Turkish hospitals and to identify factors associated with appropriate prophylaxis. (Med.)
Although some of the patients were not available in the third stage, dissociative disorder
NOS or dissociative identity disorder was confirmed in all of the patients who were
admitted for an evaluation by the study clinician. (Psyc.)
It should also be noted that with the only exception of the lexical bundle the results of the,
none of these fifteen research- referential lexical bundles had been identified before in the
literature.
To sum up, Turkish authors used lexical bundles frequently. While some of these lexical
bundles had been previously identified in the literature, more than half of them had not been
identified as frequent lexical bundles in the literature. Even though Turkish scholars used lexical
bundles that were not frequently used by native speakers of English in their written productions,
their writing was successful because the articles used in the TSRAC were all published articles
from well-known journals in each of the disciplines included in the present study. This could be
36
an indicator of stylistic variation in the use of lexical bundles between native and non-native
speakers of English writing for scholarly publication. It can be concluded that there is variation
in the use of lexical bundles between this specific group of non-native speakers of English and
native speakers of English in academic setting.
37
CHAPTER 5. CONCLUSION
The main purpose of this study was to explore the use of four-word lexical bundles in the
research articles written by Turkish scholars in English. After the compilation of the corpus, the
goal was to further analyze the lexical bundles found in comparison with the bundles previously
identified in the related literature. Both structural and functional analyses were completed in
order to highlight any similarities and differences. This chapter will present the summary of the
results by answering the research questions previously posed, discuss the limitations of the study,
and provide implications and suggestions for further study.
5.1 Summary of Results
The first research question posed referred to the most common four-word lexical bundles
found in the published research articles written by Turkish scholars. According to the findings of
the frequency analysis, it was found that overall Turkish scholars used ninety-nine frequent
lexical bundles in research articles. (See appendix B for the complete list).
The second research question asked how many of the lexical bundles in TSRAC agree with
those bundles identified by Biber et al. (2004) and Cortes (2004, 2008). First of all, it was found
that more than half of the lexical bundles found in this study had not been identified before in the
related literature. It was recorded that the most frequent lexical bundles that were used more than
fifty times per one-million word in the TSRAC agreed with the lexical bundles in the literature.
However, when the lexical bundles were compared to the lexical bundles that were found by
Biber et al. (2004), some of the frequently used bundles did not occur in the TSRAC. Examples
of these bundles that are not found in this study are for example, in the absence of, the extent to
which, in the presence of, and per cent of the. In addition, 53 of the total 99 lexical bundles
identified in the TSRAC had never been identified before in the related studies of lexical
38
bundles. Examples of these bundles are in accordance with the, it was determined that, and
during the course of.
The last research question aimed to explore the structural and functional features of the
lexical bundles found in this study based on the previous structural and functional taxonomies
developed by Biber et al. (1999, 2004). It was found that there is a high level of agreement
among the structural types of lexical bundles defined previously and those found in this study.
All the lexical bundles found in this corpus fit into the structural categorizations previously
defined. However, with regards to functional analysis, some modifications were needed. At the
end of the functional analysis, a new group of lexical bundles that did not fit into to the
previously defined groups were found and a new group called research referential bundles was
created. Interestingly, these lexical bundles had not been identified before in any of the three
studies by Biber et al. (2004) and Cortes (2004, 2008) with the exception of only one expression.
A possible reason for this discrepancy could be that the scholars in Turkey have been told
to use expressions that emphasize the study itself while writing their research articles. Therefore,
it could be beneficial to do a content analysis of academic writing classes in Turkey to see if
there is a focus on fixed expressions used for research purposes in academic prose.
5.2 Limitations
The results of this study need to be treated with some caution since the TSRAC consisted
of only six academic disciplines: they cannot be generalized to all the disciplines. Moreover,
since the structural and functional analyses of lexical bundles were qualitatively conducted by
hand, it is likely that there might be some possible inconsistencies. It is necessary to point out
that some of the disciplines represented in this corpus had not been investigated before in the
39
study of lexical bundles (e.g. medicine, economics). This could have been the reason that
originated the group of bundles that had never been identified in the literature as disciplinarity
provides these frequent expressions with a high degree of specificity, making them strongly
discipline bound.
5.3 Implications
From a pedagogical point of view, the findings of this study could be beneficial in
designing more effective materials for academic writing purposes. Even though the use of
TSRAC exclusive bundles produced successful writing that lead to publication, it is still
important to raise awareness on how often and for which specific purposes lexical bundles are
used in academic writing. As the findings suggest, lexical bundles constitute an important part of
academic prose and this should be highlighted especially by writing teachers.
5.4 Suggestions for Further Research
This study has contributed to the existing knowledge of lexical bundles; however, further
studies are needed on the use of lexical bundles especially in international settings. As a further
analysis, it would be interesting to compare each bundle found in the TSRAC to the bundles
identified before in the literature to see if the Turkish authors used the same lexical bundles in
the same way, with the same purpose and function. Additionally, it would be beneficial to survey
Turkish scholars to find out if they are aware of the use of lexical bundles and their significance
in academic writing. Moreover, it would be useful to investigate the materials used in the
teaching of academic writing in English to see if these frequently used lexical bundles which had
been identified before but do not occur in this corpus exist in these academic writing sources for
40
this particular setting. The same materials should also be investigated to obtain information on
the origin of these lexical bundles which are frequently used by Turkish scholars. For this reason,
a corpus-based study of lexical bundles found in the academic writing books available to these
scholars could be a starting point.
Finally, a study on lexical bundles used in research articles written in Turkish could be
conducted to compare with the lexical bundles used in English by the same authors. This
comparison could help to identify if there is L1 transfer in lexical bundle use.
41
REFERENCES
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent word
combinations. In A.Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 99–
122). Oxford: OUP.
Anthony, L. (2004). AntConc: A Learner and Classroom Friendly, Multi-Platform Corpus
Analysis Toolkit
Anthony, L. (2007). Antconc 3.2.1w: Freeware corpus analysis toolkit. [on-line]. Retrieved from:
http://www.antlab.sci.waseda.ac.jp/
Banerjee, S. & Pedersen, T. (2003). Extended gloss overlap as a measure of semantic
relatedness. In Proc. of IJCAI-03, pp. 805–810.
Belcher, D. D. (2007). Seeking acceptance in an English-only research world. Journal of Second
Language Writing, 16, 1–22.
Biber, D. (1996). Investigating language use through corpus-based analyses of association
patterns. International Journal of Corpus Linguistics, 1, 171-197.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman Grammar
of Spoken and Written English. London: Longman.
Biber, D., & Conrad, S. (1999). Lexical Bundles in Conversations and Academic Prose. In H.
Hasselgard & S. Oksefjell (Eds.), Out of corpora: studies in honour of Stig Johansson (pp.
181–190). Amsterdam: Rodopi.
Biber, D., Conrad, S., & Cortes, V. (2003). Lexical bundles in speech and writing: an initial
taxonomy. In A. Wilson, P. Rayson & T. McEnery (Eds.), Corpus linguistics by the Lune:
a festschrift for Geoffrey Leech (pp. 71–93). Frankfurt: Peter Lang.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at ...: Lexical bundles in university
teaching and textbooks. Applied Linguistics, 25, 371–405.
42
Biber, D. (2006). University language: A corpus-based study of spoken and written registers.
Amsterdam: Benjamin.
Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers.
English for Specific Purposes, 26, 263–286.
Butler, C. (1997). Repeated word combinations in spoken and written text: Some implications
for functional grammar. In C. Butler, J. Connolly, R. Gatward, & M. Wismans (Eds.), A
fund of Ideas: Recent development in functional grammar (pp. 60–77). Amsterdam:
Institute for Functional Research into Language and Language Use.
Charles, M. (2003). „This mystery. . .‟: A corpus-based study of the use of nouns to construct
stance in theses from two contrasting disciplines. Journal of English for Academic
Purposes, 2, 313–326.
Cortes, V. (2002). Lexical bundles in Freshman composition. In R. Reppen, S. M. Fitzmaurice &
D. Biber (Eds.), Using corpora to explore linguistic variation (pp. 131–145). Amsterdam:
John Benjamins Publishing Company.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from
history and biology. English for Specific Purposes, 23, 397–423.
Cortes, V. (2006). Teaching lexical bundles in the disciplines: An example from a writing
intensive history class. Linguistics and Education, 17, 391-406.
Cortes, V. (2008). A comparative analysis of lexical bundles in academic history writing in
English and Spanish. Corpora, 3, 43-57.
Conrad, S. (1996). Investigating academic texts with corpus-based techniques: An example
from biology. Linguistics and Education, 8, 299-326.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213–238.
43
Crompton, P. (1997). Hedging in academic writing: Some theoretical problems. English for
Specific Purposes, 16, 271–287.
De Cock, S. (1998). A recurrent word combination approach to the study of formulae in the
speech of native and non-native speakers of English. International Journal of Corpus
Linguistics, 3, 59–80.
Eaton, H. (1940). An English - French - German - Spanish Word Frequency Dictionary. New
York, NY: Dover Publications.
Ferguson, G. (2001). If you pop over there: A corpus-based study of conditionals in medical
discourse. English for Specific Purposes, 20, 61–82.
Flowerdew, J. (Ed.). (2002). Academic Discourse. New York, NY: Longman.
Fries, C. & Traver, A. (1940). English word lists: a study of their adaptability and instruction.
Washington, DC: American Council of Education.
Ghadessy, M. (1995). Thematic development and its relationship to registers and genres. In M.
Ghadessy (Ed.), Thematic development in English texts (pp. 105–128). London: Pinter.
Grabe, W. & Kaplan, R. B. (1997). On the writing of science and the science of writing: Hedging
in science text and elsewhere. In R. Markkanen & H. Schroder (Eds.), Hedging and
Discourse: Approaches to the Analysis of a Pragmatic Phenomenon in Academic Texts
(pp.151–167). Berlin: Walter de Gruyter & Co.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae.
In A. Cowie (Ed.), Phraseology: Theory, analysis, and applications (pp. 145–160).
Oxford: Oxford University Press.
Granger, S., & Meunier, F. (Eds.). (2008). Phraseology: An interdisciplinary perspective.
Amsterdam: John Benjamins.
44
Halliday, M. A. K. (1993a). The construction of knowledge and value in the grammar of
scientific discourse: Charles Darwin‟s The origin of the species. In M. A. K. Halliday, & J.
R. Martin (Eds.), Writing science. Literacy and discursive power (pp. 86–107). London:
The Falmer Press.
Halliday, M. A. K. (1993b). On the language of physical science. In M. A. K. Halliday, & J. R.
Martin (Eds.), Writing science. Literacy and discursive power (pp. 54–68). London: The
Falmer Press.
Hewings, M. (Ed.). (2001). Academic Writing in Context: Implications and Applications.
Birmingham: The University of Birmingham Press.
Holmes, J. (1986). Doubt and certainty in ESL textbooks. Applied Linguistics, 9, 21–43.
Hunston, S. (1995). A corpus study of some English verbs of attribution. Functions of Language,
2, 133–158.
Hyland, K. (1994). Hedging in academic writing and EAP textbooks. English for Specific
Purposes, 13, 239–256.
Hyland, K. (1996a). Talking to the academy: Forms of hedging in science research articles.
Written Communication, 13, 251–281.
Hyland, K. (1996b). Writing without conviction? Hedging in science research articles. Applied
Linguistics, 17, 433–454.
Hyland, K (1998). Hedging in Scientific Research Articles. Amsterdam/Philadelphia: John
Benjamins Publishing Company.
Hyland, K. (2003). Second language writing. Cambridge: Cambridge University Press.
Hyland, K. (2008a). As can be seen: Lexical bundles and disciplinary variation. English for
Specific Purposes, 27, 4-21.
45
Hyland, K. (2008b). Academic clusters: text patterning in published and postgraduate writing.
International Journal of Applied Linguistics, 18, 41-62.
Kading, J. (1879). Häufigkeitswörterbuch der deutschen Sprache. Steglitz: privately published.
Kim, Y. (2009). Korean lexical bundles in conversation and academic texts. Corpora, 4, 135-
165.
Martinez, A. I. (2003). Aspects of theme in the method and discussion sections of biology
journal articles in English. Journal of English for Academic Purposes, 2, 103-123.
McEnery, T., & Wilson, A. (1996). Corpus linguistics. Edinburgh: Edinburgh Textbooks in
Applied Linguistics.
Myers, G. (1989). The pragmatics of politeness in scientific articles. Applied Linguistics, 10, 1–
35.
Myers, G. (1990). Writing biology: Texts in the Social Construction of Scientific Knowledge.
Madison, WI: University of Wisconsin Press.
Meyer, P. G. (1997). Hedging strategies in written academic discourse: Strengthening the
argument by weakening the claim. In R. Markkanen & H. Schroder (Eds.), Hedging and
Discourse: Approaches to the Analysis of a Pragmatic Phenomenon in Academic Texts
(pp. 21–41). Berlin: Walter de Gruyter & Co.
Moon, R. (1998). Fixed Expressions and Idioms in English. Oxford: Oxford University Press.
Nation, I. S. P. (1990). Teaching and Learning Vocabulary. New York, NY: Newbury House.
Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge
University Press.
Nattinger, J. R., & De Carrico, J. S. (1992). Lexical phrases and language teaching. Oxford:
Oxford University Press.
46
Neely, E., & Cortes, V. (2009). A little bit about: analyzing and teaching lexical bundles in
academic lectures. Language Value, 1, 17–38. Retrieved from <http://www.e-
revistes.uji.es/languagevalue>.
Nesi, H., & Basturkmen, H. (2006). Lexical bundles and discourse signaling in academic
lectures. International Journal of Corpus Linguistics, 11, 283-304.
Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: native like selection and
native like fluency. In J. Richards & R. Schmidt (Eds.), Language and communication (pp.
191-226). London: Longman.
Preyer, W. (1889). The Mind of a Child. New York, NY: Appleton.
Rica-Peromingo, J. P. (2009). The use of lexical bundles in the written production of Spanish
EFL university students. Applied Linguistics for Specialized Discourse. Conference
Proceedings. (pp 1–7). Riga: University of Latvia Publishing.
Salager–Meyer, F. (1992). A text-type and move analysis study of verb tense and modality
distribution in medical English abstracts. English for Specific Purposes, 11, 93-113.
Salager-Meyer, F. (1994). Hedges and textual communicative function in medical English
written discourse. English for Specific Purposes, 13, 149–170.
Schmitt, N. & McCarthy, M. (Eds.). (1997). Vocabulary: Description, Acquisition and
Pedagogy. Cambridge: Cambridge University Press.
Schmitt, N., Grandage, S., & Adolphs, S. (2004). Are corpus-derived clusters
psycholinguistically valid? In N. Schmitt (Ed.), Formulaic sequences (pp. 127–151).
Amsterdam: Benjamins.
47
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action - an introduction. In
N. Schmitt (Ed.), Formulaic sequences acquisition, processing, and use (pp. 1- 22).
Amsterdam; Philadelphia: John Benjamins Pub.
Scott, M., & Tribble, C. (Eds.). (2006). Textual Patterns: Key Words and Corpus Analysis in
Language Education. Amsterdam and Philadelphia: John Benjamins B.V.
Silver, M. (2003). The stance of stance: A critical look at ways stance is expressed and modeled
in academic discourse. Journal of English for Academic Purposes, 2, 359–374.
Simpson, R. (2004). Stylistic features of academic speech: The role of formulaic expressions. In
T. Upton and U. Connor (Eds.), Discourse in the professions: Perspectives from corpus
linguistics (pp.37-64). Amsterdam: John Benjamins.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Stubbs, M. (2007a). An example of frequent English phraseology: Distribution, structures and
functions. In R. Facchinetti (Ed.), Corpus Linguistics 25 years on (pp. 89–105).
Amsterdam: Radopi.
Stubbs, M. (2007b). Quantitative data on multi-word sequences in English: The case of word
„world‟. In M. Hoey, M. Mahlberg, M. Stubbs & W. Teubert (Eds.), Text, Discourse and
Corpora: Theory and Analysis (pp. 163–189). London: Continuum.
Vande Kopple, W. J. (1992). Noun phrases and the style of scientific discourse. In S.P. Witte, N.
Nakadate & R. D. Cherry (Eds.), A rhetoric of doing: Essays on written discourse in honor
of James L. Kinneavy (pp. 328-348). Carbondale, IL: Southern Illinois University Press.
Varttala, T. (2003). Hedging in scientific research articles: A cross-disciplinary study. In G.
Cortese & P. Riley (Eds.), Domain-Specific English: Textual Practices across
Communities and Classrooms (pp. 141–174). New York, NY: Peter Lang.
48
Wray, A. (2000). Formulaic sequences in second language teaching: principle and practice.
Applied Linguistics, 21, 463–489.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Zamel, V (1998). Questioning Academic Discourse. In V. Zamel and R. Spack (Eds.),
Negotiating academic literacies: Teaching and learning across languages and cultures
(pp. 187-197). Mahwah, NJ: Erlbaum.
49
APPENDIX A: Journals Used in the TSRAC
Economics Journals
1. Critical Perspectives on Accounting
2. Disasters
3. Eastern European Economics
4. Economic Development and Cultural Change
5. Economic Modeling
6. Energy Economics
7. European Economic Review
8. International Journal of Urban and Regional Research
9. International Research Journal of Finance and Economics
10. Journal of Productivity Analysis
11. Journal of Asian Economics
12. Journal of Business & Economic Statistics
13. Physica
14. Public Choice
15. Review of International Political Economy
16. Russian and East European Finance and Trade
17. Small Business Economics
18. The Canadian Journal of Economics
19. Water Resources Development
20. World Development
50
Education Journals
1. Asia-Pacific Journal of Teacher Education
2. Education
3. Education Media International
4. Educational Studies in Mathematics
5. Educational Technology & Society
6. Educational Technology & Society
7. Environmental Education Research
8. European Journal of Education
9. Higher Education
10. International Research in Geographical and Environmental Education
11. International Review of Education
12. Internet and Higher Education
13. Journal of Adolescent & Adult Literacy
14. Journal of Documentation
15. Journal of Education for Teaching
16. Journal of Instructional Psychology
17. Models of Teacher Education
18. Religious Education
19. Review of Education
20. The Journal of Educational Research
51
Psychology Journals
1. Addictive Behaviors
2. Adolescence
3. Applied Developmental Psychology
4. Archives of Psychiatric Nursing
5. Child Abuse & Neglect
6. Children and Youth Services Review
7. Comprehensive Psychiatry
8. Eating Behaviors
9. European Neuropsycopharmocology
10. Issues in Mental Health Nursing
11. Journal of Applied Developmental Psychology
12. Journal of Clinical Forensic Medicine
13. Journal of Criminal Justice
14. Journal of Environmental Psychology
15. Journal of Loss and Trauma
16. Journal of Psychiatric and Mental Health Nursing
17. Journal of Psychiatric Research
18. Journal of Psychology
19. Journal of Psychosomatic Research
20. Learning and Individual Differences
21. Psychiatry and Clinical Neurosciences
22. Psychiatry Research
23. Soc. Psychiatry Epidemiology
24. Social Behavior and Personality
25. Social Science and Medicine
26. Technological Forecasting & Social Change
27. The Journal of Experimental Education
28. The Social Science Journal
52
Medicine Journals
1. Applied Developmental Psychology
2. Applied Nursing Research
3. Clinical Infectious Diseases
4. Culture, Health & Sexuality
5. European Journal of Epidemiology
6. European Journal of Oncology Nursing
7. Infection Control and Hospital Epidemiology
8. International Journal of Nursing Studies
9. Journal of Clinical Forensic Medicine
10. Journal of Midwifery & Women‟s Health
11. Journal of Professional Nursing
12. Journal of the Association of Nurses in AIDS Care
13. Nurse Education in Practice
14. Nurse Education Today
15. Pediatrics International
16. Quality of Life Research
17. Reproductive Health Matters
18. Safety Science
19. Social Science & Medicine
20. Social Indicators Research
21. Technological Forecasting & Social Change
22. The European Journal of Health Economics
23. The Journal of Infectious Diseases
24. Tobacco Control
53
History Journals
1. International Journal of Middle East Studies
2. Journal of Contemporary History
3. Journal of Interdisciplinary History
4. Journal of Social History
5. Journal of the Economic and Social History of the Orient
6. Law & Society Review
7. Middle Eastern Studies
8. The International History Review
9. The Journal of Economic History
Sociology Journals
1. Comparative Politics
2. Comparative Studies in Society and History
3. Contemporary Sociology
4. Ethnology
5. European Journal of Population
6. Fashion Theory
7. Feminist Studies
8. Human Studies
9. International Labor and Working-Class History
10. Journal of Black Studies
11. Journal of Law, Economics, & Organization
12. Journal of Medical Ethics
13. Law & Society Review
14. Middle East Journal
15. Middle Eastern Studies
16. Political Psychology
17. Social Indicators Research
18. Women's Studies Quarterly
54
APPENDIX B: TSRAC Lexical Bundles
Frequency TSRAC Lexical Bundles
44 a result of the*
36 according to the results*
30 aim of this study*
30 an important role in
23 an increase in the
23 are more likely to
29 are presented in Table*
64 as a result of
34 as well as in*
88 as well as the
22 at the beginning of
63 at the end of
21 at the level of*
57 at the same time
31 at the time of
20 at the University of
23 by the Ministry of*
21 during the course of*
25 for the first time
20 for the purpose of*
24 in a study by*
49 in accordance with the *
28 in addition to the
27 in line with the*
59 in terms of the
27 in terms of their*
72 in the case of
25 in the city of*
33 in the context of
28 in the early s*
55
21 in the face of
21 in the field of*
24 in the form of
22 in the late s
22 in the number of
37 in the Ottoman Empire*
41 in the present study
23 in the #s and*
27 in this study the*
52 is one of the
24 it is necessary to
21 it is possible to
27 it was determined that*
27 it was found that*
25 Ministry of National Education*
26 of the fact that*
26 of the Ministry of*
35 of the most important*
48 of the Ottoman Empire*
21 of the patients were*
24 of the present study*
26 of the Turkish Republic*
27 of this study is*
43 of this study was*
60 on the basis of
24 on the one hand
151 on the other hand
29 on the part of
67 one of the most
20 participate in the study*
31 purpose of this study*
29 results of this study*
56
23 than half of the*
30 that there is a
20 that there was a*
34 the aim of this*
32 the basis of the*
34 the beginning of the
30 the case of the*
21 the characteristics of the*
26 the course of the
107 the end of the
22 the establishment of the*
53 the fact that the
20 the importance of the*
40 the majority of the
30 the Ministry of Education*
27 the nature of the
30 The purpose of this*
21 the relationship between the
37 the rest of the
35 the results of the
32 the role of the*
25 the s and s
23 the second half of*
23 the size of the
31 the total number of
23 the ways in which
40 this study was to*
26 to be able to
36 to participate in the*
27 to the fact that*
36 to the results of*
24 Turkish version of the*
57
54 was found to be
34 were found to be
21 were included in the*
22 with respect to the
20 with the help of*
Top Related