Vocabulary analysis: A corpus based study of “analyze ...€¦ · Vocabulary analysis: A corpus...

19
SEBASTIAN DUNAT Akademia Techniczno-Humanistyczna w Bielsku-Białej Vocabulary analysis: A corpus based study of “analyze” clusters and collocates in academic and spoken discourse Key words: corpus linguistics, academic discourse, spoken discourse, clusters, collocates, quantitative statistical surveys Słowa klucze: językoznawstwo korpusowe, dyskurs akademicki, dyskurs mówiony, zbitki wy- razowe, kolokaty, ilościowe badania statystyczne Introduction Corpora are analyzed in various sort of ways to uncover the linguistic information: the frequency of use of certain keywords or collocations with the use of corpus search engine, if it is available online, or specific programs used for this purpose. As Jane Sunderland states: A corpus is a representative, substantial body of semantically collected and recorded data, spoken or written, which is normally electronically stored as text on a PC. 1 A corpus might be labeled with, not only syntactical or lexical features, but also speaker or text features. There are various corpora available online which constitute a great library of examples, and its data can be used for analysis with the help of any linguistic tools; some of the corpora provide a search engine for easier data acquire. Corpus linguistics obtain and analyze a large quantities of data and tries to provide an- swers to researched questions which may concern: words or grammatical structures, the fre- quency of their use, how they link with other words or structures, and their range of possible meanings. 2 According to Biber, Conrad and Reppen 3 corpus based analysis characteristics are: 1 Sunderland, J., Language and Gender: An advanced resource book. London: Routledge, 2006, p. 56. 2 Ibid. 3 Biber D., Conrad S., Reppen R., Corpus Linguistics: Investigating language structure and use. Cambridge: CUP, 1998, p.4.

Transcript of Vocabulary analysis: A corpus based study of “analyze ...€¦ · Vocabulary analysis: A corpus...

  • SEBASTIAN DUNAT

    Akademia Techniczno-Humanistyczna w Bielsku-Białej

    Vocabulary analysis: A corpus based study of “analyze” clusters

    and collocates in academic and spoken discourse

    Key words: corpus linguistics, academic discourse, spoken discourse, clusters, collocates,

    quantitative statistical surveys

    Słowa klucze: językoznawstwo korpusowe, dyskurs akademicki, dyskurs mówiony, zbitki wy-

    razowe, kolokaty, ilościowe badania statystyczne

    Introduction

    Corpora are analyzed in various sort of ways to uncover the linguistic information: the

    frequency of use of certain keywords or collocations with the use of corpus search engine, if it

    is available online, or specific programs used for this purpose. As Jane Sunderland states:

    A corpus is a representative, substantial body of semantically collected and recorded data,

    spoken or written, which is normally electronically stored as text on a PC.1 A corpus might be

    labeled with, not only syntactical or lexical features, but also speaker or text features. There

    are various corpora available online which constitute a great library of examples, and its data

    can be used for analysis with the help of any linguistic tools; some of the corpora provide

    a search engine for easier data acquire.

    Corpus linguistics obtain and analyze a large quantities of data and tries to provide an-

    swers to researched questions which may concern: words or grammatical structures, the fre-

    quency of their use, how they link with other words or structures, and their range of possible

    meanings.2 According to Biber, Conrad and Reppen3 corpus based analysis characteristics are:

    1 Sunderland, J., Language and Gender: An advanced resource book. London: Routledge, 2006, p. 56. 2 Ibid. 3 Biber D., Conrad S., Reppen R., Corpus Linguistics: Investigating language structure and use. Cambridge:

    CUP, 1998, p.4.

  • 44 Sebastian Dunat

    ▪ Empirical analysis of patterns of language use in natural texts

    ▪ Development of corpus or large, natural texts collection

    ▪ Computer use for wide-range of analysis techniques (ex. automatic or interactive)

    ▪ Reliance to quantitative and qualitative analysis

    Method

    Computational linguistics practical tasks

    Development of humans and computers/machines communication in all areas of linguis-

    tic analysis influenced Hausser4 to present several practical tasks of computational linguistics,

    although, the list is not complete and open to discussion.

    ▪ Indexing and retrieval in textual databases

    ▪ Machine translation

    ▪ Automatic text production

    ▪ Automatic text checking

    ▪ Automatic content analysis

    ▪ Automatic tutoring

    ▪ Automatic dialog and information systems

    First of the practical tasks, textual databases constitute of various kind of electronically

    stored data (texts, sentences, word frequencies). The easiness of access makes the databases

    a great tool for researchers interested in any type of texts or passages relevant for their speci-

    fic analysis. The biggest freely available database is the Worlds Wide Web but its unstruc-

    tured form might pose some difficulties in obtaining the precise data.5 Second, machine trans-

    lation has the remarkable potential of making research easier with the automatic or semi-

    automatic translation of research articles around the world. Third, precise linguistic know-

    ledge might influence and improve the automatic text production and help to create various

    forms of highly flexible and interactive systems.6 Its use might apply to modification of

    maintenance manuals for new lines of products or products descriptions. Automatic text

    checking, the fourth of the tasks, serves in a variety of computer applications for example

    simple word spelling auto-correction. Moreover, there are word form recognition programs or

    4 Hausser, R., Computational linguistics: Human-Computer Communication in Natural Language (3rd Ed.).

    Springer, 2014, p.30. 5 Ibid. 6 Ibid.

  • 45 Vocabulary analysis: A corpus based…

    syntax error checking applications based on syntactic parsers.7 The fifth practical task, auto-

    matic content analysis, may provide summaries of literature, even in specialized fields, such

    as: science or economics. Automatic content analysis is a precondition for concept-base in-

    dexing, needed for accurate retrieval from textual databases, as well as for adequate machine

    translation.8 Sixth of the above mentioned tasks, automatic tutoring, can provide interactive,

    online systems for foreign language teaching and practice. Furthermore, such systems could

    provide data on students errors or amount of time needed for completing various exercises.9

    Automatic tutoring opens a new field of research in which textbooks are replaced by electro-

    nic medium to assist in teaching and learning.10 Last of the tasks uses the information systems

    to provide automatic information services, for example, bus and train schedules, tax consult-

    ing, or medical databases11 (Ibid).

    Discourse definition

    There are two definitions of Discourse according to Mary Bucholtz. First one, formal de-

    finition, derives from linguistic units organization, similar to morphology and syntax defini-

    tions, it is the linguistic level in which sentences are combined into larger units.12 On the

    other hand, the alternative definition focuses on discourse as language used in context: lan-

    guage as it is put to use in social situations, not the more idealized and abstracted linguistics

    forms that are the central concern of much linguistic theory.13

    On the basis of previous definitions, the discourse analysis is defined by Bucholtz to be:

    a collection of perspectives on situated language use that involve a general shared theoretical

    orientation and a broadly methodological approach.14

    Present Study

    The aim of this study was to check the variety of collocates and clusters used with the

    analyze noun, in two discourses: academic and spoken, between the years 2010 and 2015. The

    material used for the research comes from the Corpus of Contemporary American English

    (COCA) available online. Three hundred examples in total; 150 examples for each of the

    7 Ibid. 8 Ibid. p.31. 9 Ibid. 10 Ibid. 11 Ibid. 12 Bucholtz, M., “Theories of Discourse as Theories of Gender: Discourse Analysis in Language and Gender

    Studies.” The handbook of Language and Gender. Blackwell Publishing Ltd., 2003, p. 43. 13 Ibid 14 Ibid., p. 45.

  • 46 Sebastian Dunat

    fields, categorized by the corpus as examples selected from spoken discourse and academic

    discourse. The study described the data of the corpus, and classified it in terms of the follow-

    ing, selected categories:

    ▪ Year

    ▪ 2-word cluster

    ▪ 3-word cluster

    ▪ Collocates

    The researcher would like to check whether:

    ▪ The difference in the quantitative use of the studied verb collocates will be evident in

    the researched discourses, with a significant advantage of one of the collocate types

    (in the number of tokens used.)

    ▪ The variation in the use of tokens with a classified type of clusters will be evident for

    the presented years and fields.

    ▪ There will be a quantitative difference in the distribution of sentences containing the

    studied type of clusters or collocates in diachronic spectrum, with a predominance of

    one cluster/collocate type in at least one of the studied years.

    Additionally, statistical Pearson’s chi-squared test surveys are carried out to confirm the fo-

    llowing theses:

    ▪ One of the clustered parts of speech used with the tested verb will have a higher fre-

    quency of its use, in at least one of the researched fields.

    ▪ Distribution of specific collocates will be greater for at least one of the studied fields.

    Moreover, it will be supported by significant statistical survey (p-value less than 0.05).

    Research

    Corpus Description

    Present research divided the corpus into two discourse fields: spoken and academic.

    Additionally it divided the data diachronically into six categorized years, each field and cate-

    gory has the same number of examples (25). Graph 1 visualizes the distribution of the corpus

    data.

  • 47 Vocabulary analysis: A corpus based…

    Graph 1 Corpus data distribution

    The following tables present the corpus clusters and collocates acquired in Antconc soft-

    ware for the purpose of this research.

    Table 1 2-word clusters for spoken and academic discourse

    Rank

    Spoken Academic

    2010 2011 2010 2011

    Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster

    1 5 analyze the 10 analyze the 4 analyze the 11 analyze the

    2 2 analyze - critically 2 analyze these 3 analyze their 2 analyze a

    3 2 analyze history 2 analyze. ! rep 2 analyze data 2 analyze and

    4 2 analyze what 1 analyze all 1 analyze and 2 analyze menu

    5 2 analyze. (begin 1 analyze for 1 analyze arguments 2 analyze them

    6 1 analyze a 1 analyze old 1 analyze as 1 analyze duration

    7 1 analyze and 1 analyze president 1 analyze commonalities 1 analyze family

    8 1 analyze both 2 analyze that 1 analyze concrete 1 analyze pending

    9 1 analyze her 1 analyze today 1 analyze differences 1 analyze specific

    10 1 analyze his 1 analyze volcanic 1 analyze excavated 1 analyze through

    11 1 analyze it 1 analyze wedding 1 analyze how 1 analyze variation

    12 1 analyze seafood 1 analyze. ! bill 1 analyze human 1 analyze your

    13 1 analyze that 1 analyze. joy 1 analyze if 1 analyze. some

    14 1 analyze their 1 analyze its

    15 1 analyze these 1 analyze public

    16 1 analyze this 1 analyze such

    17 1 analyze those 1 analyze validly

    18 1 analyze with 1 analyze why

    19 1 analyze, fox 1 analyze, diagram

    Rank

    2012 2013 2012 2013

    Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster

    1 10 analyze the 7 analyze it 8 analyze the 5 analyze the

    2 3 analyze it 4 analyze the 3 analyze their 2 analyze how

  • 48 Sebastian Dunat

    3 1 analyze and 2 analyze and 3 analyze and 1 analyze a

    4 1 analyze attorney 2 analyze that 2 analyze data 1 analyze and

    5 1 analyze brains 2 analyze them 1 analyze bigger 1 analyze as

    6 1 analyze by 2 analyze what 1 analyze complete 1 analyze canadian

    7 1 analyze from 1 analyze - dee 1 analyze image 1 analyze each

    8 1 analyze his 1 analyze details 1 analyze metrics 1 analyze factors

    9 1 analyze what 1 analyze for 1 analyze moderators 1 analyze hiv

    10 1 analyze why 1 analyze how 1 analyze my 1 analyze laboratory

    11 1 analyze yourself 1 analyze their 1 analyze operations 1 analyze meaningful

    12 1 analyze, attorneys 1 analyze your 1 analyze postcard 1 analyze multiple

    13 1 analyze, we 1 analyze student 1 analyze performance

    14 1 analyze. pinsky 1 analyze specific

    15 1 analyze text

    16 1 analyze their

    17 1 analyze them

    18 1 analyze what

    19 1 analyze, explain

    20 1 analyze. beyond

    Rank

    2014 2015 2014 2015

    Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster

    1 6 analyze the 7 analyze the 10 analyze the 8 analyze the

    2 2 analyze, fox 2 analyze it 1 analyze a 2 analyze data

    3 1 analyze (ph 2 analyze this 1 analyze and 2 analyze teachers

    4 1 analyze all 2 analyze what 1 analyze data 2 analyze and

    5 1 analyze both 1 analyze -- rachel 1 analyze each 1 analyze classroom

    6 1 analyze doha 1 analyze all 1 analyze hi 1 analyze content

    7 1 analyze gregory 1 analyze anything 1 analyze how 1 analyze film

    8 1 analyze him 1 analyze exactly 1 analyze possible 1 analyze hypotheses

    9 1 analyze in 1 analyze his 1 analyze research 1 analyze information

    10 1 analyze it 1 analyze human 1 analyze risk 1 analyze relationships

    11 1 analyze that 1 analyze my 1 analyze students 1 analyze surveillance

    12 1 analyze this 1 analyze probabilities 1 analyze their 1 analyze their

    13 1 analyze to 1 analyze these 1 analyze treatment 1 analyze this

    14 1 analyze what 1 analyze where 1 analyze, figured 1 analyze unknown

    15 1 analyze when 1 analyze why 1 analyze, interpret 1 analyze, critically

    16 1 analyze where 1 analyze, experts 1 analyze; at

    17 1 analyze whether 1 analyze. (begin

    18 1 analyze, does 1 analyze. army

    19 1 analyze. caution

    Table 1 presents 2-word clusters for spoken and academic discourse in the years between

    2010 and 2015. All of the most frequent clusters connote analyze with definite article (the),

    except year 2013 in the spoken discourse, where analyze with pronoun it is the most frequent

    2-word cluster. It seems that indefinite article (a) does not occur in spoken discourse at all.

    Although, it is visible in academic discourse in the years 2011, 2013 and 2014. Personal pro-

    nouns his/her occur in spoken discourse 2010, 2012, 2015 and him occurs in 2014. Analyze it

  • 49 Vocabulary analysis: A corpus based…

    occurs in spoken discourse of 2010, 2012, 2013, 2014 and 2015 year. Moreover, the demon-

    stratives are common in 2-word clusters with analyze. They occur more frequently in spoken

    discourse. This is enlisted in academic discourse in the year 2015. All demonstratives are vi-

    sible in 2-word spoken discourse clusters in 2010. Years 2011, 2014, 2015 use two demon-

    stratives. Year 2012 has none, and in 2012 there is only that demonstrative used. Analyze and

    is visible on both lists spoken and academic, but it occurs more frequently on the academic

    list. Years 2011, 2014, 2015 of the spoken discourse do not use and in any of the 2-word clus-

    ters. Furthermore, pronouns used in the 2-word clusters in spoken discourse are: their in 2010,

    yourself and we in 2012, them, their and your in 2013, him in 2014, and my in 2015. While,

    for the academic discourse list, the following are more frequent: in 2010 (their, its), in 2011

    (them, your), in 2012 (their, my), in 2013 (their, them), in 2014 and in 2015 (their). Spoken

    discourse used wh-adverbs more often than the academic one. What, when, where, whether

    are used in 2014; what, where and why in 2015; what, and why in 2012; what in 2010 and

    2013. In academic discourse only what, and why are used, accordingly in 2013 and 2010. By,

    from, for, with, in and to are on the list of spoken discourse and none of them occur on the

    academic list. As, if, how and at are more frequent for academic discourse clusters, with the

    exception of how which occurs in the 2013 spoken list, no other is present on the spoken dis-

    course in the researched years. Analyze each occurs on the academic list (2013, 2014), ana-

    lyze both, on the other hand, on the spoken discourse list (2010, 2014). Analyze data (2010,

    2013, 2014, 2015) is visible only on academic list, analyze all (2011, 2015) can be seen only

    on spoken discourse list. Some, and such are present on the academic list, 2011 and 2010 re-

    spectively; they do not occur on the spoken discourse list.

    Table 2 3-word clusters for spoken and academic discourse

    Rank

    Spoken Academic

    2010 2011 2010 2011

    Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster

    1 2 analyze - critically analyze 2 analyze the week 1 analyze and solve 2 analyze menu) to

    2 2

    analyze history

    from 2 analyze the weeks 1 analyze arguments and 2 analyze the recorded

    3 2

    analyze. (begin-

    video 2 analyze these things 1 analyze as many 1 analyze a 2-stage

    4 1 analyze a political 1 analyze all this 1 analyze commonalities in 1 analyze a data

    5 1 analyze and dissect 1 analyze for me 1 analyze concrete historical 1 analyze and implement

    6 1 analyze both sides 1 analyze old data 1 analyze data regarding 1 analyze and organize

    7 1

    analyze her ap-

    pearances 1

    analyze president

    obama 1 analyze data. results 1

    analyze duration. some-

    what

    8 1 analyze his diet 1 analyze that speech 1 analyze differences between 1 analyze family home-lessness

    9 1 analyze it to 1

    analyze the contend-

    ers 1 analyze excavated artifacts 1 analyze pending tax

  • 50 Sebastian Dunat

    10 1

    analyze seafood

    samples 1 analyze the cost 1 analyze how housing 1 analyze specific gas

    11 1 analyze that car 1 analyze the entire 1 analyze human remains 1 analyze the case

    12 1 analyze the -- the 1 analyze the records 1 analyze if and 1 analyze the country

    13 1 analyze the moti-vation 1 analyze the situation 1 analyze its broader 1 analyze the magazines

    14 1

    analyze the over-

    night 1 analyze the very 1 analyze public spheres 1 analyze the music

    15 1 analyze the press 1 analyze today's 1 analyze such hybridity 1 analyze the narrative

    16 1 analyze the weeks 1 analyze volcanic moon 1 analyze the data 1 analyze the pairwise

    17 1

    analyze their

    expressions 1

    analyze wedding

    rituals 1 analyze the implications 1 analyze the planning

    18 1

    analyze these

    numbers 1 analyze. ! bill-maher 1 analyze the interaction 1 analyze the results

    19 1 analyze this flow 1 analyze. ! rep-nancy 1 analyze the transcript 1 analyze the value

    20 1

    analyze those

    reports 1 analyze. ! rep-peter 1 analyze their business 1 analyze them as

    21 1 analyze what the 1 analyze. joy-behar 1 analyze their long 1 analyze them to

    22 1 analyze what's 1 analyze. that would 1 analyze their own 1 analyze through perform

    23 1 analyze with karl 1 analyze validly. in 1 analyze variation among

    24 1 analyze, fox news 1 analyze why, after 1 analyze your motivation

    25 1 analyze, diagram, and 1 analyze. some brief

    Rank

    2012 2013 2012 2013

    Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster

    1 5 analyze the week 2 analyze it. so 1 analyze and improve 2 analyze the link

    2 1

    analyze and inves-

    tigate 1 analyze - dee dee 1 analyze and interpret 1 analyze a large

    3 1

    analyze attorney

    general 1 analyze and debate 1 analyze bigger and 1 analyze and evaluate

    4 1 analyze brains in 1 analyze and write 1 analyze complete genomes 1 analyze as well

    5 1 analyze by the 1 analyze details of 1 analyze data and 1 analyze canadian identity

    6 1 analyze from the 1 analyze for you 1 analyze data; and 1 analyze each clause

    7 1

    analyze his dis-

    turbed 1 analyze how to 1 analyze image-space 1 analyze factors that

    8 1 analyze it because 1 analyze it solely 1 analyze metrics, update 1 analyze hiv/aids

    9 1 analyze it. (begin 1 analyze it with 1 analyze moderators for 1 analyze how stereotypes

    10 1 analyze it? do 1 analyze it, not 1 analyze my data 1 analyze how their

    11 1 analyze the legality 1 analyze it, to 1 analyze operations, track 1 analyze laboratory and

    12 1 analyze the loose 1 analyze it. a 1

    analyze postcard representa-

    tion 1 analyze meaningful data

    13 1 analyze the rom-ney 1 analyze that at 1 analyze student learning 1 analyze multiple data

    14 1 analyze the tapes 1 analyze that correctly 1 analyze the bacteriology 1

    analyze performance

    within

    15 1 analyze the whole 1 analyze the mission 1 analyze the data 1 analyze specific risk

    16 1 analyze what these 1 analyze the political 1 analyze the difference 1 analyze text structures

    17 1 analyze why they 1 analyze the situation 1 analyze the openurls 1 analyze the data

    18 1 analyze yourself and 1 analyze the vocals 1 analyze the orf 1 analyze the resulting

    19 1

    analyze, attorneys

    kimberly 1

    analyze their compo-

    nents 1 analyze the production 1 analyze the time

    20 1 analyze, we create 1 analyze them with 1 analyze the statistical 1 analyze their data

    21 1

    analyze. pinsky#

    well 1 analyze them. but 1 analyze the types 1 analyze them, has

    22 1 analyze what you 1 analyze their associations 1 analyze what these

  • 51 Vocabulary analysis: A corpus based…

    23 1 analyze what's 1 analyze their classroom 1 analyze, explain, or

    24 1

    analyze your

    23andme 1 analyze their own 1 analyze. beyond the

    25 1 analyze, and generate

    Rank

    2014 2015 2014 2015

    Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster

    1 2 analyze, fox news 4 analyze the week 1 analyze a column 2 analyze the data

    2 1 analyze (ph) my 2 analyze this, may 1 analyze and evaluate 1 analyze and interpret

    3 1 analyze all the 1

    analyze -- rachel

    campos 1 analyze data to 1 analyze classroom data

    4 1 analyze both jewel 1 analyze all of 1 analyze each participant 1 analyze data and

    5 1 analyze doha's 1 analyze anything. you 1 analyze hi in 1 analyze data, read

    6 1

    analyze gregory,

    but 1 analyze exactly what 1 analyze how and 1 analyze film music

    7 1 analyze him or 1 analyze his donor 1 analyze possible teacher 1 analyze hypotheses six

    8 1 analyze in terms 1

    analyze human

    behavior 1 analyze research studies 1 analyze information to

    9 1 analyze it on 1 analyze it all 1 analyze risk factors 1

    analyze relationships

    between

    10 1 analyze that and 1 analyze it faster 1 analyze students' self 1 analyze surveillance data

    11 1

    analyze the candi-

    dates 1 analyze my interview 1 analyze the alignment 1 analyze teachers' pck

    12 1

    analyze the flavor-

    ing 1

    analyze probabilities

    and 1 analyze the app 1 analyze teachers' use

    13 1 analyze the game 1 analyze the causes 1 analyze the apps 1 analyze the first

    14 1 analyze the media 1 analyze the news 1 analyze the data 1 analyze the images

    15 1

    analyze the situa-

    tion 1 analyze the nominee 1 analyze the demographic 1 analyze the melody

    16 1 analyze the week 1 analyze these bills 1 analyze the gain 1 analyze the performances

    17 1 analyze this story 1 analyze what 11 1 analyze the literary 1 analyze the sources

    18 1 analyze to further 1 analyze what he 1 analyze the records 1 analyze the word

    19 1

    analyze what

    happened 1 analyze where the 1 analyze the svs 1 analyze their e

    20 1 analyze when the 1 analyze why, why 1 analyze the tool 1 analyze this dataset

    21 1 analyze where the 1 analyze, experts everywhere 1 analyze their own 1 analyze unknown science

    22 1

    analyze whether

    they 1 analyze. (begin-video 1 analyze treatment effects 1 analyze, and report

    23 1 analyze, does this 1 analyze. army ser-geant 1 analyze, figured most 1

    analyze, critically exam-ine

    24 1

    analyze. caution,

    you 1 analyze, interpret, and 1 analyze; at times

    25 1 analyze; at times

    Most frequent 3-word clusters for spoken and academic discourses are presented in table

    two. The most frequent cluster which occurs in spoken discourse is analyze the week/s. It is

    visible on the list of all the researched years except 2013. Total numbers of use throughout the

    corpus is 15. Analyze the situation occurs twice on the list of spoken discourse. For academic

    discourse the most frequent 3-word cluster is analyze the data which occurs 5 times in total.

    Next, analyze the link (4 instances), and analyze the time (2 instances). Furthermore, the re-

    searcher would like to underline the differences between the 3-word clusters concerning the

  • 52 Sebastian Dunat

    use of clusters with determiners used in both discourses. In Spoken: the motivation, the over-

    night, the press, the weeks (2010); the week, the weeks, the contenders, the cost, the entire, the

    records, the situation, the very (2011); the week, the loose, the romney, the tapes, the whole

    (2012); the mission, the political, the situation, the vocals (2013); the flavoring, the game, the

    media, the situation, the week (2014); the causes, the news, the nominee (2015) and a political

    (2010). For academic discourse the 3-word clusters concerning the use of determiners are: the

    implications, the interaction, the transcript (2010); the case, the county, the magazines, the

    music, the narrative, the pairwise, the planning, the results, the value (2011); the bacteriolo-

    gy, the data, the difference, the openurls, the orf, the production, the statistical, the types

    (2012); the link, the data, the resulting, the time (2013); the alignment, the app, the apps, the

    data, the demographic, the gain, the literary, the records, the svs, the tool (2014); the first, the

    images, the melody, the performances, the sources, the word (2015) and a 2-stage, a data

    (2011); a large (2013); a column (2014). Please note that the only two clusters which are pre-

    sent on both lists are analyze the records and analyze what these.

    The clusters used in the corpus might suggest that spoken discourse uses more what and

    where clusters: how to (2011); what these, why they (2013); what happened, when the, where

    the, whether they (2014); where the (2015). On the other hand, academic discourse uses more

    how clusters: how housing (2010); how stereotypes, how their, what these (2013); how and

    (2014).

    Analyze plus verb clusters are different for both subcorpora, thus, for spoken discourse,

    one may find: analyze and dissect (2010), analyze and investigate (2012); analyze and de-

    bate, analyze and write (2013). While academic discourse uses: analyze and solve (2010);

    analyze and implement, analyze and organize (2011); analyze and improve, analyze and in-

    terpret (2012); analyze and evaluate (2013); analyze and evaluate (2014); analyze and inter-

    pret, analyze and report (2015).

    Furthermore, the academic discourse uses their pronouns more frequent than spoken dis-

    course, one may find the following examples: their business, their long, their own (2010);

    them as, them to, your motivation (2011); my data, their associations, their classroom, their

    own (2012); , how their, their data (2013); their own (2014). The spoken discourse, on the

    other hand, uses other possessive pronouns: her appearances, his diet (2010); for you (2011);

    their components, them with, it solely, it with, it not, it to (2013); it on (2014); anything you, it

    all (2014); his donor, it faster (2015).

    Among various examples of 3-word clusters which could be interesting, according to

    their correspondence to various nouns, those which use demonstratives should be underlined.

  • 53 Vocabulary analysis: A corpus based…

    In the spoken discourse: history from, seafood samples, both sides, this flow, those reports

    (2010); these things, all this, old data, volcanic moon, wedding rituals (2011); attorney ge-

    neral, brains in (2013); details of, (2013); all the, both jewel, in terms (2014); all of, human

    behavior, probabilities and, these bills, experts everywhere, and army sergeant (2015) are

    used. Respectively, in academic discourse: commonalities in, difference between, excavated

    artifacts, human remains (2010); family homelessness, pending tax, variation among, specific

    gas (2011); complete genomes, image-space, moderators for, postcard representation, student

    learning (2012); as well, Canadian identity, each clause, factors that, meaningful data, multi-

    ple data, performance within, specific risk, text structures (2013); data to, each participant,

    possible teacher, research studies, risk factors, students' self (2014); classroom data, film

    music, information to, relationship between, surveillance data, teachers' pck, unknown

    science, and this dataset (2015) are enlisted. The researcher may conclude that this and those

    demonstratives are more frequently used in the spoken discourse, while that demonstrative is

    more frequent in the academic discourse.

    Table 3 First twenty Academic and Spoken collocates for analyze noun (window span of 5L and 5R)

    Rank

    Spoken Academic

    Freq. Freq. left Freq. right Statistic Collocate Freq. Freq. left Freq. right Statistic Collocate

    1 89 78 11 3.98188 to 118 104 14 4.09570 to

    2 77 13 64 3.57089 the 96 46 50 3.68879 and

    3 58 30 28 3.58809 and 94 25 69 3.46859 the

    4 33 26 7 3.78556 we 33 4 29 4.17859 data

    5 29 18 11 3.47177 you 28 8 20 3.02041 of

    6 29 6 23 3.66732 it 22 8 14 3.03969 in

    7 26 5 21 3.31423 s 18 6 12 3.33515 for

    8 25 24 1 4.12199 will 16 15 1 4.29647 used

    9 22 2 20 4.46808 news 15 3 12 4.01072 their

    10 17 9 8 3.12308 i 13 9 4 3.48234 students

    11 17 8 9 2.85100 a 12 4 8 3.18629 as

    12 16 15 1 4.44161 here 12 3 9 2.54583 a

    13 15 3 12 3.67043 what 11 10 1 3.34086 were

    14 15 5 10 2.67043 that 11 10 1 4.06076 can

    15 15 7 8 3.02657 of 10 8 2 3.92325 was

    16 14 2 12 4.57089 week 10 8 2 3.01072 that

    17 14 7 7 3.71291 they 9 5 4 3.46639 we

    18 14 13 1 4.76354 brooks 9 5 4 4.05136 use

    19 13 8 5 2.94041 is 9 4 5 3.68879 how

    20 13 7 6 3.03771 in 8 6 2 3.78190 they

    The most frequent collocate in both subcorpora is to; 89 instances in spoken discourse

    and 104 in academic. And has second location on academic list, while it is third on spoken

  • 54 Sebastian Dunat

    list, respectively 96 and 56 examples. Third on the academic list, the, with 94 instances, is

    second for spoken discourse, 77 instances. The a determiner is more frequent on the spoken

    list than on the academic one; 17 and 12 instances respectively. Fourth on the academic list,

    data, does not occur on the spoken discourse list; there are 33 examples of its use. It is visible

    that spoken discourse uses we, and they more frequently; 33 and 14 examples. There, is not

    present on spoken discourse list; situated as 9th position on academic list with the frequency of

    15. Same situation occurs for use and how; 9 and 8 examples on the list respectively. Present

    form of to be verb (is) occurs on the spoken list, while its past counterparts (was, were) occur

    on the academic one. First instance has 19th position (13 examples), second has 15th position

    (10 examples), and third is 13th (11 examples). Demonstrative that occurs on both lists, with

    15 examples for spoken, and 10 examples for academic discourse (accordingly 14th and 16th

    position). Last similar words on both lists are: in and of prepositions, first is 6th on the aca-

    demic list and 20th on the spoken list; correspondingly 22 and 13 examples. Second enlisted as

    15th and 5th, on spoken and academic list respectively, with the frequency of 15 and 28. News,

    and week are enlisted on the spoken discourse list only. First one has 9th position (22), and

    second is 16th (14). Used and students enlisted as 8th and 10th, are present only on the acade-

    mic list. It is worth to note that analyze rarely collocate with modal verbs. Only will and can

    are visible on the list; first for spoken, second for academic discourse. Eighth position, will,

    has the frequency of 25. Spoken discourse uses personal pronouns I, you, it, their frequency is

    17 instances, 29 instances and 29 instances respectively; visible in table three. In the same

    category, pronoun collocates, academic discourse uses possessive pronouns their which is

    enlisted as 9th and has the frequency of 15 instances. Lastly, spoken discourse enlists wh-

    adverb (what) on the 13th position with the frequency of 15 examples in the corpus; adverbs

    are not present in the first 20 most frequent collocates on the academic counterpart list.

    Table 4 Parts of speech used in the 2-word clusters

    Rank

    Spoken Academic

    2010 2011 2012 2010 2011 2012

    Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq

    1 dt 7 Nn 6 pp 4 nns 4 nn 4 nns 4

    2 pp 4 dt 3 in 2 nn 3 dt 2 nn 3

    3 nn 3 jj 2 nns 2 wrb 2 pp 2 pp 2

    4 in 1 in 1 np 2 rb 2 in 1 dt 1

    5 that 1 that 1 dt 1 pp 2 jj 1 jjr 1

    6 rb 1 rb 1 wp 1 jj 2 rb 1 np 1

    7 wp 1 wrb 1 vvn 1 np 1 vv 1

    8 np 1 nn 1 np 1 vvg 1

    9 vv 1 in 1

  • 55 Vocabulary analysis: A corpus based…

    10 dt 1

    Rank

    2013 2014 2015 2013 2014 2015

    Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq

    1 pp 4 nn 5 dt 3 nn 5 dt 3 nn 6

    2 dt 2 dt 4 pp 3 dt 3 nn 3 nns 4

    3 in 1 in 3 nn 3 jj 2 nns 2 dt 2

    4 that 1 pp 2 rb 2 pp 2 in 1 pp 1

    5 wp 1 to 2 wrb 2 rb 2 jj 1 rb 1

    6 wrb 1 wrb 2 nns 2 np 2 pp 1 np 1

    7 nn 1 that 1 jj 1 wp 1 uh 1 vv 1

    8 nns 1 rb 1 wp 1 wrb 1 wrb 1

    9 np 1 wp 1 vv 1 nns 1 np 1

    10 vv 1 vvz 1 vv 1 vvd 1

    As table 4 presents the determiners (dt) are more frequently used in spoken discourse 7

    (2010), 3 (2011), 1 (2012), 2 (2013), 4 (2014), 3 (2015), while, singular nouns (nn) 3 (2010),

    4 (2011), 4 (2012), 5 (2013), 3 (2014), 6 (2015) and plural nouns (nns) 4 (2010), 4 (2012),

    1 (2013), 2 (2014), 4 (2015) are used more often in academic discourse. What is interesting,

    none of the plural verbs occurred in the 2011 part of the subcorpora. Proper nouns (np) do

    not occur in the 2-word clusters in spoken discourse. They are present on the academic list

    with 1 (2010), 1 (2012), 2 (2013), 1 (2014), 1 (2015) frequency. Furthermore, personal pro-

    nouns (pp) are present and more frequent on the list of spoken discourse. Wh-adverbs (wrb)

    occur on the both lists (4 instances each), but in different years. For spoken: in 2012 (1), 2013

    (1), and 2014 (2), while for academic: in 2010 (2), 2013 (1), and 2014 (1). On the other hand,

    wh-pronouns (wp) are used more frequently on the spoken list. Their frequency in 2010, 2012,

    2013, 2014 and 2015 is one example per each year. For academic discourse wh-pronouns are

    visible on the list for the years of 2013 and 2014; respectively 2 and 1 example. Adjectives (jj)

    and comparatives (jjr) are more common for the academic discourse: 2 (2010), 1 (2011), 1 (2012),

    2 (2013), 1 (2014). Their distribution in spoken discourse: 2 (2011), 1 (2015). Furthermore,

    spoken discourse uses prepositions (in) more often: 1 (2010), 1 (2011), 2 (2012), 1 (2013),

    3 (2014). In academic discourse, their use is distributed as follows: 1 (2010), 1 (2011),

    1 (2014). Verbs (vv), past tense verbs (vvd), participle/gerund verbs (vvg), and present 3rd

    person singular verbs (vvz) are not common but present on both lists. There are 4 instances of

    its use in spoken discourse and 6 instances in the academic. It is worth to note that spoken

    discourse uses only base form, and 3rd person singular forms of verbs. While academic dis-

    course uses base form, past tense, past participle, and gerund/participle.

  • 56 Sebastian Dunat

    Table 5 Parts of speech used in the 3-word clusters

    Rank

    Spoken Academic

    2010 2011 2012 2010 2011 2012

    Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq

    1 dt 13 nn 14 dt 9 nns 11 nn 15 nn 12

    2 nn 10 dt 12 pp 7 jj 7 dt 13 nns 11

    3 nns 7 nns 6 nn 7 nn 7 to 4 dt 8

    4 pp 4 that 2 in 5 in 5 jj 3 cc 6

    5 in 2 jj 2 nns 3 dt 4 pp 3 pp 4

    6 jj 2 in 1 vv 2 pp 4 rb 3 vv 3

    7 to 2 md 1 cc 2 rb 4 nns 3 jj 3

    8 vv 2 pdt 1 jj 2 cc 2 vv 2 in 1

    9 wp 2 pos 1 rb 2 wrb 2 cc 2 jjr 1

    10 cc 1 pp 1 vvp 2 np 2 cd 1

    11 np 1 rb 1 wp 1 vv 1 in 1

    12 pos 1 wrb 1 jjr 1 vvg 1

    13 rb 1 vvg 1 vvn 1

    14 that 1 vvn 1 vvp 1

    Rank

    2013 2014 2015 2013 2014 2015

    Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq Keyword Freq

    1 pp 12 dt 12 nn 10 dt 8 dt 12 nn 14

    2 in 7 nn 11 dt 9 nn 8 nn 10 nns 13

    3 dt 5 pp 5 pp 6 nns 7 nns 10 dt 8

    4 rb 5 in 4 nns 5 jj 5 jj 4 cc 3

    5 nn 5 nns 3 wp 3 cc 3 in 3 cd 3

    6 that 4 cc 2 wrb 3 pp 3 vv 2 jj 3

    7 to 4 that 2 rb 2 rb 3 cc 2 vv 2

    8 nns 3 to 2 cc 1 vv 2 to 2 in 2

    9 vv 2 wrb 2 cd 1 in 2 np 2 pos 2

    10 cc 2 jjr 1 in 1 wrb 2 jjs 1 sym 2

    11 wp 2 pdt 1 jj 1 that 1 pos 1 to 2

    12 jj 1 pos 1 md 1 sym 1 pp 1 pp 1

    13 pos 1 rb 1 rbr 1 wp 1 uh 1 rb 1

    14 wrb 1 rp 1 np 1 np 1 wrb 1 vvd 1

    15 wp 1 vvg 1 vvd 1

    16 np 1 vvz 1

    17 vvn 1 vhz 1

    18 vvz 1

    Table 5 presents parts of speech used in the 3-word clusters of the researched material.

    Academic discourse uses nouns (nn, nns), adjectives (jj) and determiners (dt) most frequently.

    Nouns, most frequent part of speech, used in the academic discourse occur with the following

    frequency: 18 (2010), 18 (2011), 23 (2012), 15 (2013), 20 (2014), 27 (2015). Adjectives are

    enlisted with the frequencies of: 7 (2010), 3 (2011), 3 (2012), 5 (2013), 4 (2014), 3 (2015).

  • 57 Vocabulary analysis: A corpus based…

    While, determiners have the frequency: 4 (2010), 13 (2011), 8 (2012), 8 (2013), 12 (2014),

    8 (2015). Most frequent parts of speech for the spoken discourse are: determiners (dt), nouns

    (nn, nns) and personal pronouns (pp). First of the above, has the frequencies of: 13 (2010),

    12 (2011), 9 (2012), 5 (2013), 12 (2014), 9 (2015). Second, are enlisted with the following

    number of examples: 17 (2010), 20 (2011), 10 (2012), 8 (2013), 14 (2014), 15 (2015). Third,

    personal pronouns, are used: 4, 1, 7, 12, 5, and 6; chronologically throughout the researched

    years. For the academic discourse personal pronouns are used less frequently, chronological-

    ly: 4, 3, 4, 3, and 1 example. Adjectives used in the spoken discourse are scarce: but fairly

    constant, chronologically 2, 2, 2, 1, 1, and 1. What is worth to note in 2014 there is one exam-

    ple of comparative adjective (jjr) used, while for the academic discourse there are compara-

    tives used in 2010 (1), 2012 (1), and one superlative (jjs) in 2014. Prepositions and subordi-

    nating conjunctions (in) are more frequent for the fields of academic discourse: 5 (2010),

    1 (2011), 1 (2012), 2 (2013), 3 (2014), 2 (2015); for spoken, there is only: 1 (2011), 7 (2013),

    4 (2014), 1 (2015). On the other hand, only spoken discourse uses that complementizer:

    1 (2011), 2 (2012), 4 (2013) and 2 (2014) instances. Next, to preposition is used in spoken

    discourse: 2 (2010), 4 (2013), and 2 (2014), and in academic one only twice in 2014. Coordi-

    nating conjunctions (cc) are present on both lists. The spoken discourse uses: 1, 0, 2, 2, 2, and

    1; chronologically throughout the researched years. In addition, academic discourse enlists

    them (cc) more frequent: 2, 2, 6, 3, 2, and 3 instances, chronologically. Wh-pronouns (wp)

    and wh-adverbs (wrb) are used more frequently in the spoken discourse. The frequencies for

    the above mentioned pronouns are: 2 (2010), 1 (2012), 2 (2013), and 1 (2014). For academic

    discourse, there is only on instance of its usage in 2013. Furthermore, wh-adverbs enlisted

    frequency on spoken discourse list: 1 (2012), 1 (2013), 2 (2014), 3 (2015); and on academic

    list: 2 (2010), 2 (2013), 1 (2014). Other adverbs (rb) are more frequently used in the academic

    discourse: 4 (2010), 3 (2011), 3 (2013), 1 (2015); and 1 (2010), 1 (2011), 2 (2012), 5 (2013)

    in the spoken discourse. Next, verb clusters (vv, vvg, vvd, vvz, vvn, vvp, vhz) are used more

    frequently in the academic discourse: 4 (2010), 5 (2011), 3 (2012), 5 (2013), 3 (2014), and

    3 (2015), than in the spoken one: 2 (2010), 4 (2012), 2 (2013), 2 (2014). What is worth to

    note, have present, 3rd person singular (vhz); past tense (vvd) and gerund/participle (vvg)

    verbs are used only on the academic list. Lastly, possessive ending (pos) is present on the list

    both lists, but more frequent for spoken discourse 1 (2010), 1 (2011), 1 (2013), and 1 (2014);

    while for the academic, they are present with the frequency of: 1 (2014), and 2 (2015).

  • 58 Sebastian Dunat

    Conclusions

    It might be concluded that, the difference in the quantitative use of the studied noun col-

    locates is evident in the researched discourses, with an advantage of some of the collocate

    types in the number of tokens used. There is a variation of cluster types used in both research

    fields and presented years. There is a quantitative difference in the distribution of sentences

    containing the studied types of clusters in diachronic spectrum. Additionally, some of the

    cluster types are more frequent than others in the studied years. What is worth to note, the

    analyze verb rarely collocates with modal verbs. Only will and can are visible on the lists;

    first for spoken, second for academic discourse. This trend may constitute the basis for further

    research.

    Table 6 Person’s Chi-squared test for significance of the parts of speech used within researched clusters

    Cluster/part of speech Academic

    freq.

    Spoken

    freq. Pearson’s Chi-squared test

    coordinating conjunctions 18 8

    X-squared = 42.748, df = 10, p-value = 5.517e-06

    determiners 65 80

    preposition/subordinating

    conjunctions 17 28

    adjectives 31 11

    singular/mass noun 90 76

    plural noun 70 32

    proper noun 12 7

    personal pronouns 26 52

    adverbs 17 17

    to 8 10

    wh-adverbs 9 13

    Cluster/part of speech Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

    singular/mass noun 90 76

    X-squared = 5.5518, df = 2, p-value = 0.06229 plural noun 70 32

    proper noun 12 7

    Cluster/part of speech Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

    adjectives 31 11 X-squared = 9.5238, df = 1, p-value = 0.002028

    Cluster/part of speech Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

    coordinating conjunctions 18 8

    X-squared = 6.6637, df = 2, p-value = 0.03573 preposition/subordinating

    conjunctions 17 28

    to 8 10

    Cluster/part of speech Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

  • 59 Vocabulary analysis: A corpus based…

    adverbs 17 17 X-squared = 0.15357, df = 1, p-value = 0.6951

    wh-adverbs 9 13

    Cluster/part of speech Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

    determiners 65 80 X-squared = 1.5517, df = 1, p-value = 0.2129

    Cluster/part of speech Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

    personal pronouns 26 52 X-squared = 8.6667, df = 1, p-value = 0.003241

    Pearson’s chi-squared tests for significance, please see table 6, revealed that noun-

    clusters are more frequent for the academic discourse, p-value equals 0.06. Although, they

    cannot be taken into consideration as significant, since the research p-value of significance

    should be less than 0.05. Next, adjectives in the clusters are significantly more frequent in the

    academic discourse; p-value hit 0.002. Furthermore, coordinating conjunctions are more fre-

    quent in academic discourse, while, subordinating conjunctions and prepositions are more

    frequent in spoken discourse. The Pearson’s chi-squared test shows the p-value of 0.03,

    correspondingly, differences in conjunctions use are significant. Moreover, statistical test for

    adverbs use does not prove that any of the researched discourses uses them more frequent; the

    p-value of the Pearson’s chi-squared test equals 0.69. Last but not least, determiners use in the

    discourses scored the p-value of 0.21 and are not proved to be significant, nevertheless the

    number of the determiners used in the spoken discourse is greater than in the academic. Final-

    ly, personal pronouns are significant, and proved to be more frequent in the fields of spoken

    discourse; the total p-value is 0.003. The overall p-value of the surveyed data equals

    0.0000005 which proves that the cluster variation within different discourses can be used as

    the basis for interesting, further research.

    Table 7 Pearson’s chi-squared test for significance of the words used as collocates in 5L to 5R widow span

    Collocate Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

    a 12 17

    X-squared = 32.873, df = 8, p-value = 6.491e-05

    and 96 58

    in 22 13

    of 28 15

    that 10 15

    the 94 77

    they 8 14

    to 118 89

    we 9 33

    Collocate Academic

    freq.

    Spoken

    freq. Pearson's Chi-squared test

    a 12 17 X-squared = 0.86207, df = 1, p-value = 0.3532

  • 60 Sebastian Dunat

    and 96 58 X-squared = 9.3766, df = 1, p-value = 0.002198

    in 22 13 X-squared = 2.3143, df = 1, p-value = 0.1282

    of 28 15 X-squared = 3.9302, df = 1, p-value = 0.04743

    that 10 15 X-squared = 1, df = 1, p-value = 0.3173

    the 94 77 X-squared = 1.6901, df = 1, p-value = 0.1936

    they 8 14 X-squared = 1.6364, df = 1, p-value = 0.2008

    to 118 89 X-squared = 4.0628, df = 1, p-value = 0.04384

    we 9 33 X-squared = 13.714, df = 1, p-value = 0.0002128

    Firstly, indefinite article a is used more frequently in spoken discourse, but the difference

    between the discourses frequency is slight (please see table 7). Definite article, on the other

    hand, is used more frequently in the academic discourse. Although, both cannot be taken into

    consideration as significant, for the p-value scores are greater than 0.05. Secondly, and is

    more frequent for academic discourse: the total difference in its use frequency between the

    discourses is 38, and the significant p-value equals 0.002. Thirdly, in, is used 22 times, in

    academic discourse, which is 9 instances more than in spoken discourse. However, the

    p-value totals 0.12, therefore, it is not significant. Next, that p-value for both discourses

    equals 0.31 which makes it significantly irrelevant. Same thing occurs for they, where the

    p-value scores 0.20. On the other hand, of, to and we have the p-value small enough to be

    taken into consideration as significant; 0.04, 0.04 and 0.0002 respectively. First two of the

    above mentioned are used more frequently in the academic discourse, while the third one is

    significantly more frequent in spoken discourse. As Pearson’s chi-squared tests for signifi-

    cance shows, the collocates from seventh table scored the overall p-value of 0.000006. It

    proves that the collocates research in the discourse setting might provide a good basis for fur-

    ther study.

    Bibliography

    BIBER D., SUSAN C., RANDI R., Corpus Linguistics: Investigating language structure and

    use, Cambridge, Cambridge University Press, 1998.

    BUCHOLTZ, M., Theories of Discourse as Theories of Gender: Discourse Analysis in Lan-

    guage and Gender Studies, The handbook of Language and Gender., Blackwell Publishing

    Ltd., 2003.

    Corpus of Contemporary American English. N.p., n.d. Dostęp. 14.11.2019,

    15.11.2019.Dostępne online https://corpus.byu.edu/coca/

    HAUSSER R., Computational linguistics: Human-Computer Communication in Natural Lan-

    guage (3 Ed.). Springer, 2014.

  • 61 Vocabulary analysis: A corpus based…

    SUNDERLAND J., Language and Gender: An advanced resource book, Routledge, London

    2006.

    The R Foundation for Statistical Computing. R x64 version 3.4.1.,30 Jun. 2017. Free-

    ware software. Dostępne online < http://cran.r-project.org/>

    LAURENCE A., TagAnt x64, version 1.2.0., 15 Sep. 2015. Freeware software. Do-

    stępne online < http://www.laurenceanthony.net/software/tagant>

    LAURENCE A., ProtAnt x64, version 0.1., 21 Mar. 2017. Freeware software. Do-

    stępne online < http://www.laurenceanthony.net/software/protant>

    LAURENCE A., AntConc 3.5.7., 30 Sept. 2018. Freeware software. Dostępne online

    Analiza słownictwa: Badanie korpusowe zbitek wyrazowych i kolokatów czasownika “analyze” w dys-

    kursie akademickim i mówionym.

    Istnieje wiele korpusów dostępnych online, które można dowolnie oznakować różnymi funkcjami językowymi.

    Wiele z nich stanowi świetną bibliotekę przykładów, a dane w nich zawarte można wykorzystać do analizy za

    pomocą dowolnego narzędzia językowego. Celem tego badania było sprawdzenie różnorodności kolokatów

    i zbitek wyrazowych używanych z czasownikiem „analyze”, w dwóch dyskursach: akademickim i mówionym.

    W pracy przedstawiono opis danych korpusowych (300 przykładów), uprzednio sklasyfikowanych pod

    względem wybranych kategorii badawczych. Narzędzia językoznawstwa komputerowego posłużyły tutaj do

    przeprowadzenia badań statystycznych z użytkiem danych korpusowych. Testy chi-kwadrat Pearsona dowiodły

    istotności użytku niektórych zbitek wyrazowych i kolokatów w zestawieniu ilościowym, w badanym materiale.

    Podsumowując, różnorodność wykorzystania zbitek wyrazowych oraz kolokatów w ramach zbadanych dys-

    kursów może być podstawą do przeprowadzenia dalszych, interesujących badań.