Statistika za filologe

8/21/2019 Statistika za filologe

1/17

Statistics for Linguists: A Tutorial

Mark Dras

Centre for Language TechnologyMacquarie University

HCSNet Summerfest

28 November 2006

1 /6 7

Tutorial Structure

primarily as some examples of tasks linguists might be interested

in

within these, statistical ideas that are useful

hypothesis testingvarious statistical measures (2, likelihood ratios, .. . )statistical distributionssome other useful ideas (e.g. Latent Semantic Analysis)

basic material taken from Manning and Schutze [1999]

another useful overview: Krenn and Samuelsson [1997]

2 / 6 7

Collocations

Outline

1 Collocations

Frequency

Hypothesis Testing (+ background: Basic Probability Theory)

The t-TestPearsons Chi-Square Test (+ background: Distributions)

Likelihood Ratio Test (+ background: Conditional Probability)

Fishers Exact Test

2 Verb Subcategorisation

Precision and Recall

3 Semantic Similarity

Latent Semantic Indexing

4 Register Analysis

5 References

3 /6 7

Collocations

Definitions

collocation: an expression consisting of two or more words that

correspond to some conventional way of saying things

Firth (1957): Collocations of a given word are statements of the

habitual or customary places of that word

examples:

noun phrases: strong tea,weapons of mass destructionphrasal verbs: to make upother standard phrases: the rich and powerful

has limited compositionality

more than an idiomexample:international best practice

4 / 6 7


2/17

Collocations Frequency

Frequency

most basic idea: start with a corpus, and count the relevant

frequencies

if looking for two-word collocations, just count frequencies of pairsof adjacent wordsobvious problem: get lots of useless high frequency wordsfrom New York Times:

C(w1w2) w1 w280871 of the

58841 in the

26430 to the

21842 on the

. . . . . . . . .

1 26 22 f ro m t he

1 14 28 N ew Yo rk

10007 he said

. . . . . . . . .

5 /6 7


Frequency

basic idea of frequency still maybe OK with conditions:

1 use when looking to verify specific alternatives or patterns; or2 add a filter

6 / 6 7


Example #1a: Eggcorns

described in Language Log

http://itre.cis.upenn.edu/myl/languagelog/

idea: something like a mistaken but plausible reanalysis of a word

or phrase

examples: to step foot in,baited breath,free reign,hone in,ripe

with mistakes,for all intensive purposes,manner from Heaven,

give up the goat

like folk etymology, a malapropism, or a mondegreen; but not

collection by Chris Waigl

http:www.eggcorns.lascribe.net

interested in seeing whether eggcorn is gaining currency

compare by Google hitse.g.inclement weather(173K whG) vs inclimate weather(11KwhG) vsincliment weather(719 whG)

7 /6 7


Example #1b: Snowclones

also defi ned at Language Log

idea: adaptable cliche frames

examples: Have X, will travel;X is the new Y;X, we have a

problem

again, use Google hits

8 / 6 7


3/17


Aside: Google Counts

theres been discussion about the reliability of Google-derived

frequencies

see Jean Veronis blog and Language Log

example of problem

Google query: junco partner lyrics (9440 whG)Google query: junco partner lyrics connick (279 whG)

Google query: junco partner lyrics -connick (930 whG)frequency counts over 100K generally regarded as unreliable, but

may also be the case for smaller

problems appear to be related to Googles indexing, and treatment

of near-identical page matches

9 /6 7


Adding Filters

alternatively, if the problem is to fi nd rather than to verify, can use

fi lters based on part of speech

for instance, in previous example of extracting collocations, can

use patterns like the following:

Tag pattern Example

A N linear function

N N r egr es si on co ef fi c ie nt sA A N Gaussian random variable

N P N d eg re es o f f re ed om

. . . . . .

1 0 /6 7


Adding Filters

applied to same New York Times text, get

C(w1w2) w1 w2 Tag Pattern11487 New York A N

7261 United States A N5412 Los Angeles N N

3301 last year A N. . . . . . . . . . . .

1074 chief executive A N1 07 3 r ea l e st at e A N

. . . . . . . . . . . .

similarly, given particular adjectives, can fi nd most frequent

co-occurring nouns:

w C(strong, w) w C(powerful, w)support 50 force 13safety 22 computers 10sales 21 position 8. . . . . . . . . . . .man 9 man 8. . . . . . . . . . . .

good, but still not perfect

e.g.manoccurring in both listswant to ignore ifmanis relatively common by itself

11/67

Collocations Hypothesis Testing (+ background: Basic Probability Theory)

Random Variable

the probability of an event is the likelihood that it will occur,

represented by a number between 0 and 1:

probability 0: impossibility

probability 1: certaintyprobability 0.5: equally likely to occur as not

a random variable ranges over all the possible types of outcomes

for the event being measured . . . example:

random variableX= the result of rolling a dieP(X= 1)= the probability of the die showing 1 = 1/6P(X= 1) = P(X =2) = . . .= P(X =6)= 1/6

properties

the probability of an outcome is always between 0 and 1the sum of probabilities of all outcomes is 1

1 2 /6 7

C ll ti H th i T ti ( b k d B i P b bilit Th ) C ll ti H th i T ti ( b k d B i P b bilit Th )


4/17


Summary Measures for Random Variables

the (arithmetic) mean, or average:

E(X) =i

xiP(xi)

from the die example,E(X) = 1.(1/6) + 2.(1/6) +. . .+ 6.(1/6) = 3.5

the variance, to measure the spread:

Var(X) = E(XE(X))2 =E(X2)(E(X))2 =i

x2i P(xi)(E(X))2

from the die example,Var(X) = 1.(1/6) + 4.(1/6) +. . .+ 36.(1/6) 3.52 =2.91

note that these apply to the whole population

13/67


Example

imagine a six-sided die where each outcome wasnt equally likely,

but hadP(X=1) = P(X=6) = 1/100,P(X=2) = P(X =5) = 4/100,P(X=3) = P(X =4) = 45/100

E(X) = 1.(1/100) + 2.(4/100) +. . .+ 6.(1/100) = 3.5, as beforeVar(X) = 1.(1/100) + 4.(4/100) +. . .+ 36.(1/100) 3.52 =0.53

1 4 /6 7


Estimating Probabilities

Maximum Likelihood Estimator (MLE)

used to estimate the theoretical probability from a sampleif a specific event has occurred mtimes out ofnoccasions, the

MLE probability ism/nthe larger the number of occasions measured, the more accuratethe MLE

sample mean and variance given observationsxi

sample mean

x= 1

n

ni=1

xi

sample variance

s2

=n

i=1(xi x)2

n 1

15/67



imagine we dont know the population probabilities

we want to estimate them from a sample

die outcome number of times rolled

1 162 183 134 165 196 18

100

x= 16 1+. . .+ 18 6

100 =3.58

s2 = 16 (1 3.58)2 +. . .+ 18 (6 3.58)2

99 3.05

1 6 /6 7

Collocations Hypothesis Testing (+ background: Basic Probability Theory) Collocations Hypothesis Testing (+ background: Basic Probability Theory)


5/17



problem: sparse data

its more difficult to estimate the probability of a rare eventif the corpus doesnt register, say, a rare word, the MLE for the wordis 0

possible solution: smoothing

even though the MLE seems like the right way of estimatingprobabilities, its not always the right oneinfrequent events can get too little probability masscan redistribute some of the probabilities

17/67


Hypothesis Testing

using frequencies, as previously, we might decide that new

companiesis a collocation because if has a high frequency

however, we dont really think it is one; maybe its becausenewandcompaniesare individually frequent, and just appear together bychance

hypothesis testing is a way of assessing whether something is due

to chance

has the following procedure:

formulate a NULL HYPOTHESISH0, that there is no associationbeyond chancecalculate the probabilitypthat the event would occur if H0were truethen, ifpis too low (usually 0.05 or smaller), reject H0; retain it as apossibility otherwise

1 8 /6 7

Collocations The t-Test

The t-Test

want a test that will say how likely or unlikely a certain event is to

occur

the t-test compares a sample mean with a population mean,

relative to the samples variability

t= x

s2

N

whereNis the size of the sample, and is the population meanaccording to the null hypothesis

look up this t-value against a table

table gives t-value for a given confidence level and a givennumber of degrees of freedom ( = N 1)

p 0 .0 5 0 .0 1 0. 00 5 0 .0 01

d.f. 1 6.314 31.82 63.66 318.31 0 1 .81 2 2 .7 64 3. 16 9 4 .1 442 0 1 .72 5 2 .5 28 2. 84 5 3 .5 52 1 .6 45 2 .3 26 2 .5 76 3 .0 91

19/67

Collocations The t-Test

The t-Test

non-linguistic example:

H0: the mean height of a population of men is 158cm (vs apopulation of shorter men)

sample data: size 200 with x= 169 ands2

=2600

t= 169 158

2600

200

3.05

looking up the table, t>2.576, so we can reject H0 with 99.5%confi dence

p 0 .0 5 0. 01 0.005 0.001d.f. 1 6.314 31.82 63.66 318.3

10 1 .8 12 2. 76 4 3 .1 69 4 .1 4420 1 .7 25 2. 52 8 2 .8 45 3 .5 52 1.645 2.326 2.576 3.091

2 0 /6 7

Collocations Pearsons Chi-Square Test (+ background: Distributions) Collocations Pearsons Chi-Square Test (+ background: Distributions)


6/17

Collocations Pearson s Chi Square Test (+ background: Distributions)

Distributions

a PROBABILITY DISTRIBUTION FUNCTION is a function describing

the mapping from random variable values to probabilities

these can be either discrete (from a finite set) or continuous

weve already seen a UNIFORM DISTRIBUTION (the original die

example)

this was a discrete function

P(X =x) = 1

n(wherenis the number of outcomes)

21/67

Collocations Pearson s Chi Square Test (+ background: Distributions)

Gaussian Distribution

another important (continuous) one is the G AUSSIAN( OR

NORMAL) DISTRIBUTION

defined by the function f(x) = 1

2

exp(x)222

population mean is, variance

a lot of data can be assumed to have this distribution, e.g. heights

in a population

the t-test described previously assumes a normal distribution

2 2 /6 7

Collocations Pearsons Chi-Square Test (+ background: Distributions)

Bernoulli Distribution

the discrete BERNOULLI DISTRIBUTIONmeasures the probability

of success in a yes/no experiment, with this probability called p

defined byP(X =1) = p, P(X =0) = 1 p

population mean isp, variance isp(1 p)

23/67

Collocations Pearsons Chi-Square Test (+ background: Distributions)

Binomial Distribution

the discrete BINOMIAL DISTRIBUTION measures the probability of

the number of successes in a sequence of nindependent yes/no

experiments (Bernoulli distributions), each of which has probability

pdefined byP(X= x) =

nk

pk(1 p)nk

population mean isnp, variance isnp(1 p)

models things like the probability of gettingkheads fromntossesof a fair coinfor largen, can be approximated by the normal distribution 2 4 /6 7


7/17


8/17

Collocations Pearsons Chi-Square Test (+ background: Distributions) Collocations Pearsons Chi-Square Test (+ background: Distributions)


9/17

Pearsons Chi-Square Test

as for the t-test, the 2 has an associated number of degrees offreedom

for a table of dimensions r c, there are(r 1)(c 1)degrees offreedom

we check the distribution for 2:

p 0.99 0.95 0.10 0.05 0 .0 1 0 .0 05 0 .0 01

d.f. 1 0.00016 0.0039 2.71 3.84 6 .6 3 7. 88 10 .8 32 0.20 0.10 4.60 5.99 9.21 10.60 13.823 0.115 0.35 6.25 7.81 11.34 12.84 16.274 0.297 0.71 7.78 9.49 13.28 14.86 18.47

100 70.06 77.93 118.5 124.3 135.8 140.2 149.4

theX2 value is less than for = 0.05, so we wouldnt rejectH0:i.e. we wouldnt takenew companiesas a collocation, as before

with the t-test

33/67

Comparison: Chi-Square vs t-Test

for the previous example, theres quite a lot of overlap

for example, the top 20 bigrams according to the t-test are the sameas the top 20 for2

however,2 also appropriate for large probabilities, where thenormality assumption of the t-test fails

3 4 /6 7

Collocations Likelihood Ratio Test (+ background: Conditional Probability)

Conditional Probability

weve already in fact used the notion of independent events

two events are independent of each other if the occurrence of onedoes not affect the probability of the occurrence of the other

tossing a coin and winning the lottery: independentspeeding and having an accident: not independent

conditional probability: the probability that one event occurs given

that another event occurs

35/67


Example

the following table shows the weather conditions for 100 horse

races and how many times Harry won:

rain shine

win 15 5

no win 15 65

Harry won 20 out of 100 races: P(win) = 0.2 (by MLE)

the conditional probability of Harry winning given rain is

P(win| rain) = 15/30= 0.5

compare this with the2 test: under the null hypothesis, theobserved data was compared against the situation where the

words were independent

3 6 /6 7


Lik lih d R iCollocations Likelihood Ratio Test (+ background: Conditional Probability)

Lik lih d R i


10/17

Likelihood Ratio

another approach to hypothesis testing

more appropriate to sparse data than 2

more interpretable also: says how much more likely one hypothesisis than another

here, examine explicitly two hypotheses to explain bigramw1w2

Hypothesis 1: P(w2 |w1) = p= P(w2|w1)Hypothesis 2: P(w2 |w1) = p1=p2= P(w2|w1)

Hypothesis 1 represents independence ofw1 and w2; Hypothesis

2 represents dependence (and hence a possible collocation)

37/67

Likelihood Ratio

well use the usual MLEs for p,p1,p2, and writingc1, c2,c12 for thenumber of occurrences ofw1,w2,w1w2

p= x2

N, p1=

c12

c1,p2 =

c2 c12N c1

well also use the notation for a binomial distribution

b(k; n,p) =n

k

pk

(1 p)nk

now, the likelihoods are

for Hypothesis 1,

L(H1) = b(c12; c1, p)b(c2 c12; N c1,p)

for Hypothesis 2,

L(H2) = b(c12; c1, p1)b(c2 c12; N c1,p2)

3 8 /6 7


Likelihood Ratio

the log of the likelihood ratio is then

log= logL(H1)

L(H2)

for bigrams ofpowerful:

2log C(w1) C(w2) C(w1w2) w1 w21291.42 1 2593 932 150 most powerful

99.31 379 932 10 p olitically p owerful82.96 932 934 10 p owerful computers80.39 932 3424 13 powerful force57.27 932 291 6 powerful symbol

. . . . . . . . . . . . . . . . . .

the value 2loghas a2 distribution

so you can do hypothesis testing using the2 table

39/67


Comparison: Chi-Square and Likelihood Ratio

the likelihood ratio has an intuitive meaning

from the previous table, the bigram powerful symbolise0.557.27 2.729 1012 times more likely to occur than would be

expected by the individual words alonecomparison carried out by Dunning [1993]

2 tends to be less accurate with sparse data

as a rule of thumb, need large sample, and counts in each cell (i.e.

occurrences of words or bigrams) of at least 5

the events were interested in textindividual words orn-gramsare in fact often less frequent than this: related to theZipfian distribution of wordsas an example, Dunning selected words from a 500,000 wordcorpus with frequences of between 1 and 4; these included wordslike abandonment,clause,meat,poiand understatement

the log likelihood is more accurate here (but still needs counts of atleast 1)

4 0 /6 7

Collocations Fishers Exact Test

Fi h E t T tCollocations Fishers Exact Test

Fi h E t T t


11/17

Fishers Exact Test

the previous tests have all been PARAMETRICtests: that is, they

assume some distribution

its possible to use a N ON -PARAMETRICtest, which makes no

assumptions

trade-off is that its typically more time-consuming to calculate,

and is only feasible for smaller amounts of data

Fishers Exact Test computes the signifi cance of an observed

table by exhaustively computing the probability of every table that

would have the same marginal totals

suggested as an alternative to the previous tests by Pedersen

[1996]

41/67

Fishers Exact Test

consider again a 2 2 contingency table

w1 = new w 1 =neww2 = companies E 1,1 E1,2

(new companies) ( e.g.old companies)w2 =companies E 2,1 E2,2

(e.g.new machines) (e.g.old machines)

the probability of obtaining any such set of values is

p=E1,1+E1,2

E1,1E2,1+E2,2

E2,1E1,1+E1,2+E2,1+E2,2

E1,1+E2,1

4 2 /6 7

Verb Subcategorisation

Outline

1 Collocations

Frequency




Fishers Exact Test





4 Register Analysis

5 References

43/67



verbs express their semantic arguments with different syntactic

means

the class of verbs with semantic arguments themeandrecipienthas a subcategory expressing these via a direct object and aprepositional phrase: he donated a large sum of money to thechurcha second subcategory permits double objects: he gave the churcha large sum of money

these subcategorisation frames are typically not in dictionaries

might be interested in identifying them via statistics

Brent [1993] developed the system Lerner to assign one of six

frames to verbsDescripti on G ood Example Bad ExampleNP only greet them *arr ive themtensed clause hope hell attend *want hell attendi nfi n it iv e h op e t o a tt en d * gr ee t t o a tt en d

NP & clause tell him hes a fool *yell him hes a foolNP & infi nitive want him to attend *hope him to attendNP & NP t el l him t he story *shout him t he story

4 4 /6 7


Algorithm for Learning Subcat FramesVerb Subcategorisation

Hypothesis Testing


12/17

Algorithm for Learning Subcat Frames

Lerner had two steps:

1 Define cues. Define a regular pattern of words and syntacticcategories which indicates the presence of the frame with highcertainty (probability of error). For a particular cue cjwe define theprobability of errorjthat indicates how likely we are to make amistake if we assign framef to verbvbased on cuecj.

2 Do hypothesis testing. Initially assume the frame is notappropriate for the verb: this is the null hypothesisH0. We rejectH0

if the cuecjindicates with high probability that H0 is wrong

example: cue for frame NP only (transitive verb)

(OBJ| SUBJ OBJ| CAP) (PUNC| CC)OBJ = accusative case personal pronouns; SUBJ OBJ =nominative or accusative case personal pronouns; CAP =capitalised word; PUNC = punctuation; CC = subordinatingconjunctionpositive indicator for transitive verb: consider . . . greet/V Peter/CAP,/PUNC . . .

45/67

Hypothesis Testing

suppose verbvioccurs a total ofntimes in the corpus and that

there arem noccurrences with a cue for frame fj

assume also some errorjin inferring a frame fjfrom cue cj

this suggests a binomial distribution

then reject null hypothesisH0thatvidoes not permit fjwith the

following probability of error:

pE=P(vidoes not permit fj |C(vi, cj) m) =n

r=m

n

r

rj(1j)

nr

various values forjwere assessed

4 6 /6 7

Verb Subcategorisation Precision and Recall


typically, when building a statistical model to do something, you

want to evaluate how suitable it is

for example, how well it performs a simple task

the measures of PRECISIONand RECALLare one way of doing that

imagine you have a system for sorting your objects of interest into

two piles, relevant and irrelevant

system can make two types of errors: classifying a relevant objectas irrelevant, or an irrelevant one as relevantsystem decisions can then be broken into four categories: truepositive (TP), false positive (FP), false negative (FN), true negative(TN)

actually:system predicts: relevant irrelevantrelevant TP FPirrelevant FN TN

47/67



precision is the proportion of system-predicted relevant objects

that are correct:

PRE=

TP

TP+ FPrecall is the proportion of actually relevant objects that the system

managed to predict as relevant

REC= TP

TP+ FNexample: theres a set of 200 documents, of which 40 are actually

relevant; your system says that 50 are relevant, including 20 of the

ones that actually are

actually:system predicts: relevant irrelevant

relevant TP = 20 FP = 30ir rel eva nt FN = 20 TN = 13 0

then,PRE= 2050

andREC= 2040 4 8 /6 7


Verb Subcategorisation: Lerner AccuracyVerb Subcategorisation Precision and Recall

F Measure


13/17

Verb Subcategorisation: Lerner Accuracy

for Lerner, precision and recall values were calculated for various

j

this is the table for the tensed clause frame

j TP FP TN FN MC %MC PRE REC

.0312 13 0 30 20 20 32 1.00 .39

.0156 19 0 30 14 14 22 1.00 .58

.0078 22 1 29 11 12 19 .96 .67

.0039 25 1 29 8 9 14 .96 .76

.0020 27 3 27 6 9 14 .90 .82

.0010 29 5 25 4 9 14 .85 .88

.0005 31 8 22 2 10 16 .79 .94.0002 31 13 17 2 15 24 .70 .94

.0001 33 19 11 0 19 30 .63 1.00

MC is total misclassified

49/67

F-Measure

theres typically a trade-off between precision and recall

there are a number of ways of combining them into a single

measure

one is the F-measure, the weighted harmonic mean of the two:

F =2 PRE REC

PRE+ REC

5 0 /6 7

Semantic Similarity

Outline

1 Collocations

Frequency




Fishers Exact Test





4 Register Analysis

5 References

51/67

Semantic Similarity

Semantic Similarity

there are a number of resources that group words together by

semantic relatedness

examples are thesauruses, Wordnet

semantic relations are synonymy, hypernymy, etce.g. dogand caninemight be in a class together; this might be ahyponym of a class corresponding to animal

you might want to automatically derive classes to capture relations

for when you have a new unknown word: e.g. if inSusan had nevereaten a fresh durian beforeyou dont know what kind of thingdurianisif you want types of classes other than the standard ones

5 2 /6 7

Semantic Similarity Latent Semantic Indexing

Latent Semantic Indexing (LSI)Semantic Similarity Latent Semantic Indexing

Example


14/17

Latent Semantic Indexing (LSI)

in LSI, we look at the interaction of terms and documents

the purpose of this interaction is twofold

to have the documents tell us which terms should be groupedtogetherto have the grouped-together terms tell us about the similarity of thedocuments

this interaction is described by a matrix, and the grouping is

carried out by a process called Singular Value Decomposition(SVD)

53/67

Example

say we have 5 terms of interestcosmonaut,astronaut,moon,

car,truckand 6 documents

we describe their interaction by a matrixA, where cellaijcontains

the count of termi in documentj

A=

d1 d2 d3 d4 d5 d6cosmonaut 1 0 1 0 0 0

astronaut 0 1 0 0 0 0moon 1 1 0 0 0 0

car 1 0 0 1 1 0

truck 0 0 0 1 0 1

this can be thought of as a fi ve-dimensional space (defi ned by the

terms) with six objects in that space (the documents)

what we want to do is reduce the dimensions, thus grouping

similar terms

5 4 /6 7


Dimensionality Reduction

there are many possible types of dimensionality reduction

LSI chooses the mapping that means that the reduced dimensions

correspond to the greatest axes of variation

that is, if the new dimensions are numbered 1 . . . k, dimension 1captures the greatest amount of commonality, dimension 2 thesecond greatest, and so on

this process is carried out by the matrix operation called Singular

Value Decomposition

here, the term-by-document matrix A tdis decomposed into three

other matrices

Atd=TtnSnn(Ddn)T

this decomposition is (almost) unique

55/67


Example

T =

Dim 1 Dim 2 Dim 3 Dim 4 Dim 5

cosmonaut 0.44 0.30 0.57 0.58 0.25

astronaut 0.13 0.33 0.59 0.00 0.73moon 0.48 0.51 0.37 0.00 0.61car 0.70 0.35 0.15 0.58 0.16truck 0.26 0.65 0.41 0.58 0.09

consider the columns .. .

5 6 /6 7


ExampleSemantic Similarity Latent Semantic Indexing

Example


15/17

Example

S=

2.16 0.00 0.00 0.00 0.000.00 1.59 0.00 0.00 0.000.00 0.00 1.28 0.00 0.000.00 0.00 0.00 1.00 0.000.00 0.00 0.00 0.00 0.39

this matrix embodies the weight of the dimensions

it always goes largest to smallest

57/67

Example

DT =

d1 d2 d3 d4 d5 d6Dim 1 0.75 0.28 0.20 0.45 0.33 0.12Dim 2 0.29 0.53 0.19 0.63 0.22 0.41Dim 3 0.28 0.75 0.45 0.20 0.12 0.33Dim 4 0.00 0.00 0.58 0.00 0.58 0.58Dim 5 0.53 0.29 0.63 0.19 0.41 0.22

5 8 /6 7


Example

so far weve just transformed the dimensions; now to reduce

for this example, decide to reduce to 2 dimensions

to look at documents, combine this reduced dimensionality with

the weighting of the dimensions

derive new matrixB2d =S22DT2d

B=

d1 d2 d3 d4 d5 d6Dim 1 1.62 0.60 0.44 0.97 0.70 0.26Dim 2 0.46 0.84 0.30 1.00 0.35 0.65

59/67


Conceptually . . .

can imagine that the terms are made up of semantic particles

perhaps along the lines of Wierzbickas semantic primitiveshowever, not defined a priori; only a consequence of the relations inthe given set of documents

LSI rearranges things so that the terms with greatest number of

semantic particles in common are grouped

6 0 /6 7

Register Analysis

OutlineRegister Analysis

Register Analysis


16/17

Outline

1 Collocations

Frequency


The t-Test

Pearsons Chi-Square Test (+ background: Distributions)


Fishers Exact Test





4 Register Analysis

5 References

61/67

Register Analysis

work done by Douglas Biber

these notes from Biber [1993]

idea: different registers have systematic patterns of variation

e.g. professional letters vs academic prosecan do descriptive analyses, based on frequency or proportions ofselected characteristicshowever, may also want to identify groups of characteristics

distinguishing registers

6 2 /6 7

Register Analysis

Descriptive Analysis

example: from Brown corpus, mean frequencies of three

dependent clause types (per 1000 words)

relative causative adverbial that complementr eg is te r c la us es s ub or di na te c la us es c la us espress reports 4.6 0.5 3.4offi cial documents 8.6 0.1 1.6conversations 2.9 3.5 4.1prepared speeches 7.9 1.6 7.6

from this, can see e.g. that relative clauses are common in offi cial

documents and prepared speeches relative to conversation

may be interested in grouping many of these characteristics of text

together

63/67

Register Analysis

Dimension Identification

Biber carried out a quantitative analysis of 67 linguistic features in

the LOB and London-Lund corpora

features included: tense and aspect markers, place and timeadverbials, pronouns and pro-verbs, nominal forms, prepositionalphrases, adjectives, lexical specificity, lexical classes (e.g. hedges,emphatics), mmodals, specialised verb classes, reduced forms anddiscontinuous structures, passives, stative forms, dependentclauses, coordination, and questionsfrequencies were counted, and normalised to per-1000 values

then, FACTOR A NALYSIS was carried out

this is a dimensionality reduction procedure very similar to LSIthe dimensions similarly end up in decreasing order of explanatorypower

6 4 /6 7

Register Analysis

Dimension IdentificationReferences

Outline


17/17

after inspecting the results of the factor analysis, Biber

interpretively labelled the fi rst fi ve dimensions:

1 Informational vs Involved Production2 Narrative vs Nonnarrative Concerns3 Elaborated vs Situation-Dependent Reference4 Overt Expression of Persuasion5 Abstract vs Nonabstract Style

example of features associated with Dimension 1:f un ctio ns lin gu is tic fe atu re s ch ar act er is tic r egi st er sM on ol og ue n ou ns , a dj ec ti ve s i nfo rm at io na l ex po si ti onCareful Production prepositionalphrases e.g. offi cial documentsInformational long words academic proseFacelessI nt eracti ve 1st and 2nd person pronouns conversat ions(Inter)personalFocus questions, reductions (personalletters)I nvol ved st ane ver bs, h ed ges ( publ ic c onve rs atio ns )Personal Stance emphati csOn-Line Production adverbial subordination

65/67

1 Collocations

Frequency


The t-Test

Pearsons Chi-Square Test (+ background: Distributions)


Fishers Exact Test





4 Register Analysis

5 References

6 6 /6 7

References

Douglas Biber. Using Register-Diversifi ed Corpora for General

Language Studies. Computational Linguistics, 19(2):219241, 1993.

Michael Brent. From grammar to lexicon: Unsupervised learning of

lexical syntax. Computational Linguistics, 19(2):243262, 1993.

Ted Dunning. Accurate Methods for the Statistics of Surprise andCoincidence.Computational Linguistics, 19(1):6174, 1993.

Brigitte Krenn and Christer Samuelsson. The Linguists Guide to

Statistics: DON T PANIC. URL

http://coli.uni-sb.de/christer. Version of December 19,

1997, 1997.

Christopher Manning and Hinrich Schutze.Foundations of Statistical

Natural Language Processing. The MIT Press, Cambridge, MA,

USA, 1999.

Ted Pedersen. Fishing for Exactness. InProceedings of the

South-Central SAS Users Group Conference, Austin, TX, USA,1996.

67/67

Statistika za filologe

Documents

Transcript of Statistika za filologe