C ontingency (frequency ) tables

24
Contingency (frequency) tables Dependence of two qualitative variables

description

C ontingency (frequency ) tables. Dependence of two qualitative variables. Examples of problems. Is survival of a person send to choleric area dependent on the fact whether the person have been vaccinated against cholera or not? Is there any connection between hair colour and sex? - PowerPoint PPT Presentation

Transcript of C ontingency (frequency ) tables

Page 1: C ontingency  (frequency )  tables

Contingency (frequency) tables

Dependence of two qualitative variables

Page 2: C ontingency  (frequency )  tables

Examples of problems

• Is survival of a person send to choleric area dependent on the fact whether the person have been vaccinated against cholera or not?

• Is there any connection between hair colour and sex?

• Are parasite species distributed independently?

Page 3: C ontingency  (frequency )  tables

Contingency table

FACTOR 2

Category 1 Category 2 Category 3 Sum

FACTOR 1 Category 1 f11 f12 f13 R1

Category 2 f21 f22 f23 R2

Sum C1 C2 C3 n

Page 4: C ontingency  (frequency )  tables

Survived in tropic

yes no sum

yes 100 10 110

Vaccinated no 100 110 210

sum 200 120 320

Species 1

present absent sum

present 100 200 300

Species 2 absent 200 1000 1200

sum 300 1200 1500

Dependence of survival on vaccination

Mutual dependence of two species

Page 5: C ontingency  (frequency )  tables

Relationship between two categorial variables in table

• in the case, when one from the variables is manipulated

• in the case, when one of the variables is probably a cause and the second one is a consequence (response), but the study is based on non-manipulative observations

• And finally, in the case, when the possible causality is unclear

Page 6: C ontingency  (frequency )  tables

Basic rules from theory of probability

• Probability of common occurrence of two independent events is Pi,j = Pi . Pj

•Example: In population is a half of its members male gender (Pmale=0.5) and a tenth of all individuals are albino (Palbino =0.1). If albinos are equally common in both sexes (i.e. albinism and sex are independent events), then probability that randomly chosen individual is albino male is Pmale

* Palbino 0.5 * 0.1 = 0.05

Page 7: C ontingency  (frequency )  tables

Basic rules from theory of probability

• Expected number of successes E(a) from n experiments, where probability of a success is Pa is

•E(a)=Pa . n

•Example: Probability that mutation occurs is 0.02 - in 100 randomly chosen individuals we expect 2 individuals with this mutation

Page 8: C ontingency  (frequency )  tables

How we compute 2 ?

k

i i

iik

i fff

1

2

1

22

ˆ)ˆ(

OE)-O(

How we obtain expected values?

H0 says – events are independent – so, with help of probability of common occurrence of two independent events.

r

i

c

j ij

ijij

fff

1 1

22

ˆ)ˆ(

Page 9: C ontingency  (frequency )  tables

Calculation of expected values

FACTOR 2

Category 1 Category 2 Category 3 Sum

FACTOR 1 Category 1 f11 f12 f13 R1

Category 2 f21 f22 f23 R2

Sum C1 C2 C3 n

With help of marginal sums

Pi. = Ri /n P.j = Cj / n Pij=Pi.P.j,

E (fij) = Pij . n = (Ri / n) . (Cj / n) . n = Ri . Cj / n

Page 10: C ontingency  (frequency )  tables

What I need to know to know result of complete experiment

(given the fixed marginal frequencies?)

df = (c-1) . (r - 1)

number of columnsnumber of rows

Page 11: C ontingency  (frequency )  tables

Critical value on 5% level of significance by df=3.

Page 12: C ontingency  (frequency )  tables

What we usually write to our paper

This area is 0.029, so we write 2 =8.99, df=3, P=0.029

Page 13: C ontingency  (frequency )  tables

Even here is sometimes (when extremely low expected frequencies) used Yates’

correlation

k

i i

ii

f

ff

1

22

ˆ

)5.0ˆ(

better protection against Type I error, but weaker test

Page 14: C ontingency  (frequency )  tables

Another test criteria, but also with 2 distribution

i j i j

jjiiijij nnCCRRffG lnlnlnln2

i i j

jjii

j

ijij nnCCRRffG loglogloglog60517.4

so-called 2 likelihood ratio (LR)

Page 15: C ontingency  (frequency )  tables

Similar results

“Normal” 2 =8.99

Page 16: C ontingency  (frequency )  tables

2 by 2 tables

Character 1

present absent sum

present a b m=a+b

Character 2 absent c d n=c+d

sum r=a+c s=b+d N=a+b+c+d

mnrsbcadN

RRCCffffn 2

2121

2211222112 )()(

Notice, that for null hypothesis’ table holds

ad = bc

Page 17: C ontingency  (frequency )  tables

Statistical and causal dependence

• Causal dependence can be proved just due to manipulative experiment

Survived in tropics

yes no sum

yes 100 10 110

Vaccinated no 100 110 210

sum 200 120 320

For “correct” experiment everyone has to be vaccinated, but half of them gets just placebo (compare what is possible and what is demanded by statistics).

Page 18: C ontingency  (frequency )  tables

Fundamentals of experimenter

• Every treatment has to have its control

• Control differs from treatment just in impact, which I want to prove (it is often very difficult)

• I have to have independent replications

Page 19: C ontingency  (frequency )  tables

Advantages of experiment and observation study

• Causality can be proved due to experiment

• Range of experimental manipulations is usually limited

• Almost every experimental impact has side effects, which are sometimes unpredictable

Page 20: C ontingency  (frequency )  tables

Fisher’s exact testHow big is probability, that I get such or more different table in given marginal frequencies (providing that null hypothesis is true, computed with help of combinatorics).

It is used for 2 x 2 table when numbers of observations are low.

Page 21: C ontingency  (frequency )  tables

If I have table

+ - marg. + 5 7 12 - 4 20 24marg. 9 27 36

Than Fisher’s test computes directly probability of this table, and all (from the view of H0) more extreme, i.e.

+ - marg. + 6 6 12 - 3 21 24marg. 9 27 36

+ - marg. + 7 5 12 - 2 22 24marg. 9 27 36

+ - marg. + 8 4 12 - 1 23 24marg. 9 27 36

Sum of all these probabilities is reached level of significance for one-way test (that’s why statistics also prints 2*p)

+ - marg. + 9 3 12 - 0 24 24marg. 9 27 36

Page 22: C ontingency  (frequency )  tables

Let us compare two tables:

Species 1

present absent sum

present 100 200 300

Species 2 absent 200 1000 1200

sum 300 1200 1500

Species 1

present absent sum

present 10 20 30

Species 2 absent 20 100 120

sum 30 120 150

2 and power of test grow with number of observations - hereat both tables are choice from one population in great probability

Page 23: C ontingency  (frequency )  tables

Measurements of association stregth in 2 x 2 table –

independent on sample sizeY = ad/bc =f11f22 / f21f12 - disadvantage - asymmetric: 0 for negative association, 1 for independence, to + infinity for positive association

from -1 over 0 for independence to + 1; -1 and + 1 (maximal possible association for given values of marg. frequencies)

nRRCC

ffffV

2

2121

21122211 )( from -1 over 0 for independence to + 1; -1 and + 1 (maximal possible association for any values of marg. frequenies)

Page 24: C ontingency  (frequency )  tables

Multidimensional frequency tables

Nowadays generalized linear models are used in these cases.

Years

Species A

Species B

present

present

absent

absent