C ontingency (frequency ) tables

Contingency (frequency) tables

Dependence of two qualitative variables

Examples of problems

• Is survival of a person send to choleric area dependent on the fact whether the person have been vaccinated against cholera or not?

• Is there any connection between hair colour and sex?

• Are parasite species distributed independently?

Contingency table

FACTOR 2

Category 1 Category 2 Category 3 Sum

FACTOR 1 Category 1 f11 f12 f13 R1

Category 2 f21 f22 f23 R2

Sum C1 C2 C3 n

Survived in tropic

yes no sum

yes 100 10 110

Vaccinated no 100 110 210

sum 200 120 320

Species 1

present absent sum

present 100 200 300

Species 2 absent 200 1000 1200

sum 300 1200 1500

Dependence of survival on vaccination

Mutual dependence of two species

Relationship between two categorial variables in table

• in the case, when one from the variables is manipulated

• in the case, when one of the variables is probably a cause and the second one is a consequence (response), but the study is based on non-manipulative observations

• And finally, in the case, when the possible causality is unclear

Basic rules from theory of probability

• Probability of common occurrence of two independent events is Pi,j = Pi . Pj

•Example: In population is a half of its members male gender (Pmale=0.5) and a tenth of all individuals are albino (Palbino =0.1). If albinos are equally common in both sexes (i.e. albinism and sex are independent events), then probability that randomly chosen individual is albino male is Pmale

* Palbino 0.5 * 0.1 = 0.05

Basic rules from theory of probability

• Expected number of successes E(a) from n experiments, where probability of a success is Pa is

•E(a)=Pa . n

•Example: Probability that mutation occurs is 0.02 - in 100 randomly chosen individuals we expect 2 individuals with this mutation

How we compute 2 ?

k

i i

iik

i fff

1

2

1

22

ˆ)ˆ(

OE)-O(

How we obtain expected values?

H0 says – events are independent – so, with help of probability of common occurrence of two independent events.

r

i

c

j ij

ijij

fff

1 1

22

ˆ)ˆ(

Calculation of expected values

FACTOR 2

Category 1 Category 2 Category 3 Sum

FACTOR 1 Category 1 f11 f12 f13 R1

Category 2 f21 f22 f23 R2

Sum C1 C2 C3 n

With help of marginal sums

Pi. = Ri /n P.j = Cj / n Pij=Pi.P.j,

E (fij) = Pij . n = (Ri / n) . (Cj / n) . n = Ri . Cj / n

What I need to know to know result of complete experiment

(given the fixed marginal frequencies?)

df = (c-1) . (r - 1)

number of columnsnumber of rows

Critical value on 5% level of significance by df=3.

What we usually write to our paper

This area is 0.029, so we write 2 =8.99, df=3, P=0.029

Even here is sometimes (when extremely low expected frequencies) used Yates’

correlation

k

i i

ii

f

ff

1

22

ˆ

)5.0ˆ(

better protection against Type I error, but weaker test

Another test criteria, but also with 2 distribution

i j i j

jjiiijij nnCCRRffG lnlnlnln2

i i j

jjii

j

ijij nnCCRRffG loglogloglog60517.4

so-called 2 likelihood ratio (LR)

C ontingency (frequency ) tables

Documents

Transcript of C ontingency (frequency ) tables