Advanced Quantitative Methods: Hypothesis testing

93
Probability distributions Fundamentals t - and F -tests Advanced Quantitative Methods: Hypothesis testing Johan A. Elkink University College Dublin 12 February 2013 Johan A. Elkink hypothesis testing

Transcript of Advanced Quantitative Methods: Hypothesis testing

Page 1: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Advanced Quantitative Methods:

Hypothesis testing

Johan A. Elkink

University College Dublin

12 February 2013

Johan A. Elkink hypothesis testing

Page 2: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

1 Probability distributions

2 Fundamentals

3 t- and F -tests

Johan A. Elkink hypothesis testing

Page 3: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Outline

1 Probability distributions

2 Fundamentals

3 t- and F -tests

Johan A. Elkink hypothesis testing

Page 4: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Bernoulli trial

“An experiment in which s trials are made of an event, withprobability p of success in any given trial.”

(Weisstein, Eric W. “Bernoulli Trial.” http://mathworld.wolfram.com/BernoulliTrial.html)

Johan A. Elkink hypothesis testing

Page 5: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Binomial distribution

“The (...) probability distribution (...) of obtaining exactly nsuccesses out of N Bernoulli trials.”

Johan A. Elkink hypothesis testing

Page 6: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Binomial distribution

“The (...) probability distribution (...) of obtaining exactly nsuccesses out of N Bernoulli trials.”

P(n|N) =N!

n!(N − n)!pn(1− p)N−n

(Weisstein, Eric W. “Binomial Distribution.” http://mathworld.wolfram.com/BinomialDistribution.html)

Johan A. Elkink hypothesis testing

Page 7: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Binomial distribution

P(n|N) =N!

n!(N − n)!pn(1− p)N−n

limN→∞

p(n) =1√

2πNpqe−

(n−Np)2

2Npq

=1√2πσ2

e−(n−n)2

2σ2

Johan A. Elkink hypothesis testing

Page 8: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Binomial distribution

P(n|N) =N!

n!(N − n)!pn(1− p)N−n

limN→∞

p(n) =1√

2πNpqe−

(n−Np)2

2Npq

=1√2πσ2

e−(n−n)2

2σ2

i.e. the limiting distribution of the binomial distribution is thenormal distribution, with σ2 ≡ Npq.

(Weisstein, Eric W. “Binomial distribution.” http://mathworld.wolfram.com/binomialdistribution.html)

Johan A. Elkink hypothesis testing

Page 9: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Normal distribution

Also called Gaussian distribution

Johan A. Elkink hypothesis testing

Page 10: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

Normal distribution

Also called Gaussian distribution, but Gauss did not invent it.

(Davidson & MacKinnon 1999: 130-135)

Johan A. Elkink hypothesis testing

Page 11: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

χ2-distribution

The sum of squares of r independent standard normaldistributions, is distributed chi-squared with r degrees of freedom,i.e. if x ∼ N(0, 1), then:

r∑

i

x2i ∼ χ2(r)

(Weisstein, Eric W. “Chi-squared distribution.” http://mathworld.wolfram.com/chi-squareddistribution.html)

Johan A. Elkink hypothesis testing

Page 12: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

χ2-distribution

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

1.2

X^2(1)X^2(2)X^2(3)X^2(5)

Johan A. Elkink hypothesis testing

Page 13: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

F -distribution

If

x ∼ χ2(m)

y ∼ χ2(n)

with x , y independent

Johan A. Elkink hypothesis testing

Page 14: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

F -distribution

If

x ∼ χ2(m)

y ∼ χ2(n)

with x , y independent, then

x/m

y/n∼ F (m, n)

has an F -distribution with m and n degrees of freedom.

Johan A. Elkink hypothesis testing

Page 15: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

F -distribution

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

F(1,1)F(1,10)F(10,1)F(10,10)

Johan A. Elkink hypothesis testing

Page 16: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

If

x ∼ N(0, 1)

y ∼ χ2(r)

with x , y independent

Johan A. Elkink hypothesis testing

Page 17: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

If

x ∼ N(0, 1)

y ∼ χ2(r)

with x , y independent, then

x√

y/r∼ t(r)

has a t-distribution with r degrees of freedom.

Johan A. Elkink hypothesis testing

Page 18: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

Imagine we have a sample of size n and we calculate the samplemean of x :

x =1

n

n∑

i

xi

with variance estimator

σ2 =1

n − 1

n∑

i

(xi − x)2

Johan A. Elkink hypothesis testing

Page 19: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

Imagine we have a sample of size n and we calculate the samplemean of x :

x =1

n

n∑

i

xi

with variance estimator

σ2 =1

n − 1

n∑

i

(xi − x)2

then σ2 ∼ χ2(n − 1) (because σ2 is a sum of squares and the useof deviations from the mean removes one degree of freedom).

Johan A. Elkink hypothesis testing

Page 20: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

x is a sum of normally distributed values, so is itself normallydistributed; σ2 has a χ2 distribution, so

t(n) =x − µ√

σ2/n

has a t-distribution with n − 1 degrees of freedom.

Johan A. Elkink hypothesis testing

Page 21: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

x is a sum of normally distributed values, so is itself normallydistributed; σ2 has a χ2 distribution, so

t(n) =x − µ√

σ2/n

has a t-distribution with n − 1 degrees of freedom.

As n increases, the t-distribution approaches a normal distribution.The t-distribution is the approximation for the normal distributionwhen n is small and σ2 unknown.

Johan A. Elkink hypothesis testing

Page 22: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

x is a sum of normally distributed values, so is itself normallydistributed; σ2 has a χ2 distribution, so

t(n) =x − µ√

σ2/n

has a t-distribution with n − 1 degrees of freedom.

As n increases, the t-distribution approaches a normal distribution.The t-distribution is the approximation for the normal distributionwhen n is small and σ2 unknown.

(The t(1) distribution is also called the Cauchy distribution.)

Johan A. Elkink hypothesis testing

Page 23: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Family of distributions

t-distribution

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

N(0,1)t(1)t(2)t(5)

Johan A. Elkink hypothesis testing

Page 24: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Outline

1 Probability distributions

2 Fundamentals

3 t- and F -tests

Johan A. Elkink hypothesis testing

Page 25: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

i.i.d.

We make three assumptions about our data to proceed:

The observations are independent

The observations are identically distributed

The population has a finite mean and a finite variance

A variable for which the first two assumptions hold is called iid.

Johan A. Elkink hypothesis testing

Page 26: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Independent observations

Intuitively: the value for one case does not affect the value foranother case on the same variable.More formally: P(x1 ∩ x2) = P(x1)P(x2).

Johan A. Elkink hypothesis testing

Page 27: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Independent observations

Intuitively: the value for one case does not affect the value foranother case on the same variable.More formally: P(x1 ∩ x2) = P(x1)P(x2).

Examples of dependent observations:

grades of students in different classes;

stock values over time;

economic growth in neighbouring countries.

Johan A. Elkink hypothesis testing

Page 28: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Identically distributed

All the observations are drawn from the same random variable

with the same probability distribution.

Johan A. Elkink hypothesis testing

Page 29: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Identically distributed

All the observations are drawn from the same random variable

with the same probability distribution.

An example where this is not the case would generally be panel data. E.g.

larger firms will have larger variations in profits, thus their variance differs, thus

these are not observations from the same probability distribution.

Johan A. Elkink hypothesis testing

Page 30: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Random sample

A proper random sample is i.i.d.

The law of large numbers and the Central Limit Theorem help usto predict the behaviour of our sample data.

Johan A. Elkink hypothesis testing

Page 31: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Law of large numbers

The law of large numbers (LLN) states that, if these threeassumptions are satisfied, the sample mean will approach thepopulation mean with probability one if the sample is infinitelylarge.

Johan A. Elkink hypothesis testing

Page 32: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Central Limit Theorem

If these three assumptions are satisfied,

Johan A. Elkink hypothesis testing

Page 33: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Central Limit Theorem

If these three assumptions are satisfied,

The sample mean is normally distributed, regardless of thedistribution of the original variable.

Johan A. Elkink hypothesis testing

Page 34: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Central Limit Theorem

If these three assumptions are satisfied,

The sample mean is normally distributed, regardless of thedistribution of the original variable.

The sample mean has the same expected value as thepopulation mean (LLN).

Johan A. Elkink hypothesis testing

Page 35: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Central Limit Theorem

If these three assumptions are satisfied,

The sample mean is normally distributed, regardless of thedistribution of the original variable.

The sample mean has the same expected value as thepopulation mean (LLN).

The standard deviation (standard error) of the sample meanis: S .E .(x) = σx = σx√

n.

Johan A. Elkink hypothesis testing

Page 36: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Sample and population size

Note that the standard error depends only on the sample size, noton the population size.

Johan A. Elkink hypothesis testing

Page 37: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Central Limit Theorem: unknown σ

When the population variance, σ, is unknown, we can use thesample estimate:

σx =σx√n

σx =

∑ni=1(xi − x)2

n− 1

Johan A. Elkink hypothesis testing

Page 38: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Type I and II errors

Type I error: rejecting a null hypothesis that is true

Johan A. Elkink hypothesis testing

Page 39: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Type I and II errors

Type I error: rejecting a null hypothesis that is true (e.g. α = .05)

Johan A. Elkink hypothesis testing

Page 40: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Type I and II errors

Type I error: rejecting a null hypothesis that is true (e.g. α = .05)

Type II error: not rejecting a null hypothesis that is false

(Davidson & MacKinnon 1999: 126)

Johan A. Elkink hypothesis testing

Page 41: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Power

Power: probability of rejecting a hypothesis that is false

1− P(Type II error)

Johan A. Elkink hypothesis testing

Page 42: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Power

Power: probability of rejecting a hypothesis that is false

1− P(Type II error)

The power of a test increases when:

the true value is further from the null hypothesis value;

the variance is lower;

the sample size is larger.

(Davidson & MacKinnon 1999: 126)

Johan A. Elkink hypothesis testing

Page 43: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

p-value

The p-value is the probability of a Type I error when rejecting thenull hypothesis.

Johan A. Elkink hypothesis testing

Page 44: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

p-value

The p-value is the probability of a Type I error when rejecting thenull hypothesis.

You can say a test is “statistically significant” if p < α, but thep-value contains more information by itself.

(Davidson & MacKinnon 1999: 128)

Johan A. Elkink hypothesis testing

Page 45: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

α = .05

Note that his value is absolutely arbitrary and just habit since thepublication of Fisher (1923).

Johan A. Elkink hypothesis testing

Page 46: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

α = .05

Note that his value is absolutely arbitrary and just habit since thepublication of Fisher (1923).

The p-value is, one could argue, just a complicated way ofmeasuring the sample size.

Johan A. Elkink hypothesis testing

Page 47: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

α = .05

Note that his value is absolutely arbitrary and just habit since thepublication of Fisher (1923).

The p-value is, one could argue, just a complicated way ofmeasuring the sample size.

“Another interesting example (...) is the propensity for publishedstudies to contain a disproportionally large number of Type I errors;studies with statistically significant results tend to get published,whereas those with insignificant results do not.” (Kennedy 2008: 61)

Johan A. Elkink hypothesis testing

Page 48: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

α = .05

(Gerber & Malhotra 2006) Johan A. Elkink hypothesis testing

Page 49: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

Central Limit Theoremp-values

Confidence intervals

Instead of an arbitrary threshold it is often more illuminating topresent confidence intervals or graphical presentations of levels ofuncertainty.

Johan A. Elkink hypothesis testing

Page 50: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Outline

1 Probability distributions

2 Fundamentals

3 t- and F -tests

Johan A. Elkink hypothesis testing

Page 51: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

t-test

h0 : β = 0

h1 : β 6= 0

Johan A. Elkink hypothesis testing

Page 52: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

t-test

h0 : β = 0

h1 : β 6= 0

We can calculate the t-value by subtracting the value under thenull and dividing by the standard error:

t =β

σβ

Johan A. Elkink hypothesis testing

Page 53: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

t-test

h0 : β = 0

h1 : β 6= 0

We can calculate the t-value by subtracting the value under thenull and dividing by the standard error:

t =β

σβ

Because β has a normal distribution and σ2βa χ2-distribution, t

has the t-distribution with n− k degrees of freedom.Johan A. Elkink hypothesis testing

Page 54: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

t-test

bhat <- solve(t(x) %*% x) %*% t(x) %*% y

e <- y - x %*% bhat

vhat <- (1/(n-k) * t(e) %*% e) %x% solve(t(x) %*% x)

se <- sqrt(diag(vhat))

p <- 2 * (1 - pt(abs(bhat / se), n-k))

cbind(bhat, se, p)

abs() absolute value2 * because it is a two-tailed test

Johan A. Elkink hypothesis testing

Page 55: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

t-test

h0 : β = 0 h1 : β 6= 0 t =β

σβ

h0 : β = a h1 : β 6= a t =β − a

σβ

h0 : β3 = β2 h1 : β3 6= β2 t =β3 − β2σβ3

Johan A. Elkink hypothesis testing

Page 56: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Sums of squares

SST Total sum of squares∑

(yi − y)2

SSE Explained sum of squares∑

(yi − y)2

SSR Residual sum of squares∑

e2i =∑

(yi − yi )2 = e′e

The key to remember is that SST = SSE + SSR

Sometimes instead of “explained” and “residual”, “regression” and“error” are used, respectively, so that the abbreviations areswapped (!).

Johan A. Elkink hypothesis testing

Page 57: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Anova

Variation SS df MS

Explained∑

(yi − y)2 k − 1 SSE/(k − 1)Residual

(yi − yi)2 n− k SSR/(n− k)

Total∑

(yi − y)2 n− 1

Johan A. Elkink hypothesis testing

Page 58: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Anova

Variation SS df MS

Explained∑

(yi − y)2 k − 1 SSE/(k − 1)Residual

(yi − yi)2 n− k SSR/(n− k)

Total∑

(yi − y)2 n− 1

SSE/(k − 1)

SSR/(n− k)∼ Fk−1,n−k

Johan A. Elkink hypothesis testing

Page 59: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

F -test

ssr <- t(e) %*% e

yhat <- X %*% bhat

sse <- t(yhat - mean(y)) %*% (yhat - mean(y))

F <- (sse/(k-1))/(ssr/(n-k))

1 - pf(F, k-1, n-k)

Johan A. Elkink hypothesis testing

Page 60: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Restrictions

H0 : y = X(1)n×kβ1 + ε

H1 : y = X(1)n×k

β1 + X(2)n×rβ2 + ε

USSR Unrestricted sum of squared residualsRSSR Restricted sum of squared residuals

Johan A. Elkink hypothesis testing

Page 61: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Restrictions

H0 : y = X(1)n×kβ1 + ε

H1 : y = X(1)n×k

β1 + X(2)n×rβ2 + ε

USSR Unrestricted sum of squared residualsRSSR Restricted sum of squared residuals

Fβ2=

(RSSR − USSR)/r

USSR/(n− k − r)∼ F (r , n − k − r)

Johan A. Elkink hypothesis testing

Page 62: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Chow test

You might have a situation where you want to know whether thecoefficients differ for two groups in the sample.

Johan A. Elkink hypothesis testing

Page 63: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Chow test

You might have a situation where you want to know whether thecoefficients differ for two groups in the sample.

You could test this with the following regression:

[

y1y2

]

=

[

X1

X2

]

β +

[

0

X2

]

γ + ε,

whereby the difference between the coefficients for the two groupsare captured by γ.

Johan A. Elkink hypothesis testing

Page 64: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Chow test

It turns out you can just run two separate regressions(“unrestricted”), for the two groups, as well as (“restricted”)regression y on X, so that we get:

Fγ =(RSSR − SSR1 − SSR2)/k

(SSR1 + SSR2)/(n − 2k)∼ F (k , n − 2k)

(Davidson & MacKinnon 1999: 146-147)

Johan A. Elkink hypothesis testing

Page 65: Advanced Quantitative Methods: Hypothesis testing

Probability distributionsFundamentalst- and F -tests

t-testf -test

Exercise: US wages

Open the uswages.dta data set and regress log(wage) on educ,exper and race.

1 Interpret all test results in the standard output.

2 Perform a test evaluating whether education and experiencejointly contribute.

3 Perform a Chow test to see if the regression is different forurbanised versus rural respondents (smsa).

4 (bonus) Repeat the t- and F -tests using matrix algebra.

Johan A. Elkink hypothesis testing

Page 66: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Appendix

Johan A. Elkink hypothesis testing

Page 67: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Outline

4 Probability distributions in R

5 Likelihood tests

Johan A. Elkink hypothesis testing

Page 68: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R

You know x and want to know the density at that point: d

Johan A. Elkink hypothesis testing

Page 69: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R

You know x and want to know the density at that point: d

You know x and want to know the area up to that point: p

Johan A. Elkink hypothesis testing

Page 70: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R

You know x and want to know the density at that point: d

You know x and want to know the area up to that point: p

You know x and want to know the area beyond that point:1-p

Johan A. Elkink hypothesis testing

Page 71: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R

You know x and want to know the density at that point: d

You know x and want to know the area up to that point: p

You know x and want to know the area beyond that point:1-p

You know the area and want to know the x value: q

Johan A. Elkink hypothesis testing

Page 72: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R

You know x and want to know the density at that point: d

You know x and want to know the area up to that point: p

You know x and want to know the area beyond that point:1-p

You know the area and want to know the x value: q

You want random numbers drawn from that distribution: r

Johan A. Elkink hypothesis testing

Page 73: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R: dnorm

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

y

x <- seq(-3,3,.01)

y <- dnorm(x)

Johan A. Elkink hypothesis testing

Page 74: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R: dnorm

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

y

dnorm(−1) = .24

> dnorm(-1)

[1] 0.2419707

Johan A. Elkink hypothesis testing

Page 75: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R: pnorm

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

y

pnorm(−1) = .16

> pnorm(-1)

[1] 0.1586553

Johan A. Elkink hypothesis testing

Page 76: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R: pnorm

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

y

pnorm(−1) = .16

> 1 - pnorm(-1)

[1] 0.8413447

Johan A. Elkink hypothesis testing

Page 77: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R: pnorm

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

y

pnorm(−1) = .16

> qnorm(.1586553)

[1] -0.9999998

Johan A. Elkink hypothesis testing

Page 78: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R: rnorm

Histogram of x.random

rnorm(100)

Den

sity

−3 −2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

x.random <- rnorm(100)

Johan A. Elkink hypothesis testing

Page 79: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R: rnorm

Histogram of x.random

rnorm(100)

Den

sity

−3 −2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

hist(x.random, freq=false)

Johan A. Elkink hypothesis testing

Page 80: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R

These functions work on many distributions:

rnorm() random from normal distributionpchisq() get area under χ2-distribution1-pf() get area under F -distributionrbinom() draw randomly from binomial distribution1-dt() get p-value from t-distribution (one-tailed)

Johan A. Elkink hypothesis testing

Page 81: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Probability distributions in R

Twenty throws (“Bernoulli trials”) with a coin:

> x <- rbinom(20,1,.5)

> factor(x, labels=c("head","tails"))

[1] head head head tails head tails tails head

[9] tails head head head tails tails tails head

[17] tails tails tails head

Johan A. Elkink hypothesis testing

Page 82: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Outline

4 Probability distributions in R

5 Likelihood tests

Johan A. Elkink hypothesis testing

Page 83: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Likelihood

“The likelihood that any parameter (or set of parameters) shouldhave any assigned value (or set of values) is proportional to theprobability that if this were so, the totality of observations shouldbe that observed.”

(Fisher 1922, 310)

Johan A. Elkink hypothesis testing

Page 84: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Maximum Likelihood

The likelihood is proportional to the probability of observing thedata you have (give or take some arbitrarily small deviation), givensome parameter estimate:

L(β|y,X) = αP(y,X|β)

Johan A. Elkink hypothesis testing

Page 85: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Maximum Likelihood

The likelihood is proportional to the probability of observing thedata you have (give or take some arbitrarily small deviation), givensome parameter estimate:

L(β|y,X) = αP(y,X|β)

The likelihood function is thus proportional to the probability

density function of the given sample, as a function of theparameter values.

Johan A. Elkink hypothesis testing

Page 86: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Maximum Likelihood

The likelihood is proportional to the probability of observing thedata you have (give or take some arbitrarily small deviation), givensome parameter estimate:

L(β|y,X) = αP(y,X|β)

The likelihood function is thus proportional to the probability

density function of the given sample, as a function of theparameter values.

The estimator that maximizes this function also maximizes thisprobability density function and is the Maximum Likelihood

Estimator (MLE or ML).

(Fisher 1922)

Johan A. Elkink hypothesis testing

Page 87: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Three tests

Three tests are based on this likelihood:

Likelihood Ratio (LR) test:

H0 : 2 logLULR

= 2(log LU − log LR) = 0

Johan A. Elkink hypothesis testing

Page 88: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Three tests

Three tests are based on this likelihood:

Likelihood Ratio (LR) test:

H0 : 2 logLULR

= 2(log LU − log LR) = 0

Wald (W) test:

H0 :(βMLE − β∗)2

var(βMLE )= 0

Johan A. Elkink hypothesis testing

Page 89: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Three tests

Three tests are based on this likelihood:

Likelihood Ratio (LR) test:

H0 : 2 logLULR

= 2(log LU − log LR) = 0

Wald (W) test:

H0 :(βMLE − β∗)2

var(βMLE )= 0

Langrange Multiplier (LM) test:

H0 :∂lnL

∂β= 0

Johan A. Elkink hypothesis testing

Page 90: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Three tests

(Kennedy 2008: 57) Johan A. Elkink hypothesis testing

Page 91: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Three tests

Asymptotically, LR, W, and LM are all χ2-distributed.

Johan A. Elkink hypothesis testing

Page 92: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Three tests

Asymptotically, LR, W, and LM are all χ2-distributed.

In small samples, W ≥ LR ≥ LM.

Johan A. Elkink hypothesis testing

Page 93: Advanced Quantitative Methods: Hypothesis testing

Probability distributions in RLikelihood tests

Three tests

Asymptotically, LR, W, and LM are all χ2-distributed.

In small samples, W ≥ LR ≥ LM.

We return to these tests and how to do them in R later in thecourse.

(Kennedy 2008: 64)

Johan A. Elkink hypothesis testing