Assignment #3
Chapter 5: 28, 36, 37 Chapter 6: 16, 18, 19 Due tomorrow Oct. 9th by 2pm in your TA’s homework box
Assignment #4
Chapter 7: 21, 22, 28 Due next Friday Oct. 16th by 2pm in your TA’s homework box
Reading
For Today: Chapter 8 For Tuesday: Chapter 9
Labs Lab 5 is no longer required Tuesday, Thursday and Friday Labs: No labs the week of Oct. 19th (Midterm week) Monday labs: No lab on Oct. 12th (Thanksgiving) Wednesday labs: No lab on Nov. 11th (Remembrance Day)
Second part of Chapter 7 Review
Binomial distribution
Probability of obtaining X left-handed flowers out of n = 27 randomly sampled, if the proportion of left-handed flowers in the population is 0.25!
Binomial test
The binomial test uses data to test whether a population proportion p matches a null expectation for the proportion.
H0: The relative frequency of successes in the population is p0 .
HA: The relative frequency of successes in the population is not p0 .
Binomial test Null distribution can be calculated using the binomial formula for each possible value of X, where n = your sample size, p = p0 and X = number of successes.
€
Pr[X ] =nX"
# $
%
& ' pX 1− p( )n−X
P = The probability of the number of successes in your sample (X) plus the probabilities of any equally or more extreme values of X. If P<0.05 we reject H0
Estimating Proportions: Proportion of successes in a sample
p̂ = Xn
The hat (^) shows that !this is an estimate of p.!
p is the true population proportion!
Standard error of the estimate of a proportion is the standard
deviation of the sampling distribution
σ ρ̂ =p 1− p( )
n
We usually don’t know p so we estimate the standard error
with
SEp̂ =p̂ 1− p̂( )
n
p̂
95% confidence interval for a proportion
€
" p =X + 2n + 4
€
" p −1.96" p 1− " p ( )n + 4
$
% &
'
( ) ≤ p ≤ " p +1.96
" p 1− " p ( )n + 4
$
% &
'
( )
This is the Agresti-Coull confidence interval!
Fitting probability models to frequency data
Probability Model
A probability distribution that represents how we think a natural process works
Example of a probability model: The proportional model
Simple probability model in which the frequency of occurrence of events is
proportional to the number of opportunities.
Example of expectations from the proportional model
Day of the week of 350 births Observed Frequencies
Day of the week of 350 births Expected Frequencies
Day Number of days in 1999
Proportion of days in
1999
Expected frequency of births
Sun 52 52/365 49.863 Mon 52 52/365 49.863 Tues 52 52/365 49.863 Wed 52 52/365 49.863 Thurs 52 52/365 49.863
Fri 53 53/365 50.822 Sat 52 52/365 49.863
Sum 365 1 350
Goodness-of-fit tests Compare an observed frequency distribution with frequency distribution expected under simple probability model Binomial Test: Limited to categorical variables with only two possible outcomes χ2 Test: Can handle categorical and discrete numerical variables having more than two outcomes
χ2 Goodness-of-fit test
Uses a test statistic called χ2 to measure the discrepancy between an observed discrete frequency distribution and the frequencies expected under a simple probability model serving as the null
hypothesis.
Hypotheses for χ2 test
H0: The data come from a particular discrete probability distribution. HA: The data do not come from that
distribution.
Test statistic for χ2 test
€
χ 2 =Observedi − Expectedi( )2
Expectediall classes∑
The month of birth for 1245 NHL players
Month Number of players
January 133February 125March 114April 119May 119June 123July 96August 91September 83October 84November 73December 85
Data from http://www.nhl.com/players/search/all.html in 2006
Hypotheses for birth month example
H0: The probability of a NHL birth occurring on any given month is equal to national proportions. HA: The probability of a NHL birth occurring on any given month is not equal to national proportions.
Month Number ofplayers
Expected(%)
January 133 7.94February 125 7.63March 114 8.72April 119 8.63May 119 8.95June 123 8.57July 96 8.76August 91 8.5September 83 8.54October 84 8.19November 73 7.70December 85 7.86Total 1245 100
NHL compared to all Canadians
Computing Expected values Month Number of
playersExpected(%)
Expected(of 1245)
January 133 7.94 99February 125 7.63 95March 114 8.72 109April 119 8.63 107May 119 8.95 111June 123 8.57 107July 96 8.76 109August 91 8.5 106September 83 8.54 106October 84 8.19 102November 73 7.70 96December 85 7.86 98Total 1245 100% 1245
Note: For simplicity, we have rounded the expected column to integers. In any real calculation, we would keep a couple decimal places.
The calculation for January
€
Observed − Expected( )2
Expected=133− 99( )2
99=115699
Calculating χ2
€
χ 2 =Observedi − Expectedi( )2
Expectediall classes∑
=115699
+90095
+25109
+144107
+64111
+256107
+
169109
+225106
+529106
+324102
+52996
+16998
= 44.77
The sampling distribution of χ2 (null distribution) by simulation
Sampling distribution of χ2 (null distribution) by the χ2 distribution
Degrees of freedom The number of degrees of freedom of a test specifies which of a family of distributions to use.
Degrees of freedom for χ2 test
df = (Number of categories)
– (Number of parameters estimated from the data)
– 1
Degrees of freedom for NHL month of birth
df = 12 - 0 - 1 = 11
Finding the P-value
Critical value
The value of the test statistic where P = α.
Table A - χ2 distribution
The 5% critical value
P<0.05, so we can reject the null hypothesis NHL players are not born in the same proportions per month as the population at large.
χ2 test as approximation of binomial test
• χ2 goodness-of-fit test works even when there are only two categories, so it can be used as a substitute for the binomial test.
• Very useful if the number of data points is large. – Imagine if, in our red/blue wrestler example, rather
than 16/20 wins by red, we had 1600/2000 wins by red. Imagine calculating:
– And then imagine calculating:
Pr[1600]= 2000!1600!400!
0.516000.5400
P = 2*(Pr[1600]+Pr[1601]+...+Pr[2000])
The experiment and the results
• Animals use red as a sign of aggression
• Does red influence the outcome of wrestling, taekwondo, and boxing?
– 16 of 20 rounds had more red-shirted than blue-shirted winners in these sports in the 2004 Olympics
– Shirt color was randomly assigned
Hill, RA, and RA Burton 2005. Red enhances human performance in contests Nature 435:293.
Stating the hypotheses
H0: Red- and blue-shirted athletes are equally likely to win (proportion = 0.5).
HA: Red- and blue-shirted athletes
are not equally likely to win (proportion ≠ 0.5).
χ2 test as approximation of binomial test
Shirt color of winners Observed Expected Red (success) 16 10 Blue (failure) 4 10
Sum 20 20
χ 2 =Observedi −Expectedi( )2
Expectediall classes∑
=3610
+3610
= 7.2
χ0.05,12
= 3.843.84 < 7.2 < 7.88 0.01> P > 0.005
χ0.005,12
= 7.88
Fitting the binomial distribution is different than the binomial test
Binomial test - uses data to test whether the proportion of successes in one set of trials matches a null expectation of the proportion of successes Fitting the binomial distribution - uses data to test whether the observed distribution of the proportion of successes in multiple sets of trials matches a null expectation of the the binomial distribution
Assumptions of χ2 test
• No more than 20% of categories have Expected<5
• No category with Expected ≤ 1
Fitting other distributions: the Poisson distribution
The Poisson distribution describes the probability that a certain number of events occur in a block of time or space, when those events happen independently of each other and occur with equal probability at every point in time or space.
Poisson distribution
€
Pr X[ ] =e−µ µ X
X!
Example: Number of goals per side in World
Cup Soccer
Q: Is the outcome of a soccer game (at this level) random?
In other words, is the number of goals per team distributed as expected by pure chance?
Hypotheses
• H0: Number of goals per side follows a Poisson distribution.
• HA: Number of goals per side does not follow a Poisson distribution.
World Cup 2002 scores
Number of goals for a team (World Cup 2002)
What’s the mean, µ?
€
x =37 0( ) + 47 1( ) + 27 2( ) +13 3( ) + 2 4( ) +1 5( ) +1 8( )
128
=161128
=1.26
Poisson with µ = 1.26
Example:
€
Pr 2[ ] =e−µµX
X!=e−1.26 1.26( )2
2!=0.284( )1.59
2= 0.225
Poisson with µ = 1.26 X Pr[X] 0 0.284 1 0.357 2 0.225 3 0.095 4 0.030 5 0.008 6 0.002 7 0 ≥8 0
Finding the Expected X Pr[X] Expected 0 0.284 36.3 1 0.357 45.7 2 0.225 28.8 3 0.095 12.1 4 0.030 3.8 5 0.008 1.0 6 0.002 0.2 7 0 0.04 ≥8 0 0.007
} Too small!
Calculating χ2 X Expected Observed 0 36.3 37 0.013 1 45.7 47 0.037 2 28.8 27 0.113 3 12.1 13 0.067 ≥ 4 5.0 4 0.200
€
Observedi − Expectedi( )2
Expectedi
€
χ 2 =Observedi − Expectedi( )2
Expectediall classes∑ = 0.429
Degrees of freedom
df = (Number of categories)
– (Number of parameters estimated from the data)
– 1
= 5 – 1 – 1 = 3
Critical value
Comparing χ2 to the critical value
€
χ 2 = 0.429χ32 = 7.81
0.429 < 7.81
So we cannot reject the null hypothesis. There is no evidence that the score of a World Cup Soccer game is not Poisson distributed.
World Cup 2002 scores
Poisson distribution
Top Related