Introduction to Statistics - Petra Christian...

Introduction to Statistics

Learning about the Word

Siana Halim

TOPICSTOPICS

I f b t MInferences about Means

Comparing Means

Pairs Samples and Blocks

Comparing Countsp g

References:•De Veaux, Velleman , Bock, Stats, Data and Models, Pearson Addison WesleyInternational Edition, 2005•John A Rice, Mathematical Statistics and Data Analysis, Duxbury Press, 1995

Siana Halim2

J , y , y ,

Inferences about MeansInferences about MeansMotor vehicle crashes are the leading cause of death for leading cause of death for people of every age between 4 and 33 years old.

S di i t ib ti f t Speeding is a contributing factor in 29% of all fatal accidents.

Siwalankerto is busy street that passes through a residential neighborhood. Residents there are concerned that vehicle traveling on Siwalankerto often traveling on Siwalankerto often exceed the posted speed limit of 30Km/hour.

Siana Halim3

Speedp29 29 24

34 34 34

5

434 32 36

28 31 31

30 27 34 Cars 3

29 37 36

38 29 21

#of

2

31 26

We are interested both in estimating the

1

0gtrue mean speed and in testing whether it exceeds the posted speed limit. Speed

363228240

Siana Halim4

A Sampling Distribution for Meansg

A sampling distribution for means

When the conditions are met the standardized sample meanWhen the conditions are met, the standardized sample mean,

nsySE

ySEyt =−

= )(,)(μ

follows a Student’s t-model with n-1 degree of freedom.We also use this model to obtain a P-value for testing the hypothesis

H : μ = μ

nySE )(

H0 : μ μ0

One- sample t-interval

Wh th diti t th fid i t l f l ti When the conditions are met, the confidence interval for population mean, μ is

)(* ySEty n 1−±

Siana Halim5

Sample SizeHow large a sample do we need ? The simple answer is “more”. But more data cost money, effort and time, so how much is enough ?g

Suppose your computer takes 2 hours on average to download a movies. You hear about a program that claims to

So you get the free the evaluation copy and test it by downloading 10 different movies. Of course, the mean downloading time is not hear about a program that claims to

download movies in less than an hour. You’re interested enough to spend $29.95 for it, but only if it really

, gexactly 1 hour as claimed. Observation vary. If the margin error was 8 minutes, though, you’d probably be able to decide whether the $29.95 for it, but only if it really

delivers. So you get the free evaluation spend $29.95 for it, but only if it really delivers.

ysoftware is worth the money. Doubling the sample size would require another 10 hours of testing and reduce your margin of error to a ybit under 6 minutes. You’d need to decide whether that was worth the effort.

Siana Halim6

Comparing Two MeansComparing Two MeansShould we buy generic rather than brand

b i ?name batteries ?

Brand Name Generic

194.0 190.7

205.5 203.5

199.2 203.5

172.4 206.5

184.0 222.5

169.5 209.4

Siana Halim7

Plot the data Comparing the MeansPlot the data Comparing the Means

The population model parameter of interest is the difference between the mean μ μ

22

2121 yVaryVaryySD +=− )()()(

is the difference between the mean, μ1 - μ2

22

2

2

2

2

1

1

nn ⎟⎟

⎠

⎞

⎜⎜

⎝

⎛+

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛=

σσ

22

21

21

2

22

1

21

ssyySE

nn

+=−

+=

)(

σσ

2121 nn

yy )(

Siana Halim8

A mpli di t ib ti f th diff b t t mA sampling distribution for the difference between two meansWhen the conditions are met, the standardized sample difference between the means of two independent groups,

)()()(

21

2121

yySEyy

t−

−−−=

μμ

Can be modeled by a Student’s t-model with a number of degree of freedom found in a special formula :

222 ⎞⎛

222

221

2

22

1

21

11⎟⎟⎞

⎜⎜⎛

+⎟⎟⎞

⎜⎜⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛+

=ss

ns

ns

df

2211 11 ⎟⎠

⎜⎝−⎟

⎠⎜⎝− nnnn

Siana Halim9

Assumptions and ConditionsAssumptions and Conditions1. Independence Assumption:

Randomization condition

10% Condition

2 Normal Population Assumption2. Normal Population Assumption

3. Independent Group Assumption

T th t l t th d th t To use the two-sample t methods, the two groups we are comparing must be independent of each other.

No statistical test can verify this assumption. You have to think about how the data were collected.

Siana Halim10

Two-sample t-intervalWhen the conditions are met, we are ready to find the confidence interval for the difference between means of two independent groups, μ1-μ2. The confidence interval isg p , μ1 μ2

( ) ( ) ( ) 22

21

212121 ns

nsyySEyySEtyy df +=−−±− ,*

The critical value depends on the particular confidence level that you specify and on the number of degrees of freedom, which we get

21 nn

specify and on the number of degrees of freedom, which we get from the sample sizes and a special formula.

Siana Halim11

Testing the Difference between Two MMeans

Price Offered For a Used Camera($)Price Offered For a Used Camera($)

Buying from a friend Buying from a Stranger

275 260

300 250

260 175

300 130300 130

255 200

275 225

290 240

300

Siana Halim12

Two-sample t-test for the difference between the means of two independent groups

The conditions for the two-sample t-test for the difference between the The conditions for the two sample t test for the difference between the means of two independent groups are the same as for the two-sample t-interval. We test the hypothesis

H ΔH0: μ1 - μ2 = Δ0,Where the hypothesized difference is almost always 0, using the statistic

( ) ( ) 22Δ

When the conditions are met and the null hypothesis is true, this statistic

( )( ) ( )

2

22

1

21

2121

021

ns

nsyySE

yySEyy

t +=−−

Δ−−= ,

When the conditions are met and the null hypothesis is true, this statistic can be closely modeled by a Student’s t-model with a number of degree of freedom given by a special formula. We use that model to obtain a P-value.

Siana Halim13

Pooled variance t-test and confidence interval for the difference between the means of two independent groupsbetween the means of two independent groups

The conditions for the two-sample t-test for the difference between the pmeans of two independent groups (commonly called a ”pooled t-test”) are the same as for the two-sample t-test with the additional assumption that the variances of the two groups are the same We test the hypothesisthe variances of the two groups are the same. We test the hypothesis

H0: μ1 - μ2 = Δ0,

Wh h h h d d ff l l 0 h Where the hypothesized difference is almost always 0, using the statistic

( ) ( )22

021 11ssyy pooledpooledΔ−−( )( ) ( )

212121

21

021 11nn

sn

sn

syySE

yySEyy

t pooledpooledpooled

pooledpooled

+=+=−−

Δ= ,

Siana Halim14

Where the pooled variance isWhere the pooled variance is

)()( 11 222

2112 −+−

=snsnS l d

When the conditions are met and the null hypothesis is true, this statistic can be closely modeled by a Student’s t-model with (n1-1)+(n2-1) degree of freedom We use that model

)()( 11 21 −+− nnS pooled

modeled by a Student s t model with (n1 1)+(n2 1) degree of freedom. We use that model to obtain a P-value.

The corresponding confidence interval isThe corresponding confidence interval is

( ) ( )2121 yySEtyy pooleddf −±− *

Where the critical value t* depends on the confidence level and is found with (n1-1)+(n2-1) degree of freedom.

Siana Halim15

Is the Pool All Wet?

So when should you use pooled-t methods rather than l h d ? N !two-sample t method ? Never !

When the variance of the two groups are in fact equal, h h d h h lthe two methods give pretty much the same result.

Pooled methods have a small advantage (slightly C I l h l f l ) l narrower C.I, slightly more powerful tests) mostly

because their d.f formula is usually a bit bigger, but the advantage is slight.g g

When the variance are not equal, the pooled methods are just not valid, and can give poor results.

Siana Halim16

j , g p

Introduction to Statistics - Petra Christian...

Documents

Transcript of Introduction to Statistics - Petra Christian...