Probability & Statistics in Engineering ++-- -2 +2 +3 -3 0909.400.01 / 0909.400.02 Dr. P.’s...

34
Probability & Statistics in Engineering + - -2 +2 +3 -3 0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in indicated otherwise, all cartoons from rtoon Guide to Statistics by L. Gonick and W. Smith Harper Resource Hypothesis Testing Hypothesis Testing Lecture 7
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    4

Transcript of Probability & Statistics in Engineering ++-- -2 +2 +3 -3 0909.400.01 / 0909.400.02 Dr. P.’s...

Page 1: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

Probability & Statistics in Engineering

+--2 +2 +3-3

0909.400.01 / 0909.400.02

Dr. P.’sClinic Consultant Module in

Unless indicated otherwise, all cartoons fromThe Cartoon Guide to Statistics by L. Gonick and W. Smith1993, Harper Resource

Hypothesis TestingHypothesis TestingHypothesis TestingHypothesis Testing

Lecture 7

Page 2: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Today in P&SToday in P&S

Review: Confidence intervals for proportion of success, large sample,

small sample mean Hypothesis testing

Null hypothesis vs. alternative hypothesis A statistician’s cherished values: The -value, the β value, the

p-value and all that jazz… We find the defendant guilty of committing a type II error…,

your honor!• Type I and type II error in hypothesis testing

After the exam – next week: Tests of Hypotheses Large sample significance tests for proportions Large sample tests for population mean Small sample tests for population mean

Page 3: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Confidence IntervalsConfidence Intervals

One of the most important areas where we use statistics is in making decisions from incomplete data. Often times we wish to make an inference about a population, such as average

shelf-life of a product, average BP of a patient after a specific treatment, percent defect rate of a product, average test scores, expected election results, etc. by only looking the data available from a small sample.

The problem is, there is uncertainty in our sample, since it is a random selection from a population whose statistical properties are unknown to us.

ENGINEERS

ROWAN ENGINEER’S NIGHT OUT

Page 4: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Confidence Intervals Confidence Intervals

We have seen that often times we simply use the sample mean or the probability of success for most of our inferences. But how good is our estimate? Confidence intervals allow us to precisely quantify the uncertainty in the data.

Page 5: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

For proportion of successes, the 100(1-)% confidence interval in which the true probability of success p will lie is

Recall that for - say 95% confidence level - =0.05, and the above CI is computed by finding the critical z- values ±z/2 , corresponding to /2, such that the true (unknown) value has a 95% probability of lying between the two limits.

For sample means, computed from large sample sizes, the 100(1- )% confidence interval in which the true mean μ lies is

where s is the std.dev of sample mean. Critical z-values can easily be computed from tables or from Matlab. Most commonly used

values are

Confidence Intervals Confidence Intervals

/2 /21-

-z/2 0 +z/2

n

ppzp

n

szpzpp p

1ˆˆˆ 2/2/ˆ2/

n

szx 2/

Confidence Level (%) 99.73 99 98 96 95.45 95 90 80 68.27 50

Critical Value z/23.00 2.58 2.33 2.05 2.00 1.96 1.645 1.28 1.00 0.675

Page 6: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

From sample data we

compute:

We want to find out the 90%, 95% and 99% confidence intervals (=0.1, 0.05 and 0.01, respectively) for the students’ weight.

ExampleExample

Here is one sample of size 100 from a group of students’ weights. Unbeknownst to us, the population is normal with mean weight of 160 lbs and a standard deviation of 20. These parameters we wish to estimate.

136 136 162 176 153 157 169 180 150 191

115 138 173 164 143 158 141 128 167 174

179 189 136 140 169 169 160 158 174 199

149 161 150 186 189 148 147 146 202 170

166 135 154 165 149 157 170 139 132 197

164 159 135 189 143 151 171 135 139 153

160 137 133 182 155 158 155 165 180 137

139 133 178 146 173 190 118 151 152 155

141 154 160 134 142 147 162 161 132 183

152 179 147 158 135 133 191 152 166 137

n=100

=157.46

s=18.89

X

Sample Size

Sample mean

Sample std.dev.

]3.1626.152[1089.18*58.246.15758.2

]2.1618.153[1089.18*96.146.15796.1

]6.1604.154[1089.18*645.146.157645.1

005.0201.0

025.0205.0

05.021.0

zz

zz

zz

n

szx 2/

Page 7: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Small Sample SizeSmall Sample Size

So far we have secretly and inconspicuously introduced the phrase “for sufficiently large sample sizes” into our calculations Exactly what is sufficiently large? Depends on the problem, but

usually n>40 What happens if n is not sufficiently large?

Recall that in calculating the confidence interval we needed to compute, which included the term σ, which was unknown to us. So we replaced it with the standard error, s, the variance of the sample mean.

While is indeed normal, is only approximately normal, and only

for large n.

In fact, is said to have a student’s t-distribution

n

X

n

X

ns

X

ns

X

Page 8: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

T-DistributionT-Distribution

When is the mean of a random sample of size n from a normal distribution with mean μ the random variable

has a probability distribution called (student’s) t-distribution with n – 1 degrees of freedom (df).

For large n the r.v. S will have a value s close to the true σ, however, for small n this is not the case. Therefore, the t-distribution resembles the normal distribution for large n but deviates from it for smaller n

X

nS

XT

std. normal

t-dist., large n large ν

t-dist., small n (ν)

Page 9: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Properties of T-DistributionProperties of T-Distribution

/2 /21-

-t/2,ν 0 t/2,ν

tν curve

Let tv denote the density function curve for v degrees of freedom.

1. Each tν curve is bell-shaped and centered at 0.

2. Each tν curve is spread out more than the standard normal-z curve.

3. As ν increases, the spread of the corresponding tν curve decreases.

4. As ν→∞ , the sequence of tν curves approaches the standard normal curve (the z curve is called a t curve with df =∞)

5. Let t,ν= the number on the measurement axis for which the area under the t curve with ν df to the right of t,ν is . Then, t,ν is called a t-critical value (which is the counterpart of the z critical value in normal distribution). For brevity, when the meaning is obvious, we will drop ν and simply use t just like z

Page 10: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Confidence Intervals Using T-Distribution

Confidence Intervals Using T-Distribution

Then, for smaller sample sizes (where the original distribution is normal), we can write the confidence interval expression as follows: Let and s be the sample mean and standard deviation

computed from the results of a random sample from a normal population with mean μ. The 100(1-)% confidence interval is:

x

n

stx

n

stx

n

stx

n

nn

1,2

1,21,2 ,

Strictly speaking, the t-distribution applies if and only if the population parameter being estimated is normally distributed. However, in practice, t-distribution works well, if the population distribution is only approximately mound-shaped.

tcdf(x,v): returns the area under the tv curve, left of xtinv(p,v): return the t-critical value to the left of which the area under the curve is p

Page 11: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Estimating the value of a parameter, even along with its confidence interval, has little meaning, unless we use that information to make a decision. The probability that a randomly selected processor from a specific

manufacturer will be flawed is 0.24%±0.01% with a confidence level of 95%...So what…?

Shall we decide that this is a reliable processor? Confidence intervals are most useful in

making decisions based on statistical tests Given an observation based on a finite

random sample, can this observation beentirely due to chance?

In HT, we compare two hypothesesagainst each other and determine whetherwe have enough statistical evidence to reject the hypothesis that the observationis entirely due to chance.

Hypothesis TestingHypothesis Testing

Page 12: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Hypothesis Testing:Setting the stage

Hypothesis Testing:Setting the stage

We will start with an example to familiarize our self with the terminology. Note that any given application can easily be substituted into a number of engineering or non-engineering scenarios: As the CEO of Owl Superior Chip Co., you hear the announcement of your

competitor Lentil’s new chips: Pantsium XIX, and its low cost version Crapleron.

Lentil declares that their chips, even the low cost versions, are 99.99% defect free (that is, only 0.01% of their chips are flawed).

Since you are in this business for quite some time (2 ½ months), you think this is pretty impressive, if not too good to be true…You are suspicious.

You know that Pantsium is pretty reliable, but 99.99% on Crapleron…? You suspect that Lentil is cheating in its figures…that the 99.99% is

primarily for the Pantsium chips, not for the Craplerons…How to prove? You later learn that in estimating the 99.99% figure, they have taken a

sample of 80 chips, of which only 4 were Crapleron…You consider going to court, stating that this is false advertising!...to which they reply with “…well, we randomly picked 80 chips from a production run that manufactures equal number of chips of each kind. The fact that there were only 4 of Craplons in the sample is purely coincidental. There is no foul play!”

Page 13: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

By chance…?By chance…?

Other versions of the same scenario: A company whose workforce of 80 employees consists of 76 males and

4 females. The company claims that they do not favor males, and the fact that there are only 4 females is purely by chance. On the days they were hiring, only men happened to apply – although men and women are equally likely to apply and be successful in such a position

A southern state in 1960s: Out of a panel of 80 potential jurors, only 4 were African –American, in a district where 50% of all eligible citizens were AAs.50% of all eligible employees/jurors/chips are

women / African American / CrapleronOn a random sample of 80 employees/jurors/chips, only 4 are women / African American / Crapleron !

Could this be the result of pure chance?

Page 14: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

What are the odds?What are the odds?

If the selection is really random, and that each group is 50% of the total population, then the number of women / AAs / Craplerons in the sample would be the binomial random variable X with p=0.5, and n=80.

Thus the chances of getting only 4 women/AAs/Craplerons is P(X≤4), which is

You think you have enough statistical evidence to reject Lentil’s claim that having only 4 Craplons in their sample was random or pure chance. You go to court!

764 5.015.04

800.0000000000000000014 !

The probability of 4 Craplerons in a sample of

80 is0.0000000000000000014!

your honor…

Page 15: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

What are the odds?What are the odds?

To drive the point home, you argue that this probability is less than the chances of getting three consecutive royal flushes in poker, or almost the same as hitting the big jackpot twice in a row! Remember? Picking 6 numbers out of 52 in order:

0.000000000068 Getting 4 Craplerons in sample of 80:

0.0000000000000000014 !

So the judge rejects Lentil’s claim (hypothesis) of random selection!

Page 16: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Formal DefinitionsFormal Definitions

A statistical hypothesis is a claim about the value or values of one or more parameters. Proportion of defective chips is p<0.01% Average SAT math score in NJ is s>500 Average wattage of a 60W bulb is w=60W

In any hypothesis testing problem, there are two competing hypothesis H0 – Null hypothesis: the protected hypothesis that is initially

assumed to be true, such as the observations are the result of pure chance

Ha – Alternative hypothesis: the claim that the null hypothesis is false, such as the observations are not by chance, but are the result of a real effect, plus variation.

The test is to analyze observed data to determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis. The burden of proof is with the alternative hypothesis. If the data

does not strongly support the Ha claim, then the test fails to reject H0.

Page 17: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

H0 vs. HaH0 vs. Ha

Often we wish to find out whether a new value / a new theory / a new treatment plan is better then the previous / existing one. H0: The claim that the current value/theory/plan is better.

Ha: The alternative claim that the new value/theory/plan is better.

We only replace the current with the new if there is enough, convincing and compelling evidence to do so. Ex: If in the defective chips example, if we develop a new procedure to

fabricate the chips, we would use it if and only if it produces fewer defects. If the current procedure has proportion of defective chips as p=0.01

• Ha , on which the burden of proof is placed, is the assertion that the new procedure has p<0.01. H0 is then the initial and prior claim that p=0.01

The null hypothesis is always in the form of Ho: θ=θ0 (the null value) The alternative hypothesis can be in any of the following three

forms:• Ha

: θ > θ0 (which implicitly assumes that Ho: θ≤θ0)

• Ha: θ < θ0 (which implicitly assumes that Ho: θ≥θ0)

• Ha: θ ≠ θ0 (which implicitly assumes that Ho: θ=θ0)

Page 18: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Choosing an Appropriate test

Choosing an Appropriate test

Suppose that a 9V battery – when fresh – is required to provide 9.1 V. As the quality control engineer, you draw a random sample of size n to determine whether you are in compliance.

You design an experiment where H0 μ = 9.1 V and

a. Ha>9.1V b. Ha<9.1V c. Ha≠9.1V

You would choose (a) because, in this formulation H0 indicates non-compliance. As a quality control engineering, you put the burden of proof on asserting that the specs are satisfied.

If we were to choose the other options, then H0 would indicate compliance, and Ha would then put the burden of proof on asserting that the batteries are in non-compliance. If you were challenged in a legal proceeding, however, the alleger would have to choose test (b).

Suppose 5pCi/L is the borderline for radioactivity in water. Which test would you choose?

Choose H0: μ=5pCi/L vs. Ha: μ<5pCi/L Then the water is believed unsafe unless proven otherwise, that is the burden of proof is on showing that the water is indeed safe, that is μ<5pCi/L. Choosing Ha: μ>5pCi/L would mean that the water is safe, unless proven otherwise. Suppose you manufacture 20 A fuses for home use. If the fuse burns out at < 20 A, then users would complain fuse burning

out prematurely. If fuse burns out at >20 A, then fire may occur due to malfunctioning fuse. What test should you choose?

Choose H0: μ=20 A vs. Ha: μ≠20 A. Because this time the burden of proof is on showing that fuse blows out at exactly at 20A. Departure in either way from 20A is equally costly.

Page 19: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Testing ProcedureTesting Procedure

Step 1: Formulate the hypotheses and determine the null valueThe null hypothesis asserting that current / status quo situation is preferred H0:Lentil’s sample was purely random – there was 50% chance to pick either

chip H0: The new drug will lower the cholesterol by (no more then) 20%

H0: The new engine technology will allow gas mileage of (no more then) 30mph

H0: The defective component ratio of our product is the same as the competitor’s

The alternative hypothesis claiming that the null hypothesis should be rejected in preference of the new procedure Ha: Lentil’s sample was not purely random, but rather it was biased: there

was >50% chance to pick Pantsium in the sample. Ha: The new drug will lower the cholesterol by > 20%

Ha: The new engine technology will allow gas mileage > 30 mph

Ha: The defective component ratio of our product is < that of the competitor’s.

Page 20: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Testing ProcedureTesting Procedure Step 2: Choose a test statistic and the formula for computing it

A test statistic is a function of the sample data on which the decision – reject H0 or do not reject H0 – will be based

This is the statistical value that will asses your evidence against the null hypothesis

• For the random sampling of chips example, the test statistic would be the binomial random variable with probability of success p=0.5, and the number of trials n=80.

– For applications of the form ‘proportion of successes’, the test statistic will generally be the mean of the observed binomial random variable probability of success, compared with the presumed probability of success (p0, the null value) Note that for a large enough sample size this random variable is approximately normal

• For the gas mileage problem, the test statistic would be the sample mean of the gas mileage obtained from a normally distributed gas mileages of the cars with the new technology: H0: μnew_tech=30mpg vs. Ha

: μnew_tech> 30mpg. – For all applications of the form “average value”, the test statistic will

generally be the mean of the random sample (sample mean) compared to presumed average (μ0, the null value). Note that from CLT, for a sufficiently large sample size, this statistic will also be approximately normal.

npp

pp

n

ppz

00

00

1

ˆ

/

ˆ

n

xz

/0

Page 21: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Testing ProcedureTesting Procedure

Step 3: State the rejection region for a selected significance level The rejection region is the set of all test statistic values for which

H0 will be rejected.• For the Lentil’s random sampling, we may want to reject their

hypothesis if the probability of selecting 4 Craplons at random is less then a specific value. In the previous example, the de-facto rejection (for the judge) was the probability of three royal flushes in a row or hitting the jackpot twice in a row.

• For the gas mileage example, we may choose the rejection as average gas mileage being less then 35 mph.

– Note that since the H0 is the default hypothesis, we need convincing and compelling argument to reject it. Therefore, the rejection region usually picked in such a manner to give H0 plenty of “benefit of the doubt”

• The value is the confidence we wish to have in our rejection region. For example a 95% confidence for the car example, would mean that after observing a large number of cars with the new technology, on average, 95% will have a gas mileage 40 (or higher).

Page 22: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Testing ProcedureTesting Procedure

Step 4:Compute the sample quantities and decide whether H0 should be rejected For the random sampling example, we compute the probability

P(X≤4|p0=0.5, n=80)

For the car example, we compute P( ≥35 |μ0=30, σ=…, n=…)• We then compare these values to rejection region at the specified

confidence level.

A commonly used figure of merit is the p-value, which answers the following question: If the null hypothesis were true, then what is the

probability of observing a test statistic as extreme as the one we observed ?

The smaller the p-value, the stronger the evidenceagainst the null hypothesis.

Much more about the p-value later…

x

Page 23: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Testing ProcedureTesting Procedure

If the p-value is less then a threshold, corresponding to the rejection region, then we agree that there is statistically compelling evidence against H0. For the random sampling example, p=1.4x10-18, we have

enough evidence to rule out Lentil’s claim that having only 4 Craplerons in their sample was purely coincidental!

Page 24: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Errors in Hypothesis Testing

Errors in Hypothesis Testing

Can we make errors despite being over cautious and giving H0 plenty of benefit of the doubt…? Of course, in fact, there are two types of errors we can make.

To make the point, think of the fire detector in your house, and how often it goes off if you make the toast little too dark!

Well, this is called Type I error: An alarm without a fire

Page 25: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Errors in Hypothesis Testing

Errors in Hypothesis Testing

Every cook knows how to avoid a type I error: Just remove the batteries! But then this can cause a fire going undetected – and this is

called Type II error : A fire without an alarm!

Similarly, we can reduce the chance of Type II error by increasing the sensitivity of the sensor, but then again, that increases the probability of Type I error.

Page 26: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Errors in Hypothesis Testing

Errors in Hypothesis Testing

We can put these observations in a table, called the decision table.

Now consider the null hypothesis that there is no fire, and Ha: FIRE!. The alarm, then corresponds to rejection of the null hypothesis.

Statistically speaking: A type I error is committed if we reject the null hypothesis when

in fact it was true A type II error is committed if we fail to reject the null

hypothesis, when in fact it was false.

Page 27: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

:Type I Error:Type I Error

Examples: For the car example, let’s suppose we observed 50 cars and

checked their gas mileage. It is possible that the average gas mileage of those 50 cars was say 35.7 mph, when in fact the true average is below 35. Then by rejecting H0, a type I error is made.

On the other hand, it is also possible that the average gas mileage of those 50 cars were, say 34.6 mph, when in fact, the true average was above 35. Then by not rejecting H0, a type II error is made.

Note that the significance level we mention earlier, emphasized the probability of committing a type I error: the probability of making the observed observation, if indeed H0 was true:

P(rejecting H0 | H0 is true) = P(type I error | H0 )= Then, with 100(1- )% confidence, we claim that the observed

observation under H0 is statistically very unlikely, and hence reject H0. The lower the , the higher the confidence we have in rejecting H0 hence the lower the probability of committing a type I error.

Page 28: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Type II errorType II error

But, sometimes we are interested in type II error, is our alarm too sensitive?

In the past, factories discharging chemicals into waterways were required to show that the discharge had no effect on the downstream wildlife. This is H0. The factory could continue, as long as H0 was not rejected at the 0.05 significance level.

So a polluter, suspecting that he is in violation of EPA standards could devise an ineffective pollution monitoring program:

Type I error: Reject H0, when it is true(shut down the factory, when in factits discharge really has no effect on wildlife)

Type II error: Accept H0, when it is false(factory continues, when in fact it is decimating the wildlife).

Page 29: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Type II errorType II error

Such a test, say “interviewing the ducks” is equivalent to removing the batteries from the fire detector. Both are designed to reduce (remove) type I error.

Of course, such a test greatly increases the probability of committing a type II error, that is, accepting the H0 that the factory discharge is harmless, when in fact it is.

Page 30: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

β:Type II errorβ:Type II error

Just like we limit our probability of committing a type I error using a confidence level of , we can also limit our probability of making a type II error.

We define: β = P(accepting H0 | Ha is true) = P(type II error |Ha) Thus β defines the probability of making a type II error. The lower the

β, the more confident we are of not committing a type II error. Again, just like our confidence in not making a type I error is 1-, our confidence in

not making a type II error is then 1-β, which is called the power of a hypothesis test. Note that the two types of error, type I and II are always in competition. Reducing

one increases the other.

Of course, we’re happy to report, the environmental regulationshave changed since then, requiring pollution monitoring programsto show that they have a high probability of detecting seriouspollution events – that is having a very small β, revealing anyhidden flaws in the monitoring program.

Page 31: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

A Complete ExampleA Complete Example

A new design for braking systems is proposed. For the current system, the true average braking distance at 40 mph is 120 ft. The new system is to be implemented, if there is substantial evidence that it will reduce the braking distance significantly. Parameter of interest, appropriate hypotheses to test the new system Suppose the new system’s braking distance has a σ=10 ft. Let be the

sample average breaking distance of the new system for 36 observations. Which rejection region is most appropriate? R1:

>124.80, R2: <115.20, R3: { >125.13 or <114.87}

What is the significance level for the appropriate region in selected above? How would we change the region to obtain 99% confidence level?

What is the probability that the new design is NOT implmented when its true average braking distance is actually 115 ft and the appropriate region from above is used?

Let . What is the significance level for the rejection region of z<-2.33? How about z<-2.88?

X

x x x x

nXZ 120

Page 32: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

a. Let μ = true average braking distance for the new design at 40 mph. We want to make sure that the burden of proof is on the new braking distance be lower, then, Ho: μ = 120 vs. Ha: μ < 120

b. We want to give the null hypothesis the benefit of doubt. Therefore, we need significant evidence that the new average distance is substantially less then that of the existing one. Therefore, we should choose R2. Reject Ho if < 115.2 (<120)

c. Recall, significance level is probability of type I error, that isrejecting H0, when in fact we shouldn’t. We will reject H0, if observed average is <115.2. The area under the normal curve with mean 120 (the assumed average for existing system - hence H0) is the green shaded region whose area is then :

Now, if we want =0.001 (that is increased to 99.9% confidence) , then we should expect a smaller rejection region: We find the z- value that would give a green shaded area of 0.001 as -3.08 from the Gaussian tables. Then the new rejection region threshold c is:

SolutionSolution

x

1-115.2 120

confidencezPzPn

xzPxP %9802.088.2

6/10

1202.115120|2.115

01.0667.1

12087.11487.11408.3

610

120

zPcc 1-

115.2

0.001

114.87

Page 33: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

Solution – Cont.Solution – Cont.

d. What is the probability that the new design is NOT implemented when its true average braking distance is actually 115 ft and the appropriate region from above is used?

▪ Now, if we are not implementing the new design, then we must have failed to reject H0 (presumably because we think we do not have enough evidence). But in fact the true average distance for the new design is 115, which is less then 115.20. Clearly, we are committing a type II error (failed to reject H0 when it should have been)

▪ According to our hypothesis, we will not reject H0 if average braking distance is >115.2. Therefore, we are looking at the probability of the observed average braking distance being greater then 115.2, when in fact the observed sample is drawn from a population that has a mean of 115:

e. For z is normal, therefore

4522.012.06667.1

1152.115

115|20.115)115(

zPzP

truexP

115 115.2 nXZ 120

02.088.201.033.2 zPzP

Page 34: Probability & Statistics in Engineering ++-- -2  +2  +3  -3  0909.400.01 / 0909.400.02 Dr. P.’s Clinic Consultant Module in Unless indicated.

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

+--2 +2 +3-3

No new HomeworkNo new Homework

The midterm exam will be during regular class meeting time, Thursday, October 23, 10:50 AM to 12:05PM.

The exam will consist of several fill-in-the-blank / short answer questions that test your comprehension of statistical concepts, and a few problems that test your ability to make use of these concepts in real world problems.

The conceptual comprehension part will include questions from all material we have discussed so far, including today’s hypothesis testing, however, the numerical problem section will only include material from Lectures 1 through 6.

The problems will be similar in nature to those you have solved for homework. The questions will assume that you only have 60 minutes to solve them. You may bring one page (two sides) of equation sheet with you. You may put any equation

that you think you may need, however, you may NOT put any definition / description / explanation etc. on this sheet. Equation sheets will be collected at the end of the exam.

You may not use your books or laptops. Standard calculators are allowed. I will also provide a table of standard Gaussian distribution values.

Complete solution is necessary for full credit, drawing cartoons, however, are not! Dr. Linda Head will proctor the exam

MIDTERM: Thursday, Oct 23