Week 1_Hypothesis Testing_Part 1.ppt

77
A bit concept review... Statistics is ... Population is ... Sample is ... Variable is ... Parameter is ... Relation between statistics & parameters ... Difference between statistics and probability Statistics assumptions : Population and its parameters are (unknown / known) Sample is used to ..... parameters

Transcript of Week 1_Hypothesis Testing_Part 1.ppt

  • A bit concept review...Statistics is ...Population is ...Sample is ...Variable is ...Parameter is ...Relation between statistics & parameters ...Difference between statistics and probabilityStatistics assumptions : Population and its parameters are (unknown / known)Sample is used to ..... parameters

  • A bit concept review... (2)A statistic is a numerical characteristic of a sample. A population is a collection of all units of interest.A sample is a subset of a population that is actually observed.A measurable property or attribute associated with each unit of a population is called a variable. A parameter is a numerical characteristic of a population. Statistics are used to infer the values of parameters. In probability, we assume that the population and its parameters are known and compute the probability of drawing a particular sample. In statistics, we assume that the population and its parameters are unknown and the sample is used to infer the values of the parameters.

  • A bit concept review... (3)Statistik Ilmu pengetahuan yang berhubungan dengan data statistik dan fakta yang benar.

    Statistik Suatu kajian ilmu pengetahuan yang menggunakan teknik pengumpulan data, teknik pengolahan data, teknik analisis data, penarikan kesimpulan, dan pembuatan kebijakan/keputusan yang cukup kuat alasannya berdasarkan data dan fakta yang benar.

  • A bit concept review... (4)

  • A bit concept review... (5)Populasi sejenis sampelPopulasi TerbatasPopulasi tak terbatas a. Populasi homogen b. Populasi heterogen

  • A bit concept review... (6)Teknik samplingProbability samplingSimple random samplingProportionate stratified random samplingDisproportionate stratified random samplingArea samplingNon-probability samplingSampling sistematisSampling kuotaSampling aksidentalPurposive samplingSampling jenuhSnowball sampling

  • A bit concept review... (7)DATA?Data kualitatifData kuantitatif

  • A bit concept review... (8)SKALA PENGUKURAN?Skala NominalSkala OrdinalSkala IntervalSkala Ratio

  • Hypothesis TestingNiniet Indah Arvitrida, ST, [email protected] Nopember Institute of TechnologyINDONESIA2013

  • Course 1 ObjectivesStudents able toFormulate null and alternative hypotheses for applications involving a single population mean or proportionFormulate a decision rule for testing a hypothesisKnow how to use the test statistic, critical value, and p-value approaches to test the null hypothesisKnow what Type I and Type II errors areCompute the probability of a Type II error

  • Review of Estimating Population Value

  • Point and Interval EstimatesA point estimate is a single number, a confidence interval provides additional information about variabilityPoint EstimateLower Confidence LimitUpperConfidence LimitWidth of confidence interval

  • Point EstimatesWe can estimate a Population Parameter

    with a Sample Statistic(a Point Estimate)MeanProportionppx

  • Confidence IntervalsHow much uncertainty is associated with a point estimate of a population parameter?

    An interval estimate provides more information about a population characteristic than does a point estimate

    Such interval estimates are called confidence intervals

  • Estimation Process(mean, , is unknown)PopulationRandom SampleMean x = 50Sample

  • Confidence LevelConfidence LevelConfidence in which the interval will contain the unknown population parameterA percentage (less than 100%)

  • Confidence Interval for ( Known) AssumptionsPopulation standard deviation is knownPopulation is normally distributedIf population is not normal, use large sampleConfidence interval estimate

  • Finding the Critical ValueConsider a 95% confidence interval:z.025= -1.96z.025= 1.96Point EstimateLower Confidence LimitUpperConfidence Limitz units:x units:Point Estimate0

  • Margin of ErrorMargin of Error (e): the amount added and subtracted to the point estimate to form the confidence intervalExample: Margin of error for estimating , known:

  • Factors Affecting Margin of ErrorData variation, : e as Sample size, n : e as n Level of confidence, 1 - : e if 1 - Chap 7-*

  • ExampleA sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is .35 ohms.

    Determine a 95% confidence interval for the true mean resistance of the population.Chap 7-*

  • ExampleA sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is .35 ohms. Solution:Chap 7-*(continued)

  • InterpretationWe are 95% confident that the true mean resistance is between 1.9932 and 2.4068 ohms Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true meanAn incorrect interpretation is that there is 95% probability that this interval contains the true population mean. (This interval either does or does not contain the true mean, there is no probability for a single interval)Chap 7-*

  • Students t DistributionChap 7-*t0t (df = 5) t (df = 13)t-distributions are bell-shaped and symmetric, but have fatter tails than the normalStandard Normal(t with df = )Note: t z as n increases

  • Approximation for Large SamplesSince t approaches z as the sample size increases, an approximation is sometimes used when n 30:Chap 7-*Technically correctApproximation for large n

  • Now we talk about HYPOTHESIS

  • What is a Hypothesis?A hypothesis is a claim (assumption) about a population parameter:

    population mean

    population proportion

    Example: The mean monthly cell phone bill of this city is = $42Example: The proportion of adults in this city with cell phones is p = .68

  • The Null Hypothesis, H0States the assumption (numerical) to be tested Example: The average number of TV sets in U.S. Homes is at least three ( )Is always about a population parameter, not about a sample statistic

  • The Null Hypothesis, H0Begin with the assumption that the null hypothesis is trueSimilar to the notion of innocent until proven guiltyAlways contains = , or signMay or may not be rejected(continued)

  • The Alternative Hypothesis, HAIs the opposite of the null hypothesise.g.: The average number of TV sets in U.S. homes is less than 3 ( HA: < 3 )Never contains the = , or signMay or may not be acceptedIs generally the hypothesis that is believed (or needs to be supported) by the researcher

  • PopulationClaim: thepopulationmean age is 50.(Null Hypothesis:REJECTSupposethe samplemean age is 20: x = 20SampleNull Hypothesis20likely if = 50?=IsHypothesis Testing ProcessIf not likely, Now select a random sample H0: = 50 )x

  • Reason for Rejecting H0Sampling Distribution of x = 50If H0 is trueIf it is unlikely that we would get a sample mean of this value ...... then we reject the null hypothesis that = 50.20... if in fact this were the population meanx

  • Level of Significance, Defines unlikely values of sample statistic if null hypothesis is trueDefines rejection region of the sampling distributionIs designated by , (level of significance)Typical values are .01, .05, or .10Is selected by the researcher at the beginningProvides the critical value(s) of the test

  • Level of Significance and the Rejection RegionH0: 3 HA: < 30H0: 3 HA: > 3H0: = 3 HA: 3aa /2 Represents critical valueLower tail testLevel of significance = a00a /2aUpper tail testTwo tailed testRejection region is shaded1-1-1-

  • Errors in Making DecisionsType I Error Reject a true null hypothesisConsidered a serious type of error

    The probability of Type I Error is Called level of significance of the testSet by researcher in advanceThe probability of Type I Error is

  • Errors in Making DecisionsType II ErrorFail to reject a false null hypothesis

    The probability of Type II Error is (continued)

  • Outcomes and ProbabilitiesState of NatureDecisionDo NotRejectH0No error (1 - )aType II Error ( )RejectH0Type I Error( )aPossible Hypothesis Test Outcomes H0 False H0 TrueKey:Outcome(Probability)No Error ( 1 - )

  • Type I & II Error Relationship Type I and Type II errors can not happen at the same time Type I error can only occur if H0 is true Type II error can only occur if H0 is false

    If Type I error probability ( ) , then Type II error probability ( )

  • Factors Affecting Type II ErrorAll else equal, when the difference between hypothesized parameter and its true value

    when when when n

  • Critical Value Approach to TestingConvert sample statistic (e.g.: ) to test statistic ( Z or t statistic )Determine the critical value(s) for a specified level of significance from a table or computerIf the test statistic falls in the rejection region, reject H0 ; otherwise do not reject H0

  • Lower Tail TestsReject H0Do not reject H0The cutoff value, or , is called a critical valuea-zx-zx0H0: 3 HA: < 3

  • Upper Tail TestsReject H0Do not reject H0The cutoff value, or , is called a critical valueazxzx0H0: 3 HA: > 3

  • Two Tailed TestsDo not reject H0Reject H0Reject H0There are two cutoff values (critical values): or/2-z/2x/2 z/2x/200H0: = 3 HA: 3z/2x/2LowerUpperx/2LowerUpper/2

  • Critical Value Approach to TestingConvert sample statistic ( ) to a test statistic ( Z or t statistic )

    x Known Large Samples UnknownHypothesis Tests for Small Samples

  • Calculating the Test Statistic Known Large Samples UnknownHypothesis Tests for Small SamplesThe test statistic is:

  • Calculating the Test Statistic Known Large Samples UnknownHypothesis Tests for Small SamplesThe test statistic is:But is sometimes approximated using a z:(continued)

  • Calculating the Test Statistic Known Large Samples UnknownHypothesis Tests for Small SamplesThe test statistic is:(The population must be approximately normal)(continued)

  • Review: Steps in Hypothesis Testing1. Specify the population value of interest2.Formulate the appropriate null and alternative hypotheses3. Specify the desired level of significance4. Determine the rejection region5. Obtain sample evidence and compute the test statistic6. Reach a decision and interpret the result

  • Hypothesis Testing ExampleTest the claim that the true mean # of TV sets in US homes is at least 3.(Assume = 0.8)1. Specify the population value of interestThe mean number of TVs in US homes2.Formulate the appropriate null and alternative hypothesesH0: 3 HA: < 3 (This is a lower tail test)3. Specify the desired level of significanceSuppose that = .05 is chosen for this test

  • Hypothesis Testing Example4. Determine the rejection regionReject H0Do not reject H0 = .05-z= -1.6450This is a one-tailed test with = .05. Since is known, the cutoff value is a z value: Reject H0 if z < z = -1.645 ; otherwise do not reject H0(continued)

  • Hypothesis Testing Example5. Obtain sample evidence and compute the test statisticSuppose a sample is taken with the following results: n = 100, x = 2.84 ( = 0.8 is assumed known) Then the test statistic is:

  • Hypothesis Testing Example 6. Reach a decision and interpret the resultReject H0Do not reject H0 = .05-1.6450-2.0Since z = -2.0 < -1.645, we reject the null hypothesis that the mean number of TVs in US homes is at least 3 (continued)z

  • Hypothesis Testing Example An alternate way of constructing rejection region:Reject H0 = .052.8684Do not reject H032.84Since x = 2.84 < 2.8684, we reject the null hypothesis(continued)xNow expressed in x, not z units

  • p-Value Approach to TestingConvert Sample Statistic (e.g. ) to Test Statistic ( Z or t statistic )Obtain the p-value from a table or computerCompare the p-value with If p-value < , reject H0If p-value , do not reject H0 x

  • p-Value Approach to Testingp-value: Probability of obtaining a test statistic more extreme ( or ) than the observed sample value given H0 is trueAlso called observed level of significanceSmallest value of for which H0 can be rejected (continued)

  • p-value exampleExample: How likely is it to see a sample mean of 2.84 (or something further below the mean) if the true mean is = 3.0?p-value =.0228 = .052.868432.84x

  • p-value exampleCompare the p-value with If p-value < , reject H0If p-value , do not reject H0 Here: p-value = .0228 = .05Since .0228 < .05, we reject the null hypothesis(continued)p-value =.0228 = .052.868432.84

  • Example: Upper Tail z Test for Mean ( Known) A phone industry manager thinks that customer monthly cell phone bill have increased, and now average over $52 per month. The company wishes to test this claim. (Assume = 10 is known)H0: 52 the average is not over $52 per monthHA: > 52 the average is greater than $52 per month(i.e., sufficient evidence exists to support the managers claim)Form hypothesis test:

  • Suppose that = .10 is chosen for this test

    Find the rejection region:Reject H0Do not reject H0 = .10z=1.280Reject H0Reject H0 if z > 1.28Example: Find Rejection Region(continued)

  • Review:Finding Critical Value - One TailZ.07.091.1.3790.3810.38301.2.3980.40151.3.4147.4162.4177z01.28.08Standard Normal Distribution Table (Portion)What is z given a = 0.10?a = .10Critical Value = 1.28.90.3997.10.40.50

  • Example: Test StatisticObtain sample evidence and compute the test statisticSuppose a sample is taken with the following results: n = 64, x = 53.1 (=10 was assumed known) Then the test statistic is:(continued)

  • Example: DecisionReach a decision and interpret the result:Reject H0Do not reject H0 = .101.280Reject H0Do not reject H0 since z = 0.88 1.28i.e.: there is not sufficient evidence that the mean bill is over $52z = .88(continued)

  • p -Value SolutionCalculate the p-value and compare to Reject H0 = .10Do not reject H01.280Reject H0z = .88(continued)p-value = .1894Do not reject H0 since p-value = .1894 > = .10

  • Example: Two-Tail Test( Unknown) The average cost of a hotel room in New York is said to be $168 per night. A random sample of 25 hotels resulted in x = $172.50 and s = $15.40. Test at the = 0.05 level.(Assume the population distribution is normal)H0: = 168 HA: 168

  • Example Solution: Two-Tail Testa = 0.05n = 25 is unknown, so use a t statisticCritical Value: t24 = 2.0639Do not reject H0: not sufficient evidence that true mean cost is different than $168Reject H0Reject H0a/2=.025-t/2Do not reject H00t/2a/2=.025-2.06392.06391.46H0: = 168 HA: 168

  • Hypothesis Tests for ProportionsInvolves categorical valuesTwo possible outcomesSuccess (possesses a certain characteristic)Failure (does not possesses that characteristic)Fraction or proportion of population in the success category is denoted by p

  • ProportionsSample proportion in the success category is denoted by p

    When both np and n(1-p) are at least 5, p can be approximated by a normal distribution with mean and standard deviation (continued)

  • Hypothesis Tests for ProportionsThe sampling distribution of p is normal, so the test statistic is a z value:np 5andn(1-p) 5Hypothesis Tests for pnp < 5orn(1-p) < 5Not discussed in this chapter

  • Example: z Test for Proportion A marketing company claims that it receives 8% responses from its mailing. To test this claim, a random sample of 500 were surveyed with 25 responses. Test at the = .05 significance level.Check: n p = (500)(.08) = 40n(1-p) = (500)(.92) = 460

  • Z Test for Proportion: Solutiona = .05 n = 500, p = .05Reject H0 at = .05H0: p = .08 HA: p .08Critical Values: 1.96Test Statistic:Decision:Conclusion:z0RejectReject.025.0251.96-2.47There is sufficient evidence to reject the companys claim of 8% response rate.-1.96

  • p -Value SolutionCalculate the p-value and compare to (For a two sided test the p-value is always two sided)Do not reject H0Reject H0Reject H0/2 = .0251.960z = -2.47(continued)p-value = .0136:Reject H0 since p-value = .0136 < = .05z = 2.47-1.96/2 = .025.0068.0068

  • Type II ErrorType II error is the probability of failing to reject a false H0Reject H0: 52Do not reject H0 : 525250Suppose we fail to reject H0: 52 when in fact the true mean is = 50

  • Type II ErrorSuppose we do not reject H0: 52 when in fact the true mean is = 50Reject H0: 52Do not reject H0 : 525250This is the true distribution of x if = 50This is the range of x where H0 is not rejected(continued)

  • Type II ErrorSuppose we do not reject H0: 52 when in fact the true mean is = 50Reject H0: 52Do not reject H0 : 525250Here, = P( x cutoff ) if = 50(continued)

  • Calculating Suppose n = 64 , = 6 , and = .05Reject H0: 52Do not reject H0 : 525250So = P( x 50.766 ) if = 50(for H0 : 52)50.766

  • Calculating Suppose n = 64 , = 6 , and = .05Reject H0: 52Do not reject H0 : 525250(continued)Probability of type II error: = .1539

  • Using PHStatOptions

  • Sample PHStat OutputInputOutput

  • Chapter SummaryAddressed hypothesis testing methodologyPerformed z Test for the mean ( known)Discussed pvalue approach to hypothesis testingPerformed one-tail and two-tail tests . . .

  • Chapter SummaryPerformed t test for the mean ( unknown)Performed z test for the proportionDiscussed type II error and computed its probability(continued)