Analysis of Sample Mean

download Analysis of Sample Mean

of 43

Transcript of Analysis of Sample Mean

  • 8/3/2019 Analysis of Sample Mean

    1/43

    Go to Index

    Analysis of Means

    Farrokh Alemi, Ph.D.

    Kashif Haqqi M.D.

  • 8/3/2019 Analysis of Sample Mean

    2/43

    Go to Index

    Table of Content

    Review

    Objectives

    Definitions Expected Value

    Normal Distribution

    Distribution of Mean

    Central Limit Theorem Standard Normal

    Distribution

    Use of Z Values

    Confidence Interval

    Hypothesis

    Two Types of Error

    One-tailed Tests

    Steps in Testing aHypothesis

    When to AssumeNormal Distribution forMeans

    Use t-distribution

  • 8/3/2019 Analysis of Sample Mean

    3/43

    Go to Index

    Review

    Frequency distribution

    Mean, median, and mode

    Standard deviation and range

    Statistics is theart of making

    sense of

    distributions.

  • 8/3/2019 Analysis of Sample Mean

    4/43

    Go to Index

    Objectives

    Describe different distributions, including

    normal, and t-distributions.

    Calculate and interpret confidenceintervals using normal distributions.

    Understand types of errors that occurs

    with hypothesis testing.

    Hypothesis testing using t-distribution.

  • 8/3/2019 Analysis of Sample Mean

    5/43

    Go to Index

    Example You Should Be Able to

    Answer at the End

    Is it important to ask these

    types of questions?

    The cost of rehabilitation in the industry is

    $25,000, with a standard deviation of

    3000. Assume that the average cost in our

    hospital is $30,000.

    With 95% confidence, would you say thatour cost is different than the industry?

  • 8/3/2019 Analysis of Sample Mean

    6/43

    Go to Index

    Definitions

    A random variable is a variable whosevalues are determined by chance.

    A probability distribution is theprobability with which values of a randomvariable can or are observed.

    Probability of a value is the frequency ofoccurrence of that value divided by thefrequency of occurrences of all values.

  • 8/3/2019 Analysis of Sample Mean

    7/43

    Go to Index

    Example of Probability Estimates

    We examined the waiting time of 50 people at

    our emergency room and found that 10 people

    waited up to 5 minutes, 20 people waited 5.001to 10 minutes, 13 people waited 10.001 to 15

    minutes and 7 people waited 15.001to 20

    minutes.

    What is the probability of waiting 5 minutes?

    What is the probability of waiting up to 10

    minutes? Distributions help us make probability

    estimates about observed values.

  • 8/3/2019 Analysis of Sample Mean

    8/43

    Go to Index

    Example of Probability Estimates

    (Continued) The probability of waiting up to 5 minutes

    is the number of times people waited up to

    5 minutes divided by the total number ofpeople: 10/50=.20.

    The probability of waiting up to 10

    minutes is the number of people whowaited up to 10 minutes divided by the

    total number of people: (10+20)/50=0.6.

  • 8/3/2019 Analysis of Sample Mean

    9/43

    Go to Index

    Expected Value

    Expected value of a distribution is the mean ofthe distribution.

    It represents our long run expectations about thedistribution.

    The expected value of X is given by summingthe product of each value of X, referred to as i,

    times its probability of occurring, referred to asp(X=i).

    Expected value = mean = p(X=i) * i.

  • 8/3/2019 Analysis of Sample Mean

    10/43

    Go to Index

    Example Calculation of Expected

    Value or Mean We examined the waiting time of 50

    people at our emergency room and found

    that 10 people waited up to 5 minutes, 20people waited 6 to 10 minutes, 13 people

    waited 11 to 15 minutes and 7 people

    waited 16-20 minutes. What is the mean waiting time at our

    emergency room?

  • 8/3/2019 Analysis of Sample Mean

    11/43

    Go to Index

    Example Calculation of Expected

    Value or Mean (Continued)

    Do this in Excel

    Observed

    waiting

    time Frequency Probability

    Probability

    times waiting

    time

    2.5 10 0.2 0.5

    7.5 20 0.4 3

    12.5 13 0.26 3.25

    17.5 7 0.14 2.45

    Total 50 1 9.2

    The expected value or mean is 9.2

    http://biostatistics.gmu.edu/means.xlshttp://biostatistics.gmu.edu/means.xls
  • 8/3/2019 Analysis of Sample Mean

    12/43

    Go to Index

    Normal Distribution

    A symmetric distribution, meaning that

    data are evenly distributed about the

    mean. Mean, median and mode are the same

    value.

    It has one mode and looks like a bellshaped curve.

  • 8/3/2019 Analysis of Sample Mean

    13/43

    Go to Index

    Normal Distribution Continued

    The curve is continuous, there are no gaps

    or holes.

    The curve never touches the X-axis as anyvalue is possible but with infinitely small

    probabilities.

    99.7% of values are within 3 standarddeviations of mean.

  • 8/3/2019 Analysis of Sample Mean

    14/43

    Go to Index

    Distribution of Mean

    If you take a repeated sample of some

    observations and average them, then you have a

    distribution for the mean. The distribution of the mean has the same mean

    as the distribution of the observations.

    Standard deviation of the mean = Standard error

    = Standard deviation of the observations /

    Square root of the sample size.

  • 8/3/2019 Analysis of Sample Mean

    15/43

    Go to Index

    Example

    What is the mean, standard deviation and

    standard error for the following data: 4, 5,

    6? Mean = 5

    Standard deviation = 1

    Standard error = 1 / 1.7 = 0.58

  • 8/3/2019 Analysis of Sample Mean

    16/43

    Go to Index

    Central Limit Theorem

    For any distribution of n observations with

    mean of and standard deviation .

    As n increases, the sample means willhave a Normal distribution of mean and

    standard deviation / square root (n).

    The theorem is important because it

    helps us ignore questions about the

    shape of distribution and focus on the

    mean and standard deviation of it.Do this in Excel

    http://biostatistics.gmu.edu/avgisnormal.xlshttp://biostatistics.gmu.edu/avgisnormal.xls
  • 8/3/2019 Analysis of Sample Mean

    17/43

    Go to Index

    Standard Normal Distribution

    A Normal distribution.

    Mean of zero.

    Standard deviation of 1. Z = (Observed valuemean) / standard

    deviation of average.

    Where standard deviation of mean = standarderror = standard deviation of observations

    divided by square root of sample size.

  • 8/3/2019 Analysis of Sample Mean

    18/43

    Go to Index

    Example Calculation of Z

    What is the Z value for the observed mean

    of 16, if the average mean is 10 and the

    standard error is 2? Z = (16-10) / 2 = 3.

  • 8/3/2019 Analysis of Sample Mean

    19/43

    Go to Index

    Another Example

    What is the Z value for the mean 16 of 4

    observations, if the average of repeated

    sample of means is 10 and the standarddeviation of the observations is 2?

    Standard deviation of mean =

    2 / 4^0.5 = 2/2 =1 Z value for 16 = (16-10)/1 = 6

  • 8/3/2019 Analysis of Sample Mean

    20/43

    Go to Index

    Use of Z Values

    99.7% of data are between z=3 and z=-3.

    Z is the number of standard deviations that

    X is away from the mean.

    0.15% of data are below z=-3.

    0.15 % of data are above z=3.

  • 8/3/2019 Analysis of Sample Mean

    21/43

    Go to Index

    Use of Z Value (Continued)

    95% of data are within z=1.96 and z=-1.96

    5% are outside z=1.96 and z=-1.96

    2.5% of data are below z=-1.96

    2.5% of data are above z=1.96

  • 8/3/2019 Analysis of Sample Mean

    22/43

    Go to Index

    Confidence Interval

    For Normal distributions, the 95% two

    tailed confidence interval corresponds to

    observations where z=1.96 and z=-1.96.

  • 8/3/2019 Analysis of Sample Mean

    23/43

    Go to Index

    Example

    What is the 95% confidence interval for

    mean of 10 and standard deviation of 2?

    Lower limit = 10-1.96*2 = 6.08.

    Upper limit = 10+1.96*2 =13.92.

    At 13.92, Z value is (13.92-10)/2=1.96.

    At 6.08 , Z value is (6.08-10) / 2=-1.96.

    95% of data fall within these limits.

  • 8/3/2019 Analysis of Sample Mean

    24/43

    Go to Index

    Two Tailed Confidence Interval

    What percentage of data are between z=1.96 andZ=-1.96. Answer: 95%. Often referred to astwo-tailed confidence interval.

    What percentage of data are below z=1.96?

    Answer = 97.5. Often referred to as one tailed-confidence interval.

    What percentage of data are above Z=-1.96.Answer =97.5. Often referred to as one tailedconfidence interval.

  • 8/3/2019 Analysis of Sample Mean

    25/43

    Go to Index

    Hypothesis

    A statistical hypothesis is a conjecture

    about population parameter.

    The null hypothesis is that there is nodifference between the parameter and a

    value.

    The alternative hypothesis states there is aspecific difference.

    Experimental data can only reject a

    hypothesis not accept it.

  • 8/3/2019 Analysis of Sample Mean

    26/43

    Go to Index

    Possible Outcomes of Hypothesis

    TestThere are four possible outcomes:

    1. We reject a hypothesis that is true.

    2. We reject a hypothesis that is false.

    3. We do not reject a hypothesis that is true.

    4. We do not reject a hypothesis that is false.

  • 8/3/2019 Analysis of Sample Mean

    27/43

    Go to Index

    Two Types of Error

    Hypothesis is

    true

    Hypothesis is

    false

    We reject

    hypothesis

    Type one error Correct

    We do not rejecthypothesis

    Correct Type two error

  • 8/3/2019 Analysis of Sample Mean

    28/43

    Go to Index

    Type 1 Error

    The level of significance is the maximum

    probability of type 1 error, symbolized by alpha,

    . When we base our decision on 95% confidence

    intervals, 5% of the data are ignored at the two

    tails of the distribution. Therefore, there is 5%

    chance that we will reject a hypothesis that istrue.

    Type one error= 5%, = 0.05.

  • 8/3/2019 Analysis of Sample Mean

    29/43

    Go to Index

    One-tailed Tests

    In a two-tailed test, the hypothesis is

    rejected when the value is above higher

    limit and below the lower limit. In a one-tailed test that a parameter is

    larger than a particular value, the

    hypothesis is rejected when the value isabove higher limit.

  • 8/3/2019 Analysis of Sample Mean

    30/43

    Go to Index

    One-tailed Tests (Continued)

    When we base our decision on 95%

    confidence intervals, 2.5% of the data are

    ignored at one tail of the distribution.Therefore, there is 2.5% chance that we

    will reject a hypothesis that is true.

    =0.025.

  • 8/3/2019 Analysis of Sample Mean

    31/43

    Go to Index

    Steps in Testing a Hypothesis

    1. State the null hypothesis.

    2. Identify the alternative hypothesis.

    3. Is this a one tailed or two tailed test?

    4. Decide the critical Z value above or below which the

    hypothesis is rejected, usually 1.96.

    5. Calculate the Z value corresponding to the

    observation.6. Reject or do not reject the hypothesis by comparing

    the calculated Z to the critical values.

  • 8/3/2019 Analysis of Sample Mean

    32/43

    Go to Index

    Example

    The cost of rehabilitation in the industry is

    $25,000, with a standard deviation of

    3000. In our hospital, the average cost is

    $30,000.

    With 95% confidence, would you say thatour cost is different than the industry?

    Do this in Excel

    http://biostatistics.gmu.edu/stand.xlshttp://biostatistics.gmu.edu/stand.xls
  • 8/3/2019 Analysis of Sample Mean

    33/43

    Go to Index

    Steps in Testing Example

    Hypothesis1. The null hypothesis: Our cost is higher or

    lower than average.

    2. Alternative hypothesis: Our costs are thesame as the industry.

    3. This is a two tailed test.

    4. The critical Z is +1.96 or1.96.

    5. Observed Z = (30000-25000)/3000 = 1.66.

    6. Do not reject the hypothesis.

  • 8/3/2019 Analysis of Sample Mean

    34/43

    Go to Index

    When to Assume Normal

    Distribution for Means When the population variance is known

    and observations have a Normal

    distribution. When the population variance is unknown

    and there are more than 30 observations.

    Otherwise use t-distribution anapproximation for Normal distribution.

  • 8/3/2019 Analysis of Sample Mean

    35/43

    Go to Index

    Use t-distribution

    If the values in the population is Normal.

    If we have less than 30 observations.

    If we have to estimate the standard

    deviation from the sample and variance of

    the population is not known.

    The t-distribution is used as anapproximation for near Normal data.

  • 8/3/2019 Analysis of Sample Mean

    36/43

    Go to Index

    Calculating t Statistic

    t= (observed averagemean) / standard

    deviation of the average.

    Critical value of t depends on sample size.

    For one tail test of alpha = 0.025 and two

    tailed test of alpha =0.05.

    The critical t value for sample size of 10 is2.22 and for sample size of 20 is 2.08.

  • 8/3/2019 Analysis of Sample Mean

    37/43

    Go to Index

    Calculating t Statistic

    (Continued) If we are examining sample size of 10,

    95% of data are within t=2.22 and t=-2.22.

    If we are examining sample size of 10,97.5% of data are below t=2.22.

  • 8/3/2019 Analysis of Sample Mean

    38/43

    Go to Index

    Testing With t-distribution

    1. State the null hypothesis.

    2. Identify the alternative hypothesis.

    3. Is this a one tailed or two tailed test?

    4. Decide the critical t value above or below which the

    hypothesis is rejected, the value depends on sample

    size.

    5. Calculate the t value corresponding to the

    observation.

    6. Reject or do not reject the hypothesis by comparing

    the calculated t to the critical values.

  • 8/3/2019 Analysis of Sample Mean

    39/43

    Go to Index

    Example Data

  • 8/3/2019 Analysis of Sample Mean

    40/43

    Go to Index

    Selecting Data Analysis

  • 8/3/2019 Analysis of Sample Mean

    41/43

    Go to Index

    Select Descriptive Statistics

  • 8/3/2019 Analysis of Sample Mean

    42/43

    Go to Index

    Enter Data Range

  • 8/3/2019 Analysis of Sample Mean

    43/43

    Go to Index

    Result

    Confidence interval is

    the mean plus or

    minus the confidence

    level. If it does notinclude $30,000, then

    our hospital has a

    different cost structure

    than other hospitals in

    our database