Llsk_confidence Interval Printout

download Llsk_confidence Interval Printout

of 32

Transcript of Llsk_confidence Interval Printout

  • 8/11/2019 Llsk_confidence Interval Printout

    1/32

    Confidence Intervals

    Prof. Benjamin HK YipDivision Family MedicineSchool of Public Health and Primary Care

    1

  • 8/11/2019 Llsk_confidence Interval Printout

    2/32

    Overview

    Outline! Background! Confidence intervals (CIs)! Examples

    Learning Objectives! to understand CI construction! to be able to name 3 factors that affect CIs! to be able to interpret CIs found in literature

    2

  • 8/11/2019 Llsk_confidence Interval Printout

    3/32

    Motivation

    ! Research/Clinical Questions: Do HK youngadults have a low BMD (L2-L4)?

    ! Ex: In this particular class, the sample mean ofBMD (L2-L4) is 0.96 g/cm 2.

    ! Questions:

    " How meaningful is this sample mean?" Will you trust this estimates?

    3

  • 8/11/2019 Llsk_confidence Interval Printout

    4/32

    Statistical inference

    ! Methods for drawing conclusions about a population from a sample data," Parameter estimation and Confidence interval" Hypothesis testing (p-values)

    ! Question: What allows us to make valid inferencesabout a population based only on a sample ?" Probability (see previous lecture)" Random process, i.e., randomization is the key

    4

  • 8/11/2019 Llsk_confidence Interval Printout

    5/32

    Why should I sample instead of usingthe entire population?

    The reasons to avoid using entire population arefollowing:

    ! Cost ( ! , $) and time! Impractical! Inaccurate

    " There is a lot of error to control and monitor

    " Lists are rarely up to date.! Random sampling

    5

  • 8/11/2019 Llsk_confidence Interval Printout

    6/32

  • 8/11/2019 Llsk_confidence Interval Printout

    7/32

    Terminology

    Population parameter! a quantity that describes a population.

    Sample statistics! an estimate of the population parameter

    Statistical inference! process of drawing conclusions about a population

    based on observations in a sample

    7

  • 8/11/2019 Llsk_confidence Interval Printout

    8/32

    Framework for statistical inference

    8

    Sampling

    Inference

    Population Sample

    Random sampling, study design

    Statistical estimation, hypothesis testing

    ! 2

    "

    #

    x

    s2

    r

    p

  • 8/11/2019 Llsk_confidence Interval Printout

    9/32

    Example of Population, sample and

    parameters

    Arbitrary population:! Objective: Smoking = > cancer! Population: ???! The underlying truth process, which is

    universally true for the Population

    9

  • 8/11/2019 Llsk_confidence Interval Printout

    10/32

    Definitions

    Confidence interval, CI! a range of values that probably contain the

    population value

    Confidence limits! the values that state the boundaries of the

    confidence interval

    10

  • 8/11/2019 Llsk_confidence Interval Printout

    11/32

    Construction

    Most CIs have the following form:

    Sample +/- (critical value)x(SE of sample statistics)statistics

    margin of error

    11

  • 8/11/2019 Llsk_confidence Interval Printout

    12/32

    Construction! The sample statistics is point estimate based on sampled

    data (eg, sample mean, sample proportion)

    ! The critical value represents the desired confidence levelbased on distribution theory (normal, t, Poisson).

    ! The SE of sample statistics is a measure of the precisionof the sample estimation. In case the estimate is about a

    mean (central tendency) then it can also be called asStandard error of the mean (SEM). SE differ to SD, butthey are related (see later slides).

    12

  • 8/11/2019 Llsk_confidence Interval Printout

    13/32

    Critical value

    ! Decide distribution which the desired CI is based on." Continuous: Normal or Student- t" Count: Poisson

    " Binary: Binomial! In general, normal (z-table) is the default distribution,

    given the sample size is large enough (Central limittheorem).

    ! Decide type I error rate (

    "): Incorrect to claim asignificant results (False positive). In general:

    " " = 0.05

    13

  • 8/11/2019 Llsk_confidence Interval Printout

    14/32

  • 8/11/2019 Llsk_confidence Interval Printout

    15/32

    Recall

    15

    Pr ! 1.96 < z < 1.96( )= 0.95

  • 8/11/2019 Llsk_confidence Interval Printout

    16/32

    Population Sample

    Mean

    UnbiasedEstimator

    m or

    # Standard deviation

    UnbiasedEstimator

    16

    x

    SD =1

    n ! 1 x

    i ! x( )2"

    SE and SD

    SE =!

    N =

    SD

    N

  • 8/11/2019 Llsk_confidence Interval Printout

    17/32

    SE vs SD! Standard Deviation tells you the variability of your

    data.

    ! Standard Error of the mean, SEM, tells you howgood is your estimate of the mean (accuracy). Its ingeneral smaller than SD, but dont let this be areason for you to choose to use it!

    ! Which one to use is depending on the content, youwant to describe the variability of the data or theaccuracy of your mean estimation?

    17

  • 8/11/2019 Llsk_confidence Interval Printout

    18/32

  • 8/11/2019 Llsk_confidence Interval Printout

    19/32

    Probability and Confidence Interval

    ! From the CLT we know that

    ! From a N(0,1)-table we have

    ! Rearranging gives

    ! Thus, the interval is a 95% CI for !

    November 08, 2012Benjamin Yip 19

    ! SE

    ~ z ~ N (0, 1)

    Pr ! 1.96 < !

    SE < 1.96

    "#$

    %&'

    = 0.95

    Pr( ! 1.96 " SE < < + 1.96 " SE ) = 0.95

    ..96.1 e s!

  • 8/11/2019 Llsk_confidence Interval Printout

    20/32

    *Theory behind CI

    ! Constructing a CI is simple, only need 3components: statistics (e.g., mean), SE of thestatistics, and desired % CI.

    ! However, the logic behind is more complicated. Itinvolves three type of standard deviation (SD): SD ofthe population parameter, SD of the sample, and SD

    of the sampling distribution.

    20

  • 8/11/2019 Llsk_confidence Interval Printout

    21/32

    !"#$% '$ ()*(+*),$ ,#$ -)./*$ .$)% '$ )0$ +-+)**12%,$0$-,$3 %4, 2% ,#$ .$)% 45 ,#2- /)06(+*)0 -)./*$7 8+, 2%

    ,#$ .$)% 540 2%32923+)*- 45 ,#2- ,1/$ : 2% -,)6-6()* ,$0.-7 45,#$ /4/+*)64% 504. '#2(# ,#$ -)./*$ (4.$- 504.; "$

    +-+)**1 (4**$(, 3),) 2% 403$0 ,4

  • 8/11/2019 Llsk_confidence Interval Printout

    22/32

    22

    Only 1 CI missed the true mean.

    Indicates the true mean (75mmHg)

    *95% CI for the mean diastolic BP for 20simulated studies, 50 subjects in simulation

  • 8/11/2019 Llsk_confidence Interval Printout

    23/32

    An example! Suppose that you would like to know the effect of a

    newly developed drug (drug A) and a current drug(drug B) on systolic blood pressure (SBP).

    ! Let say 35 patients were randomly assigned toreceive drug A and another 35 assigned to drug B.The average (mean) SBP among drug A and drug Bpatients was 107 mmHg (SD=19) and 125 (20)mmHg, respectively.

    ! Construct 95% CI for each group, do the CIs overlapand what is the interpretation?

    23

  • 8/11/2019 Llsk_confidence Interval Printout

    24/32

    Mean =

    z1-" /2 =

    SE = SD/sqrt(N) =

    95% CI =

    24

    95% CI = mean z1! ! /2 SE = mean z1! ! /2SD

    N

    95% CI for Drug A

  • 8/11/2019 Llsk_confidence Interval Printout

    25/32

  • 8/11/2019 Llsk_confidence Interval Printout

    26/32

    26

    Graph the CIs

    100 110 120 130 140

    100 110 120 130 140

    Drug A

    Drug B

    Non-overlapping CIs indicating a true (i.e., signicant) mean difference: Drug A is more effective to lower SBP than drug B.

  • 8/11/2019 Llsk_confidence Interval Printout

    27/32

    In general:

    27Sourse: http://www.measuringusability.com/blog/ci-10things.php

  • 8/11/2019 Llsk_confidence Interval Printout

    28/32

    Factors that affect the width of a CI are:

    ! Targeted confidence level, 1- " (higher % wider CI)

    ! Sample size, N(larger sample size, shorter CI)

    ! Variability or standard deviation, # (or SD)(higher SD, wider CI)

    28

    mean z1! ! /2SD

    N

  • 8/11/2019 Llsk_confidence Interval Printout

    29/32

    Factors that affect the width of a CI are:Targeted CI, 1- !

    ! Intuition: a higher confidence interval level withoutimproving data quality means a larger margin of

    error.

    ! As the targeted confidence interval increases, the CIwidth increases, given all other quantities remainunchanged.

    29

  • 8/11/2019 Llsk_confidence Interval Printout

    30/32

    Factors that affect the width of a CI are:Sample size, N

    ! Intuition: a larger sample size means moreinformation, which implies better inference

    ! As the sample size increases, the CI widthdecreases, given all other quantities remainunchanged.

    30

  • 8/11/2019 Llsk_confidence Interval Printout

    31/32

    Factors that affect the width of a CI are:Variability, SD

    ! Intuition: more variability or larger spread meansmore difficult to estimate population value withoutlarge amounts of data

    ! As the variability increases, the CI width increases,given all other quantities remain unchanged

    31

  • 8/11/2019 Llsk_confidence Interval Printout

    32/32

    5 things to know about CI

    1. CI tells you the most likely range of the unknownpopulation statistics (e.g., mean, proportion).

    2. CI provides both the location and precision of a

    measure3. Three things influence the width of a CI ( " , N, SD)4. Our CI estimated from sample data may or may not

    contain the population average.

    5.

    Overlap in CIs is a quick way to check for statisticalsignificance. However, the term significance ismore related to hypothesis testing.

    32