Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always...

64
Statistics

Transcript of Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always...

Page 1: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Statistics

Page 2: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

A Word on Statistics - Wislawa SzymborskaOut of every hundred people,those who always know better:fifty-two. Unsure of every step:almost all the rest. Ready to help,if it doesn't take long:forty-nine. Always good,because they cannot be otherwise:four -- well, maybe five. Able to admire without envy:eighteen.

Led to errorby youth (which passes):sixty, plus or minus. Those not to be messed with:four-and-forty. Living in constant fearof someone or something:seventy-seven. Capable of happiness:twenty-some-odd at most.

Harmless alone,turning savage in crowds:more than half, for sure.

Page 3: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Cruelwhen forced by circumstances:it's better not to know,not even approximately. Wise in hindsight:not many morethan wise in foresight. Getting nothing out of life except things:thirty(though I would like to be wrong). Balled up in painand without a flashlight in the dark:eighty-three, sooner or later.

Those who are just:quite a few, thirty-five. But if it takes effort to understand:three. Worthy of empathy:ninety-nine. Mortal:one hundred out of one hundred --a figure that has never varied yet.

Page 4: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.
Page 5: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.
Page 6: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.
Page 7: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.
Page 8: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Today

• Introduction to statistics

• Looking at our qualitative data in a quantitative way

• More exploration of the data

• Presentations

• Tutorials

Page 9: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Why statistics are importantStatistics are concerned with difference – how much

does one feature of an environment differ from another

Suicide rates/100,000 people

Page 10: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Why statistics are importantRelationships – how does much one feature of the environment

change as another measure changes The response of the fear centre of white people to black faces

depending on their exposure to diversity as adolescents

Page 11: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

The two tasks of statisticsMagnitude: What is the size of the difference or the

strength of the relationship?

Reliability. What is the degree to which the measures of the magnitude of variables can be replicated with other samples drawn from the same population.

Page 12: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Magnitude – what’s our measure?

Suicide rates/100,000 people

• Raw number? Rate?• Some aggregate of numbers? Mean, median, mode?

Page 13: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Arithmetic mean or averageMean (M or X), is the sum (X) of all the sample values ((X1 +

X2 +X3.…… X22) divided by the sample size (N). Mean/average = X/N - Carbon footprint scores

63 71 75 78 80 85

64 72 75 79 81 85

66 73 75 79 81 85

67 73 75 79 83 86

68 74 76 79 84 89

70 74 76 80 84 90

70 74 77 80 84 92

71 77 84

Page 14: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Compute the mean

Total Polynesian Other

Total (X) 3483 971 2512

N 45 13 32

mean 77.4 74.7 78.5

Page 15: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

The median• median is the "middle" value of the sample. There are

as many sample values above the sample median as below it.

• If the number (N) in the sample is odd, then the median = the value of that piece of data that is on the (N-1)/2+1 position of the sample ordered from smallest to largest value. E.g. If N=45, the median is the value of the data at the (45-1)/2+1=23rd position

• If the sample size is even then the median is defined as the average of the value of N/2 position and N/2+1. If N=32, the median is the average of the 32/2 (16th) and the 32/2+1(17th) position. Why use the median?

Page 16: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Other measures of central tendency

• The mode is the single most frequently occurring data value. If there are two or more values used equally frequently, then the data set is called bi-modal or tri-modal, etc

• The midrange is the midpoint of the sample - the average of the smallest and largest data values in the sample.

• The geometric mean (log transformation) and the harmonic mean (inverse transformation) – both used where data is skewed with the aim of creating a more even distribution

Page 17: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Compute the median and mode63 71 75 78 80 85

64 72 75 79 81 85

66 73 75 79 81 85

67 73 75 79 83 86

68 74 76 79 84 89

70 74 76 80 84 90

70 74 77 80 84 92

71 77 84

Page 18: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Mean, median, mode, mid-range

Total Polynesian Other

Total 3483 971 2512

N 45 13 32

mean 77.4 74.7 78.5

median 77 75 78.5

mode 75, 79, 84 81 84

midrange 77 75 77

Page 19: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.
Page 20: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

The underlying distribution of the data

Page 21: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Normal distribution

Page 22: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Three things we must know before we can say events are different

1. the difference in mean scores of two or more events

- the bigger the gap between means the greater the difference

2. the degree of variability in the data

- the less variability the better

Page 23: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Variance and Standard DeviationThese are estimates of the spread of data. They

are calculated by measuring the distance between each data point and the mean

variance (s2) is the average of the squared deviations of each sample value from the mean = s2 = X-M)2/(N-1)

The standard deviation (s) is the square root of the variance.

Page 24: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Calculating the

Variance and the standard deviation

for the Polynesian

sample

x (x-Mx) (x-Mx)2

64 -10.7 114.3

66 -8.7 75.6

67 -7.7 59.2

70 -4.7 22.0

71 -3.7 13.6

74 -0.7 0.5

75 0.3 0.1

77 2.3 5.3

79 4.3 18.6

80 5.3 28.2

81 6.3 39.8

81 6.3 39.8

86 11.3 127.9

Total 971 544.8

Mean (Mx) 74.7 Variance = sx2 41.9

Nx 13 Standard deviation = sx 6.5

Page 25: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

All normal distributions have similar properties. The percentage of the scores that is between one standard

deviation (s) below the mean and one standard deviation above is always 68.26%

Page 26: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Is there a difference between Polynesian and “other” scores

Page 27: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Is there a significant difference between Polynesian and “other” scores

Page 28: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Three things we must know before we can say events are different

3. The extent to which the sample is representative of the population from which it is drawn

- the bigger the sample the greater the likelihood that it represents the population from which it is drawn

- small samples have unstable means. Big samples have stable means.

Page 29: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Estimating difference The measure of stability of the mean is the Standard

Error of the Mean = standard deviation/the square root of the number in the sample.

So stability of mean is determined by the variability in the sample (this can be affected by the consistency of measurement) and the size of the sample.

The standard error of the mean (SEM) is the standard deviation of the normal distribution of the mean if we were to measure it again and again

Page 30: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Yes it’s significant. The mean of the smaller sample is not too variable. Its Standard Error of the Mean = 6.5/√13 = 1.80. The 95% confidence interval =1.96 SDs = 3.52. This gives a range

from 71.2 to 78.2. The “Other” mean falls just outside this confidence interval

PolynesianMean =74.7 SD=6.5N= 13

Distribution of Standard error of the mean

Page 31: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Is the difference between means significant?

What is clear is that the mean of the Other group is just outside the area where there is a 95% chance that the mean for the Polynesian Group will fall, so it is likely that the Other mean comes from a different population as the Polynesian mean.

The convention is to say that if mean 2 falls outside of the area (the confidence interval) where 95% of mean 1 scores are estimated to be, then mean 2 is significantly different from mean 1. We say the probability of mean 1 and mean 2 being the same is less than 0.05 (p<0.05) and the difference is significant

p

Page 32: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

The significance of significance• Not an opinion• A sign that very specific criteria have been met• A standardised way of saying that there is a

There is a difference between two groups – p<0.05;There is no difference between two groups – p>0.05;There is a predictable relationship between two

groups – p<0.05; orThere is no predictable relationship between two

groups - p>0.05.

• A way of getting around the problem of variability

Page 33: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

If you argue for a one

tailed test – saying the

difference can only be in one direction, then you can add 2.5% error

from the side where no data is expected to the side where

it is

2.5% of M1

distri-bution

2.5% of M1

distri-bution

95% of M1

distri-bution

2-tailed test

1-tailed test

-1.96 +1.96 Standard deviations

One and two tailed

tests

Page 34: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

If we were to argue for a one tailed test – that Polynesian people were more eco-sustaintable, than the Others – the 95% confidence interval can all be to the left of the of the SEM distribution rather than equally distributed on either side. This means that instead of going to 47.5%

line on the right we go to the 45% line = 1.65 SDs or 3.0 units Normal distribution

PolynesianMean =74.7 SD=6.5N= 13

Distribution of Standard error of the mean

Page 35: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

T-testst = (Mx-My)/Sx2/Nx + Sy2/Ny;

where t is value generated and :

Mx= the mean carbon footprint of participants with higher incomes

My= the mean carbon footprint of participants with moderate to low incomes

Sx2=the variance of the carbon footprint of participants with higher incomes

Sy2= the variance of the carbon footprint of participants with moderate to low incomes

Nx=the number of participants with higher incomes

Ny=the number of participants with moderate to low incomes

Page 36: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

T-test result. This does exactly what we have done except it argues that in every sample the first data point is fixed and that other data points are free to vary in relation to it. Consequently, when estimating variance we should divide by (N-1) not N. That makes this test more conservative.

t-Test: Two-Sample Assuming Unequal Variances

  Polynesian OtherMean 74.69 78.50Variance 45.397 44.26Observations 13 32Hypothesized Mean Difference 0Degrees of freedom (df) 43t Stat -1.73p(T<=t) one-tail 0.045t Critical one-tail + or -1.68p(T<=t) two-tail 0.090t Critical two-tail + or - 2.02  

Page 37: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Impact of gender on safetyt-Test: Two-Sample Assuming Unequal Variances

  women menMean 2.05 1.58Variance 0.95 0.99Observations 21.00 12.00Hypothesized Mean Difference 0.00df 23.00t Stat 1.30P(T<=t) one-tail 0.10t Critical one-tail 1.71P(T<=t) two-tail 0.21t Critical two-tail 2.07  

Page 38: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Impact of religion on safetyt-Test: Two-Sample Assuming Unequal Variances

  no religion religionMean 1.81 2.00Variance 1.23 0.86Observations 16.00 15.00Hypothesized Mean Difference 0.00df 29.00t Stat -0.51P(T<=t) one-tail 0.31t Critical one-tail 1.70P(T<=t) two-tail 0.61t Critical two-tail 2.05 

Page 39: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Impact of work on safetyt-Test: Two-Sample Assuming Unequal Variances

 work in MPHS

not working in MPHS

Mean 2.43 1.47Variance 0.26 1.15Observations 14.00 19.00Hypothesized Mean Difference 0.00df 27.00t Stat 3.39P(T<=t) one-tail 0.00t Critical one-tail 1.70P(T<=t) two-tail 0.00t Critical two-tail 2.05  

Page 40: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Correlations and Chi-square

Page 41: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

The correlation with the glacier went unnoticed.The debate proceeded and receded with slow heated monotonous cold regularityalthough never reversing at the same point of disagreement.

The correlation with the glacier went. . . The weight of paper and opinionnow far-exceeding the frozen mountain, even at its zenith.But no amount of FSC vellum could paper over the crevasse cracked argument.

The correlation with the glacier . . . . The blue-green water vein bled But no aerial artery replenished the source.The constant melt etching the messageof increased bloodletting from the waning carcase

Page 42: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

The correlation with the . . . . . Lost in the science of the unknown.The pre-historic signpost, scarred by graffiti,slowly shrank and collapsedIts incremental deficit matched by political will.

The correlation . . . . . .We are,    we were,    the new dinosaurs,like the sun-burnt beached bergdoomed for demise in the new non-ice age. No-one will record its disappearance or ours.

The correlation with humanity went unnoticed.

Correlation by John S http://allpoetry.com/poem/9257026-Correlation-by-JohnS

Page 43: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Chi-square test - comparing MPHS samples with the local populations

• Looks at the magnitude or size of the difference between observed and expected values (O-E) and then squares those differences to they are all positive - (O-E)2,

• Adjust those differences so they are relative to the size of the expected values - (O-E)2/E. This is a variance measure and takes care of effects that are due to the size of the expected value, which in turn is related to the sample size.

• Calculates a chi-square value which is the sum of the adjusted differences ( S(O-E)2/E)=14.03). This is compared with the value that chi-squared would have to reach to be significant for the number of categories used (n).

• The question: Is the MPHS sample representative of the cultural mix of the MPHS population?

Page 44: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

What would we predict?

In red are the number of participants we would predict (we EXPECT) based on the percent in each category in the MPHS population (2006). In blue is what we got (we OBSERVED). Is the match sufficiently close?

MPHS sample population

Age 18-30 9 27% 48%

31-40 13 39% 22%

41-50 9 27% 18%

>50 2 6% 13%

33 100%

Page 45: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Does the MPHS sample match the population age distribution?

Age O E O-E (0-E)2 (0-E)2/E18-30 9 16 -6.69 44.72 2.8531-40 13 7 5.82 33.89 4.72

41-50 9 6 3.09 9.54 1.61>50 2 4 -2.22 4.94 1.17

chi-square= 10.35

Degrees of freedom = N-1 = 3, where N=the number of parametres not the nunumber of participantsValue of chi-square (χ2) for p<0.05=7.81Actual χ2 is more than 7.81, therefore there is a significant difference between the MPHS sample and MPHS populationChi-square table click here to get the Chi-Square table

Page 46: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Does the MPHS sample match the population age distribution?

Children O E O-E (0-E)2 (0-E)2/ENo Children 6 10 -3.79 14.33 1.46One Child 10 5 5.00 25.02 5.01Two Children 7 7 0.08 0.01 0.00Three Children 6 5 1.46 2.12 0.47Four Children 1 2 -1.48 2.19 0.88Five Children 1 1 -0.08 0.01 0.01Six or More Children 0 1 -1.19 1.41 1.19

chi-square= 9.02

Degrees of freedom = N-1 = 6, where N=the number of parametres not the number of participantsValue of chi-square (χ2) for p<0.05=12.59

Actual χ2 is less than 12.59, therefore there is no significant difference between the MPHS sample and MPHS population

Page 47: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

r=0.904N=33p<0.00

Page 48: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

r =( (X – MX)*((Y – MY))/(N*SX*SY)

X = GDP purchasing power in $'000s

Y= Better Life Index (0-10)

MX=Mean of X = 25,200

MY =Mean of Y= 6.34

SX=Standard deviation of X=7.02

SY=Standard deviation of Y=1.44

r =correlation coefficient = +0.90

Correlations

Page 49: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

One or two tails? Have we made a prior prediction? Yes, that life satisfaction will increase with wealth = 1 tailed test

What degrees of freedom? df=N-1= 33-1 = 32

What level of significance should be chosen? It depends on the number of correlations. p<0.05 – there is only one correlation. Often there 100’s – in which case a tougher criterion should be chosen.

Where can we find the critical values of r? HERE

Page 50: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

  Children Adultstotal people in household

total safety

Children: 1.00Adults 0.18 1.00total people in household 0.70 0.83 1.00total safety -0.09 -0.24 -0.22 1.00

p<0.05, df=30, r=0.349

Correlation felt safety and people in the household

Page 51: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

• http://www.medcalc.org/manual/chi-square-table.php

Page 52: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Correlation and regression• Correlation quantifies the degree to which two

random variables are related. Correlation does not fit a line through the data points. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does.

• Linear regression finds the best line that predicts the size of one variable when given another variable which is fixed. The regression co-efficient (r2) tells how much of the variability of our fixed (dependent) variable is accounted for by the independent variable

Page 53: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Correlations

Page 54: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

A perfect relationship, but not a linear correlation

x

y

Page 55: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

A powerful relationship,

but not a correlation – what’s

happening here?

Page 56: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Normality of the data and Homoscedasticity

Page 57: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

r=0.904N=33p<0.00

Page 58: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.
Page 59: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

How correlation is used and misused

Page 60: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Tests of significance

• Tests of difference – t-tests, analysis of variance, chi-square, odds ratios

• Tests of relationship – correlation, regression analysis

• Tests of difference and relationship – analysis of covariance, multiple regression analysis.

Page 61: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

Inferential statistics

Page 62: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.

How safe do MPHS people feel?1. Feeling safe in their own home: yes=1, no=0

2. Feeling safe in their local part of MPHS: yes =1, no=0

3. Feeling safe in MPHS generally: yes=1, no=0

Total safety score = add 1-3. range=0 to 3.

If people don’t refer to 1. above, score it as =1,

If people score 0 on 2, they must be 0 on 3.

Page 63: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.
Page 64: Statistics. A Word on Statistics - Wislawa Szymborska Out of every hundred people, those who always know better: fifty-two. Unsure of every step: almost.