SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

20
SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7

Transcript of SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Page 1: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

SADC Course in Statistics

Processing single and multiple variables

Module I3 Sessions 6 and 7

Page 2: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Learning objectives

• Students should be able to:

• Provide and interpret the appropriate summary

statistics • for practical examples of quantitative data.

• Relate the general ideas of statistics • in relation to variability, with the measures of variability

• Recognise the role of statistics in “taming” variability

• Construct and interpret a simple analysis of variance (ANOVA) table

• Explain why both the standard deviation and variance • are used to summarise variation

Page 3: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Contents

• Activity 1: This presentation

• Activity 2: Practical 1 - Review• A further quick check that you are comfortable• with the summary statistics

• Activity 3: Practical 2• Apply the summaries to real data• And see what happens when there are outliers, etc

• Activity 4: Practical 3• Processing multiple variables• To see whether variation can be explained• Which introduces the “Analysis of Variance” (ANOVA)• As a descriptive tool

• Activity 5: Review of the ideas

Page 4: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Why variation is SO important

• From D. S. Moore•In Statistics: A Guide to the Unknown – 4th Edition

• “Variation is everywhere•Individuals vary.•Repeated measurements on the same individual vary.

• The science of statistics

• provides tools for dealing with variation”

• These are the tools we examine here

Page 5: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

CAST and summary statistics

We continue to use CAST in these sessions

Page 6: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

The aim of Practical 1

• Understanding simple formulae remains important

• See the examples in this session

• e.g. stdev, mdev, cv

• Can you calculate them in Excel

• Using built-in functions

• Or from first principles?

Page 7: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Practical 1 – using built-in functions

Example using data from the

statistics glossary

These terms all use Excel’s built-

in functions

As we show on the next slide

Page 8: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Practical 1 - review

The statistics as Excel functions

The terms should be (or

become) familiar

Page 9: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Practical 1 - continued

Excel functions From first principles

Page 10: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

DFID and climate – again!

Reducing the vulnerability of the poor to current climate variability is the starting point for adaptation to climate change.

Climatic variability is a fundamental driver of poverty in poor countries. The climate is changing and it is highly likely that it will worsen poverty and hinder efforts to achieve the Millennium Development Goals.

The poor cannot cope with current climatic variation in many parts of the world, but this issue is often ignored in poverty assessments or national development planning.

Responses to existing climatic variability should be mainstreamed into national development plans and processes.

Current responses by individuals and governments to the impacts of climate variability can be used as the basis for adaptation to the increasing climate variability that will be associated with longer-term climate change.

Interpreting variability is so

important

Page 11: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Practical 2 – summaries for climatic data

The start of the rains is important to many people

And is very variable from year to year

Consider the effect of “oddities” on the summary values

Page 12: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Practical 3 – Introducing ANOVA

• The example of rice yields is used

• The yields are very variable• The lowest is less than 20 (t/ha*10)• The highest is more than 60 (t/ha*10)• The standard deviation = 11 (t/ha*10)

• The farmers use different varieties

• Could knowing the variety explain some of the variation?

• Variation is not so much the problem

• Unexplained variation IS the problem

Page 13: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

You use Excel and CAST

ANOVA table

Sums of squares

Degrees of freedom

Mean squares

Page 14: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

The terms and an example

Page 15: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Understanding the terms

Total corrected sum of squares

devsq function in Excel – practical 1

Overall mean square

This IS the variance

d.f. = (n-1)

Page 16: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Understanding the terms continued

Residual (unexplained) or within groups sum of squares

Is much smaller than the overall SS

Residual mean square (residual variance)

Is therefore also much smaller than the overall variance

Page 17: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Understanding the terms continued again

Overall standard deviation = √18.07 = 4.25

Residual (unexplained) standard deviation = √4.97 = 2.2

Is correspondingly much smaller

Page 18: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Most used measures of variation

• This example is why the standard deviation• and the variance

• Are the most used measures of variation• Even if they are not so simple to interpret

• You can “do arithmetic” with them

• You can split the variation • into explained • and unexplained

• using these terms

• This doesn’t work with the quartiles • or the mean deviation

Page 19: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Learning objectives

• Are you now able to:

• Provide and interpret the appropriate summary

statistics • for practical examples of quantitative data.

• Relate the general ideas of statistics • in relation to variability, with the measures of variability

• Recognise the role of statistics in “taming” variability

• Construct and interpret a simple analysis of variance (ANOVA) table

• Explain why both the standard deviation and variance • are used to summarise variation

Page 20: SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.

Now you know more about variability

The next sessions show how to interpret the results as statements of risk etc