SADC Course in Statistics Time Series: An Introduction (Session 01)

23
SADC Course in Statistics Time Series: An Introduction (Session 01)

Transcript of SADC Course in Statistics Time Series: An Introduction (Session 01)

Page 1: SADC Course in Statistics Time Series: An Introduction (Session 01)

SADC Course in Statistics

Time Series:An Introduction

(Session 01)

Page 2: SADC Course in Statistics Time Series: An Introduction (Session 01)

2To put your footer here go to View > Header and Footer

Time Series Learning ObjectivesBy the end of the next 4 sessions, devoted totime series, you will be able to• appreciate the broader concept of data

where time is a factor• understand basic time series concepts and

terminology • be able to decompose a time series to look

at trends and seasonal effects, and do simple forms of forecasting

• be able to concisely summarize results of time series analysis in writing

Page 3: SADC Course in Statistics Time Series: An Introduction (Session 01)

3To put your footer here go to View > Header and Footer

Learning Objectives – this session

By the end of this session, you will be able to

• give examples of data collected over time

• state objectives of a time series analysis

• appreciate the importance of graphing data

• interpret key features emerging from an examination of a time series

• report main findings from a graphical presentation of time series data

Page 4: SADC Course in Statistics Time Series: An Introduction (Session 01)

4To put your footer here go to View > Header and Footer

Basics: Definitions and Notation• A time series is a collection of observations

made sequentially through time

• Such observations may be denoted by

Y1 , Y2 ,Y3 , … Yt , … , YT

observation at time t

since data are usually collected at discrete points in time

• The interval between observations can be any time interval (hours within days, days, weeks, months, years, etc).

Page 5: SADC Course in Statistics Time Series: An Introduction (Session 01)

5To put your footer here go to View > Header and Footer

Some areas of applications• Time series can occur in a wide range of

fields – from economics to sociology, meteorology to financial investment, etc

• Some examples of time series are:– Monthly closings of the stock exchange

index– Malaria incidence or deaths over calendar

years– Daily maximum temperatures– Hourly records of babies born at a

maternity hospital

• Can you suggest other examples?

Page 6: SADC Course in Statistics Time Series: An Introduction (Session 01)

6To put your footer here go to View > Header and Footer

Basics: Types of time series• Observations made continually in time give

rise to a Continuous Time Series, e.g.– Thermometer readings at a Met station

(continuously measured)– Measurement of whether air pollution reached

increasing levels of unacceptability at an industrial site (air pollution levels are continuous)

• More often, observations are taken only at specific points in time, giving rise to a Discrete Time Series, e.g.– annual number of road accidents (discrete)– maximum daily temperature (continuous)– whether or not there was daily rain (binary)

Page 7: SADC Course in Statistics Time Series: An Introduction (Session 01)

7To put your footer here go to View > Header and Footer

Objectives of a time series• Description (often with monitoring data)

– Merely to describe the patterns over time

• Explanation– Can the pattern observed over time be explained

in terms of other factors or causes? Helps in understanding the behaviour of the series

• Prediction (forecasting)– Can past records help us to predict what will

happen in the future?

• Improving the past system/behaviour– If factors affecting the behaviour of a variable

over time can be identified, action may be taken to improve the system, e.g. action over increasing levels of air pollution

Page 8: SADC Course in Statistics Time Series: An Introduction (Session 01)

8To put your footer here go to View > Header and Footer

Analysing Series with time element • Where the time element is just incidental, it

may not be necessary to use a formal time series analysis approach– e.g. start of the rainy season each year at a

tobacco farm

• The analysis used depends on the objective(s) of the study

• It can vary from just descriptive methods to more advanced analysis approaches

• In these time series sessions, we will largely concentrate on simple approaches.

Page 9: SADC Course in Statistics Time Series: An Introduction (Session 01)

9To put your footer here go to View > Header and Footer

Approach in this session

• We begin with some examples showing the importance of graphing the data to get an insight into the distribution over time

• For other examples, refer to 2.1.1 in

CAST for SADC – Higher Level

• We then summarise some lessons that can be learnt from graphing the data in time

Page 10: SADC Course in Statistics Time Series: An Introduction (Session 01)

10To put your footer here go to View > Header and Footer

Jumping to conclusions from raw dataData (interval-scale): Company profits (‘000 dollars)Objective: To study changes in profit figures over consecutive quarters

Year Quarter1

Quarter2

Quarter3

Quarter4

1 667 631 675 699

2 739 695 751 779

3 823 795 835 875

4 931 855 939 967

Impression is that the 4th quarter is always higher than the 1st quarter

Page 11: SADC Course in Statistics Time Series: An Introduction (Session 01)

11To put your footer here go to View > Header and Footer

Take a look again…

Previous impression is largely because there is a general increase over time

600

700

800

900

1000

1993 1994 1995 1996 1997 1998

Year

Pro

fits

(in

'000

$)

Page 12: SADC Course in Statistics Time Series: An Introduction (Session 01)

12To put your footer here go to View > Header and Footer

Objective: to emphasize the need for graphing distributions in order to get a clearer understanding of the data distribution

Day 1 Day 2 Day 3

Mean 20.81 20.81 20.81

Std Dev 0.72 0.72 0.72

Jumping to conclusions from summaries

Data source: Petruccelli, J; MSOR Connections Vol 7 No 2, 2007

Data (interval-scale): Breaking strengths of parcel string tested on a piece selected every 5 minutes from one spool during production. 100 samples from each of 3 different days (simulated data)

Summary statistics identical!

Page 13: SADC Course in Statistics Time Series: An Introduction (Session 01)

13To put your footer here go to View > Header and Footer

Take a look again…Day2

19

20

21

22

23

0 20 40 60 80 100

Consecutive samples

Bre

akin

g S

tren

gth

s

• The distributions are definitely different!

Day1

19

20

21

22

23

0 20 40 60 80 100

Consecutive samples

Bre

akin

g s

tren

gth

s

Day3

19

20

21

22

23

0 20 40 60 80 100

Consecutive samples

Bre

akin

g s

tren

gth

Page 14: SADC Course in Statistics Time Series: An Introduction (Session 01)

14To put your footer here go to View > Header and Footer

Discussion exercise • Level of data for analysis depends on

objectives– Level : time period

» Botswana hours of sunshine data

– Level: Local, National, International» Malaria incidence with rainfall pattern

relationship (between variables)» Malaria incidence comparisons

(between countries)

• In small groups, study the information on slides 15-20. Discuss what the graphs indicate and report back to the whole class after 20 minutes.

Page 15: SADC Course in Statistics Time Series: An Introduction (Session 01)

15To put your footer here go to View > Header and Footer

Zambia Rainfall Data

Problem: Farmers in Southern Zambia are moving out of the province because they believe that climate change is affecting farm production.

A local NGO promoting Conservation Farming insists that the problem is due to bad farming practice.

Study commissioned to investigate the problem; one of the events investigated was “Start of the Rains” ( defined as >20mm of rainfall in 3 days, after 15 November)

Page 16: SADC Course in Statistics Time Series: An Introduction (Session 01)

16To put your footer here go to View > Header and Footer

Start of RainsObjective: to investigate if there has been any change in start of the rainy season in Southern Zambia

Data source: Moorings Station, Monze, Southern Zambia Data (interval-scale): “Start of Rains” calculated as day number (from July 1st) of the first 3-day spell with >20 mm rain after November 15th

What is your answer to the question?

120

140

160

180

200

1920 1940 1960 1980 2000

Season

Da

y n

um

be

r w

ith

>2

0m

m r

ain

ov

er

3 d

ay

s

Page 17: SADC Course in Statistics Time Series: An Introduction (Session 01)

21To put your footer here go to View > Header and Footer

Lessons summarised• The level to which the data needs to be

summarised before analysis depends on the objective(s) of the study

• The specific analysis depends on the objectives - a descriptive analysis will often be sufficient

• Different levels of data will be needed depending on whether the problem is being looked at the international level, national level or local level

• Imperative however that quality data be made accessible to ensure that conclusions arising from the analysis are correct.

Page 18: SADC Course in Statistics Time Series: An Introduction (Session 01)

22To put your footer here go to View > Header and Footer

Time Plots• This is a plot of the measurement of interest

against the time of the observation

• No matter what you decide is the appropriate way to analyse your data, the time factor must not be ignored.

• As we have seen in the examples considered in this session, it is very important to start the exploration of a time series with a graphical representation of the data.

• However, there are a number of points to be kept in mind when drawing such a plot, as discussed in the next two slides

Page 19: SADC Course in Statistics Time Series: An Introduction (Session 01)

23To put your footer here go to View > Header and Footer

Choice of sampling interval

The two figures are of an ECG of a healthy woman , but whereas the bottom one is measured at a smaller interval, the top one is measured at a longer interval – and misses the peculiar peak of the heartbeat.

So the choice of the sampling interval is quite important: too frequent can be costly & too infrequent might miss out essential characteristics

Page 20: SADC Course in Statistics Time Series: An Introduction (Session 01)

24To put your footer here go to View > Header and Footer

Choice of aspect ratio

• Notice: different aspect ratios emphasize different characteristics of the series – the top one brings out the differences in the peaks while the lower one highlights the way the peaks rise and fall

Page 21: SADC Course in Statistics Time Series: An Introduction (Session 01)

25To put your footer here go to View > Header and Footer

To join or not to join

Same data as in slide 18 but without the points joined up

30

50

70

90

110

1975 1980 1985 1990 1995 2000

year

percent1

percent3

Page 22: SADC Course in Statistics Time Series: An Introduction (Session 01)

26To put your footer here go to View > Header and Footer

To join or not to join

Advantage of joining – usually easier to digest

Disadvantage – gives impression of continuity; definitely a risk when missing values exist

Return now to example on slide 15 for some practical work in order to ensure learning objectives are achieved…

(Details are outlined in Practical 01)

Page 23: SADC Course in Statistics Time Series: An Introduction (Session 01)

27To put your footer here go to View > Header and Footer

With reference to slide 19, note that: “The World Health Organisation does not warrant that the information contained in the web site is complete and correct and shall not be liable whatsoever for any damages incurred as a result of its use”.

The WHO website further add “Extracts of WHO information can be used for private study or for educational purposes without permission. Wider use requires permission to be obtained from WHO”.