A Brief Introduction to Statistical Forecasting

35
A Brief Introduction to Statistical Forecasting Kevin Werner

Transcript of A Brief Introduction to Statistical Forecasting

Page 1: A Brief Introduction to Statistical Forecasting

A Brief Introduction to Statistical Forecasting

Kevin Werner

Page 2: A Brief Introduction to Statistical Forecasting

Outline

• Principle Component Theory• Applications• Z Score• VIPER

Page 3: A Brief Introduction to Statistical Forecasting

Statistical regression

Basic Forecast Methods

May 1 snowpack % avg

Apr-

Jul s

trea

mflo

w %

avg

S Fork Rio Grande, Colo

Snowpack

Soil water

Snow

Rainfall

Runoff

Heat

Simulation modeling

Credit: Tom Pagano

Page 4: A Brief Introduction to Statistical Forecasting

The General Linear Regression Model

where:Y = dependent variableXi = independent variables

bi = regression coefficients

n = number of independent variables

n

iii XbbY

10

Credit: Dave Garen

Page 5: A Brief Introduction to Statistical Forecasting

The Problem

If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated.However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.

n

iii XbbY

10

Credit: Dave Garen

Page 6: A Brief Introduction to Statistical Forecasting

ExampleStreamflow = bo + b1 * (Snotel A) + b2 * (Snotel B)

-> Snotel sites are very well correlated-> An optimal b1 and b2 will be difficult to determine since the correlation is so strong

Page 7: A Brief Introduction to Statistical Forecasting

The Solution

Possibilities:1) Pre-combine X’s into composite index(es), e.g., Z-score method2) Principal components regressionThese are similar in concept but differ in the mathematics.

n

iii XbbY

10

Credit: Dave Garen

Page 8: A Brief Introduction to Statistical Forecasting

Principal Components Analysis

Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables.

Principal components are linear combinations of the X’s.

Credit: Dave Garen

Page 9: A Brief Introduction to Statistical Forecasting

Principal Components AnalysisEach principal component is a weighted sum of all the X’s:

n

jjjXePC

111

n

jjjXePC

122

n

jjnjn XePC

1

. .

.

Credit: Dave Garen

Page 10: A Brief Introduction to Statistical Forecasting

Principal Components Analysis

The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other.

Principal components are new variables that are not correlated with each other.

The principal components transformation is equivalent to a rotation of axes.

Credit: Dave Garen

Page 11: A Brief Introduction to Statistical Forecasting

Principal Components Analysis

0

2

4

6

8

10

12

14

16

18

20

0 10 20 30 40 50 60 70 80

X1

X 2

R2 = 0.698R = 0.836

PC1 = e11 X1 + e12 X2

PC2 = e21 X1 + e22 X2

Credit: Dave Garen

Page 12: A Brief Introduction to Statistical Forecasting

Principal Components Analysis

The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true).

Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.

Credit: Dave Garen

Page 13: A Brief Introduction to Statistical Forecasting

Credit: Dennis Hartmann

Page 14: A Brief Introduction to Statistical Forecasting

Principal Components Analysis -- Example

Independent Variables:

X1 – X5 Snow water equivalent at 5 stations

X6 – X10 Water year to date precipitation at 5 stations

X11 Antecedent streamflow

X12 Climate teleconnection index

Credit: Dave Garen

Page 15: A Brief Introduction to Statistical Forecasting

Correlation MatrixX1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Y

X1 1.0

.72

.67

.76

.81

.54

.31

.54

.38

.50

.18

.64

.65

X2 1.0

.67

.45

.80

.62

.45

.47

.31

.49

.14

.39

.60

X3 1.0

.49

.72

.84

.76

.86

.68

.85

.48

.56

.80

X4 1.0

.62

.42

.26

.36

.56

.38

.28

.59

.68

X5 1.0

.62

.49

.51

.44

.62

.32

.59

.73

X6 1.0

.93

.87

.83

.90

.63

.43

.85

X7 1.0

.82

.85

.90

.67

.32

.76

X8 1.0

.74

.84

.64

.39

.70

X9 1.0

.80

.70

.49

.84

X10 1.0

.64

.46

.79

X11 1.0

.36

.51

X12 1.0

.64

Credit: Dave Garen

Page 16: A Brief Introduction to Statistical Forecasting

First Five Eigenvectors

PC1 PC2 PC3 PC4 PC5

X1 0.265 0.444 0.004 0.074 -0.104

X2 0.249 0.325 -0.483

-0.030

0.315

X3 0.335 0.016 -0.178

0.149 -0.314

X4 0.229 0.353 0.456 -0.595

-0.009

X5 0.287 0.332 -0.148

0.120 0.412

X6 0.339 -0.168

-0.162

-0.106

-0.040

X7 0.308 -0.329

-0.150

-0.058

-0.015

X8 0.317 -0.197

-0.114

0.027 -0.261

X9 0.304 -0.240

0.299 -0.313

-0.103

X10 0.330 -0.197

-0.197

0.072 -0.129

X11 0.235 -0.349

0.351 0.168 0.692

X12 0.232 0.262 0.473 0.675 -0.212

% var.

62.7 15.8 7.8 3.8 3.2

Credit: Dave Garen

Page 17: A Brief Introduction to Statistical Forecasting

Principal Components Regression Procedure

• Try the PC’s in order• Test for regression coefficient significance (t-test)• Stop at first insignificant component• Transform regression coefficients to be in terms

of original variables• Sign test – coefficient signs must be same as

correlation with Y

Credit: Dave Garen

Page 18: A Brief Introduction to Statistical Forecasting

Summary

• Principal components analysis is a standard multivariate statistical procedure

• Can be used for descriptive purposes to reduce the dimensionality of correlated variables

• Can be taken a step further to provide new, non-correlated independent variables for regression

• PC’s taken in order, subject to t-test and sign test

• Final model is expressed in terms of original X variables Credit: Dave Garen

Page 19: A Brief Introduction to Statistical Forecasting

Soil Moisture at the interannual timescale

• Another example demonstrating importance of land surface processes in the climate system: Werner, 1999:– GCM run with and without active

land surface model in South America to explore the importance of land surface processes in the climate system variability in the Nordeste region.

– Both simulations include full atmospheric model, slab ocean model (no ocean dynamics), and dynamic land surface model everywhere except tropical South America in the Data Land simulation.

Page 20: A Brief Introduction to Statistical Forecasting

• Modeled variability– Full dynamic land surface

model simulation contains variability resembling observed variability with connection between NH and SH SSTs.

– Fixed land surface model shows no connected variability between NH and SH SSTs

Soil Moisture at the interannual timescale

Page 21: A Brief Introduction to Statistical Forecasting

Resources

• Dave Garen VIPER slides• Dennis Hartmann lecture notes (

http://www.atmos.washington.edu/~dennis/)

Page 22: A Brief Introduction to Statistical Forecasting

What does z-score regression do?

1. Combines predictors into weighted indices,emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data.

3. Regresses index against target predictand

Credit: Tom Pagano

Page 23: A Brief Introduction to Statistical Forecasting

What is a z-score?

A z-score is a “normalized anomaly”:Z = value - average

standard deviation

Credit: Tom Pagano

Page 24: A Brief Introduction to Statistical Forecasting

What is a z-score?

A z-score is a “normalized anomaly”:Z = value - average

standard deviation

Credit: Tom Pagano

Page 25: A Brief Introduction to Statistical Forecasting

What is a z-score?

A z-score is a “normalized anomaly”:Z = value - average

standard deviation

60

135avg stdev

30

15

Credit: Tom Pagano

Page 26: A Brief Introduction to Statistical Forecasting

What is a z-score?

A z-score is a “normalized anomaly”:Z = value - average

standard deviation

60

135avg stdev

30

15

Z = (90 – 60)/15 = +2

Credit: Tom Pagano

Page 27: A Brief Introduction to Statistical Forecasting

How good are the results

Under conditions of serially compete data,and relatively “normal” conditionsPCA and Z-Score are effectively indistinguishable*

Skill and behavior is similar to the official published outlooks**

However… Any tool is a weapon if you hold it right.(aka “A fool with a tool is still a tool”)

*Viper technical note - 1 basin ** Pagano dissertation – 29 basins Credit: Tom Pagano

Page 28: A Brief Introduction to Statistical Forecasting

Super Quick Primer on VIPER

Page 29: A Brief Introduction to Statistical Forecasting

The Viper Main InterfaceLayout and interpretation

Credit: Tom Pagano

Page 30: A Brief Introduction to Statistical Forecasting

The Viper Main InterfaceLayout and interpretation

Selectingpredictors and

predictands

Global month changes

Credit: Tom Pagano

Page 31: A Brief Introduction to Statistical Forecasting

The Viper Main InterfaceLayout and interpretation

Selectingpredictors and

predictands

Predictorsquality, availability

Global month changes

Historical statisticsCredit: Tom Pagano

Page 32: A Brief Introduction to Statistical Forecasting

The Viper Main InterfaceLayout and interpretation

Selectingpredictors and

predictands

Predictorsquality, availability

Forecast vs observed time series

Station availability, weights

Global month changes

Historical statisticsCredit: Tom Pagano

Page 33: A Brief Introduction to Statistical Forecasting

The Viper Main InterfaceLayout and interpretation

Selectingpredictors and

predictands

Predictorsquality, availability

Forecast vs observed time series

Station availability, weights

Fcst vs obsscatterplot

Helpervariable

Scatterplot/Forecast

progression

Global month changes

Historical statisticsCredit: Tom Pagano

Page 34: A Brief Introduction to Statistical Forecasting

The Viper Main InterfaceLayout and interpretation

Selectingpredictors and

predictands

Predictorsquality, availability

Probabilitybounds

Forecast vs observed time series

Station availability, weights

Fcst vs obsscatterplot

Helpervariable

Scatterplot/Forecast

progression

Settings

Global month changes

Historical statisticsCredit: Tom Pagano

Page 35: A Brief Introduction to Statistical Forecasting

The Viper Main InterfaceLayout and interpretation

Probabilitybounds

Forecast vs observed time series

Station availability, weights

Fcst vs obsscatterplot

Helpervariable

Scatterplot/Forecast

progression

Settings

Historical statistics

There’s more if you scroll right:Relate any variable to another

Credit: Tom Pagano