MLOS forecasting

19
Introduction Data processing Methods Results Conclusion and recommendations Notes on the development of an experimental seasonal MLOS forecasting scheme for the Pacific Islands Nicolas Fauchereau 1,2 Scott Stephens 1 Nigel Goodhue 1 Rob Bell 1 Doug Ramsay 1 [email protected] 1 NIWA Ltd., Auckland, New Zealand 2 Oceanography Dept., University of Cape-Town, Cape-Town, South Africa June 20, 2013 1/19

description

Notes on the development of an experimental MLOS forecasting scheme for the Pacific Islands

Transcript of MLOS forecasting

Page 1: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Notes on the development of an experimentalseasonal MLOS forecasting scheme for the Pacific

Islands

Nicolas Fauchereau 1,2 Scott Stephens 1 Nigel Goodhue 1

Rob Bell 1 Doug Ramsay 1

[email protected]

1NIWA Ltd., Auckland, New Zealand2Oceanography Dept., University of Cape-Town, Cape-Town, South Africa

June 20, 2013

1/19

Page 2: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Table of contents

1 Introduction

2 Data processingMean Level of the Sea anomalies (MLOS)Predictors sets

IndicesSST EOFs

3 MethodsRegressionClassification

4 Results

5 Conclusion and recommendations

2/19

Page 3: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Introduction

RationaleSet out in the “White Paper”

high impact from sea level extremesvalue in developing an “extreme calendar”extreme tides + NTR (MLOS + “high frequency”)

GoalCompared to existing PEAC scheme:

Extend coverage to non-US affiliated IslandsFrequency: every month for the coming 3 months (IslandClimate Update)Performance of the model, type of forecast (probabilistic ?)

3/19

Page 4: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Introduction

ObjectiveProvide recommendations:

Data processing, predictandChoice of the set of predictorsStatistical methods for predictionOperational Implementation

ImplementationFor 3 Islands in the Pacific (presenting wide range of variability):

”Hindcast”: forecast for T+1 to 3 using information at T0(e.g. May for June-August)Different predictorsDifferent methods (state of the art Machine Learning)

4/19

Page 5: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Sea-Level-records

GuamCoordinates (144.7833 W., 13.4500 N.)1948-03-10 to 2008-12-31proportion of days missing: 12 %

Kiribari, TarawaCoordinates (172.9300 W., 1.3625 N.)1974-05-03 to 2012-07-30proportion of days missing: 8 %

Cook Islands, RarotongaCoordinates (200.2147 W., 21.2048 S.)1977-04-24 to 2011-08-31proportion of days missing: 2 %

5/19

Page 6: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Sea-Level-recordsHourly sea-level (cm), tidal and high frequency componentremoved (Scott, Nigel, Rob)

1 Daily then Monthly averages2 Series truncated before 1979-1-13 Climatology over 1979-20084 3-points running averages of monthly anomalies WRT

climatology

1979 1984 1989 1994 1999 2004 20090.25

0.20

0.15

0.10

0.05

0.00

0.05

0.10

0.15

0.20MLOS Seasonal Time-series

Guam

Kiribati

Cooks

6/19

Page 7: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Sea-Level-records

5 categories (”labels”) for classification algorithms:1 ”well below” = (−inf, −0.15]: labelled -22 ”below” = (−0.15, −0.05]: labelled -13 ”normal” = (−0.05, +0.05]: labelled 04 ”above” = (+0.05, +0.15]: labelled 15 ”well-above” = (+0.15, inf): labelled 2

7/19

Page 8: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Predictors sets

Choice of the predictors set is dictated by:

Relevance:Need to reflect plausible physical relationships betweenOcean-Climate system and Sea-Level.Operational constraints:Must be available in near real time (within the first 5 days ofMonth 1 for forecast Season Month 1 - Month 3).

8/19

Page 9: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Indices

Indices of SST and Atmospheric variables, monthly time-scale:

NINOS (1+2, 3.4, 3, 4): from CPCSouthern Oscillation Index (SOI): calculated by NIWA,data from BoMEl Nino Modoki Index (EMI): calculated from ERSSTdatasetSeasonal Cycle: (first 3 harmonics on MLOS climatology)Regional SST anomalies ...

9/19

Page 10: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Indices: Regional SSTs

Regression of SST anomalies on MLOS anomalies (lead 1 month)

10/19

Page 11: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Sea-Surface-Temperatures EOFSEOF analysis of monthly anomalies of ERSST SSTs.9 first Principal Components used as predictors

11/19

Page 12: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Methods

Machine LearningRegression: continuous dependent variableClassification: discrete, categorical dependent variable

Regression1 Generalized Linear Models: Extension of linear regression

for distributions of the exponential family (Normal, Poisson,Binomial, Multinomial, etc)

Ordinary Least Square (Linear Regression)Penalized Least Square (Ridge Regression, LARS, LASSO)Logistic Regression

2 Multivariate Adaptative Regression Splines (MARS):Non-parametric multivariate regression methodModels non-linearities and interactions between predictorsSimilarities with stepwise regression and CART (ClassificationAnd Regression Trees: recursive partitioning)

12/19

Page 13: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Methods

Classification1 Logistic Regression

Binomial or multinomial (categorical) response variableModels probability of observation to belong to each class

2 Support Vector Machines (SVM)Optimal hyperplane (2 classes) or set of hyperplanes (kclasses)Kernel trick: map data to higher dimensional space to dealwith non-linearly separable classesRadial Basis Function is widely used kernel

13/19

Page 14: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Approach

All the methods referred to above are tested in turn, usingsuccessively the Indices and the SST EOFs set as predictorsApplied to Guam, Kiribati and Cooks”Best” Model selected using objective measures (i.e.R-squared) + cross-validation + expert judgmentResults for Guam only presented in details

14/19

Page 15: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Results for GuamNotes on the Guam time-series

12 % of missing valuesLarge gap October 1997 - January 1999, 26 consecutive seasonsmissingtrend from about 2002

1979 1984 1989 1994 1999 2004−0.25

−0.20

−0.15

−0.10

−0.05

0.00

0.05

0.10

0.15

0.20Guam time-series

TS minus quadratic fitOriginal Time-seriesquadratic fit

15/19

Page 16: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Results: Logistic regression (Multinomial)

Predictors set = SST PCs + seasonal cycleSuccess rate: 66.2 % (random: 20 %)

Probabilistic forecast

well-below below normal above well-above

0

1

2

3

4

5

6

7

8

9

Tim

e (

seaso

ns)

Exemple of a Multinomial Logistic regression probabilistic forecast

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Pro

b.

16/19

Page 17: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Results: MARS

Predictors set = SST PCs + seasonal cycle + damped lineartermR-squared: 0.85

1979 1984 1989 1994 1999 2004 20090.25

0.20

0.15

0.10

0.05

0.00

0.05

0.10

0.15

0.20

Guam MARS Model: Var (R2 ): 92.50 MSE: 0.0011, GCV: 0.0017, RSQ: 0.8556, GRSQ: 0.7800

observed

predicted

17/19

Page 18: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Results: Support Vector Machines

Predictors set = SST PCs + seasonal cycle + damped lineartermSuccess rate (with intermediate ”regularization” parameter):96 %

Confusion matrix

WB B N A WAWB 14 2 1 0 0B 0 64 1 0 0N 0 2 117 1 0A 0 0 2 85 0WA 0 0 0 3 4

18/19

Page 19: MLOS forecasting

Introduction Data processing Methods Results Conclusion and recommendations

Conclusion and recommendations

For regression (continuous): MARS with SST EOFsFor classification (categorical): SVM with SST EOFshow to deal with (non-linear) trend ? here we used a dampedlinear term, but bit of a ad-hoc solutionInclude Pacific Decadal OscillationEnsemble techniques (Random Forests, bagging, boosting) forclassifications ?Hybrid predictor set ? EOF on enhanced indices setLength of the time-series (30 years is really minimum)

19/19