MLOS forecasting

download MLOS forecasting

If you can't read please download the document

  • date post

    11-May-2015
  • Category

    Business

  • view

    36
  • download

    4

Embed Size (px)

description

Notes on the development of an experimental MLOS forecasting scheme for the Pacific Islands

Transcript of MLOS forecasting

  • 1.Introduction Data processing Methods Results Conclusion and recommendationsNotes on the development of an experimentalseasonal MLOS forecasting scheme for the PacicIslandsNicolas Fauchereau 1,2 Scott Stephens 1 Nigel Goodhue 1Rob Bell 1 Doug Ramsay 1Nicolas.Fauchereau@niwa.co.nz1NIWA Ltd., Auckland, New Zealand2Oceanography Dept., University of Cape-Town, Cape-Town, South AfricaJune 20, 20131/19

2. Introduction Data processing Methods Results Conclusion and recommendationsTable of contents1 Introduction2 Data processingMean Level of the Sea anomalies (MLOS)Predictors setsIndicesSST EOFs3 MethodsRegressionClassication4 Results5 Conclusion and recommendations2/19 3. Introduction Data processing Methods Results Conclusion and recommendationsIntroductionRationaleSet out in the White Paperhigh impact from sea level extremesvalue in developing an extreme calendarextreme tides + NTR (MLOS + high frequency)GoalCompared to existing PEAC scheme:Extend coverage to non-US aliated IslandsFrequency: every month for the coming 3 months (IslandClimate Update)Performance of the model, type of forecast (probabilistic ?)3/19 4. Introduction Data processing Methods Results Conclusion and recommendationsIntroductionObjectiveProvide recommendations:Data processing, predictandChoice of the set of predictorsStatistical methods for predictionOperational ImplementationImplementationFor 3 Islands in the Pacic (presenting wide range of variability):Hindcast: forecast for T+1 to 3 using information at T0(e.g. May for June-August)Dierent predictorsDierent methods (state of the art Machine Learning)4/19 5. Introduction Data processing Methods Results Conclusion and recommendationsSea-Level-recordsGuamCoordinates (144.7833 W., 13.4500 N.)1948-03-10 to 2008-12-31proportion of days missing: 12 %Kiribari, TarawaCoordinates (172.9300 W., 1.3625 N.)1974-05-03 to 2012-07-30proportion of days missing: 8 %Cook Islands, RarotongaCoordinates (200.2147 W., 21.2048 S.)1977-04-24 to 2011-08-31proportion of days missing: 2 %5/19 6. Introduction Data processing Methods Results Conclusion and recommendationsSea-Level-recordsHourly sea-level (cm), tidal and high frequency componentremoved (Scott, Nigel, Rob)1 Daily then Monthly averages2 Series truncated before 1979-1-13 Climatology over 1979-20084 3-points running averages of monthly anomalies WRTclimatology1979 1984 1989 1994 1999 2004 20090.250.200.150.100.050.000.050.100.150.20 MLOS Seasonal Time-seriesGuamKiribatiCooks6/19 7. Introduction Data processing Methods Results Conclusion and recommendationsSea-Level-records5 categories (labels) for classication algorithms:1 well below = (inf, 0.15]: labelled -22 below = (0.15, 0.05]: labelled -13 normal = (0.05, +0.05]: labelled 04 above = (+0.05, +0.15]: labelled 15 well-above = (+0.15, inf): labelled 27/19 8. Introduction Data processing Methods Results Conclusion and recommendationsPredictors setsChoice of the predictors set is dictated by:Relevance:Need to reect plausible physical relationships betweenOcean-Climate system and Sea-Level.Operational constraints:Must be available in near real time (within the rst 5 days ofMonth 1 for forecast Season Month 1 - Month 3).8/19 9. Introduction Data processing Methods Results Conclusion and recommendationsIndicesIndices of SST and Atmospheric variables, monthly time-scale:NINOS (1+2, 3.4, 3, 4): from CPCSouthern Oscillation Index (SOI): calculated by NIWA,data from BoMEl Nino Modoki Index (EMI): calculated from ERSSTdatasetSeasonal Cycle: (rst 3 harmonics on MLOS climatology)Regional SST anomalies ...9/19 10. Introduction Data processing Methods Results Conclusion and recommendationsIndices: Regional SSTsRegression of SST anomalies on MLOS anomalies (lead 1 month)10/19 11. Introduction Data processing Methods Results Conclusion and recommendationsSea-Surface-Temperatures EOFSEOF analysis of monthly anomalies of ERSST SSTs.9 rst Principal Components used as predictors11/19 12. Introduction Data processing Methods Results Conclusion and recommendationsMethodsMachine LearningRegression: continuous dependent variableClassication: discrete, categorical dependent variableRegression1 Generalized Linear Models: Extension of linear regressionfor distributions of the exponential family (Normal, Poisson,Binomial, Multinomial, etc)Ordinary Least Square (Linear Regression)Penalized Least Square (Ridge Regression, LARS, LASSO)Logistic Regression2 Multivariate Adaptative Regression Splines (MARS):Non-parametric multivariate regression methodModels non-linearities and interactions between predictorsSimilarities with stepwise regression and CART (ClassicationAnd Regression Trees: recursive partitioning)12/19 13. Introduction Data processing Methods Results Conclusion and recommendationsMethodsClassication1 Logistic RegressionBinomial or multinomial (categorical) response variableModels probability of observation to belong to each class2 Support Vector Machines (SVM)Optimal hyperplane (2 classes) or set of hyperplanes (kclasses)Kernel trick: map data to higher dimensional space to dealwith non-linearly separable classesRadial Basis Function is widely used kernel13/19 14. Introduction Data processing Methods Results Conclusion and recommendationsApproachAll the methods referred to above are tested in turn, usingsuccessively the Indices and the SST EOFs set as predictorsApplied to Guam, Kiribati and CooksBest Model selected using objective measures (i.e.R-squared) + cross-validation + expert judgmentResults for Guam only presented in details14/19 15. Introduction Data processing Methods Results Conclusion and recommendationsResults for GuamNotes on the Guam time-series12 % of missing valuesLarge gap October 1997 - January 1999, 26 consecutive seasonsmissingtrend from about 20021979 1984 1989 1994 1999 20040.250.200.150.100.050.000.050.100.150.20Guam time-seriesTS minus quadratic fitOriginal Time-seriesquadratic fit15/19 16. Introduction Data processing Methods Results Conclusion and recommendationsResults: Logistic regression (Multinomial)Predictors set = SST PCs + seasonal cycleSuccess rate: 66.2 % (random: 20 %)Probabilistic forecastwell-below below normal above well-above0123456789Time(seasons)Exemple of a Multinomial Logistic regressionprobabilistic forecast0.00.10.20.30.40.50.60.70.8Prob.16/19 17. Introduction Data processing Methods Results Conclusion and recommendationsResults: MARSPredictors set = SST PCs + seasonal cycle + damped lineartermR-squared: 0.851979 1984 1989 1994 1999 2004 20090.250.200.150.100.050.000.050.100.150.20Guam MARS Model: Var (R2 ): 92.50MSE: 0.0011, GCV: 0.0017, RSQ: 0.8556, GRSQ: 0.7800observedpredicted17/19 18. Introduction Data processing Methods Results Conclusion and recommendationsResults: Support Vector MachinesPredictors set = SST PCs + seasonal cycle + damped lineartermSuccess rate (with intermediate regularization parameter):96 %Confusion matrixWB B N A WAWB 14 2 1 0 0B 0 64 1 0 0N 0 2 117 1 0A 0 0 2 85 0WA 0 0 0 3 418/19 19. Introduction Data processing Methods Results Conclusion and recommendationsConclusion and recommendationsFor regression (continuous): MARS with SST EOFsFor classication (categorical): SVM with SST EOFshow to deal with (non-linear) trend ? here we used a dampedlinear term, but bit of a ad-hoc solutionInclude Pacic Decadal OscillationEnsemble techniques (Random Forests, bagging, boosting) forclassications ?Hybrid predictor set ? EOF on enhanced indices setLength of the time-series (30 years is really minimum)19/19