Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy...

67
Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield

Transcript of Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy...

Page 1: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Getting started with GEM-SA

Marc KennedyCentral Science Laboratory, York

Tony O’Hagan, Jeremy OakleyUniversity of Sheffield

Page 2: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Part 1: Getting started

Starting GEM-SA program Creating input and output files Explanation of the menus, toolbars, etc. Description of the project window

Page 3: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Starting GEM-SA

Double-click the GEM-SA icon to start The main window appears, with

– Menu– Toolbar– Sensitivity analysis output grid

Tab windows for other types of output

– Log window

Page 4: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

menumenu

Log windowLog window

toolbartoolbar

Sensitivity analysis Sensitivity analysis output gridoutput grid

Page 5: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Toolbar icons

New project

Open project

Save project

Print output report

Edit project

Generate input design points

Rescale an input

Standardise design

Copy input design to clipboard

Convert input to integer

Run the analysis

Help

Page 6: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Sensitivity analysis output grid

This will report the sensitivity results after the analysis is complete– One line for each input parameter– One line for each pair of inputs, if joint

effects are selected

Page 7: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Log Window output

Tells us– Which training data are being loaded/saved– Transformations applied to the data– Fitted Gaussian process parameters– Summary of uncertainty analysis results

Page 8: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Creating a GEM project

To build the emulator we first need 3 files:– Data file of code inputs– Data file of code outputs– GEM-SA project file

Page 9: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Restrictions on input/output data

Single output– Multiple outputs must be treated individually– GEM can read multiple outputs file, but a

single column is specified within a project Max 30 input parameters Max 400 training points The data files should be plain text files

– One line for each point– Input file can be space or tab delimited

Page 10: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Generating a new input design

Designs can be generated using the toolbar icon or the menu: Input Generate…

The design dialog appears

Page 11: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Generating a new input design

Click OK and fill in the required range for each input

Click OK again

Page 12: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Editing input designs

If you select a column, you can rescale values of that input or round values to be integers

Designs can be loaded into or saved from this window using the Inputs menu. Use to copy the points to the clipboard for use in other programs

Page 13: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Types of design

GEM-SA can generate 2 types of design– LP-– Maximin Latin Hypercube designs

Both have good space-filling properties– Ensure all regions of the input space are

well represented LP- quick to generate, good for increasing

input design sequentially MmLH can be better in high dimensions

Page 14: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Creating output data from these inputs

Each row from the input design must be used to generate a single output, e.g. using– Spreadsheet

Simple, but requires functional form

– Script Only need executable code Loop through inputs, modify code input file

– Modify code to loop through the points Can be difficult, need source code

Page 15: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Example: using a spreadsheet

Copy the input design to the clipboard using

Open Excel and paste inputs

Create formula in final column

Copy formula for all rows of the design

Cut and paste special (values) in a new sheet

Save as text file

Page 16: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Example: using a script

Read base input file (read by executable code) Loop through lines of input design file

– Replace selected inputs in base input file– Run executable code with new input file– Calculate single output and add to training

output file

Page 17: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

The project window

Appears whenever you– Load a project– Edit a project– Create new project

This window has 3 tabs– Files– Options– Simulations

Page 18: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Names for Names for the input the input filesfiles

Names for Names for the output the output filesfiles

Page 19: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

How many How many inputs?inputs?

What are What are the input the input names?names?

Which Which column column from from output file?output file?

Page 20: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Which joint Which joint effects effects should be should be calculated?calculated?

What What should be should be calculated, calculated, and how?and how?

Page 21: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Are the Are the inputs inputs uncertain?uncertain?

What prior What prior mean for mean for the output?the output?

Page 22: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

What kind of What kind of prediction?prediction?

What kind of cross What kind of cross validation?validation?

Page 23: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

MCMC MCMC control control parametersparameters

How many points How many points used to calculate used to calculate main effects, joint main effects, joint effectseffects

How many realisations How many realisations of predictions, main of predictions, main and joint effects to and joint effects to generategenerate

Page 24: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Part 2Uncertainty Analysis Using GEM-SA

Page 25: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Part 2: Outline

Setting up the project

Running a simple analysis

More complex analyses

Page 26: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Setting up the project

Page 27: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Create a new project

Select Project -> New, or click toolbar icon

Project dialog appears

We’ll specify the data

files first

Page 28: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Files

The “Inputs” file contains one column for each parameter and one row for each model training run (the design)

The “Outputs” file contains the outputs from those runs (one column, in this examle)

Using “Browse” buttons, select input and output files

Page 29: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Our example

We’ll use the example “model1” in the GEM-SA DEMO DATA directory

This example is based on a vegetation model with 7 inputs– RESAEREO, DEFLECT, FACTOR, MO,

COVER, TREEHT, LAI The model has 16 outputs, but for the present

we will consider output 4– June monthly GPP

Page 30: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Number of inputs

Click on Options tab

Select number of inputs using or click “From Inputs File”

Page 31: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Define input names

Click on “Names …”

Enter parameter names

Click “OK”

The “Input parameter names” dialog opens

Page 32: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Complete the project

We will leave all other settings at their default values for now

Click “OK”

The Input Parameter

Ranges window

appears

Page 33: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Close and save project

Click “Defaults from input ranges” button

Click “OK”

Select Project -> Save– Or click toolbar icon

Choose a name and click “Save”

Page 34: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Running a simple analysis

Page 35: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Build the emulator

Click to build the emulator A lot of things now start to happen!

– The log window at the bottom starts to record various bits of information

– A little window appears showing progress of minimisation of the roughness parameter estimation criterion

– A new window appears in the “Main Effects” tab and several graphs appear Progress bar at the bottom

Page 36: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Focus on the log window

Ignore the outputs in the “Main Effects” and “Sensitivity Analysis” windows for now– These will be explained later

Focus on the log window This reports two key things

– Diagnostics of the emulator build– The basic uncertainty analysis results

These also appear in the “Output Summary” window and can be printed using

Page 37: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Emulation diagnostics

Note where the log window reports …

The first line says roughness parameters have been estimated by the simplest method

The values of these indicate how non-linear the effect of each input parameter is– Note the high value for input 4 (MO)

Estimating emulator parameters by maximising probability distribution...

maximised posterior for emulator parameters: sigma-squared = 0.342826, roughness = 0.217456 0.0699709 0.191557 16.9933 0.599439 0.459675 1.01559

Page 38: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Uncertainty analysis – mean

Below this, the log reports

So the best estimate of the output (June GPP)

is 24.1 (mol C/m2)– This is averaged over the uncertainty in the

7 inputs Better than just fixing inputs at best estimates

– There is an emulation standard error of 0.062 in this figure

Estimate of mean output is 24.145, with variance 0.00388252

Page 39: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Uncertainty analysis – variance

The final line of the log is

This shows the uncertainty in the model output that is induced by input uncertainties– The variance is 73.9– Equal to a standard deviation of 8.6– So although the best estimate of the output

is 24.3, the uncertainty in inputs means it could easily be as low as 16 or as high as 33

Estimate of total output variance = 73.9033

Page 40: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

More complex analyses

Page 41: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Input distributions

A normal (gaussian) distribution is generally a more realistic representation of uncertainty– Range unbounded– More probability in the

middle

Default is to assume the uncertainty in each input is represented by a uniform distribution– Range determined by the range of values

found in the input file, or input manually

Page 42: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Changing input distributions

In Project dialog, Options tab, click the button for “All unknown, product normal”

Then OK A new dialog

opens to specify means and variances

Page 43: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Model 1 example

Uniform distributions from input ranges

Normal distributions to match– Range is 4

std devs Except for MO

– Narrower distribution

  Uniform Normal

Parameter Lower Upper Mean Variance

RESAEREO 80 200 140 900

DEFLECT 0.6 1 0.8 0.01

FACTOR 0.1 0.5 0.3 0.01

MO 30 100 60 100

COVER 0.6 0.99 0.8 0.01

TREEHT 10 40 25 100

LAI 3.75 9 6.5 1

Page 44: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Effect on UA

After running the revised model, we see:– It runs faster, with no need to rebuild the

emulator

– The mean is changed a little and variance is halved

The emulator fit is unchanged

Estimate of mean output is 26.2698, with variance 0.00784475

Estimate of total output variance = 38.1319

Page 45: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Reducing the MO uncertainty further

If we reduce the variance of MO even more, to 49:– UA mean changes a little more and

variance reduces again

– Notice also how the emulation uncertainty has increased (0.004 for uniform)

– This is because the design points cover the new ranges less thoroughly

Estimate of mean output is 26.3899, with variance 0.0108792

Estimate of total output variance = 27.1335

Page 46: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Cross-validation

In the Project dialog, look at the bottom menu box, labelled “Cross-validation”

There are 3 options– None– Leave-one-out– Leave final 20% out

CV is a way of checking the emulator fit– Default is None because CV takes time

Page 47: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Cross Validation Root Mean-Squared Error = 0.907869

Cross Validation Root Mean-Squared Relative Error = 4.34773 percent

Cross Validation Root Mean-Squared Standardised Error = 1.15273

Largest standardised error is 4.32425 for data point 61

Cross Validation variances range from 0.18814 to 3.92191

Written cross-validation means to file cvpredmeans.txt

Written cross-validation variances to file cvpredvars.txt

Leave-one-out CV

After estimating roughness and other parameters, GEM predicts each training run point using only the remaining n-1 points

Results appear in log window Close to 1

Page 48: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Leave out final 20% CV

This is an even better check, because it tests the emulator on data that have not been used in any way to predict it

Emulator is built on first 80% of data and used to predict last 20%Cross Validation Root Mean-Squared Error = 1.46954

Cross Validation Root Mean-Squared Relative Error = 7.4922 percent

Cross Validation Root Mean-Squared Standardised Error = 1.73675

Largest standardised error is 5.05527 for data point 22

Cross Validation variances range from 0.277304 to 4.88653

Page 49: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Other options

There are various other options associated with the emulator building that we have not dealt with

But we’ve done the main things that should be considered in practice

And it’s enough to be going on with!

Page 50: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

When it all goes wrong

How do we know when the emulator is not working?– Large roughness parameters

Especially ones hitting the limit of 99

– Large emulation variance on UA mean– Poor CV standardised prediction error

Especially when some are extremely large

In such cases, see if a larger training set helps– Other ideas like transforming output scale

Page 51: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Part 3Sensitivity Analysis in GEM-SA

Page 52: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Example

Again we use the ForestETP vegetation model– 7 input parameters– 120 model runs

Objective: conduct a variance-based sensitivity analysis to identify which uncertain inputs are driving the output uncertainty.

Page 53: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Exploratory scatter plots

Page 54: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Sensitivity Analysis Walkthrough

1. Project New

2. Click “Browse” for the Inputs File– From the GEM-SA Demo Data/Model1/

folder, select “emulator7x120inputs.txt”

3. Click “Browse” for the Outputs File– From the GEM-SA Demo Data/Model1/

folder, select “out11.txt”

4. Select the Options tab

Page 55: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Sensitivity Analysis Walkthrough

5. Change the Number of Inputs to 7.

6. Leave the other options unchanged– Input uncertainty options: All unknown, uniform– Prior mean options: Linear term for each input– Generate predictions as: function realisations

(correlated points)

Page 56: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Sensitivity Analysis Walkthrough

Page 57: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Sensitivity Analysis Walkthrough

7. Click OK

8. Select “Default from input ranges” then OK

9. Project Run or use

Page 58: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Main effect plots

Page 59: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Main effect plots

Fixing X6 = 18, this point shows the expected value of the output (obtained by averaging over all other inputs).

Simply fixing all the other inputs at their central values and comparing X6=10 with X6=40 would underestimate the influence of this input

(The thickness of the band shows emulator uncertainty)

X6

Page 60: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Variance of main effects

Main effects for each input. Input 6 has the greatest individual contribution to the variance

Main effects sum to 66.8% of the total variance

Page 61: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Interactions and total effects

Main effects explain 2/3 of the variance– Model must contain interactions

Any input can have small main effect, but large interaction effect, so overall this input is still ‘important’

Can ask GEM-SA to compute all pair-wise interaction effects– 435 in total for a 30 input model – can take

some time! Useful to know what to look for

Page 62: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Interactions and total effects

For each input Xi

Total effect of Xi = main effect for Xi + all interactions involving Xi

Total effect >> main effect implies interactions in the model

So for any input with large total effect relative to the main effect– investigate possible interactions involving

that input

Page 63: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Interactions and total effects

Total effects for inputs 4 and 7 much larger than its main effect. Implies presence of interactions

Page 64: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Interaction effects

10. Project Edit or

11. In Options tab, tick calculate joint effects

12.De-select all inputs under “Inputs to include in joint effects”, then select X4, X5, X6, X7

Page 65: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Interaction effects

13.Click OK, then OK again

14. Project Run or

Page 66: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Interaction effects

Note interactions involving inputs 4 and 7

Main effects and selected interactions now sum to almost 92% of the total variance

Page 67: Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Exercise

1. Set up a new project using SAex1_inputs.txt for the inputs and SAex1_outputs.txt for the output– 8 input parameters (uniform on [0,1])– 100 model runs

2. Estimate the main effects only for this model and identify the influential input variables

3. By comparing main effects with total effects, can you spot any interactions?

4. Estimate any suspected interactions to test your intuition!