Radiology of Fracture Principles Suzanne O’Hagan 18 May 2012.
Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy...
-
Upload
estella-parker -
Category
Documents
-
view
219 -
download
0
Transcript of Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy...
Getting started with GEM-SA
Marc KennedyCentral Science Laboratory, York
Tony O’Hagan, Jeremy OakleyUniversity of Sheffield
Part 1: Getting started
Starting GEM-SA program Creating input and output files Explanation of the menus, toolbars, etc. Description of the project window
Starting GEM-SA
Double-click the GEM-SA icon to start The main window appears, with
– Menu– Toolbar– Sensitivity analysis output grid
Tab windows for other types of output
– Log window
menumenu
Log windowLog window
toolbartoolbar
Sensitivity analysis Sensitivity analysis output gridoutput grid
Toolbar icons
New project
Open project
Save project
Print output report
Edit project
Generate input design points
Rescale an input
Standardise design
Copy input design to clipboard
Convert input to integer
Run the analysis
Help
Sensitivity analysis output grid
This will report the sensitivity results after the analysis is complete– One line for each input parameter– One line for each pair of inputs, if joint
effects are selected
Log Window output
Tells us– Which training data are being loaded/saved– Transformations applied to the data– Fitted Gaussian process parameters– Summary of uncertainty analysis results
Creating a GEM project
To build the emulator we first need 3 files:– Data file of code inputs– Data file of code outputs– GEM-SA project file
Restrictions on input/output data
Single output– Multiple outputs must be treated individually– GEM can read multiple outputs file, but a
single column is specified within a project Max 30 input parameters Max 400 training points The data files should be plain text files
– One line for each point– Input file can be space or tab delimited
Generating a new input design
Designs can be generated using the toolbar icon or the menu: Input Generate…
The design dialog appears
Generating a new input design
Click OK and fill in the required range for each input
Click OK again
Editing input designs
If you select a column, you can rescale values of that input or round values to be integers
Designs can be loaded into or saved from this window using the Inputs menu. Use to copy the points to the clipboard for use in other programs
Types of design
GEM-SA can generate 2 types of design– LP-– Maximin Latin Hypercube designs
Both have good space-filling properties– Ensure all regions of the input space are
well represented LP- quick to generate, good for increasing
input design sequentially MmLH can be better in high dimensions
Creating output data from these inputs
Each row from the input design must be used to generate a single output, e.g. using– Spreadsheet
Simple, but requires functional form
– Script Only need executable code Loop through inputs, modify code input file
– Modify code to loop through the points Can be difficult, need source code
Example: using a spreadsheet
Copy the input design to the clipboard using
Open Excel and paste inputs
Create formula in final column
Copy formula for all rows of the design
Cut and paste special (values) in a new sheet
Save as text file
Example: using a script
Read base input file (read by executable code) Loop through lines of input design file
– Replace selected inputs in base input file– Run executable code with new input file– Calculate single output and add to training
output file
The project window
Appears whenever you– Load a project– Edit a project– Create new project
This window has 3 tabs– Files– Options– Simulations
Names for Names for the input the input filesfiles
Names for Names for the output the output filesfiles
How many How many inputs?inputs?
What are What are the input the input names?names?
Which Which column column from from output file?output file?
Which joint Which joint effects effects should be should be calculated?calculated?
What What should be should be calculated, calculated, and how?and how?
Are the Are the inputs inputs uncertain?uncertain?
What prior What prior mean for mean for the output?the output?
What kind of What kind of prediction?prediction?
What kind of cross What kind of cross validation?validation?
MCMC MCMC control control parametersparameters
How many points How many points used to calculate used to calculate main effects, joint main effects, joint effectseffects
How many realisations How many realisations of predictions, main of predictions, main and joint effects to and joint effects to generategenerate
Part 2Uncertainty Analysis Using GEM-SA
Part 2: Outline
Setting up the project
Running a simple analysis
More complex analyses
Setting up the project
Create a new project
Select Project -> New, or click toolbar icon
Project dialog appears
We’ll specify the data
files first
Files
The “Inputs” file contains one column for each parameter and one row for each model training run (the design)
The “Outputs” file contains the outputs from those runs (one column, in this examle)
Using “Browse” buttons, select input and output files
Our example
We’ll use the example “model1” in the GEM-SA DEMO DATA directory
This example is based on a vegetation model with 7 inputs– RESAEREO, DEFLECT, FACTOR, MO,
COVER, TREEHT, LAI The model has 16 outputs, but for the present
we will consider output 4– June monthly GPP
Number of inputs
Click on Options tab
Select number of inputs using or click “From Inputs File”
Define input names
Click on “Names …”
Enter parameter names
Click “OK”
The “Input parameter names” dialog opens
Complete the project
We will leave all other settings at their default values for now
Click “OK”
The Input Parameter
Ranges window
appears
Close and save project
Click “Defaults from input ranges” button
Click “OK”
Select Project -> Save– Or click toolbar icon
Choose a name and click “Save”
Running a simple analysis
Build the emulator
Click to build the emulator A lot of things now start to happen!
– The log window at the bottom starts to record various bits of information
– A little window appears showing progress of minimisation of the roughness parameter estimation criterion
– A new window appears in the “Main Effects” tab and several graphs appear Progress bar at the bottom
Focus on the log window
Ignore the outputs in the “Main Effects” and “Sensitivity Analysis” windows for now– These will be explained later
Focus on the log window This reports two key things
– Diagnostics of the emulator build– The basic uncertainty analysis results
These also appear in the “Output Summary” window and can be printed using
Emulation diagnostics
Note where the log window reports …
The first line says roughness parameters have been estimated by the simplest method
The values of these indicate how non-linear the effect of each input parameter is– Note the high value for input 4 (MO)
Estimating emulator parameters by maximising probability distribution...
maximised posterior for emulator parameters: sigma-squared = 0.342826, roughness = 0.217456 0.0699709 0.191557 16.9933 0.599439 0.459675 1.01559
Uncertainty analysis – mean
Below this, the log reports
So the best estimate of the output (June GPP)
is 24.1 (mol C/m2)– This is averaged over the uncertainty in the
7 inputs Better than just fixing inputs at best estimates
– There is an emulation standard error of 0.062 in this figure
Estimate of mean output is 24.145, with variance 0.00388252
Uncertainty analysis – variance
The final line of the log is
This shows the uncertainty in the model output that is induced by input uncertainties– The variance is 73.9– Equal to a standard deviation of 8.6– So although the best estimate of the output
is 24.3, the uncertainty in inputs means it could easily be as low as 16 or as high as 33
Estimate of total output variance = 73.9033
More complex analyses
Input distributions
A normal (gaussian) distribution is generally a more realistic representation of uncertainty– Range unbounded– More probability in the
middle
Default is to assume the uncertainty in each input is represented by a uniform distribution– Range determined by the range of values
found in the input file, or input manually
Changing input distributions
In Project dialog, Options tab, click the button for “All unknown, product normal”
Then OK A new dialog
opens to specify means and variances
Model 1 example
Uniform distributions from input ranges
Normal distributions to match– Range is 4
std devs Except for MO
– Narrower distribution
Uniform Normal
Parameter Lower Upper Mean Variance
RESAEREO 80 200 140 900
DEFLECT 0.6 1 0.8 0.01
FACTOR 0.1 0.5 0.3 0.01
MO 30 100 60 100
COVER 0.6 0.99 0.8 0.01
TREEHT 10 40 25 100
LAI 3.75 9 6.5 1
Effect on UA
After running the revised model, we see:– It runs faster, with no need to rebuild the
emulator
– The mean is changed a little and variance is halved
The emulator fit is unchanged
Estimate of mean output is 26.2698, with variance 0.00784475
Estimate of total output variance = 38.1319
Reducing the MO uncertainty further
If we reduce the variance of MO even more, to 49:– UA mean changes a little more and
variance reduces again
– Notice also how the emulation uncertainty has increased (0.004 for uniform)
– This is because the design points cover the new ranges less thoroughly
Estimate of mean output is 26.3899, with variance 0.0108792
Estimate of total output variance = 27.1335
Cross-validation
In the Project dialog, look at the bottom menu box, labelled “Cross-validation”
There are 3 options– None– Leave-one-out– Leave final 20% out
CV is a way of checking the emulator fit– Default is None because CV takes time
Cross Validation Root Mean-Squared Error = 0.907869
Cross Validation Root Mean-Squared Relative Error = 4.34773 percent
Cross Validation Root Mean-Squared Standardised Error = 1.15273
Largest standardised error is 4.32425 for data point 61
Cross Validation variances range from 0.18814 to 3.92191
Written cross-validation means to file cvpredmeans.txt
Written cross-validation variances to file cvpredvars.txt
Leave-one-out CV
After estimating roughness and other parameters, GEM predicts each training run point using only the remaining n-1 points
Results appear in log window Close to 1
Leave out final 20% CV
This is an even better check, because it tests the emulator on data that have not been used in any way to predict it
Emulator is built on first 80% of data and used to predict last 20%Cross Validation Root Mean-Squared Error = 1.46954
Cross Validation Root Mean-Squared Relative Error = 7.4922 percent
Cross Validation Root Mean-Squared Standardised Error = 1.73675
Largest standardised error is 5.05527 for data point 22
Cross Validation variances range from 0.277304 to 4.88653
Other options
There are various other options associated with the emulator building that we have not dealt with
But we’ve done the main things that should be considered in practice
And it’s enough to be going on with!
When it all goes wrong
How do we know when the emulator is not working?– Large roughness parameters
Especially ones hitting the limit of 99
– Large emulation variance on UA mean– Poor CV standardised prediction error
Especially when some are extremely large
In such cases, see if a larger training set helps– Other ideas like transforming output scale
Part 3Sensitivity Analysis in GEM-SA
Example
Again we use the ForestETP vegetation model– 7 input parameters– 120 model runs
Objective: conduct a variance-based sensitivity analysis to identify which uncertain inputs are driving the output uncertainty.
Exploratory scatter plots
Sensitivity Analysis Walkthrough
1. Project New
2. Click “Browse” for the Inputs File– From the GEM-SA Demo Data/Model1/
folder, select “emulator7x120inputs.txt”
3. Click “Browse” for the Outputs File– From the GEM-SA Demo Data/Model1/
folder, select “out11.txt”
4. Select the Options tab
Sensitivity Analysis Walkthrough
5. Change the Number of Inputs to 7.
6. Leave the other options unchanged– Input uncertainty options: All unknown, uniform– Prior mean options: Linear term for each input– Generate predictions as: function realisations
(correlated points)
Sensitivity Analysis Walkthrough
Sensitivity Analysis Walkthrough
7. Click OK
8. Select “Default from input ranges” then OK
9. Project Run or use
Main effect plots
Main effect plots
Fixing X6 = 18, this point shows the expected value of the output (obtained by averaging over all other inputs).
Simply fixing all the other inputs at their central values and comparing X6=10 with X6=40 would underestimate the influence of this input
(The thickness of the band shows emulator uncertainty)
X6
Variance of main effects
Main effects for each input. Input 6 has the greatest individual contribution to the variance
Main effects sum to 66.8% of the total variance
Interactions and total effects
Main effects explain 2/3 of the variance– Model must contain interactions
Any input can have small main effect, but large interaction effect, so overall this input is still ‘important’
Can ask GEM-SA to compute all pair-wise interaction effects– 435 in total for a 30 input model – can take
some time! Useful to know what to look for
Interactions and total effects
For each input Xi
Total effect of Xi = main effect for Xi + all interactions involving Xi
Total effect >> main effect implies interactions in the model
So for any input with large total effect relative to the main effect– investigate possible interactions involving
that input
Interactions and total effects
Total effects for inputs 4 and 7 much larger than its main effect. Implies presence of interactions
Interaction effects
10. Project Edit or
11. In Options tab, tick calculate joint effects
12.De-select all inputs under “Inputs to include in joint effects”, then select X4, X5, X6, X7
Interaction effects
13.Click OK, then OK again
14. Project Run or
Interaction effects
Note interactions involving inputs 4 and 7
Main effects and selected interactions now sum to almost 92% of the total variance
Exercise
1. Set up a new project using SAex1_inputs.txt for the inputs and SAex1_outputs.txt for the output– 8 input parameters (uniform on [0,1])– 100 model runs
2. Estimate the main effects only for this model and identify the influential input variables
3. By comparing main effects with total effects, can you spot any interactions?
4. Estimate any suspected interactions to test your intuition!