Benchmarking robust regression techniques for global ... · Benchmarking robust regression...
Transcript of Benchmarking robust regression techniques for global ... · Benchmarking robust regression...
Benchmarking robust regression techniques
for global energy con�nement scaling in tokamaks
Geert VerdoolaegeDepartment of Applied Physics, Ghent University, Ghent, Belgium
Laboratory for Plasma Physics, Royal Military Academy (LPP�ERM/KMS), Brussels, Belgium
IAEA TM Fusion Data Processing, Validation and Analysis, May 30, 2017
1 Motivation
2 Geodesic least squares regression (GLS)
3 Energy con�nement scaling
4 Conclusion
2
Overview
1 Motivation
2 Geodesic least squares regression (GLS)
3 Energy con�nement scaling
4 Conclusion
3
Overview
Parametric dependencies
Validation, prediction
Ordinary least squares
Uncertainties:
All variables (`x ' and `y ')
Heterogeneous data, outliers
Model: deterministic +stochastic component
Collinearity: regularizationy = β0 + β1x + ε
ε ∼ N (0, σ2
ε )
Power scaling laws: astronomy, biology, geology, �nance, . . .
4
Regression analysis
Robust regression analysis
Need a robust, general-purpose regression technique that is easy to apply.
5
1 Motivation
2 Geodesic least squares regression (GLS)
3 Energy con�nement scaling
4 Conclusion
6
Overview
7
Two measurements
8
Zooming in...
9
Example 1: electron density
10
Example 1: electron density distribution
11
Example 2: inter-ELM time
12
Example 2: inter-ELM time distribution
13
Di�erence/distance between measurements
14
Euclidean distance
15
Which distance?
16
A point and a distribution
17
Sum of squares
18
Mahalanobis distance
p(yi |xi , θ) =1√2πσ
exp
−1
2
(yi − µi
σ
)2 → maximum likelihood
µi = fi (xi , θ)e.g.= β0 + β1xi
19
20
21
22
23
Mahalanobis distance
24
Telling cats from dogs
25
Rao geodesic distance
26
Information geometry
Pseudosphere model
27
The Gaussian probability space
1√2π(
σ2y + ∑m
j=1 βj2
σ2
x ,j
) exp
−1
2
[y −
(β0 + ∑m
j=1 βj xij
)]2σ2y + ∑m
j=1 βj2
σ2
x ,j
Modeled
distribution
1√2π σobs
exp
[−1
2
(y − yi )2
σobs2
]Observed distribution
Rao GD
To be estimated: σobs, β0, β1, . . . , βm
iid data: minimize sum of squared GDs =⇒ geodesic least squares (GLS) regression
If σmod = σobs ⇒ Mahalanobis distance
G. Verdoolaege et al., Nucl. Fusion 55, 113019, 2015
28
GLS with linear model
1 Motivation
2 Geodesic least squares regression (GLS)
3 Energy con�nement scaling
4 Conclusion
29
Overview
Engineering parameters:
τE,th = β0 IβIp B
βBt n̄
βne P
βP
l RβR κβκ εβε MβM
eff
Dimensionless variables:
ωciτE,th = α0 ρ∗αρ βαβ
t ν∗αν qαq
95κακ εαε MαM
eff
ITPA global H-mode database: 1296 measurements from 9 tokamaks
IPB98(y,2):τE,th ∝ I 0.93p B0.15
t n̄0.41e P−0.69l R1.97 κ0.78 ε0.58M0.19eff
ωciτE,th ∝ ρ∗−2.70 β−0.90t ν∗−0.01 q−3.095
κ3.3 ε0.73M0.96eff
30
Global con�nement scaling
ITER-relevance
Uncon�rmed predictions
New predictor variables
Not robust:
Heterogeneous data
Outliers
Log-linear vs. nonlinear
31
Issues with IPB98
Proportional error bars
Unconstrained
100 bootstrap samples:
Average
95% con�dence interval
Benchmarking:
Ordinary least squares (OLS)
Iteratively reweighted least squares (ROB)
Bayesian: uninformative priors, marginalized σ (ROB)
Kullback-Leibler least squares (KLD)
Geodesic least squares (GLS)
32
Methodology
β0 βI βB βn βP βR βκ βε βM τ̂E,th (s)
IPB98 0.056 0.93 0.15 0.41 −0.69 1.97 0.78 0.58 0.19 4.9
OLS ll. 0.049 0.78 0.32 0.44 −0.67 2.24 0.39 0.58 0.18 4.3± 0.25OLS nl. 0.058 0.67 0.50 0.47 −0.83 2.60 1.0 0.86 −0.26 3.5± 0.33
ROB 0.046 0.77 0.32 0.45 −0.66 2.26 0.33 0.57 0.24 4.4± 0.24
BAY 0.051 0.87 0.13 0.47 −0.67 2.13 0.17 0.49 0.23 4.3
KLD ll. 0.056 0.61 0.49 0.46 −0.81 2.53 0.93 1.0 0.18 3.2± 0.29KLD nl. 0.053 0.60 0.49 0.49 −0.81 2.57 0.94 1.0 0.18 3.3± 0.37
GLS ll. 0.048 0.65 0.44 0.49 −0.76 2.52 0.63 0.87 0.27 4.0± 0.23GLS nl. 0.047 0.65 0.44 0.50 −0.75 2.52 0.62 0.85 0.22 4.1± 0.25
ll. = log-linear, nl. = nonlinear33
Regression results
J.G. Cordey et al., Nucl. Fusion 45, 1078, 200534
Robustness w.r.t. error bars
35
Interpretation on pseudosphere: JET data
1 Motivation
2 Geodesic least squares regression (GLS)
3 Energy con�nement scaling
4 Conclusion
36
Overview
Geodesic least squares regression: �exible and robust
Easy to use, fast optimization
Works for linear and nonlinear relations and any distribution model
Revisit established scaling laws, contribute to new regression analyses
Robust estimation of con�nement scaling
Comparing probability distributions:
Quanti�cation of stochasticity
Model validation
37
Conclusions