Uncertainties in blood flow calculations and...
Transcript of Uncertainties in blood flow calculations and...
Uncertainties in blood flowcalculations and data
Rachael Brag and Pierre Gremaud (NCSU)
August 10, 2014
Goals
1. introduce methods from machine learningI Machine learning (...) deals with the construction and study
of systems that can learn from data, rather than follow onlyexplicitly programmed instructions. Wikipedia
I Machine learning is the science of getting computers to actwithout being explicitly programmed. E. Ng
x −→ nature −→ ymath, stat: x −→ linear regression, other models −→ y
machine learning: x −→ −→ y regression trees
2. illustrate new concepts on Cerebral Blood Flow (CBF)studies
Problem
Can we estimate local CBF
I cheaplyI continuously and in real timeI accuratelyI or at least with "error bars"?
Problem
cheap: Transcranial Doppler(TCD)
wikipedia
expensive: Magnetic ResonanceImaging (MRI)
Mangia et al., J. Cereb. Blood Flow Metab., 32 (2012)
Problem
This −→
"It is intriguing thatmethods measuring thesame physiologicalparameter do notcorrelate." Henriksen et al.J. Magn. Res. Imaging,2012
• patient
↑"should" agree with that
Hypothesis
different patients react differently to the measurement protocols
so...
I let’s group patients into "like" groupsI let’s apply local "models" in each group
to do so, we let the "data speak"....
Overview
I linear and nonlinear approximationsI local regression and treesI classificationI random forestsI back to CBF, UQ and other acronyms
Mathematical challenge
I predictor variable (vector): x = [x1, . . . , xd ]
I response variable (scalar): y
WANTED: value (or distribution) of y for given x , i.e.
y = f (x)
CHALLENGE: we do not have f but "just" data
[xi , yi ] = [xi,1, . . . , xi,d , yi ], i = 1, . . . ,N
For us: d = 14, N = number of patients ≈ 200
Approximation 101: linear
"Pretend" we know f and x ∈ Ω = [0,1]d
I partition ∆ of Ω into cells ωI piecewise constant (to simplify) approximation
fh(x) =∑ω∈∆
cωχω(x)
I best constants: cω = 1|ω|
∫ω f (x) dx = mean of f on ω
I well know result:
‖f − fh‖ ≤ C(d)N−1/d‖∇f‖
N = md = number of cubes of length h = 1/m
Approximation 102: nonlinear
Choose better partitions based on f /data
I "equivariation" partition(Kahane 1961)
I easy in 1d (partitiondepends on f )
I "optimal" partitions inhigher dim not doable
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Minimization −→ recursive dyadic partitioning
x1 < .3
x2 < .7
ω1 ω2
x1 < .5
ω3 x2 < .2
ω4 x1 < .8
ω5 x2 < .8
ω6 ω7
easy
x1
x 2
t1
t2
t3
t4
t5t6
t7
Trees and dataI loop on split variables xj , j = 1,2, . . .
I loop on split split values sI ω1(j, s) = x ; xj ≤ s, ω2(j, s) = x ; xj > sI error = minj,s
∑xi∈ω1(j,s)(yi − c1)
2 +∑
xi∈ω2(j,s)(yi − c2)2
I endI end
0 0.2 0.4 0.6 0.8 100.5
10
0.2
0.4
0.6
0.8
1
x1x2
y
0 0.2 0.4 0.6 0.8 11.6
1.65
1.7
1.75
split valueer
ror
baby tree x1 < .3
ω1 ω2
Regression tree
1. consider all binary splits on every predictor2. select split with lowest MSE and |child node| < MinLeaf3. impose split4. repeat recursively for child nodes
Stop if any of the following holds
I node is pure (MSE < qetoler× MSE(full data))I fewer than MinParent observations in nodeI |child node| < MinLeaf
Classification tree
I What about categorical variables? (gender (F/M), diabetes (Y/N), hypertensive
(Y/N), car manufacturer (AMC/Aston Martin/Ferrari/Datsun/Peugeot/Rolls Royce/Yugo etc...)I MSE −→ Gini impurity
K∑k=1
pmk (1− pmk )
I pmk = 1|ωm|
∑xi∈ωm
δxi ,k = fraction of items from class k in ωmI how often a randomly chosen element from ωm would be incorrectly labeled if it were randomly
labeled according to the distribution of classes in ωm
I issues with mixed data...
What’s good about trees
1. easy to understand2. can handle both categorical and numerical predictors3. can handle missing data4. fast5. no model!
What’s not so good about trees
1. trees are unstable2. predictions are not smooth3. biases toward predictor variables with high variation4. no model⇒ little analysis
Doing better: bagging
bootstrap aggregatingI for b = 1 to B
I draw bootstrap sample of size N from training data(uniformly and with replacements)
I grow tree Tb to bootstrapped dataI endI average to get prediction for x :
f (x) =1B
B∑b=1
Tb(x)
Issues with bagging
I trees Tb’s are correlated: i.d. but not i.i.d.I i.i.d: var(
∑i Xi) =
∑i var(Xi)⇒
var(f (x)) =σ2
B
I correlated i.d:var(
∑i Xi) =
∑i var(Xi) + 2
∑i<j cov(Xi ,Xj)⇒
var(f (x)) = ρσ2 +1− ρ
Bσ2
I ρ ↓ and B ↑ ⇒ variance ↓
Random forests (Breiman 2001)
decrease tree correlation by splitting based on m < d variables
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9 10 11 12 13Number of Randomly Selected Fitting Variables m
Cor
rela
tion
Betw
een
Tree
s
OOB errorserror check on training data
I for each (xi , yi), construct RF predictor by averaging onlytrees from bootstrap samples not containing (xi , yi)
0 200 400 600 800 1000 1200 1400 1600 1800 200070
80
90
100
110
120
130
Number of Trees
Out−
of−
Bag M
SE
m = 1
m = 4
m = 5
m = 10
m = 5 < d = 14 wins
Results for our problem
0 10 20 30 40 50 60 70 80 900
10
20
30
40
50
60
70
80
90
CBF out−of−bag prediction
CA
SL
MR
I
MSE divided by ≈ 8
But wait, there is more...
Trees can be used to assess variable importance1. Gini importance: at each split, MSE reduction attributed to
split variable and accumulated over all trees for eachvariable⇒ bias toward high variability predictors
2. permutation importance: in each tree, compute MSE forOOB samples; then randomly sample values of variableand compute increase in OOB MSE
room for improvements and analysis...
Variable importance
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
TCD BFVC02
WeightAge
Head WidthTerr. Mass
HCT %HeightAngle
GenderDiabetes
Head LengthArea
Hypertension
Average Error Increase
Clustering
I consider each pair of patientsI count number of times pair belongs to same tree in the
forest
⇒ proximity matrix A
I clustering algorithms (spectral or other) can be applied to A
0 0.005 0.01 0.015 0.02 0.0250
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Distance to Cluster 1 Centroid
Dis
tanc
e to
Clu
ster
2 C
entro
id
Cluster 1Cluster 2
Conclusion
I machine learning: powerful for "messy" problemsI simple, efficientI may be hard to interpret and analyzeI low hanging fruits for mathematicians...
More references
literature
Statistical modeling: the two cultures , L. Breiman, StatisticalSc., 16 (2001), p. 199–231.
The elements of statistical learning , T. Hastie, R. Tibshirani, J.Friedman, Second Edition, Springer Series inStatistics, 2009
Cerebral blood flow measurements: ... , R. Bragg, P.A.Gremaud, V. Novak, in preparation
software
MATLAB FITENSEMBLE from the stat toolboxR RANDOMFOREST package
java http://www.cs.waikato.ac.nz/ml/weka/