Olga Senyukova - Machine Learning Applications in Medicine
Transcript of Olga Senyukova - Machine Learning Applications in Medicine
MACHINE LEARNING APPLICATIONS
IN MEDICINE
Olga Senyukova
Graphics & Media Lab
Faculty of Computational Mathematics and Cybernetics
Lomonosov Moscow State University
Medical data
Medical images
Physiologic signals
Other: narrative, textual, numerical, etc.
Medical data
Medical images
Physiologic signals
Other: narrative, textual, numerical, etc.
Medical images
X-Ray MRI
CT Ultrasound
Computed tomography (CT)
1972, Sir Godfrey Hounsfield
X-rays are computer-processed to produce
tomographic images
https://en.wikipedia.org/wiki/CT_scan
Computed tomography (CT)
insightci.com.au
Magnetic resonance imaging (MRI)
1973, Paul C. Lauterbur and Peter Mansfield
Allows localizing the image by slices
Source: K. Toennies
Magnetic resonance imaging (MRI)
www.raleighrad.com
Electrocardiography (ECG)
1901, Einthoven
Recording of the electrical activity of the heart by
electrodes placed on the body
intensivecarehotline.com
RR time series
RR time series (interbeat intervals lengths) are widely
used for ECG analysis
www.elsevier.es
Human gait time series
reylab.bidmc.harvard.edu
Analysis: what for?
Normal or diseased?
Where is the diseased area?
What changes over time occur
(especially, after treatment)?
Does the specific condition take
place (e.g. overtraining of the
sportsman)?
…
www.fresher.ru
Main tasks: images
Detection
aneurysm
Segmentation
T Matching (Registration)
Main tasks: physiologic signals
Diagnostics
Healthy
Disease XXX
Disease YYY
Template Matching
Condition ZZZ
The same or
not???
Machine learning in medical imaging:
challenges
Slide by D. Rueckert
Images are often 3D or 4D:
# of voxels and # of extracted features is very large
Number of images for training is often limited:
large datasets means typically 100 to 1000 images
“small sample size problem”
Machine learning in medical imaging:
challenges
Training data is expensive
annotation of images is resource intensive (manpower,
cost, time)
sometimes possible to augment training bases using
unlabelled images
Training data is sometimes imperfect
training data may be wrongly labelled
e.g. diseases such as Alzheimer’s require confirmation
through pathology (difficult and costly to obtain)
Slide by D. Rueckert
The InnerEye project
Measuring brain tumors
Localizing and identifying vertebrae
Kinect for surgery
Source: A. Criminisi & the InnerEye team @ MSRC
Anatomy localization via regression
forests
A. Criminisi, et al.
Med Image Analysis
2013
Decision forests
Leo Breiman, 2001
A. Criminisi, J. Shotton (eds.). Decision Forests in
Computer Vision and Medical Image Analysis //
Advances in Computer Vision and Pattern
Recognition. 2013
Decision forest consists
of decision trees…
Decision tree
Each internal node: a split (test) function
Each leaf: class label (predictor)
Source: A. Konushin
Regression tree
input value
continu
ous
la
bel
• Green – high uncertainty
• Red – low uncertainty
• Thickness – the number of samples
from the training set Several following slides are adapted from
A. Criminisi and J. Shotton
Regression tree: training
• S0 – whole training set
• Sj – part of training set at the jth node
))(,;(~)|( 2 xyyNxyp y
Regression tree: training
Split function parameters at the jth node maximize the information gain
At each part (L,R):
fit a line to the points
(e.g. least squares)
for each x we have ))(,;(~)|( 2 xyyNxyp y
),(maxarg
jj SIj
jij
Syx RLi Syx
yy xxI),( },{ ),(
))(log())(log(
y – green line
Example
Example
Different models
Predictor models
Constant Polynomial and linear Probabilistic linear
Weak learners (split functions)
Axis-aligned Generic oriented
hyperplane
Conic section
Regression forest
d
dxx ),...,( 1v
Randomness
Bagging: each tree is trained on a random subset
of the whole training set
Randomness
Randomized node optimization: optimize a split
function at the jth node w.r.t. a small random subset
of parameter values
),(maxarg jj SI ),(maxarg jj SI
j
!!!
j
),,( jjjj τ
j
j
jτ
selects features from the whole feature set
is a weak learner type (axis-aligned, linear, etc.)
is a set of splitting thresholds
Forest vs tree
The labeled database
Anatomy localization
Key idea: all voxels in the image vote for the
position of the organ
Each organ is defined by its 3D axis-aligned
bounding box
Cc),,,,,( F
c
H
c
P
c
A
c
R
c
L
cc bbbbbbb
C = {liver, spleen, kidneyL, kidneyR, …}
Anatomy localization
For each input voxel the distribution of
relative displacements to the organ bounding box
is obtained
),,( zyx vvvv
),,,,,()( F
c
H
c
P
c
A
c
R
c
L
cc ddddddd v
);( vf – feature response
Anatomy localization
Voxel clusters with the highest confidence of
prediction are considered to be salient regions for
localization of an organ
salient regions are shown in green
Context-rich features
Features: mean intensity in randomly displaced boxes
Features for CT and MRI
CT: we can rely
on absolute
intensity values
MRI: only intensity
difference makes
sense
Learning clinically useful information
from medical images
Biomedical Image Analysis Group
Department of Computing
Daniel Rueckert
Segmentation using registration
Slide by D. Rueckert
Multi-atlas segmentation using classifier
fusion
Multi-atlas segmentation using classifier
fusion and selection
Selection of atlases
How to select atlases the most similar to our image?
Atlases should be clustered by disease/population
Manifold learning is used to efficiently discover
such clusters
Manifold learning
Several following slides are adapted from D. Rueckert
Embed the data to
the manifold
(project to less-
dimensional space)
Find a manifold
Manifold learning: Laplacian eigenmaps
Given a graph G = (V, E)
Each vertex vi corresponds to an image
Each edge weight wij defines the similarity between
image i and j
Define diagonal matrix T which contains the degree
sums for each vertex
j ijii wt
Manifold learning: Laplacian eigenmaps
2/12/1 )( TWTTL
Normalized graph Laplacian
2
,min jiji ij yyW
The eigen decomposition of L
provides manifold coordinates
yi for each vertex i (or image)
Manifold learning for multi-atlas
segmentation
We have two sets of images:
labeled (atlases)
unlabeled
We want to label all the unlabeled images
We can do it iteratively:
label a part of unlabeled images using the most similar
from already labeled
these images can be used as atlases for the next
iteration
Manifold learning for multi-atlas
segmentation
Wolz et al., Neuroimage, 2010
Example
Wolz et al., Neuroimage, 2010
Segmentation of brain lesions in MRI
Olga V. Senyukova, “Segmentation of blurred objects by
classification of isolabel contours”. Pattern Recognition,
2014
Data was provided by Children's Clinical and Research
Institute Emergency Surgery and Trauma
The proposed algorithm
Each MRI slice is processed separately
In order to improve speed and robustness the
regions containing lesions can be specified manually
Lesions inside these regions are segmented
automatically
Algorithm overview
Input region Isolabel contours
I(x,y)=const
Closed isolabel
contours Nonlinear SVM
classification
Isolabel contours
In geography
each isolabel contour (one color):
constant height f(x,y)=h
In image processing
each isolabel contour (one color):
constant intensity f(x,y)=I
How to distinguish lesion contours?
Visually we can do it easily!
Let’s use the same set of features for automatic
classification of isolabel contours
Features of isolabel contours
In order to distinguish isolabel contours delineating
lesions 4 features were proposed
Imean Imean inside the contour / Imean inside BBox
Imax-Imin Ivariance
Labeled training base
Various regions on many images:
a user can click on lesion contours: they will get “lesion”
other isolabel contours will automatically get “non-lesion”
… , ,
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
…
[ɸ1, ɸ2, ɸ3, ɸ4] is
a feature vector
Binary classification via SVM
We have a binary classification task: each isolabel contour belongs to one of two classes, lesions or non-lesions
One of the best classifiers is SVM – Support Vector Machine
original linear SVM: Vladimir Vapnick, Alexey Chervonenkis, 1963
applying a kernel trick results in nonlinear SVM: Bernhard Boser, Isabelle Guyon, Vladimir Vapnick, 1992
Linear SVM
support vectors margin
1:1 by ii wx
1:1 by ii wx
positive samples
negative samples
w/2
Maximizing
we solve quadratic
optimization problem:
w/2
wwT
2
1
1)( by ii xw
minimizing
subject to
byb iii i xxxw
Solution is a hyperplane:
ix
i
– support vectors
– learned weights
Nonlinear SVM
For linearly separable data linear SVM is excellent
What about the data that is not linearly separable?..
We can make it linearly separable by mapping it to
more-dimensional space
Nonlinear SVM: kernel trick
by iii i xx bKy iii i ),( xxInstead of we have
)()(),( jijiK xxxx where
2
exp),( jijiK xxxx
For classification of isolabel contours nonlinear SVM
with RBF (radial basis function) kernel is used
Ensemble-based analysis of RR and gait
Olga Senyukova
Valeriy Gavrishchaka, Department of Physics, West
Virginia University
Springer, 2013, 2015
RR and gait time series
Normal?
Huntington’s disease?
Parkinson’s disease?
…
Normal?
Arrhythmia?
Congestive heart failure?
…
Ensemble learning techniques
Ensemble can work better than a single classifier
…
accuracy: 0.61 accuracy: 0.73 accuracy: 0.65
Weak learner 1 Weak learner 2 Weak learner N
Ensemble of classifiers accuracy: 0.9
AdaBoost
Freund and Schapire, 1997
On each iteration focuses on the most hard-to-
classify samples
AdaBoost
– training data, – labels
Initial weights of all N samples:
M iterations, from m = 1 to M:
find
set
update
Classifier output:
Nwi /1)0(
))(()(1
M
m mmTsignH xx
Nii ,...,1, x }1;1{ iy
)]([)(minarg)(1
imi
N
i
mjT
m TyiwTj
xx
m
mm
1log
2
1
m
imimmm
Z
Tyiwiw
)(exp)()(1
x
Good classifier example
Iteration 1 of 3
T1
Iteration 2 of 3
T2
Iteration 3 of 3
STOP T3
Final model
)](72.0)(70.0)(42.0[ 321 xxx TTTsign
Ensemble decomposition learning
We apply ensemble-based classifier to a point x
Each x can be described by its ensemble
decomposition vector (EDL vector)
We can classify data points by comparing their EDL
vectors
M
m mmTH1
)()( xx
)](,),(),([)( 2211 xxxx MMTTTD
EDL: learning
All available data
«normal/abnormal»
MSE DFA
AdaBoost
Indicators from nonlinear
dynamics
Building a general classifier
«normal/abnormal»
MSE1 DFA2 … MSEN α1 + α2 + αN
Ensemble classifier
Training sample x
MSE1 DFA2 … MSEN α1 + α2 + αN
Applying the ensemble
MSE
+1 (normal) -1 (abnormal) +1 (normal) -1 (abnormal)
DFA
)]1(*,),1(*,1*[)( 21 MD x
EDL vector
Disease XXX
EDL: testing
Input y
MSE1 DFA2 … MSEN α1 + α2 + αN
Applying the ensemble
]1*,,1*),1(*[)( 21 MD y
)()( yx DD
? x = y x ≠ y
EDL vector
no yes
In multi-class classification problem the class of y is the class of the training example
with the closest EDL vector
))()((min)()(: yy DxDDxDCik C
iCk
y has a disease XXX y does not have a disease XXX
Results
CHF/Arrhythmia classification
Real data from
http://www.physionet.org/physiobank
Thank you for attention!
knizhnayaraduga.ru