Machine Learning Applications in Medicine (Olga Senyukova)
-
Upload
anton-konushin -
Category
Science
-
view
330 -
download
6
Transcript of Machine Learning Applications in Medicine (Olga Senyukova)
MACHINE LEARNING APPLICATIONS
IN MEDICINEOlga SenyukovaGraphics & Media Lab
Faculty of Computational Mathematics and Cybernetics
Lomonosov Moscow State University
Medical data
Medical images
Physiologic signals
Other: narrative, textual, numerical, etc.
Medical data
Medical images
Physiologic signals
Other: narrative, textual, numerical, etc.
Medical images
X-Ray MRI
CTUltrasound
Computed tomography (CT)
1972, Sir Godfrey Hounsfield X-rays are computer-processed to
produce tomographic images
https://en.wikipedia.org/wiki/CT_scan
Computed tomography (CT)
insightci.com.au
Magnetic resonance imaging (MRI)
1973, Paul C. Lauterbur and Peter Mansfield
Allows localizing the image by slices
Source: K. Toennies
Magnetic resonance imaging (MRI)
www.raleighrad.com
Electrocardiography (ECG)
1901, Einthoven Recording of the electrical activity of the
heart by electrodes placed on the body
intensivecarehotline.com
RR time series
RR time series (interbeat intervals lengths) are widely used for ECG analysis
www.elsevier.es
Human gait time series
reylab.bidmc.harvard.edu
Analysis: what for?
Normal or diseased? Where is the diseased
area? What changes over time
occur (especially, after treatment)?
Does the specific condition take place (e.g. overtraining of the sportsman)?
…
www.fresher.ru
Main tasks: images
Detection
aneurysm
Segmentation
TMatching (Registration)
Main tasks: physiologic signals
Diagnostics Healthy
Disease XXX
Disease YYY
Template MatchingCondition ZZZ
The same or not???
Machine learning in medical imaging:challenges
Slide by D. Rueckert
Images are often 3D or 4D: # of voxels and # of extracted features is
very large Number of images for training is often
limited: large datasets means typically 100 to 1000
images “small sample size problem”
Machine learning in medical imaging:challenges
Training data is expensive annotation of images is resource intensive
(manpower, cost, time) sometimes possible to augment training
using unlabelled images Training data is sometimes imperfect training data may be wrongly labelled e.g. diseases such as Alzheimer’s require
confirmation through pathology (difficult and costly to obtain)
Slide by D. Rueckert
The InnerEye project
Measuring brain tumors
Localizing and identifying vertebrae
Kinect for surgery
Source: A. Criminisi & the InnerEye team @ MSRC
Anatomy localization via regression forests
A. Criminisi, et al. Med Image Analysis 2013
Decision forests
Leo Breiman, 2001 A. Criminisi, J. Shotton (eds.). Decision
Forests in Computer Vision and Medical Image Analysis // Advances in Computer Vision and Pattern Recognition. 2013
Decision forest consists of decision trees…
Decision tree
Each internal node: a split (test) function Each leaf: class label (predictor)
Source: A. Konushin
Regression tree
input value
conti
nuous
label
• Green – high uncertainty• Red – low uncertainty• Thickness – the number of
samples from the training setSource: A. Criminisi, J. Shotton
Regression tree: training
• S0 – whole training set• Sj – part of training set at the jth node
))(,;(~)|( 2 xyyNxyp y
Source: A. Criminisi, J. Shotton
Regression tree: training
Split function parameters at the jth node maximize the information gain
At each part (L,R): fit a line to the points (e.g. least squares) for each x we have ))(,;(~)|( 2 xyyNxyp y
),(maxarg jj SI
j
jijSyx RLi Syx
yy xxI),( },{ ),(
))(log())(log(
y– green line Source: A. Criminisi, J. Shotton
Example
Source: A. Criminisi, J. Shotton
Example
Source: A. Criminisi, J. Shotton
Different models
Predictor models
Constant Polynomial and linearProbabilistic linear
Weak learners (split functions)
Axis-aligned Generic oriented
hyperplane
Conic sectionSource: A. Criminisi, J. Shotton
Regression forest
ddxx ),...,( 1v
Source: A. Criminisi, J. Shotton
Randomness
Bagging: each tree is learned on subset of the whole training set
Randomness
Randomized node optimization: optimize a split function at the jth node w.r.t. a small random subset of parameter values
),(maxarg jj SI ),(maxarg jj SI
j!!!
j
),,( jjjj τ jjjτ
selects features from the whole feature setis a weak learner type (axis-aligned, linear, etc.)
is a set of splitting thresholdsSource: A. Criminisi, J. Shotton
Forest vs tree
Source: A. Criminisi, J. Shotton
The labeled database
Source: A. Criminisi, J. Shotton
Anatomy localization
Key idea: all voxels in the image vote for the position of the organ
Each organ is defined by its 3D axis-aligned bounding box
Cc),,,,,( F
cHc
Pc
Ac
Rc
Lcc bbbbbbb
C = {liver, spleen, kidneyL, kidneyR, …}
Source: A. Criminisi, J. Shotton
Anatomy localization
For each input voxel we obtain distribution of relative displacements to the organ bounding box
),,( zyx vvvv
),,,,,()( Fc
Hc
Pc
Ac
Rc
Lcc ddddddd v
);( vf – feature response
Source: A. Criminisi, J. Shotton
Context-rich features
Features: mean intensity in randomly displaced boxes
Source: A. Criminisi, J. Shotton
Features for CT and MRI
CT: we can rely on absolute intensity values
MRI: only intensity difference makes senseSource: A. Criminisi, J.
Shotton
Learning clinically useful information from medical images
Biomedical Image Analysis Group Department of Computing Daniel Rueckert
Segmentation using registration
Slide by D. Rueckert
Multi-atlas segmentation using classifier fusion
Multi-atlas segmentation using classifier fusion and selection
Selection of atlases
How to select atlases the most similar to our image?
Atlases should be clustered by disease/population
Manifold learning is used to efficiently discover such clusters
Manifold learning
Source: D. Rueckert
Embed the data to the manifold
(project to less-
dimensional space)
Find a manifold
Manifold learning: Laplacian eigenmaps
Given a graph G = (V, E) Each vertex vi corresponds to an image Each edge weight wij defines the similarity
between image i and j Define diagonal matrix T which contains the
degree sums for each vertex
j ijii wt
Slide by D. Rueckert
Manifold learning: Laplacian eigenmaps
2/12/1 )( TWTTL
Normalized graph Laplacian
2
,min jiji ij yyW
The eigen decomposition of L provides manifold
coordinates yi for each vertex i (or image)
Source: D. Rueckert
Manifold learning for multi-atlas segmentation
We have two sets of images: labeled (atlases) unlabeled
We want to label all the unlabeled images
We can do it iteratively: label a part of unlabeled images using the
most similar from already labeled these images can be used as atlases for the
next iteration
Manifold learning for multi-atlas segmentation
Wolz et al., Neuroimage, 2010
Example
Wolz et al., Neuroimage, 2010
Segmentation of brain lesions in MRI
Olga V. Senyukova, “Segmentation of blurred objects by classification of isolabel contours”. Pattern Recognition, 2014
Data was provided by Children's Clinical and Research Institute Emergency Surgery and Trauma
The proposed algorithm
Each MRI slice is processed separately In order to improve speed and
robustness the regions containing lesions can be specified manually
Lesions inside these regions are segmented automatically
Algorithm overview
Input regionIsolabel contours
I(x,y)=const
Closed isolabel contours
Nonlinear SVM classification
Isolabel contours
In geographyeach isolabel contour (one color): constant height f(x,y)=h
In image processingeach isolabel contour (one color): constant intensity f(x,y)=I
How to distinguish lesion contours? Visually we can do it easily! Let’s use the same set of features for
automatic classification of isolabel contours
Features of isolabel contoursIn order to distinguish isolabel contours delineating lesions we propose 4 features
Imean Imean inside the contour / Imean inside BBox
Imax-IminIvariance
Labeled training base
Various regions on many images: a user can click on lesion contours: they will
get “lesion” other isolabel contours will automatically get
“non-lesion”
…, ,
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion
[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion[ɸ1, ɸ2, ɸ3, ɸ4] -> non-lesion[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion[ɸ1, ɸ2, ɸ3, ɸ4] -> lesion…
[ɸ1, ɸ2, ɸ3, ɸ4] is a feature vector
Binary classification via SVM We have a binary classification task:
each isolabel contour belongs to one of two classes, lesions or non-lesions
One of the best classifiers is SVM – Support Vector Machine original linear SVM: Vladimir Vapnick,
Alexey Chervonenkis, 1963 applying a kernel trick results in nonlinear
SVM: Bernhard Boser, Isabelle Guyon, Vladimir Vapnick, 1992
Linear SVM
support vectors margin
1:1 by ii wx
1:1 by ii wx
positive samples
negative samples
w/2
Maximizing we solve quadratic optimization problem:
w/2
wwT
2
1
1)( bxy ii w
minimizing
subject to
byb iii i xxxw Solution is a hyperplane:
ix
i– support vectors
– learned weights
Nonlinear SVM
For linearly separable data linear SVM is excellent
What about the data that is not linearly separable?..
We can make it linearly separable by mapping it to more-dimensional space
Nonlinear SVM: kernel trick
by iii i xx bKy iii i ),( xxInstead of we have
)()(),( jijiK xxxx where
2exp),( jijiK xxxx
For classification of isolabel contours I use nonlinear SVM with RBF (radial basis
function) kernel
Ensemble-based analysis of RR and gait
Olga Senyukova Valeriy Gavrishchaka, Department of
Physics, West Virginia University Springer, 2013, 2015
RR and gait time series
Normal?
Huntington’s disease?
Parkinson’s disease?
…
Normal?
Arrhythmia?
Congestive heart failure?
…
Ensemble learning techniques Ensemble can work better than a single
classifier
…
accuracy: 0.61 accuracy: 0.73 accuracy: 0.65
base classifier 1
base classifier 2
base classifier N
Ensemble of classifiers
accuracy: 0.9
AdaBoost
Freund and Schapire, 1997 On each iteration focuses on the most
hard-to-classify samples
AdaBoost
– training data, – labels
Initial weights of all N items: M iterations, from m = 1 to M: find if then stop set
update Classifier output:
Nwi /1)0(
))(()(1
M
m mmTsignH xx
Nii ,...,1, x }1;1{ iy
)]([)(minarg)(1
imi
m
imj
Tm TyiwT
j
xx
2/1m
m
mm
1log
2
1
t
imimmm Z
Tyiwiw
)(exp)()(1
x
Good classifier example
Iteration 1 of 3
T1
Iteration 2 of 3
T2
Iteration 3 of 3
STOPT3
Final model
)](72.0)(70.0)(42.0[ 321 xxx TTTsign
Ensemble decomposition learning We apply ensemble-based classifier to
vector x
Each x can be described by its ensemble decomposition vector (EDL vector)
We can classify data points by comparing their EDL vectors
M
m mmTH1
)()( xx
)](,),(),([)( 2211 xxxx MMTTTD
EDL: learning
All available data«normal/abnormal»
MSE
DFA
AdaBoost
Indicators from nonlinear dynamics
Building a general classifier«normal/abnormal»
MSE1
DFA2 …
MSEN
α1 + α2 + αN
Ансамбль классификаторов
Training example x
MSE1
DFA2 …
MSEN
α1 + α2 + αN
Applying the ensemble
MSE
+1 (normal) -1 (abnormal) +1 (normal) -1 (abnormal)
DFA
)]1(*,),1(*,1*[)( 21 MD xEDL vector
EDL: testing
Testing example y
MSE1
DFA2 …
MSEN
α1 + α2 + αN
Applying the ensemble
]1*,,1*),1(*[)( 21 MD y
)()( yx DD
?x = yx ≠ y
EDL vector
no yes
In multi-class classification problem the class of y is the class of the training example with the closest EDL vector
))()((min)()(: yy DxDDxDCik C
iCk
Results
CHF/Arrhythmia classification Real data from
http://www.physionet.org/physiobank
Thank you for attention!
knizhnayaraduga.ru