Post on 22-Nov-2014
description
PREDICTING STROKE PATIENT
RECOVERY FROM BRAIN IMAGES:
A MACHINE LEARNING
APPROACH
Alastair Smith
Supervised by Prof. Glyn Humphreys
1
Introduction
Objectives
Can machine learning techniques applied to Computed
Tomography (CT) brain imaging data provide meaningful
predictions of functional recovery in stroke patients?
By exploring multiple machine learning techniques examine which approach provides
the most accurate predictions?
What aspects of the images is utilised by the machine learning algorithms to inform
predictions?
2
Introduction
Stroke: The Consequences
Recovery & Rehabilitation:
Effects include physical disability, loss of cognitive and communication skills, mental
health problems.
Recovery program specific to patient symptoms and commonly requires intervention
from physiotherapists, psychologists, occupational therapists, speech therapists and
specialist nurses and doctors.
A third of patients make a close to full recovery physically and are able to live an
independent life, a third will require assistance in daily activities, and a third of
patient will die within a year. (http://www.nhs.uk)
Impact in the U.K. (National Stroke Strategy, 2007)
Every year approximately 110,000 people in England have a stroke, with over 900,000
people currently living in England who have had a stroke.
Stroke is the single largest cause of adult disability with a third of people who have a
stroke left with long-term disability.
Stroke costs the NHS and the economy about £7 billion a year, despite U.K. services being
among the most expensive, outcomes for U.K. patients are comparatively poor with
unnecessarily long lengths of stay and high levels of avoidable disability and mortality.
3
Introduction
Machine Learning Techniques:
Increasingly Influential in Neuroscience and Clinical Medicine
(Belazzi & Zupan, 2008)
Informing individual patient management, selecting appropriate
treatments (Seker et al, 2003)
Brain Imaging Data
Large number of features, small number of samples
Avoids ‘overfitting’ problem
Machine Learning & Brain Imaging (1) 4
Introduction
MRI & fMRI
Support Vector Machine (SVM) applied to MRI data
Ecker et al (2010), Autistic Spectrum Disorder
Kloppel et al (2008), Alzheimer's Disease (acc = 96%, n=68)
Detection of other diseases: Fan et al (2005), Kawasaki et al (2007)
SVM applied to fMRI data
Classifiers developed to distinguish between stimuli, mental states and behaviours, demonstrating data contains sufficient information
For review see Norman et al (2006) and Haynes & Rees (2006)
Saur et al (2010) predicting recovery of stroke patients language abilities after 6 months, (acc = 76%, n=21)
Relevance Vector Regression (RVR) applied to fMRI data
Stonnington et al (2010):
Predicted continuous measure
Clinical measures of Alzheimer's Disease
Predicted Score and actual scores highly correlated (p<0.0001, n=163)
Machine Learning & Brain Imaging (2) 5
Introduction
Machine Learning & Brain Imaging (2)
PET & RVM
Phillips et al (2011):
Distinguish between levels of consciousness
Acc = 100%, n = 58
Computed Tomography (CT)
Automated image segmentation, Li et al (2006)
Haemorrhage detection, Liu et al (2008)
Reid et al (2010):
CT derived variables did not significantly improve multivariate logistic
regression models predictions of functional recovery in stroke patients
6
Method
Nottingham Extended ADL
Ranked assessment of patients ability to complete activities of daily living (ADL) independently
Developed specifically for use with stoke patients (Nouri & Lincoln, 1987)
Completed by patient or carer via post or interview
Demonstrated to be a useful measure of outcome in stroke research
Gladman et al (1993)
Cited in 14 studies as a measure of stroke patient outcomes (Green et al, 2001)
Composed of 21 questions, split in to 4 subsections:
Mobility, Kitchen, Domestic, Leisure
High scores indicate low disability
Maximum score = 21, Minimum Score = 0
7
Method
Data Acquisition
Participants
Patients of to stroke units within West Midlands area
Recruited as part of Birmingham University Cognitive Screen (BUCS) project
All patients selected for current study had suffered ischemic stroke
8
Inclusion Criteria:
• Informed Consent
• New Acute Stroke
• Alert
• Sufficient English Comprehension
Exclusion Criteria:
• Unwell
• Decline to participate
• Concentration span <35mins
Age Time from stroke
to scan (days) Time from stroke
to testing (days) n
NEADL 69.54 1.79 299.3 155
Method
NEADL data sets 9
Score n Mean SD
Good Recovery >=17 65 19.3 1.46
Poor Recovery <17 90 9.02 4.72
Very Good Recovery >=17 65 19.3 1.46
Very Poor Recovery <=12 65 14.5 1.24
0
2
4
6
8
10
12
14
16
18
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
No.
NEADL
Very
Good
Reco
very
To
p 4
2 p
erce
ntile
Very
Poor
Reco
very
Bott
om
42
perc
ent
ile
Good Recovery Poor Recovery
Method
Data Acquisition
Computed Tomography (CT) images:
Capture density of tissue
In-plane resolution 0.5x0.5mm², slice thickness 4-5mm
Whole Brain
Pre-processing & Image Compression
Images of poor quality (due to head movement or other imaging issues) removed from sample
Images normalised to an in-house CT template (Ashbumer & Friston, 2003) using SPM8
Images segmented using unified segmentation SPM8 (Seghier et al, 2005) to form Grey Matter, White Matter and Cerebrospinal Fluid images
A further Abnormal tissue class was produced by adding an additional probability map (Seghier et al, 2008)
Smoothed Grey and White matter using a 12mm³ FHWM Gaussian kernel
10
Method
Training & Testing
Cross Validation
Applied in 5 folds
Data set(s) randomly divided into 5 equal test sets
In each fold
Model trained on all samples not present in test set
Model tested on ability to assign correct labels to test set
Measures of performance
Performance measures record mean performance across all 5 folds
Accuracy = Proportion of correct classifications
Specificity = Proportion of samples correctly classified as ‘Bad’
Sensitivity = Proportion of samples correctly classified as ‘Good’
MCC = Matthews Correlation Coefficient (Matthews, 1975)
Common measure of performance for classifiers within machine learning literature
Balanced measure allows for uneven samples
Correlation coefficient equal to phi coefficient
+1 = perfect prediction
11
Method
Improving Efficiency 12
Recursive Feature Elimination (RFE):
Features with the lowest weights attributed by the model are eliminated
iteratively
On each iteration:
Feature with lowest weight identified and eliminated from training data
New model trained on new training set
Training therefore becomes focused on voxels for which high weights are
assigned
Principle Component Analysis (PCA):
Reduce dimensionality of data set
Transforms set of correlated variables to smaller set of set of
uncorrelated variables
PCA applied to 2D data set (Jehan, 2005)
Method
Machine Learning Techniques
Support Vector Machine (Classifier):
Images treated as points in higher dimensional space
SVM aims to identify a hyperplane that separates the two classes, while maximising the distance between classes.
The hyperlane is defined by the set of images (support vectors) that lie on the maximal margin
Joachims (2002, 1999), based on Vapnik (1995)
Sparse Logistic Regression (Classifier):
Logistic regression method applied within Bayesian framework
Sparse Gaussian prior is assumed with mean zero
Iterative algorithm in which least informative features are pruned according to assigned weights
Yamashita et al (2008)
Relevance Vector Machine (Classification & Regression)
Applies Bayesian techniques within a functional form similar to that of an SVM
Probabilistic model therefore able to indicate probability of class membership
By altering the conditional distribution of the target variable RVMs can be applied to both classification and regression problems
Tipping et al (2001, 2003).
13
Optimal Separating Hyperplane defined by
set of support vectors
Results
NEADL Results (SVM) 14
SVM
Standard with PCA with RFE 99% Var Extremes
Tissue Type UnG AbT AbT AbT SmG
Accuracy / Pearson's r max 65% 69% 69% 70% 74%
mean n/a 59% 62% 60% 65%
Sensitivity max 54% 46% 66% 66% 71%
Specificity max 73% 87% 71% 73% 76%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48
p< max 0.001 0.001 0.0001 0.0001 0.0001
Results
NEADL Results (SVM) 15
SVM
Standard with PCA with RFE 99% Var Extremes
Tissue Type UnG AbT AbT AbT SmG
Accuracy / Pearson's r max 65% 69% 69% 70% 74%
mean n/a 59% 62% 60% 65%
Sensitivity max 54% 46% 66% 66% 71%
Specificity max 73% 87% 71% 73% 76%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48
p< max 0.001 0.001 0.0001 0.0001 0.0001
Results
NEADL Results (SVM) 16
SVM
Standard with PCA with RFE 99% Var Extremes
Tissue Type UnG AbT AbT AbT SmG
Accuracy / Pearson's r max 65% 69% 69% 70% 74%
mean n/a 59% 62% 60% 65%
Sensitivity max 54% 46% 66% 66% 71%
Specificity max 73% 87% 71% 73% 76%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48
p< max 0.001 0.001 0.0001 0.0001 0.0001
Results
NEADL Results (SVM) 17
SVM
Standard with PCA with RFE 99% Var Extremes
Tissue Type UnG AbT AbT AbT SmG
Accuracy / Pearson's r max 65% 69% 69% 70% 74%
mean n/a 59% 62% 60% 65%
Sensitivity max 54% 46% 66% 66% 71%
Specificity max 73% 87% 71% 73% 76%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48
p< max 0.001 0.001 0.0001 0.0001 0.0001
Results
NEADL Results (SVM) 18
SVM
Standard with PCA with RFE 99% Var Extremes
Tissue Type UnG AbT AbT AbT SmG
Accuracy / Pearson's r max 65% 69% 69% 70% 74%
mean n/a 59% 62% 60% 65%
Sensitivity max 54% 46% 66% 66% 71%
Specificity max 73% 87% 71% 73% 76%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48
p< max 0.001 0.001 0.0001 0.0001 0.0001
Results
NEADL Results (SVM) 19
SVM
Standard with PCA with RFE 99% Var Extremes
Tissue Type UnG AbT AbT AbT SmG
Accuracy / Pearson's r max 65% 69% 69% 70% 74%
mean n/a 59% 62% 60% 65%
Sensitivity max 54% 46% 66% 66% 71%
Specificity max 73% 87% 71% 73% 76%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48
p< max 0.001 0.001 0.0001 0.0001 0.0001
Frontal Section Horizontal Plane
Sagittal Plane
Relevance map threshold at 90%: • Voxels with weights (absolute value)
attributed by model in top 10 percentile
• Blue = negative weight
• Red = positive weight
R L R L
Results
NEADL Results (SVM & SLR) 20
SVM SLR
Standard with PCA with RFE 99% Var Extremes Standard with PCA
(99%) & RFE
Tissue Type UnG AbT AbT AbT SmG UnG AbT
Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68%
mean n/a 59% 62% 60% 65% n/a 58%
Sensitivity max 54% 46% 66% 66% 71% 50% 74%
Specificity max 73% 87% 71% 73% 76% 63% 62%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37
p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001
Results
NEADL Results (SVM & SLR) 21
SVM SLR
Standard with PCA with RFE 99% Var Extremes Standard with PCA
(99%) & RFE
Tissue Type UnG AbT AbT AbT SmG UnG AbT
Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68%
mean n/a 59% 62% 60% 65% n/a 58%
Sensitivity max 54% 46% 66% 66% 71% 50% 74%
Specificity max 73% 87% 71% 73% 76% 63% 62%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37
p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001
Results
NEADL Results (SVM, SLR & RVM) 22
SVM SLR RVM
Standard with PCA with RFE 99% Var Extremes Standard with PCA
(99%) & RFE
Standard with PCA
(99%) & RFE
Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT
Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69%
mean n/a 59% 62% 60% 65% n/a 58% 58%
Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%
Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40
p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001
Results
NEADL Results (SVM, SLR & RVM) 23
SVM SLR RVM
Standard with PCA with RFE 99% Var Extremes Standard with PCA
(99%) & RFE
Standard with PCA
(99%) & RFE
Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT
Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69%
mean n/a 59% 62% 60% 65% n/a 58% 58%
Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%
Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40
p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001
Results
NEADL Results (SVM, SLR, RVM & RVR) 24
SVM SLR RVM RVR
Standard with PCA with RFE 99% Var Extremes Standard with PCA
(99%) & RFE
Standard with PCA
(99%) & RFE
Standard with PCA (99%), RFE
& Standardised Scores
Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT UnG AbT
Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69% 0.28 0.39
mean n/a 59% 62% 60% 65% n/a 58% 58% n/a 0.35
Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%
Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40 6.75 0.76
p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001 0.001 0.0001
Results
NEADL Results (SVM, SLR, RVM & RVR) 25
SVM SLR RVM RVR
Standard with PCA with RFE 99% Var Extremes Standard with PCA
(99%) & RFE
Standard with PCA
(99%) & RFE
Standard with PCA (99%), RFE
& Standardised Scores
Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT UnG AbT
Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69% 0.28 0.39
mean n/a 59% 62% 60% 65% n/a 58% 58% n/a 0.35
Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%
Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%
MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40 6.75 0.76
p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001 0.001 0.0001
Discussion
Summary 26
Abnormal Tissue, Smoothed Grey Matter and Unsmoothed Grey Matter consistently
outperform other tissue types
Application of PCA and RFE improves model performance
Best performance produced when model trained on extreme samples within data set
RVM, SVM & SLR classifiers predict patient recovery with significant levels of accuracy
(p<0.001)
SVM & RVM produce similar levels of performance yet outperform SLR
RVR predictions are highly correlated with true scores (p<0.001)
Discussion
Wider Implications
Performance comparable to results in literature
Saur et al (2010) predict language outcome 6 months after stroke with 76% accuracy
using SVM classifier
Stonnington et al (2010) correlation between predicted and actual clinical measures of
Alzheimer's Disease (P<0.0001)
Stroke lesions generally more heterogeneous than those typically found in
Alzheimer's Disease patients
Few studies within currently literature applying Machine Learning to CT data to
predict patient recovery
27
Discussion
Methodological Issues
Model evaluation and selection
Noise may account for maximum values
Accepted methods of evaluation and model selection:
Average across 100 trials with sample order randomised
Adapt algorithm to select when performance peaks
Analyse in the context of 100 random trials with scores randomly assigned
28
Discussion
Future Study
Improving Performance:
Poor performance currently restricts application to patient management or assessment of intervention programs
Additional Variables – e.g. blood vessel effected
Isolate ROI:
Informed by literature (Saur et al, 2010)
Weight maps (Ecker, 2010)
Ensemble methods (Optiz, 1999):
Train on individual lobes
Bootstrap Aggregating
Predict improvement in ADL scores
Saur at al, 2010
Investigate role of weighted voxels
29
Discussion
Acknowledgments
Alan Meeson
Provided:
Original code for machine learning algorithms
Support and guidance throughout project
Vaia Lestou
Assisted in the design and analysis of current study
30