Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

PREDICTING STROKE PATIENT

RECOVERY FROM BRAIN IMAGES:

A MACHINE LEARNING

APPROACH

Alastair Smith

Supervised by Prof. Glyn Humphreys

Introduction

Objectives

Can machine learning techniques applied to Computed

Tomography (CT) brain imaging data provide meaningful

predictions of functional recovery in stroke patients?

By exploring multiple machine learning techniques examine which approach provides

the most accurate predictions?

What aspects of the images is utilised by the machine learning algorithms to inform

predictions?

Introduction

Stroke: The Consequences

Recovery & Rehabilitation:

Effects include physical disability, loss of cognitive and communication skills, mental

health problems.

Recovery program specific to patient symptoms and commonly requires intervention

from physiotherapists, psychologists, occupational therapists, speech therapists and

specialist nurses and doctors.

A third of patients make a close to full recovery physically and are able to live an

independent life, a third will require assistance in daily activities, and a third of

patient will die within a year. (http://www.nhs.uk)

Impact in the U.K. (National Stroke Strategy, 2007)

Every year approximately 110,000 people in England have a stroke, with over 900,000

people currently living in England who have had a stroke.

Stroke is the single largest cause of adult disability with a third of people who have a

stroke left with long-term disability.

Stroke costs the NHS and the economy about £7 billion a year, despite U.K. services being

among the most expensive, outcomes for U.K. patients are comparatively poor with

unnecessarily long lengths of stay and high levels of avoidable disability and mortality.

Introduction

Machine Learning Techniques:

Increasingly Influential in Neuroscience and Clinical Medicine

(Belazzi & Zupan, 2008)

Informing individual patient management, selecting appropriate

treatments (Seker et al, 2003)

Brain Imaging Data

Large number of features, small number of samples

Avoids ‘overfitting’ problem

Machine Learning & Brain Imaging (1) 4

Introduction

MRI & fMRI

Support Vector Machine (SVM) applied to MRI data

Ecker et al (2010), Autistic Spectrum Disorder

Kloppel et al (2008), Alzheimer's Disease (acc = 96%, n=68)

Detection of other diseases: Fan et al (2005), Kawasaki et al (2007)

SVM applied to fMRI data

Classifiers developed to distinguish between stimuli, mental states and behaviours, demonstrating data contains sufficient information

For review see Norman et al (2006) and Haynes & Rees (2006)

Saur et al (2010) predicting recovery of stroke patients language abilities after 6 months, (acc = 76%, n=21)

Relevance Vector Regression (RVR) applied to fMRI data

Stonnington et al (2010):

Predicted continuous measure

Clinical measures of Alzheimer's Disease

Predicted Score and actual scores highly correlated (p<0.0001, n=163)

Machine Learning & Brain Imaging (2) 5

Introduction

Machine Learning & Brain Imaging (2)

PET & RVM

Phillips et al (2011):

Distinguish between levels of consciousness

Acc = 100%, n = 58

Computed Tomography (CT)

Automated image segmentation, Li et al (2006)

Haemorrhage detection, Liu et al (2008)

Reid et al (2010):

CT derived variables did not significantly improve multivariate logistic

regression models predictions of functional recovery in stroke patients

Method

Nottingham Extended ADL

Ranked assessment of patients ability to complete activities of daily living (ADL) independently

Developed specifically for use with stoke patients (Nouri & Lincoln, 1987)

Completed by patient or carer via post or interview

Demonstrated to be a useful measure of outcome in stroke research

Gladman et al (1993)

Cited in 14 studies as a measure of stroke patient outcomes (Green et al, 2001)

Composed of 21 questions, split in to 4 subsections:

Mobility, Kitchen, Domestic, Leisure

High scores indicate low disability

Maximum score = 21, Minimum Score = 0

Method

Data Acquisition

Participants

Patients of to stroke units within West Midlands area

Recruited as part of Birmingham University Cognitive Screen (BUCS) project

All patients selected for current study had suffered ischemic stroke

Inclusion Criteria:

• Informed Consent

• New Acute Stroke

• Alert

• Sufficient English Comprehension

Exclusion Criteria:

• Unwell

• Decline to participate

• Concentration span <35mins

Age Time from stroke

to scan (days) Time from stroke

to testing (days) n

NEADL 69.54 1.79 299.3 155

Method

NEADL data sets 9

Score n Mean SD

Good Recovery >=17 65 19.3 1.46

Poor Recovery <17 90 9.02 4.72

Very Good Recovery >=17 65 19.3 1.46

Very Poor Recovery <=12 65 14.5 1.24

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Good Recovery Poor Recovery

Method

Data Acquisition

Computed Tomography (CT) images:

Capture density of tissue

In-plane resolution 0.5x0.5mm², slice thickness 4-5mm

Whole Brain

Pre-processing & Image Compression

Images of poor quality (due to head movement or other imaging issues) removed from sample

Images normalised to an in-house CT template (Ashbumer & Friston, 2003) using SPM8

Images segmented using unified segmentation SPM8 (Seghier et al, 2005) to form Grey Matter, White Matter and Cerebrospinal Fluid images

A further Abnormal tissue class was produced by adding an additional probability map (Seghier et al, 2008)

Smoothed Grey and White matter using a 12mm³ FHWM Gaussian kernel

Method

Training & Testing

Cross Validation

Applied in 5 folds

Data set(s) randomly divided into 5 equal test sets

In each fold

Model trained on all samples not present in test set

Model tested on ability to assign correct labels to test set

Measures of performance

Performance measures record mean performance across all 5 folds

Accuracy = Proportion of correct classifications

Specificity = Proportion of samples correctly classified as ‘Bad’

Sensitivity = Proportion of samples correctly classified as ‘Good’

MCC = Matthews Correlation Coefficient (Matthews, 1975)

Common measure of performance for classifiers within machine learning literature

Balanced measure allows for uneven samples

Correlation coefficient equal to phi coefficient

+1 = perfect prediction

Method

Improving Efficiency 12

Recursive Feature Elimination (RFE):

Features with the lowest weights attributed by the model are eliminated

iteratively

On each iteration:

Feature with lowest weight identified and eliminated from training data

New model trained on new training set

Training therefore becomes focused on voxels for which high weights are

assigned

Principle Component Analysis (PCA):

Reduce dimensionality of data set

Transforms set of correlated variables to smaller set of set of

uncorrelated variables

PCA applied to 2D data set (Jehan, 2005)

Method

Machine Learning Techniques

Support Vector Machine (Classifier):

Images treated as points in higher dimensional space

SVM aims to identify a hyperplane that separates the two classes, while maximising the distance between classes.

The hyperlane is defined by the set of images (support vectors) that lie on the maximal margin

Joachims (2002, 1999), based on Vapnik (1995)

Sparse Logistic Regression (Classifier):

Logistic regression method applied within Bayesian framework

Sparse Gaussian prior is assumed with mean zero

Iterative algorithm in which least informative features are pruned according to assigned weights

Yamashita et al (2008)

Relevance Vector Machine (Classification & Regression)

Applies Bayesian techniques within a functional form similar to that of an SVM

Probabilistic model therefore able to indicate probability of class membership

By altering the conditional distribution of the target variable RVMs can be applied to both classification and regression problems

Tipping et al (2001, 2003).

Optimal Separating Hyperplane defined by

set of support vectors

Results

NEADL Results (SVM) 14

Standard with PCA with RFE 99% Var Extremes

Tissue Type UnG AbT AbT AbT SmG

Accuracy / Pearson's r max 65% 69% 69% 70% 74%

mean n/a 59% 62% 60% 65%

Sensitivity max 54% 46% 66% 66% 71%

Specificity max 73% 87% 71% 73% 76%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48