Decrypting Cryptogenic Epilepsy: Machine Learning Methods ...
Transcript of Decrypting Cryptogenic Epilepsy: Machine Learning Methods ...
Decrypting Cryptogenic Epilepsy: Machine
Learning Methods for Detecting Cortical
Malformations
A dissertation submitted by
Bilal Ahmed
In partial fulfillment of the requirements
for the degree of Doctor of Philosophy in
Computer Science
TUFTS UNIVERSITY
May 2016
ADVISER: Carla E. Brodley
ii
Abstract
Epilepsy is a common neurological disorder, affecting approximately 1% of
the world’s population. Uncontrolled epilepsy can have harmful effects on the
brain and increases the risk of injuries and sudden death. Cortical malforma-
tions, particularly focal cortical dysplasia (FCD) is recognized as one of the
most common source of treatment resistant epilepsy (TRE). Surgical resection
of the abnormal tissue is the only treatment for TRE patients, and a success-
ful outcome results in complete seizure freedom. Chances of success when the
lesion is visually detected on the MRI (MRI-positive) are 66%, and only 29%
for cases with undetected lesions (MRI-negative). Approximately 45%-60% of
histologically confirmed FCD lesions are missed by expert neuroradiologists.
This dissertation develops automated methods of detecting cortical mal-
formations in MRI-negative patients using surface-based morphometry. Using
data from MRI-negative patients to train machine learning (ML) algorithms
has a number of confounding factors that limit their applicability to the lesion
detection task. These include, label noise arising from subjectivity in deter-
mining the cortical region to resect without a visible abnormality. Similarly,
inter-subject and intra-subject variations in brain morphology limit the gener-
alization of ML methods trained on data aggregated from different individuals.
To address these issues we develop two novel ML methods. We propose a mul-
titask learning (MTL) method that models each patient as a separate learning
task, and uses the results of intra-cranial EEG exam as added supervision to
mitigate label noise. Next, we develop hierarchical conditional random fields
(HCRF) for outlier-detection, which is a semi-supervised learning method that
does not require labeled training data. By correcting for all three factors (i.e.,
label noise, intra-subject and inter-subject variation) HCRF outperforms the
baseline methods and the MTL method.
iii
The high detection rate (75% for HCRF) of the proposed methods for
MRI-negative patients shows that some electrophysiologically and histologi-
cally abnormal cortical regions are not visually apparent to the human eye
but can be detected using ML methods. Incorporating such ML methods in
the pre-surgical evaluation protocol have the potential to enhance the chances
of detecting the lesion prior to surgery, leading to an increased number of
patients being referred to resective surgery.
iv
Acknowledgements
Foremost, I would like to express my sincere gratitude to my thesis advisor
Prof. Carla E. Brodley for her supervision of my research. I consider that all
I have achieved during the course of my doctorate, and the fun I have had
would not have been possible without her support and patience. I would like
to thank my thesis committee: Prof. Roni Khardon, Prof. Ben Hescott, Prof.
Shuchin Aeron and Dr. Thomas Thesen, for their insightful comments and
suggestions.
I am also grateful to the following former or current staff at Tufts Univer-
sity, for their support during my graduate study: Jeannine Vangelist, Donna
Cirelli, Gail Fitzgerald, Sarah Richmond, George D. Preble and the excellent
Systems support staff. I have benefited immensely from the advice of the
people at the Tufts research computing group.
I would like to specially thank the Epilepsy Foundation, USA for awarding
me the pre-doctoral training scholarship, and also FACES (finding a curing
for epilepsy and seizures) organization for their financial support.
My friends have helped me stay sane through these difficult years. Their
support and care helped me overcome difficult times and stay the course in my
graduate studies. I greatly value their friendship and I deeply appreciate their
belief in me. I would especially like to thank, Mashhood Ishaque, Noman
H. Khan, Ehsan Ullah, Nathan Ricci, Saeed Majidi, Alireza Aghassi, Gilad
Barash, Haris Ghafoor, Mamoon Raja, Abdur-Rehman Rashid and Syed Musa
Bukhari.
None of this would have been possible without the never-ending love and
unconditional support of my parents, Baba and Ammi. They inspired me to
reach for the stars and dream big. I would also like to thank my parents-in-law,
Uncle and Aunty. I am also indebted to my loving wife, Sabeeka for believing
v
in me even under the most trying circumstances, and for the numerous pep-
talks that showed me the light when everything seemed bleak.
vi
Contents
1 Introduction 1
1.1 Lesion Detection in Epilepsy Patients . . . . . . . . . . . . . . 2
1.2 Machine Learning For Lesion Detection . . . . . . . . . . . . . 3
1.3 Intra-cranial EEG as Auxiliary Supervision . . . . . . . . . . . 5
1.4 Identifying Lesions As Outliers . . . . . . . . . . . . . . . . . 8
1.5 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Focal Cortical Dysplasia and surface-based Morphometry 12
2.1 Focal Cortical Dysplasia . . . . . . . . . . . . . . . . . . . . . 13
2.2 Surface-Based Morphometry . . . . . . . . . . . . . . . . . . . 15
2.2.1 Surface Reconstruction . . . . . . . . . . . . . . . . . . 15
2.2.2 Registration . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Morphological Features . . . . . . . . . . . . . . . . . . 17
2.3 Radiological Features of FCD Lesions . . . . . . . . . . . . . . 20
2.4 Computational Methods for Detecting FCD using surface-based
Morphometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 A Vertex-Based Classifier 25
3.1 Eliminating Label Noise . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Removing False Positives . . . . . . . . . . . . . . . . . 27
vii
3.1.2 Removing False Negatives . . . . . . . . . . . . . . . . 29
3.2 Reducing Cortical Complexity . . . . . . . . . . . . . . . . . . 29
3.3 Overcoming Class Imbalance . . . . . . . . . . . . . . . . . . . 30
3.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . 32
3.4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.1 Data Stratification . . . . . . . . . . . . . . . . . . . . 38
3.5.2 Mask Reduction . . . . . . . . . . . . . . . . . . . . . . 42
3.5.3 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Leveraging iEEG for FCD Lesion Detection 47
4.1 Multitask Learning . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 MTL with Auxiliary Label Information . . . . . . . . . . . . . 52
4.2.1 Regularized Multi-task Learning (MTL) . . . . . . . . 53
4.2.2 Incorporating Auxiliary Label Information . . . . . . . 54
4.2.3 Globally-Consistent Label Ranking (GC) . . . . . . . . 56
4.2.4 Task-Specific Label Ranking (TS) . . . . . . . . . . . . 59
4.3 Detecting Cortical Malformations . . . . . . . . . . . . . . . . 61
4.3.1 Data Description . . . . . . . . . . . . . . . . . . . . . 61
4.3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.3 Creating Electrode Maps . . . . . . . . . . . . . . . . . 63
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.1 Baseline Selection . . . . . . . . . . . . . . . . . . . . . 65
4.4.2 Experimental Setup: . . . . . . . . . . . . . . . . . . . 66
viii
4.4.3 Performance Analysis: . . . . . . . . . . . . . . . . . . 68
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Hierarchical Conditional Random Fields For Detecting FCD
Lesions 73
5.1 Hierarchical Conditional Random Fields . . . . . . . . . . . . 76
5.2 HCRFs for Lesion Detection . . . . . . . . . . . . . . . . . . . 77
5.2.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.2 HCRF Construction . . . . . . . . . . . . . . . . . . . 80
5.2.3 Lesion Detection . . . . . . . . . . . . . . . . . . . . . 84
5.3 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.1 Data Pre-processing and Parameter Selection . . . . . 86
5.3.2 Evaluation Methodology . . . . . . . . . . . . . . . . . 88
5.3.3 Cluster Ranking . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4.1 Individual Features . . . . . . . . . . . . . . . . . . . . 92
5.4.2 Combining Features . . . . . . . . . . . . . . . . . . . 97
5.4.3 Ranking Criterion and the Detection Rate . . . . . . . 104
5.5 HCRF versus Human Expert . . . . . . . . . . . . . . . . . . . 106
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6 Conclusion 110
Appendices 113
A Patient Information 114
B HCRF Results for MRI-Positive Patients 118
Bibliography 121
ix
List of Figures
2.1 Automatic segmentation of the gray/white matter boundary
and surface extraction. . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Registration of different cortical surfaces. . . . . . . . . . . . . 17
2.3 Summary of the five morphometric features estimated at each
cortical vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Manual mask reduction for an MRI-positive patient. . . . . . . 28
3.2 Overview of the training and test phase of a vertex-based classifier. 31
3.3 Detection results for the machine learning based approach on
an MRI-positive and an MRI-negative subject. . . . . . . . . . 37
3.4 Effects of changing the manually determined thresholds for mask
reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Effects of changing the classifier design on the detection rate of
MRI-negative patients . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Mapping iEEG electrodes on the cortical surface. . . . . . . . 64
5.1 Constructing a Hierarchical Conditional Random Field (HCRF)
for a flattened cortical parcellation image isolated using a neuro-
anatomical atlas. . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Detection results for MRI-negative patient NY67 using HCRF
and cortical thickness. . . . . . . . . . . . . . . . . . . . . . . 93
x
5.3 Comparison of detection rates, precision and recall between the
HCRF based approach and the baseline method using individual
morphological features. . . . . . . . . . . . . . . . . . . . . . . 95
5.4 Detection results for MRI-negative patient NY294 using HCRF
and cortical thickness. . . . . . . . . . . . . . . . . . . . . . . 96
5.5 Comparison of detection rates, precision and recall between the
HCRF based approach and the z-score based baseline method
when the detection scores are combined across features. . . . . 99
5.6 Comparison of detection rates, precision and recall between the
HCRF based approach and the logistic regression based baseline
method, when the detection scores are combined across features. 100
5.7 Comparison of detection rates, precision and recall between the
HCRF based approach and the z-score based baseline method
when the detection scores are combined across cortical thickness
and mean curvature. . . . . . . . . . . . . . . . . . . . . . . . 103
5.8 Cluster ranking criterion and its effects on the detection rate. 105
5.9 An MRI-positive patient with abnormal detections outside the
resection zone. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.1 Detection results for an MRI positive patient, using HCRF and
cortical thickness. . . . . . . . . . . . . . . . . . . . . . . . . . 119
B.2 Comparison of detection rates, precision and recall between
then HCRF based approach and the baseline method using in-
dividual morphological features for MRI-positive patients. . . 120
xi
List of Tables
3.1 The detection performance of the z-score baseline approach and
the proposed scheme (ML) on MRI-positive subjects. The true
positive rate (TPR) and false positive rate (FPR) are calculated
as the percentage of lesional vertices correctly labeled, and the
percentage of non-lesional vertices incorrectly labeled, respec-
tively. The Dice coefficient (DC) measuring the degree of spatial
overlap (shown here as a percentage) between the detected clus-
ters and the expert-marked lesion on the cortical surface is also
listed (‘-’ represents a value of zero, and for both TPR and DC
signifies that no abnormal cluster was detected that overlapped
with the lesion). . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Results for MRI-negative subjects. For each subject the true
positive rate (TPR) and false positive rate (FPR) are calcu-
lated as the percentage of lesional vertices correctly labeled,
and the percentage of non-lesional vertices incorrectly labeled,
respectively. The dice coefficient (DC) is also shown as a per-
centage to quantify the overlap between the detected clusters
and the resection on the cortical surface (‘-’ represents a value
of zero for FPR and no-detection for TPR and DC). . . . . . . 39
xii
3.3 A comparison of detection results using the z-score based method
and the ML methods only for MRI-positive subjects with differ-
ent variations in the design of the ML approach. (A) no strati-
fication along the sulcal values, (B) stratifies the data based on
the sulcal depth values, but does not reduce the lesion mask.
(C) uses stratification, lesion reduction by calculating a thresh-
old for each sulcal level using cortical thickness values, but it
does not use bagging (The TPR and FPR are measured as a
percentage and‘-’ represents a value of zero for FPR and no-
detection for TPR). . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 A comparison of detection results using the z-score based method
and the ML method only for MRI-negative subjects with differ-
ent variations in the design of the ML approach. (A) no strati-
fication along the sulcal values, (B) stratifies the data based on
the sulcal depth values, but does not reduce the lesion mask.
(C) uses stratification, lesion reduction by calculating a thresh-
old for each sulcal level using cortical thickness values, but it
does not use bagging (FPR is given as a percentage and ‘-’ rep-
resents a value of zero for FPR). . . . . . . . . . . . . . . . . . 41
4.1 Range of values for the model hyper-parameters used in the
grid search. The grid search optimized the area under the curve
(AUC) over the model parameter set (MPS) consisting of three
patients whose data is distinct from the fifteen patients used for
performance analysis. . . . . . . . . . . . . . . . . . . . . . . . 68
xiii
4.2 Detailed results for MRI-negative subjects. LDA is the Fisher
linear discriminant analysis based method adapted from [43],
ML represents the stratfified classification scheme described in
Chapter 3, MTL represents regularized MTL [31] without aux-
iliary supervision, GC and TS are the globally-consistent and
the task-specific approaches, respectively (‘-’ represents a value
of zero for FPR and no-detection for recall and precision, ‘*’
MRI-positive patients). . . . . . . . . . . . . . . . . . . . . . . 70
A.1 Demographic and seizure-related information for the MRI-positive
patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.2 Demographic and seizure-related information for the MRI-negative
patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
xiv
Chapter 1
Introduction
In this research we address the task of detecting structurally abnormal cortical
regions in patients suffering from treatment resistant epilepsy (TRE) caused
by focal cortical dysplasia (FCD). We take a machine learning approach that
utilizes the magnetic resonance imaging (MRI) data of TRE patients with the
end goal of enhancing the early detection rate in patients whose MRI scans
are deemed normal by neuroradiologists. Visual detection of the dysplastic
cortical region (FCD lesion) is dependent on various factors such as reviewer
training, location of the lesion within the complex convolutional structure of
the brain, etc. Section 1.1 introduces the clinical process of lesion detection for
TRE patients, and its impact on surgical outcomes. Section 1.2 formalizes the
lesion detection problem from a machine learning perspective and explains the
major confounding factors in the data that warrant the development of novel
learning techniques. Sections 1.2, 1.3 and 1.4 give an overview of three new
learning algorithms specifically tailored for the task of FCD lesion detection,
that constitute the main technical contributions of this research. In Section
1.6 we provide a guide to the rest of the thesis.
1
1.1 Lesion Detection in Epilepsy Patients
Epilepsy is a common neurological disorder, affecting approximately 1% of the
population [39]. It is characterized by profound abnormal neural activity dur-
ing seizures and inter-ictal periods. Uncontrolled epilepsy can have harmful
effects on the brain and has increased risk of injuries and sudden death [11].
About one third of epilepsy patients remain refractory to medical treatment
[55]. Cortical malformations, particularly focal cortical dysplasia (FCD) is rec-
ognized as the most common source of pediatric epilepsy [11, 96] and the third
most common source in adults suffering from TRE [45, 92]. Early detection
and subsequent surgical removal of the FCD lesion area is the most effective
treatment to stop seizures and is often the last hope for these patients.
For patients suffering from FCD based TRE, an initial radiological evalua-
tion of the patient’s MRI is carried out by a panel of experienced radiologists to
locate the lesion. In some cases the lesion is located based on visual inspection
(MRI-positive), while in most cases the patient’s MRI is read as normal (MRI-
negative). A number of factors such as the highly complex folded pattern of
the brain [44, 12], reviewer experience [107] and the specific characteristics of
the FCD lesion [64, 53] limit the chances of visually detection. Visual inspec-
tion is followed by an intracranial EEG (iEEG) exam. This invasive procedure
requires precise implantation of intracranial electrodes, which in the absence of
any target provided by the MRI becomes a challenging task. Once the seizure
onset zone is identified, then resective surgery can be performed. It should be
noted here that, resective surgery is dependant on the specific location of the
lesion. In certain cases resection may not be possible, such as a lesion in the
motor or visual cortex, in which case surgical resection will lead to a loss of
basic life functions.
For MRI-positive patients, the chance that the patient will be seizure-free
2
after surgery is 66%, whereas for MRI-negative patients it is only 29% [57].
It is estimated that 70-80% of cases with FCD escape visual MRI inspection
[11, 96] (i.e., are MRI-negative).
Despite a growing number of studies demonstrating that resective surgery
is effective for TRE patients whose main indication is FCD, it remains under-
utilized [9]. This is especially true for MRI-negative patients. Not only are
these individuals less likely to be referred to specialized epilepsy center by neu-
rologists [38], but many epilepsy specialists are reluctant to operate without
a well-defined lesion. In this thesis we describe and evaluate novel machine
learning algorithms that have higher sensitivity for identifying FCD lesions
in MRI data than other reported computational methods for FCD lesion de-
tection [14, 96, 43], resulting in an increased detection rate for MRI-negative
patients during their pre-surgical evaluation. The ultimate impact of this re-
search is enhanced utilization of the resective surgical procedure leading to
better quality of life for FCD patients.
1.2 Machine Learning For Lesion Detection
We use surface-based morphometry (SBM) [24] to extract a surface model from
the structural MRI scans of the patients. SBM represents the cortex as a two
dimensional folded sheet embedded in a three dimensional space [35]. Techni-
cally, the folded sheet is represented as a triangulated surface, and each vertex
on the surface can be characterized by different morphological features such
as cortical thickness [33], curvature, etc. Using SBM, we can accurately align
the extracted surfaces of different individuals such that there is a one-to-one
correspondence among the cortical regions of different individuals. This align-
ment plays a crucial role in comparing the regions among individuals as there
3
is considerable inter-subject variation in brain morphology based on different
demographic factors such as age, gender, handedness, level of education, etc.
By aligning the brains of different individuals to a common surface, we can
correct for inter-subject variation by matching exact locations among different
brains.
The machine learning task is to train learning algorithms to distinguish
between normal and lesional vertices on the extracted surfaces. To this end we
use training data from healthy controls and FCD patients (both MRI-positive
and MRI-negative) who underwent surgical resection and neuropathological
examination of the resected tissue showed evidence of FCD.
The task as described above, seems straight-forward and fits nicely within
the supervised learning framework for binary classification. We can train a
classifier by collecting the vertices from the patient’s affected area as posi-
tive instances and the corresponding vertices from the healthy controls would
serve as negative instances. However, there are a number of confounding fac-
tors arising from human subjectivity and data complexity which if not prop-
erly addressed will result in classifiers that have low sensitivity. The major
confounding factors in the data include:
Label Noise: The diagnostic methodology in the absence of an MRI-visible
lesion rests on the accurate placement of iEEG electrodes and subsequent
analysis by a surgical board. For MRI-negative patients the absence of a visu-
ally detected target, negatively impacts the accuracy of electrode placement
which in turn significantly undermines the surgical outcome [88, 89]. The goal
of resective surgery is to remove the entire lesion. If any part of the lesion is left
behind, the outcome will not be successful. This introduces false positives be-
cause the margins around the lesion/resection area tend to be “generous”. The
problem is more pronounced in MRI-negative patients because in the absence
4
of an MRI-identified target, abnormal vertices are delineated by the extent of
the tissue removed in surgery. The resected tissue may include a gradation
from abnormal to normal tissue. In addition, there are false negatives outside
the resected regions of patients, which arise due to lifetime seizure burden
leading to cortical abnormalities or the presence of additional developmental
lesions that are not epileptogenic.
Anatomic Complexity: The anatomic complexity and heterogeneity in folded
cortical tissue reduces the ability to discriminate lesional tissue from normal
cortex, which is one of the reasons why a large number of lesions remain elusive
to human perception in routine radiological evaluation [93, 44]. Recent studies
have shown that subtle FCD lesions occur with higher frequency at the bottom
of the folded regions [41]. Similarly, the distribution of different features such
as cortical thickness and gray-white contrast (GWC) exhibits a covariate shift
based on where the region is located within the folded cortex [17].
Chapter 2 provides a detailed description of SBM. It also provides an
overview of the different FCD lesion detection schemes that utilize SBM. In
Chapter 3 we develop a FCD lesion detection mechanism that is tailored specif-
ically to counter the confounding factors outlined above. The empirical results
show that higher detection rate can be achieved for MRI-negative patients by
appropriately addressing the domain idiosyncrasies as compared to a recently
reported detection approach [96].
1.3 Intra-cranial EEG as Auxiliary Supervi-
sion
Our second approach, investigates iEEG as an auxilliary source of labels in
the supervised learning framework to augment the noisy vertex labels. Re-
5
call that, the main confounding factor in the data for FCD lesion detection is
label noise, that arises when the entire resected tissue is treated as an FCD le-
sion. However, for patients who have undergone surgery, and are subsequently
seizure free, the resection zone can be regarded as a source of weak (noisy)
supervision. In addition to the resection zones, we also have the results of the
iEEG analysis for MRI-negative patients. Our second approach investigates
the incorporation of iEEG as an auxiliary source of supervision, that can be
used to mitigate the effects of label noise when only resection zones are used
as ground truth.
Before undergoing resective brain surgery, all patients are subjected to an
invasive intracranial EEG (iEEG) exam. In this exam subdural electrodes are
implanted on the cortical surface to record electrical activity [112]. A board of
certified epileptologists reviews this information to determine the region that
is responsible for generating the seizure (i.e., the seizure onset zone). To isolate
the abnormal region, each electrode is labeled as being part of the seizure onset
zone or not. iEEG has been shown to be highly effective for localizing FCD
lesions [65]. However, for MRI-negative patients there is no visible lesion to
guide precise electrode implantation, which results in sampling errors. In such
cases the identified abnormal region fails to capture the lesion in its entirety
in about 40% of the cases [43]. Similarly, the subdural electrodes are unable
to record electrical activity from the bottom of the sulci. In cases where the
lesion is located at the bottom of the sulcus, iEEG analysis will not be effective
in locating the seizure onset zone [13]. Therefore, the outcome of the iEEG
analysis constitutes another source of weak supervision.
The output of an iEEG analysis consists of labeling each electrode as: part
of the seizure onset zone; active during the initial stages of seizure onset; or ac-
tive toward the end of the seizure. Based on these classifications, we interpret
6
the electrode labels as the output of a pairwise ranking function. This means
that the vertices that fall within the range of an electrode labeled as being part
of the seizure onset zone would be considered more positive (will have a higher
rank) than vertices covered by electrodes that have a different label. However,
the criteria used by the epileptologists for classifying electrodes depends on
a number of patient-specific factors [81], seizure morphology and semiology
[101], etc. Therefore, the underlying semantics of the pairwise ranking func-
tion varies from one patient to another. Along with the inter-patient variabilty
of iEEG assessment, the morphology of the human brain such as its thickness,
curvature and the overall structure in general are affected by different demo-
graphic factors such as age, gender and education [84, 83]. Because the data
of each patient has its own unique morphological characteristics, treating the
data from all the patients in an identical manner will lead to poor classification
accuracy.
To model inter-patient variability, both in terms of brain morphology and
iEEG-based electrode ranking, we treat each patient as a separate learning
task, and learn a joint classifier using the multitask learning framework [22].
To this end, we use the patient’s MRI to isolate the resected region (posi-
tive instances) and extract the same region from an age and gender matched
healthy control (negative instances). The positive labels provided by the re-
section zones are augmented with the ranking information provided by iEEG.
To utilize ranking information as an auxiliary source of supervision, we extend
the regularized multitask learning framework [30, 31] to learn a common clas-
sifier across the training subjects. This classifier can then be used to detect
FCD lesions in new patients i.e., who have not yet undergone iEEG electrode
placement. We evaluate the proposed technique on a dataset comprised of the
individual resection zones of patients and the corresponding cortical regions
7
from matched controls. Using this combined supervision, our proposed multi-
task learning approach detects abnormal regions within the resection zones for
all fifteen MRI-negative patients included in the dataset, albeit with a higher
false positive rate, as compared to other supervised learning methods that
included our vertex based method (c.f. Section 1.2) which correctly detected
lesions in 60% patients, and another recently reported supervised approach
[42] which achieved a detection rate of 73%. Chapter 4 provides the technical
details of this approach along with experimental results.
1.4 Identifying Lesions As Outliers
Our third method for lesion detection overcomes the effects of label noise by
formulating FCD lesion detection as an outlier detection problem. To, this end
we define a cortical lesion as a region that would be considered an outlier when
compared to the same region across a control cohort. Using this approach we
are able to bypass the use of noisy vertex labels to train a classifier.
Unlike other neurological disorders that affect a particular region of the
cortex such as Autism [71], Schizophrenia [79], etc., FCD lesions can occur
anywhere in the cortex and have variable size. In order to minimize the chances
of missing subtle lesions on the cortical surface we model lesion detection as
a multi-scale salient object detection problem using hierarchical conditional
random fields (HCRF) [78, 75]. In our case the saliency of the object is defined
by it’s degree of “outlier-ness”.
We employ image segmentation to isolate sub-regions of the cortex that
have similar morphological properties. Instead of segmenting the image at
a single scale we segment the image at different scales to obtain sub-regions
of varying size. Each sub-region is given an outlier score by comparing it to
8
the same region extracted from the control population. Finally, these outlier
scores are combined across the different scales using a tree structured condi-
tional random field. The final outlier scores are then thresholded to obtain the
detected lesion(s).
HCRFs have been used previously for object detection and semantic image
labeling for which they require accurate pixel-level labels. The accuracy of the
HCRFs in these domains is highly sensitive to label noise, and in most cases
the pixel-level labels need to be refined manually to obtain accurate results. In
our proposed formulation, we have extended the HCRF framework for binary
object detection/segmentation for which only image captions are available.
In our case the image captions correspond to whether a brain is healthy or
diseased. A caveat to this contribution is that the images must be able to
be accurately registered such that a one-to-one correspondence can be made
between sub-regions.
The HCRF-based outlier detection scheme was able to achieve a detection
rate of 75% for twenty MRI-negative patients as compared to our vertex-
based scheme [1] that achieved a detection rate of 55% and another baseline
approach that achieved a detection rate of 60%. For MRI-positive patients
the HCRF-based method achieved a detection rate of 92%, as compared to
the baseline which detected the lesion in 85% of the patients. As compared
to the baselines the HCRF-based method was able to achieve higher recall
and precision for both MRI-positive and MRI-negative patients. Chapter 5
provides the technical details of the HCRF-based outlier detection scheme for
FCD lesion detection and detailed experimental results including a comparison
of its performance with an expert neuroradiologist.
9
1.5 Thesis Contributions
The main focus of the research presented in this thesis is the development and
evaluation of automated methods for detecting cortical malformations in MRI-
negative epilepsy patients. As a first step we identify the main confounding
factors that result when the training data consists of MRI-negative patients.
Next, we develop two novel machine learning methods: a regularized multi-
task learning (MTL) method with auxiliary supervision from iEEG analysis
and hierarchical conditional random fields (HCRF) for outlier detection. In
separate evaluations, both methods were able to achieve superior performance
as compared to recently reported methods in the lesion detection literature.
Keeping in mind that experienced neuro-radiologists were unable to visu-
ally locate the lesion in MRI-negative patients, the high detection rate of the
proposed methods shows that some electrophysiologically and histopatholog-
ically abnormal cortical regions are not visually apparent to the human eye
but can be detected with the aid of machine learning methods. Furthermore,
incorporating automated lesion detection methods in the pre-surgical evalua-
tion protocol can enhance the chances of detecting the lesion prior to surgery,
leading to a higher number of patients being referred to resective surgery.
1.6 Roadmap
The rest of the thesis is organized as follows. Chapter 2 provides a brief intro-
duction to surface-based morphometry and a review of different approaches to
lesion detection that utilize surface-based morphometry. We develop an initial
supervised vertex-based lesion detection method that addresses the presence
of all the confounding factors that we identify for this data in Chapter 3. In
Chapter 4 we develop methods of regularized multitask learning with auxiliary
10
supervision, which lead to the incorporation of iEEG data for mitigating the
effects of label noise for supervised learning. Chapter 5 describes and evaluates
hierarchical conditional random fields (HCRF) for outlier detection and their
application to detecting FCD lesions. Chapter 6 discusses future avenues of
research in this domain and the concluding remarks for this research.
11
Chapter 2
Focal Cortical Dysplasia and
surface-based Morphometry
“If the human brain were sosimple that we could understandit, we would be so simple that wecouldn't”
Emerson M. Pugh
Epilepsy affects around 50 in 100,000 people every year, and a third of them
have medically intractable seizures i.e., their seizures cannot be controlled
through medication [55]. Treatment resistant epilepsy (TRE)1 carries the risks
of premature death, seizure-related injuries, social isolation and an overall low
quality of life [56]. For TRE patients, surgical resection of the affected cortical
region is the only treatment and usually their last hope for leading a normal,
seizure-free life. Focal cortical dysplasia (FCD), a malformation of cortical
development (MCD), is the most common epileptogenic lesion in children and
the third most common in adults with TRE [45, 92].
1Also known as drug-resistant epilepsy.
12
2.1 Focal Cortical Dysplasia
Focal cortical dysplasia (FCD) represents a group of structural disorders re-
sulting from malformations of cortical development (MCD). MCD characterize
structural and metabolic abnormalities of the brain that occur during gesta-
tion. About 25% of all reported cases of epilepsy are caused by MCD [110].
In all such cases, FCD is the most prevalent etiology accounting for 45% of
the cases [110, 76].
FCD is classified into three subtypes [18, 7]:
1. FCD Type I: is caused by abnormal neuronal migration.
2. FCD Type II: results from abnormal neural proliferation.
3. FCD Type III: defines lesions accompanied with hippocampal sclerosis
and tumors.
Surgical resection of the dysplastic brain tissue is the only treatment for
FCD-based TRE patients, and a successful outcome results in complete seizure
freedom for the patient. The success of the surgical procedure rests on the iden-
tification and delineation of the full FCD lesion during pre-surgical evaluation,
which currently involves an expert visual inspection of the patient’s MRI. The
chances of a successful surgical outcome in the presence of a visually detected
lesion are 66% as compared to only 29% when the lesion is not detected dur-
ing pre-surgical MRI evaluation [57, 99]. Recent advances in neuroimaging
technology especially MRI have revolutionized the detection and evaluation of
structural lesions associated with FCD, this in turn has led to higher success
rates for resective surgery [95]. However, approximately 45% of histologically
confirmed FCD lesions go undetected during visual inspection [110].
A successful surgical outcome depends on the complete removal of the
FCD lesion detected on the patient’s pre-surgical MRI [92]. In some cases,
13
even with a visually identified FCD lesion, sugery is not feasible as the lesion
overlaps with the eloquent cortex, which represents the cortical regions that
are mainly responsible for sensory, linguistic and motor processing. Hence,
before identifying the target for resection, all patients are subjected to an
invasive intracranial EEG (iEEG) exam, to accurately identify the extent of the
lesion and also to map the eloquent cortex. In this exam subdural electrodes
are implanted on the cortical surface to record electrical activity [112]. A
board of certified epileptologists reviews this information to determine the
region that is responsible for generating the seizure i.e., the seizure onset zone.
iEEG has been shown to be effective for localizing FCD lesions [65]. However,
for MRI-negative patients there is no visible lesion to guide precise electrode
implantation, which results in sampling errors. In such cases the identified
abnormal region fails to capture the lesion in its entirety in about 40% of the
cases, leading to poor surgical outcomes [99]. Therefore, patients who lack an
MRI-visible lesion are less likely to be referred to a specialized epilepsy center
by neurologists [38] and many epilepsy specialists are reluctant to operate
without a well-defined lesion. For these reasons, resective surgery remains
underutilized, despite a growing number of studies demonstrating that surgery
is effective for patients with focal TRE [9].
The relative inability to locate subtle FCD lesions on structural MRI scans
has lead to the development of mathematical and computational models of
brain’s morphology such as its shape, folding patterns and tissue character-
istics derived from structural MRI. These models facilitate comparisons of
cortical structures among different brains, and help in quantifying disease and
variability patterns. The models and the resulting algorithms are collectively
known as morphometry. A number of different morphometric algorithms exist
such as voxel-based morphometry [4], sulcal morphometry [80, 44] and surface-
14
based morphometry [24]. Next, we describe surface-based morphometry and
imaging biomarkers that are used by neuroradiologists to identify FCD lesions.
2.2 Surface-Based Morphometry
The cortical surface represents the outer layer of the brain modeled as a folded
two-dimensional surface in three-dimensional space. Even with optimized im-
age acquisition, identifying and delineating FCD lesions is highly dependent
on reviewer expertise. The rate of FCD lesion detection by non-expert and
expert neuroradiolgists range from 39%-50% [107]. Similarly, certain image
biomarkers of FCD, such as subtle abnormalities in cortical curvature and sul-
cal/gyral patterns may not be easily identifiable on planar MRI slices [93, 8].
In such cases computational models of the cortex derived from structural MRI
have shown to increase the sensitivity of locating FCD lesions [14, 44, 96].
Surface-based morphometry (SBM) is one such methodology, which pro-
vides the means to characterize and analyze the human brain by explicitly
modeling the cortex using a suitable geometric model [24], using structural
MRI scans. Modeling the brain using explicit surface models has advantages
of reaching sub-millimeter accuracy in measuring morphological features [33],
more precise registration [37, 50] and high sensitivity of identifying differences
in morphological features [59]. SBM has been used successfully for analyzing
and detecting neurological abnormalities in various neurological disorders such
as Schizophrenia [79], Autism [71], and Epilepsy [96, 43].
2.2.1 Surface Reconstruction
Structural T1-weighted MRI scans are used to extract the cortical surface by
delineating the boundary between the gray and white matter [24]. This process
15
Figure 2.1: (Top): Results of automatic segmentation and classification ofwhite matter voxels on an MRI volume to locate the gray/white matter bound-ary (yellow) and the pial surface boundary (red). (Bottom): Three differentsurface models obtained from the surface reconstruction phase.
is referred to as surface reconstruction [24], and involves: (i) segmentation of
the white matter, (ii) tessellation of the gray/white matter (GWM) boundary,
(iii) inflation of the folded surface, and (iv) correction of topological defects.
Once the surface is reconstructed it is further refined by classifying all white
matter vertices in the MRI volume to create the GWM boundary. The GWM
boundary is delineated up to sub-millimeter accuracy by further refining the
white matter surface. After refining the gray/white matter boundary the pial
surface is located by deforming the surface outward [35]. The reconstructed
surface is represented as a triangulated mesh and at each vertex different
morphological features can be estimated to characterize the cortex. It should
be noted that the spatial resolution of the reconstructed surface is different
from that of the original MRI volume. Figure 2.1 shows the results of surface
reconstruction on a subject’s MRI along with the resulting surface models.
16
Figure 2.2: Inter-subject registration using surface-based morphometry. (a):Mapping the curvature values from the pial surface to a sphere. (b): Aligningthe spheres to a group average sphere, by matching the curvature on a vertex-by-vertex basis. (c): Transforming the aligned sphere back to a surface model.
2.2.2 Registration
The reconstructed surface is closed at the brain stem, and can be geometrically
regarded as a sphere [35]. Different morphological transforms can be applied
to register the cortical surface to a standard surface also known as a group-
atlas. Registration is achieved by aligning specific sulcal and gyral patterns
across the reconstructed cortical surfaces while minimizing metric distortion.
Figure 2.2 shows the different steps involved in the registration process. The
use of strucural landmarks to guide the registration process results in a more
accurate alignment among different brains [50], which in turn allows more
precise comparisons of individual cortical structures across subjects [36].
2.2.3 Morphological Features
In this work, we use five morphological features to characterize the cortex:
17
Figure 2.3: Summary of the five morphometric features estimated at eachcortical vertex. A. shows the pial surface (blue) and the white matter surface(pink) on the underlying MRI, B. cortical thickness, C. gray-white contrast,D. mean curvature, E. suclal depth/gyral height, and F. Jacobian distortion.
1. Cortical thickness represents the thickness of the cortex which is de-
fined as the distance between the gray/white matter boundary and the
outermost surface of the gray matter (pial surface). It is calculated at
each vertex using an average of two measurements [33]: (a) the shortest
distance from the white matter surface to the pial surface; and (b) the
shortest distance from the pial surface at each point to the white matter
surface.
2. Gray/white-matter contrast (GWC) represents the degree of blurring at
the gray/white-matter boundary. GWC is estimated by calculating the
non-normalized T1 image intensity contrast at 0.5mm above and below
the gray/white boundary with trilinear interpolation of the images. The
range of GWC values lies in [− 1, 0], with values near zero indicating a
higher degree of blurring of the gray/white boundary.
18
3. Curvature is measured as 1r, where r is the radius of an inscribed circle
and mean curvature represents the average of two principal curvatures
with a unit of 1/mm [74]. Mean curvature quantifies the sharpness of
cortical folding at the gyral crown or within the sulcus, and can be used
to assess the folding of small secondary and tertiary folds in the cortical
surface.
4. Sulcal depth characterizes the folded structure of the cortex. It is esti-
mated by calculating the dot product of the movement vectors with the
surface normal [35], and results in the calculation of the depth/height of
each point above the average surface. The values of sulcal depth lie in
the range [− 2, 2] with lower values indicating a location in the sulcus
whereas higher values indicate a location on the gyral crown.
5. Jacobian distortion measures the distortion at each vertex during regis-
tration. In the registration process, as defined above, each subjects gyral
and sulcal features are aligned by warping the entire brain to a spheri-
cal average surface (i.e., the standard brain). During this process, each
vertex is subjected to a nonlinear spherical transform. Jacobian distor-
tion measures the magnitude of the nonlinear transform at each vertex
needed to warp each vertex on the subjects brain to a target vertex on
the average surface [36]. It is a measure of global brain deformation and
has been used at the vertex level for the detection of abnormal cortical
regions in autism [28].
Figure 2.3 illustrates the estimation of the morphological features using SBM.
19
2.3 Radiological Features of FCD Lesions
Typical MRI features of FCD include cortical thickening or thinning, blur-
ring of the gray-white matter boundary, increased signal intensities on Fluid-
attenuated inversion recovery (FLAIR) and/or T2-weighted images, a trans-
mantle stripe of T2 hyperintensity, and localized brain atrophy [67]. Below
we describe the efficacy of each of the previously mentioned morphological
features in identifying FCD lesions from a diagnostic imaging perspective.
Cortical Thickness: Thickening of the cortex is reported in 50-92% of FCD
cases [92, 10]. Cortical thickening results from the presence of balloon cells
(FCD type II) and is usually found in conjunction with blurring of the GWM
boundary. It has been reported as the most sensitive feature for automated
methods of detecting FCD lesions specially in Type-II patients [96, 43, 2].
GW Contrast: Blurring of the GWM boundary is another common finding
in MRI-positive patients, reported in 60-80% of FCD cases [92]. High levels
of blurring is observed mostly in FCD type-II patients due to the presence
of immature balloon cells and neuronal hypertrophy [97]. Cortical thickening
combined with blurring of the GWM boundary were found in approximately
64% of FCD type-II patients [64].
Sulcal depth and curvature: Subtle changes in sulcal depth and curvature are
difficult to observe and assess on planar MRI slices [8]. However, FCD le-
sions have been associated with varying degrees of sulcal and curvature based
anomalies [8]. Hong et al. [43], found sulcal depth to be helpful in identifying
FCD type-II lesions, however in the same study sulcal depth was also respon-
sible for generating the most extra-lesional clusters (detections deemed as false
positives based on expert-marked lesions).
Overall, 45% of histologically confirmed FCD lesions go undetected dur-
ing visual inspection of the MRI [110], which besides other factors can be
20
attributed to the anatomical complexity of the folded structure of the cortex.
For example, about 80% of FCD lesions located deep within the sulcus cannot
be detected through visual inspection [12]. Similarly, 87% of FCD type-I cases
[94, 53] and 33% of FCD type-II cases [94, 53] have been reported as having
normal MRI (MRI-negative). This makes FCD the most common histopatho-
logical finding in focal epilepsy patients with no visible lesion.
2.4 Computational Methods for Detecting FCD
using surface-based Morphometry
In this section we first define the related work with regard to automated tech-
niques of FCD lesion detection. We then discuss the critical limitations of
existing approaches. We provide the current computational methods of FCD
lesion detection that specifically use surface-based morphometry. For methods
that do not use SBM please see the recent and comprehensive surveys provided
in Bernasconi et al. [11], Kini et al. [49], and Duncan et al. [27].
Besson et al. [14], use a combination of surface and texture based features
to represent each vertex on the surface. They use cortical thickness, curvature
and sulcal depth along with gray-white contrast and T1 signal hyperintensity.
A four-layer neural network was trained to detect abnormal vertices using
leave-one-subject-out cross-validation. The dataset consisted of nineteen MRI-
positive patients who had “small” FCD lesions. The neural network based
classifier was able to detect abnormal regions within the expert-marked lesions
of 95% patients. A second fuzzy k-nearest neighbors classifier was used to
further refine the results and reduce the false positive rate. For this purpose,
each detected cluster was represented by the mean and standard deviation of
the individual features. The final detection rate after post-processing by the
21
second level classifier was found to be 68%.
Hong et al. [43], developed a two-stage Fisher linear discriminant analysis
(LDA) [16] classifier to detect FCD type-II lesions in patients who were radio-
logically classified as MRI-negative during their pre-surgical assessment. The
lesions were however identified on the pre-surgical MRI scans after surgery and
were traced manually by an expert using texture-based maps. Therefore, as far
as the learning algorithm is considered the patients were MRI-positive. Each
vertex was represented using cortical thickness, sulcal depth, curvature, gray-
white contrast and relative intensity from the T1-weighted MRI volume. A
leave-one-subject-out evaluation strategy was used, to assess the performance
of the lesion detection scheme. As a first step, a vertex-level LDA classifier
was used to classify each vertex on the reconstructed cortical surface as being
lesional or non-lesional for both controls and patients. These detections were
then further refined using a second LDA classifier that was trained to discrim-
inate between actual FCD lesions (detections made inside the manually traced
resection zones of patients) and spurious lesional detections made on controls.
For secondary classification, each cluster was represented by the mean and
standard deviation of the original individual features. The proposed scheme
was able to detect abnormal regions that co-localized with the expert-marked
lesions in 14/19 (74%) patients.
Thesen et al. [96], used a semi-supervised uni-variate z-score based thresh-
olding approach on registered SBM data of MRI-positive patients to classify
each vertex as being lesional or normal, using cortical thickness, GWC, curva-
ture, sulcal depth and Jacobian-distortion, individually. The dataset consisted
of eleven MRI-positive patients with five having FCD as the primary indica-
tion. They nominate cortical thickness along with GWC as being the most
informative features for FCD lesion detection in MRI-positive patients. By
22
combining results from cortical thickness and GWC the lesion was correctly
detection in ten out of the eleven patients.
Most of the techniques mentioned above deal either with MRI-positive
patients [14, 96] or patients who were initially deemed MRI-negative during
their preliminary radiological screening, but later their lesions were found to
visible on MRI [43]. In contrast to these studies, our data includes pure MRI-
negative patients whose lesions are not visible on their MRI, but their resected
tissues have been histologically verified to contain FCD.
The goal of resective surgery is to remove the entire lesion. If any part of
the lesion is left behind, the outcome will not be successful. This introduces
label noise, because the expert-marked lesion can contain normal vertices; the
margin around the lesion is marked in a “generous” manner so as to increase
the chances of capturing the entire lesion. Chapter 3 provides empirical evi-
dence that label noise needs to be mitigated particularly when MRI-negative
patients are part of the training data. A possible way to eliminate label noise
would be train exclusively on MRI-positive patients. However, the features
that characterize the lesion in MRI-negative vs MRI-positive patients may not
be concordant. For example, in FCD type-I (a high proportion of MRI-negative
patients have type-I lesions [49]) the abnormal features such as sulcal-depth
and curvature are hard to interpret on planar MRI slices [93, 8]. Therefore,
training exclusively on MRI-positive patients will limit the classifier’s detec-
tion ability. In Chapter 4 we take a different approach to eliminating label
noise and augment the weak labels provided by the marked resection with the
results of iEEG evaluation. We develop an outlier detection method based on
hierarchical conditional random fields (HCRF) in Chapter 5, that overcomes
label noise by posing FCD lesion detection as an outlier detection problem,
and does not utilize the resected regions as ground truth for training.
23
Most lesion detection methods cited previously, typically employ a post-
processing method to reduce the false positive rate. In this strategy a portion
of the vertices labeled lesional by the classifier are relabeled as normal. This
can be done by training a second-level classifier to classify the detected clusters
as lesional or non-lesional [14, 43]. Similarly, different heuristics can also be
used such as the surface area of the detected clusters [96]. Discarding any
detected region based on its size or surface area can result in discarding the
actual lesion or part of the lesion, because FCD can be located in any part of
the cortex, is highly variable in size, and may occur in multiple lobes [18]. In
Chapter 5 we develop a ranking methodology which ranks the detected clusters
based on a combination of their surface area and degree of abnormality. This
strategy bypasses the need to discard any findings and instead provides the
radiologist with multiple findings, that can be assessed visually or using iEEG.
24
Chapter 3
A Vertex-Based Classifier
“The combination of some dataand an aching desire for ananswer does not ensure that areasonable answer can beextracted from a given body ofdata”
John W. Tukey
In this chapter we develop a lesion detection scheme to classify each ver-
tex on the coritcal surface as “lesional” or “normal”. To this end, we use
labeled training data comprising of healthy controls and histopathologically
verified MRI-positive and MRI-negative patients who have undergone resec-
tive surgery. The classifier developed here highlights the idiosyncrasies of this
data that directly impact the design of a lesion detection scheme. From a
supervised learning perspective there are three main challenges that must be
addressed to develop an effective classifier:
1. Class label noise arises due to the subjectivity involved in identifying
and delineating the lesions in both MRI-positive and the MRI-negative
patients resulting in a significant number of false positives in data (much
more so for MRI-negative patients than MRI-positive patients). Label
noise is further aggravated by the presence of false negatives in the extra-
25
lesional (outside the resected regions) vertices of patients. This happens
because dysplastic regions can develop due to a number of causes such
as prolonged untreated epilepsy.
2. Anatomical complexity of the folded structure of the cortical surface re-
duces the discernability between dysplastic and normal tissue, and is
one of the main reasons why a large number of lesions remain elusive in
routine visual MRI evaluation [94].
3. Class imbalance results from the relatively low ratio of lesional vertices
to that of normal vertices for a particular patient, which is further com-
pounded by the higher availability of healthy control data as compared
to patient data.
This chapter explores the development of a vertex-based classifier that is de-
signed to explicitly address these issues.
3.1 Eliminating Label Noise
Label noise arises because the expert-marked lesion for MRI-positive patients,
and the resected tissue for MRI-negative patients can contain normal tissue
along with lesional tissue, causing normal tissue to be labeled as lesional. A
second source of false positives stems from the goal of resective surgery, which is
to remove the lesion in its entirety. Incomplete removal of the lesion can lower
the chances of a patient being seizure-free after resective surgery from 66% to
29% [92, 57]. This introduces false positives because the margins around the
lesion/resection area tend to be “generous”. The problem is more pronounced
in MRI-negative patients because in the absence of an MRI identified target,
abnormal vertices are delineated by the extent of the tissue removed in surgery.
26
The resected tissue may include a gradation from abnormal to normal tissue.
From a supervised ML perspective, treating all the resected vertices in the
case of MRI-negative patients as being lesional introduces false positives into
the training data, which can adversely affect classifier accuracy.
3.1.1 Removing False Positives
To ameliorate the impact of false positive label noise we pre-process the train-
ing data by manually reducing the lesion for both MRI-negative and MRI-
positive patients. The strategy is to eliminate those vertices from the lesional
regions that are not significantly different from the vertices outside the lesion.
In order to define the notion of “significance” we compare the distribution of
the normalized values of a morphological feature such as cortical thickness,
curvature, etc., for the vertices inside the labeled lesion/resection area to that
of the vertices outside the labeled lesion/resection area. Based on the assump-
tion that the lesion/resection area contains cortical structures characterized
by cortical malformation, we want to identify and select vertices within the
lesion/resection that are significantly different from the average feature values
outside the lesion/resection i.e., normal cortex.
Figure 3.1 shows an example of mask reduction when cortical thickness is
used characterize the cortex for an MRI-positive patient. It can be seen that
the patient has abnormal thinning in the expert-marked lesion, therefore, we
would like to select the vertices from the lesion that are in the left tail of the
distribution, at the same time ensuring that the sampling region has mini-
mal overlap with the extra-lesional (outside the resection/lesion) distribution
(marked by τthin in Figure 3.1). As a patient can have both abnormally thick
and thin values, we calculate two thresholds for each subject namely τthin and
τthick. These two thresholds can be seen as selecting only those vertices from
27
−6 −4 −2 0 2 4 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Cortical Thickness (Z−Scores)P
(X)
Normal
Lesional
τthin
Figure 3.1: Manually calculating τthin for an MRI-positive patient who hascortical thinning in his lesion area. τthick is undefined for this patient becauseno vertices in the lesional area are significantly thicker than the vertices outsidethe lesion.
the marked lesion, that can be regarded as outliers when compared to the
tissue outside the lesion.
We have selected to work with cortical thickness, which is one of the most
informative features for characterizing FCD lesions [96]. First the thickness
measurements across all the vertices for a given subject are standardized using
first-order statistics calculated at each vertex from the controls. This vertex-
based normalization is done after registering all the controls and subjects to
the average surface. In cases where the lesional thickness density has heavier
tails than the non-lesional density on both sides we can calculate both τthin
and τthick. In patients where the structural abnormalities are characterized
only by cortical thinning or cortical thickening as is the case in FCD type-1
or type-II respectively [18], only one of the right or left tails of the lesional
density would be heavier in which case only one threshold can be calculated
and the other remains undefined. Similarly, for some patients we cannot find
appropriate thresholds, which occurs as a result of undetected abnormalities
in the non-lesional area due to factors other than epilepsy (e.g., head trauma,
long-term untreated epilepsy, etc.). If we are unable to detect any appropriate
28
threshold(s) then that particular subject contributes no vertices to the training
data.
The lesion reduction procedure is applied only to training data; for a test
subject the classifier is evaluated on all vertices and none of the vertices are left
out. This selective procedure of isolating vertices in the hope of eliminating
label noise is similar to the global and maximum difference approach taken in
[111] where the normalized gray matter difference image for a patient was used
to minimize the overlap between his/her lesional and non-lesional vertices.
3.1.2 Removing False Negatives
There are also false negatives in the non-lesional areas of patients, which can
arise because a lifetime seizure burden of a given patient can lead to cortical
abnormalities outside the seizure onset zone [63, 66] and/or the possibility of
additional non-epileptogenic dysplastic lesions [32]. In addition, patients who
are suffering from epilepsy due to developmental factors may have additional
lesions that are either not epileptogenic or have latent epileptogenicity [32].
Based on these considerations, we did not include non-lesional vertices from
the subjects as negative instances in our training data. Instead, all negative
instances were taken from a cohort of 62 healthy controls.
3.2 Reducing Cortical Complexity
The folding of the cortex varies across individuals and hinders the visibility of
subtle FCD lesions hidden deep within the folds. Recent studies have shown
that subtle FCD lesions tend to occur more frequently at the bottom of the sul-
cus [41]. Similarly, different sulcal levels have different thickness and GWC dis-
tributions [17], indicating that there are three distinct sub-populations of the
29
vertices. Given these insights we quantize the data into three non-overlapping
levels, where a sulci depth in the range [−2,−1) represents the sulcus, (1, 2]
represents the gyrus and vertices having sulci depth of [−1, 1] are labeled as
wall vertices.
Using the above mentioned stratification technique we calculate the two
thresholds τthin and τthick per sulci level for eliminating false positives (c.f.
Section 3.1.1). This results in a total of six distinct thresholds which may
or may not exist for a particular patient as explained previously. We train
two separate classifiers for each sulci level: one to detect cortical thickening
and one to detect cortical thinning. Although, we use cortical thickness to
reduce the lesion/resection region, the classifiers utilize four features: cortical
thickness, gray/white contrast, cortical curvature and Jacobian distance to
represent each vertex.
3.3 Overcoming Class Imbalance
There are far fewer lesional vertices than non-lesional vertices which, if not
addressed, can lead to a classifier that labels each vertex as non-lesional as this
maximizes classification accuracy [47]. Recall that we obtain “non-lesional”
vertices for our training data from the set of healthy controls. The number of
available controls is higher than the number of patients who have undergone
resective surgery, because only a few patients proceed to surgery when no
visible lesion is found on their MRI. This results in class imbalance where the
number of normal instances considerably outnumbers the positive (lesional)
instances.
To counter the effects of class imbalance, we use bagging coupled with
under-sampling, which has been shown to work well both empirically and
30
Figure 3.2: Different steps involved in the (A) training and (B) test phase ofthe vertex-based classifier. Note that, the lesion reduction step is applied onlyto the training patients. For a test subject we calculate two labels per vertex:one from each thick/thin classifier. The final label of the vertex is calculatedas the maximum of both predicted labels.
theoretically for imbalanced datasets [108]. Each one of our six classifiers is
replaced by a bag of ten classifiers. Within a bag, each classifier is trained on
all the lesional vertices and an equal number of randomly sampled negative
instances. To classify a vertex as lesional or non-lesional we first use its sulcal
depth to choose the two correct bags of classifiers: one for detecting cortical
thickening and the other for detecting cortical thinning.
We have chosen to work with logistic regression [16], which is a linear
classification algorithm. We selected logistic regression based on its relatively
fast training time and because it outputs a classification score that can be
interpreted as label probabilities. The final prediction for a bag is obtained
by taking a majority vote of the ten in-bag logistic regression classifiers. The
final label of the vertex is calculated as the maximum of both predicted labels.
Figure 3.2 illustrates the overall classifier design during training and testing.
31
3.4 Empirical Evaluation
We tested the vertex-based classifier defined here on a sample of 31 patients,
24 of which were MRI-negative and 7 MRI-positive. All subjects were selected
from a large registry of patients with epilepsy treated at the New York Univer-
sity School of Medicine Comprehensive Epilepsy Center who signed consent
for a research MRI scanning protocol. Criteria for inclusion in this study in-
cluded: (1) completion of a high resolution T1-weighted MRI scan; (2) surgical
resection to treat focal epilepsy; (3) diagnosis of FCD on neuropathological ex-
amination of the resected tissue. Demographic and seizure-related information
for these participants is provided in Appendix A. In addition, MRI scans us-
ing identical imaging parameters from a total of 62 neurotypical controls were
acquired (31 females/31 males; ages 17 − 65; mean age = 33; SD = 12.5).
Exclusion criteria for the control group included any history of psychiatric or
neurological disorders.
3.4.1 Data Preprocessing
The reconstructed cortical surfaces of all the subjects and controls were reg-
istered to an average surface. Furthermore, the feature values at each vertex
were z-score normalized based on first and second order statistics calculated
across the control population. Normalization of the feature values plays a vital
role in mitigating the effects of inter-personal variation in cortical morphology
resulting from different demopraphic factors such as age, gender, etc., that can
lead to high number of false positives [96].
32
3.4.2 Training
All positive (lesional) instances consisted of vertices located in the manually
reduced lesion/resection zone (c.f. Section 3.1) of both MRI-positive and MRI-
negative training subjects. The corresponding vertices from the controls were
included in the training data as negative (non-lesional) instances. This train-
ing data was partitioned into three distinct subsets based on sulcal depth.
Based on the two thresholds calculated for each subject: τthin and τthick, we
further decompose each of the three initial subsets into two non-overlapping
sets corresponding to thin and thick vertices. Thus, in our data stratification
procedure we end up with six subsets of training instances.
Six bags of ten logistic regression classifiers each, were trained to detect
either cortical thickening or cortical thinning at one of the three sulci levels. It
should be noted that any linear classifier can be used within this framework.
Each base-level logistic regression classifier was trained on a balanced dataset
i.e., with an equal number of positive and negative instances. We randomly
under-sampled [108] the negative instances (culled from the control data) to
balance the training set for each base-level classifier.
3.4.3 Testing
The output of each logistic regression classifier within the bag is the probability
that the input vertex belongs to the positive (lesional) class. To convert this
probability into a class label, we need to define a threshold ρ for the output
probability values such that the vertices having a predicted probability above
ρ are deemed lesional and those that fall below ρ are considered non-lesional.
In the results shown in Tables-3.1 and 3.2 we use ρ = 0.95.
We use a leave-one-patient-out cross-validation (LOOCV) strategy to test
the performance of our proposed classification scheme. For this purpose we left
33
out a single subject from the training data and trained the stratified classifiers
on vertices belonging to all the remaining subjects and all the controls. To
classify a vertex from the test subject, we first select the two bags of classifiers
corresponding to the sulcal depth of the vertex, and the output of each bag
is calculated based on the values of cortical thickness, GWC, curvature and
Jacobian distance for that vertex. Thus, we predict two labels for each test
vertex, indicating whether it is deemed lesional based on the “thinning” clas-
sifier (ythin) or the “thickening” classifier (ythick). These two predictions are
combined into a single label by taking the maximum of these two labels i.e.,
y := max {ythin, ythick}.
After each vertex of the test subject has been classified, the results were
post-processed to get rid of insignificant detections. To this end, we define the
notion of a detected cluster as a set of contiguous vertices that are labeled as
being lesional. The number of detected clusters was reduced to eliminate false
positives based on cluster surface area [96]. In our experiments all clusters
having a surface area less than 50mm2 were discarded, following the exact
same post-processing strategy as outlined in [96]. Although, discarding de-
tected clusters increases the possibility of discarding subtle abnormal regions,
we perform this step to have a consistent comparison between the proposed
method and the baseline.
A test subject is considered a true positive after post-processing, if any
of the remaining clusters partially or completely overlap with the original le-
sion/resection area [96, 14]. In the case, where all the significant clusters fall
outside the lesion/resection region, the test subject is regarded as a false neg-
ative. It should be kept in mind that detections outside the lesion/resection
zone may actually represent abnormal cortical tissue (c.f. Section 3.1). Thus,
the statistics provided here represent an lower bound on actual classifier per-
34
formance.
3.4.4 Results
We use the z-score based approach proposed in [96] as a baseline, which uses
a single feature to detect abnormal vertices. Specifically, the vertices of the
registered data are z-score normalized using first and second order statistics
from the control population. Then the resulting z-scores are thresholded at
z = 2.1 to identify lesional vertices. Although any one of the five available
features can be used within this approach, we selected to work with cortical
thickness which was reported by Thesen et al. [96], to be the most effective
feature for detecting FCD lesions. Furthermore, the detected clusters for the
z-score based approach were post-processed using the same method as outlined
in Section 3.4.3.
We used the Dice coefficient (DC) [26] to quantify the performance of both
the proposed approach and the z-score baseline. DC is a set similarity metric
that is a special case of the kappa statistic [113]. It is commonly used to
measure the accuracy of segmentation in medical images [114, 85, 6] when
ground truth is available. We use the DC to measure the overlap between
the final detected clusters (after post-processing) with the available resection
(for MRI-negative patients) and the expert-traced lesions (for MRI-positive
patients). Let Mpred be the binary mask created that represents all the final
detected clusters, and let Mlabel be the binary mask representing the vertices
within the lesion/resection zone for a given subject. The DC is then calculated
as:
DC(Mpred,Mlabel) =2 |Mpred ∩Mlabel||Mpred|+ |Mlabel|
(3.1)
35
Subj. Z-Score MLId. TPR FPR DC TPR FPR DC
NY49 11.85 1.00 19.92 24.76 2.27 34.58NY53 20.28 2.60 29.60 27.72 4.46 35.42NY123 29.80 3.68 27.61 31.33 4.50 26.36NY143 16.38 0.60 12.28 20.03 2.00 5.81NY156 26.12 1.20 38.69 25.65 2.11 36.14NY187 - 0.50 - - 0.90 -NY194 7.79 0.14 14.00 11.48 0.58 18.18
Mean 16.03 1.40 20.30 20.14 2.41 22.36
Table 3.1: The detection performance of the z-score baseline approach andthe proposed scheme (ML) on MRI-positive subjects. The true positive rate(TPR) and false positive rate (FPR) are calculated as the percentage of le-sional vertices correctly labeled, and the percentage of non-lesional verticesincorrectly labeled, respectively. The Dice coefficient (DC) measuring the de-gree of spatial overlap (shown here as a percentage) between the detectedclusters and the expert-marked lesion on the cortical surface is also listed (‘-’represents a value of zero, and for both TPR and DC signifies that no abnormalcluster was detected that overlapped with the lesion).
Thus we want to maximize DC. In addition to DC, we also estimate the false
positive rate: the percentage of non-lesional vertices incorrectly predicted as
lesional, and the true positive rate: the percentage of lesional vertices correctly
detected. It should be noted that all the performance metrics are calculated
using the original expert-marked resection/lesion zones as the ground truth,
therefore the estimates of true positive rate should be considered as lower
bounds.
Table 3.1 shows the performance of both approaches on MRI-positive sub-
jects. Both approaches detect significant clusters in the expert-marked lesional
area for the same subjects. It can be seen that the ML based approach de-
tects larger clusters within the lesion area as compared to the baseline but
has a higher false positive rate. However, the overall difference between the
true positive rate and DC of the two approaches was found to be not statisti-
36
(a) (b)
Figure 3.3: The detection results for the machine learning based approach on(a) an MRI-positive and (b) an MRI-negative subject. The inflated lateral andmedial cortical surfaces show the original expert-marked lesion or the resectionzone as the regions outlined by the white solid curve and the significant lesionalclusters discovered by the machine learning approach as the yellow solid filledregions. The MRI slice on the right shows the abnormal area correspondingto the clusters discovered inside the lesion/resection from the actual brainvolume.
cally significant. Figure 3.3(a) shows the detection results using the proposed
approach on an MRI-positive subject after post-processing.
Whereas, both approaches have the same detection rate for MRI-positive
subjects, the proposed approach outperforms the z-score based method for
MRI-negative subjects. Table-3.2 compares the performance of both approaches
on MRI-negative subjects. The ML approach is able to correctly detect sig-
nificant clusters inside the resection zone for 14 out of 24 subjects whereas the
z-score based method is able to correctly detect lesions in only 9. The proposed
approach has a higher true positive rate, and a higher DC. The difference be-
tween the calculated DC values for the proposed approach and the baseline
was found to be significant using a two-tailed t-test (t(23) = 3.34, p=0.0029).
However, the proposed method has a higher FPR than the z-score method
(1.04% versus 0.58%). Figure 3.3(b)b shows the detection results using the
proposed approach on an MRI-negative subject after post-processing.
37
Most of the automated FCD lesion detection schemes reported in the lit-
erature have been evaluated on MRI-positive patients [43, 96]. However, our
dataset contains 24 MRI-negative patients whose lesions were not detected
by experienced radiologists. The baseline z-score based method used here is
the actual computational tool being used at NYU Comprehensive Epilepsy
Center for patient evaluation. In this context, the higher detection rate of
the proposed scheme shows its higher sensitivity to detecting subtle cortical
malformations in FCD patients. We conjecture, that one of the reasons for
this higher sensitivity arises from the use of all five features to characterize
abnormal regions, that are overlooked by the univariate baseline method.
3.5 Sensitivity Analysis
The vertex-based classifier developed in this chapter is designed specifically to
circumvent the idiosyncrasies of the FCD dataset. In this section we evaluate
the impact of each individual design choice on classifier performance. Specif-
ically, these design choices include data stratification based on sulcal depth
measures, bagging and manual reduction of the expert marked lesion/resection
area.
3.5.1 Data Stratification
In order to determine whether correcting for cortical complexity by stratify-
ing classifiers by sulcal depth results in improved detection rates, we re-ran
the training phase in the leave-one-patient-out cross-validation without this
correction (note that we retain bagging and mask reduction). As depicted
in Tables 3.3 and 3.4 (compare the ML column to column A), the true posi-
tive rate dropped from 20.1% to 12.9%, in the MRI-positive group, and lesion
38
Subj. Z-Score MLId. TPR FPR DC TPR FPR DC
NY46 - 0.34 - 0.95 0.74 1.78NY51 2.86 1.00 5.11 4.15 1.02 7.24NY67 4.35 0.26 8.13 8.30 0.65 14.45NY68 0.09 1.33 0.14 0.12 1.69 0.15NY72 - - - 0.55 0.25 1.07NY98 - 0.33 - - 0.81 -NY116 - - - - 0.39 -NY130 - 0.16 - - 0.25 -NY148 - 0.10 - - 0.12 -NY149 - 0.84 - - 1.68 -NY169 - 1.02 - 9.41 1.98 8.97NY171 2.45 1.00 2.93 2.94 1.80 2.57NY177 1.88 0.14 3.59 3.20 0.32 5.80NY207 - 0.05 - - 0.60 -NY212 - 1.01 - - 1.60 -NY226 - 0.50 - 1.09 0.60 1.88NY241 - 0.33 - - 0.40 -NY255 3.23 0.42 6.01 6.18 1.30 10.30NY259 - 0.50 - - 0.58 -NY294 - 0.50 - - 1.40 -NY297 2.98 0.14 5.64 7.98 0.56 13.30NY299 - 3.13 - 3.30 4.86 4.32NY312 6.10 0.50 10.05 9.04 0.97 12.83NY322 1.74 0.31 3.25 2.02 0.49 3.66
Mean 1.07 0.58 1.87 2.47 1.04 3.68
Table 3.2: Results for MRI-negative subjects. For each subject the true posi-tive rate (TPR) and false positive rate (FPR) are calculated as the percentageof lesional vertices correctly labeled, and the percentage of non-lesional ver-tices incorrectly labeled, respectively. The dice coefficient (DC) is also shownas a percentage to quantify the overlap between the detected clusters and theresection on the cortical surface (‘-’ represents a value of zero for FPR andno-detection for TPR and DC).
39
Subj. Z-Score ML (A) (B) (C)Id. TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR
NY49 11.8 1.0 24.8 2.3 8.7 0.6 - - 0.6 -NY53 20.3 2.6 27.7 4.5 23.5 4.2 10.6 1.2 4.3 0.2NY123 29.8 3.7 31.3 4.5 31.4 4.6 25.2 1.2 7.4 0.1NY143 16.4 0.6 20.0 2.0 - 0.4 - - - 0.1NY156 26.1 1.2 25.7 2.1 26.4 1.8 20.1 0.4 2.1 -NY187 - 0.5 - 0.9 - 0.6 - 0.4 - -NY194 7.8 0.1 11.5 0.6 - - - - - -
Mean 16.0 1.4 20.1 2.4 12.9 1.7 8.0 0.5 2.1 0.1
Table 3.3: A comparison of detection results using the z-score based method and the ML methods only for MRI-positive subjectswith different variations in the design of the ML approach. (A) no stratification along the sulcal values, (B) stratifies the databased on the sulcal depth values, but does not reduce the lesion mask. (C) uses stratification, lesion reduction by calculating athreshold for each sulcal level using cortical thickness values, but it does not use bagging (The TPR and FPR are measured asa percentage and‘-’ represents a value of zero for FPR and no-detection for TPR).
Subj. Z-Score ML (A) (B) (C)Id. Detected FPR Detected FPR Detected FPR Detected FPR Detected FPR
NY46 n 0.34 y 0.74 n - n - n -NY51 y 1.00 y 1.02 y 0.95 y 0.30 n -NY67 y 0.30 y 0.65 y 0.19 n - n -NY68 y 1.33 y 1.69 y 1.56 y 0.81 n -NY72 n - y 0.25 n - n - n -NY98 n 0.33 n 0.81 n 0.56 n - n -NY116 n - n 0.39 n - n - n -NY130 n 0.16 n 0.25 n 0.16 n - n -NY148 n 0.10 n 0.12 n 0.10 n - n -NY149 n 0.84 n 1.68 n 0.18 n - n 0.06NY169 n 1.02 y 1.98 n 0.09 n - n -NY171 y 1.00 y 1.80 y - n - n -NY177 y 0.14 y 0.32 y 0.30 n - n -NY207 n 0.05 n 0.60 n - n - n -NY212 n 1.01 n 1.60 n 1.13 n 0.53 n 0.06NY226 n 0.50 y 0.60 n 0.46 n 0.20 n -NY241 n 0.33 n 0.40 n 0.35 n 0.08 n -NY255 y 0.42 y 1.30 y 0.08 n - n -NY259 n 0.50 n 0.58 n 0.50 n 0.17 n 0.06NY294 n 0.50 n 1.40 n 0.20 n - n -NY297 y 0.14 y 0.56 y - n - n -NY299 n 3.13 y 4.86 n 0.30 n - n -NY312 y 0.50 y 0.97 y 0.73 y - n -NY322 y 0.31 y 0.49 y 0.12 n - n -
Mean 9/24 0.58 14/24 1.04 8/24 0.33 3/24 0.09 0/24 0.005
Table 3.4: A comparison of detection results using the z-score based method and the ML method only for MRI-negative subjectswith different variations in the design of the ML approach. (A) no stratification along the sulcal values, (B) stratifies the databased on the sulcal depth values, but does not reduce the lesion mask. (C) uses stratification, lesion reduction by calculating athreshold for each sulcal level using cortical thickness values, but it does not use bagging (FPR is given as a percentage and ‘-’represents a value of zero for FPR).
detection dropped from 58% to 33% in the MRI-negative group. This sug-
gests that different feature combinations might be more prevalent in specific
regions (e.g., sulcus, gyrus, wall), which is consistent with the observation of
region-specific dysplasia subtypes (e.g., bottom-of-the-sulcus dysplasia) [41].
3.5.2 Mask Reduction
The proposed classification approach relies heavily on the availability of clean
training data (i.e., positive and negative instances with minimal label noise).
This observation holds irrespective of the choice of the base linear classifier
(logistic regression in our case). There are two main issues that need to be
evaluated for this strategy: i) is mask reduction necessary? and ii) how resilient
is the classifier performance to the choice of the manually selected thresholds?
Mask reduction aims to reduce all the vertices labeled as lesional to only
those that were significantly thicker or thinner than non-lesional vertices. We
tested the improvement in detection rates when utilizing this strategy by re-
running our analysis without mask reduction (note that we retain stratification
and bagging). The results are depicted in Tables 3.3 and 3.4 (compare the ML
column to column B) and show a drop in detection rates for both the MRI-
positive group (from 6/7 (86%) to 3/7 (43%)) and the MRI-negative group
(from 14/24 (58%) to 3/24 (12.5%) detections). This indicates that class label
noise is a significant issue for both groups that can be corrected by utilizing a
mask reduction strategy with a separate threshold for cortical thickening and
cortical thinning.
Given the high impact of mask reduction on classifier performance, the
thresholds used in manually reducing the resection/lesion area (c.f. Section
3.1.1) play a critical role. These thresholds are selected manually, and thus
involve human subjectivity. In order to test the sensitivity of our results
42
0.007 0.008 0.009 0.01 0.011 0.012 0.013 0.0140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
False Positive Rate
Dete
cti
on
Rate
−7.5%
τ −5%
+7.5%
+10%
−10%+ 5%
Figure 3.4: The effect of changing the manually determined thresholds forreducing the resected region to eliminate label noise on the detection rateof MRI-negative subjects. The detection rate represents the percentage ofMRI-negative subjects for whom significant clusters were detected within themarked resected area.
to these thresholds, we perturbed the thresholds by a relative amount and
recorded the effect of this change on the final detection rates for both MRI-
positive and MRI-negative subjects.
Let ∆ represent the relative change in both the thresholds τthin and τthick
(c.f. Section 3.1). A positive change makes the absolute threshold higher,
which represents a more exclusive selection criterion for keeping lesional ver-
tices in the training data from the marked lesion/resection area. Similarly,
a negative change lowers the threshold selecting a (possibly) higher number
of vertices as lesional. The detection rate for MRI-positive subjects was not
impacted by this perturbation. Figure 3.4 plots the detection rate versus the
FPR for different perturbations in the thresholds for MRI-negative subjects.
The detection rate for MRI-negative subjects changes when the manually iden-
tified thresholds are adjusted more than 5% of their original values. Adjusting
the thresholds for the MRI-negative subjects increases both the detection rate
and the false positive rate. Note that the thresholds used in our experiments
were set once, independently for each subject manually, and were not adjusted
43
throughout the course of this study.
The results show that even though the proposed approach is robust to
slight perturbations in the mask reduction thresholds, this step is crucial for
both the resulting detection rate and the false positive rate. This finding
emphasizes the presence of label noise in data, especially when the resected
regions of MRI-negative patients are used as ground truth. Similarly, this step
also highlights the main limitation of the proposed approach: valuable expert
time is needed to establish the mask reduction thresholds for all new training
subjects.
3.5.3 Bagging
We examined the impact of bagging on the classifier by replacing each bag
with a single logistic regression classifier, trained on all the negative instances,
while retaining both stratification and mask reduction. As can be seen from
Tables 3.3 and 3.4, eliminating bagging resulted in the most substantial drop
in performance; the TPR of MRI-positive group dropped from 20.1% to 2.1%
and for the MRI-negative group the detection rate dropped from 14/24 (58%)
to 0/24 (0%). In other words, failing to correct for the class imbalance problem
resulted in zero detection of MRI-negative FCD lesions.
Figure 3.5 summarizes the changes in detection rates for MRI-negative
patients as the design choices are omitted individually (i.e., one at a time)
from the proposed classifier design.
3.6 Conclusion
This chapter provided key insights for developing an effective FCD lesion de-
tection scheme using surface-based morphometry for MRI-negative patients.
44
10
20
30
40
50
60
70
Dete
cti
on
Rate
(%
)
Effect of Classifier Design on the Detection Rate for MRI−Negative Subjects
Without LesionReduction
(B)
Without SulcalStratification
(C)
Combined(D)
No Bagging(A)
Figure 3.5: A comparison of detection rate for MRI-negative subjects with dif-ferent variations in the design of the proposed approach, including A) withoutbagging, but with sulcal stratification and mask reduction, B) without maskreduction but with bagging and sulcal stratification, C) without stratificationbut with bagging and lesion reduction, and D) using all three corrections.
We identified three different confounding factors that can undermine classifier
performance if not taken into account. The empirical evaluation shows that
the detection results are greatly enhanced as compared to a baseline scheme
that does not incorporate the design choices tailored specifically to counter
these confounding factors in data.
Our results offer a potential advancement of neurodiagnostic tools for the
more challenging population of MRI-negative patients. However, the case-
control methods we utilize in our approach require a large normal control
cohort with identical MRI scanning parameters as those of the patient and
thus cannot be readily applied in any clinical setting. Furthermore, auto-
mated detection and classification of lesions should not replace careful visual
analysis by a trained expert. Rather, the quantitative approaches can be used
to supplement visual analysis by highlighting areas with a high lesional prob-
ability, similar to a focus of attention mechanism. Such quantitative lesion
45
detection methods as the one developed in this chapter aim to increase the
chances of accurately detecting the lesion in the pre-surgical phase, and are
primarily designed to highight all possible problematic cortical regions that
may be overlooked by expert radiologists during conventional visual inspec-
tion. Therefore, the availability of such a classification scheme that can identify
FCD lesions especially in MRI-negative patients with higher accuracy during
the initial diagnostic stages enhances the chance of a successful outcome for
resective surgery.
The training data consisting of the MRI-negative cases were derived from
resection areas that were identified using iEEG. FCD pathology was present in
the resection area in all such patients; however, non-lesional tissue may have
also been part of the resection. We reduced this problem (label noise) by ap-
plying a manual mask reduction step which resulted in enhanced performance.
Mask reduction is the main shortcoming of the proposed approach; it requires
expert time to establish the mask reduction thresholds and is prone to human
errors. In the next chapter we develop a machine learning model that incor-
porates the results of the iEEG exam as an additional source of supervision,
with the aim of eliminating the manual mask reduction step.
In summary, this chapter demonstrated that a quantitative morphometric
method using surface-based brain modeling, combined with machine learn-
ing algorithms and novel strategies to deal with the complexity of cortical
malformations, results in improved detection of FCD. Improved detection of
neocortical structural lesions is likely to increase the number of patient re-
ferrals to specialized tertiary epilepsy centers for surgical consideration, and
in many cases, may decrease the delay between initial diagnosis and surgery.
This has significant implications for improved seizure and cognitive outcomes
in patients with FCD and concomitant epilepsy.
46
Chapter 4
Leveraging iEEG for FCD
Lesion Detection
“The signal is the truth. Thenoise is what distracts us fromthe truth.”
Nate Silver
The classification scheme developed in the last chapter highlighted a num-
ber of confounding factors that inhibit the performance of supervised learning
algorithms for lesion detection in MRI-negative patients. Specifically, the pro-
posed detection scheme involved a manual mask reduction step, requiring both
expert time and knowledge to determine the mask reduction thresholds. Mask
reduction is an ad-hoc procedure aimed at reducing the label noise that arises
when resected regions are treated as ground truth. Similarly, the morphology
of the human brain such as its thickness, curvature and the overall structure
in general are affected by different demographic factors such as age, gender
and education [84, 83]. Learning a classifier by aggregating the data from the
patient population (as done in the previous chapter), and not accounting for
these sub-trends in data, has a negative impact on classifier accuracy. In this
chapter we adopt a multitask learning (MTL) approach for detecting FCD
47
lesions in treatment resistant epilepsy patients (TRE), which uses the results
of the iEEG exam as a second source of supervision in addition to the vertex
labels provided by the resections for individual patients. Secondly, instead of
pooling the data from the entire patient population, we now treat each patient
as a separate learning task.
As mentioned previously, before undergoing resective brain surgery, all pa-
tients are subjected to an invasive intracranial EEG (iEEG) exam. In this
exam sub-dural electrodes are implanted on the cortical surface to record elec-
trical activity [112]. A board of certified epileptologists reviews this informa-
tion to determine the region that is responsible for generating the seizure i.e.,
the seizure onset zone. To isolate the abnormal region, each electrode is la-
beled as being part of the seizure onset zone or not. iEEG has been shown
to be effective for localizing FCD lesions [65]. However, for MRI-negative pa-
tients there is no visible lesion to guide precise electrode implantation, which
results in sampling errors. In such cases the identified abnormal region fails
to capture the lesion in its entirety in about 40% of the cases [43], resulting
in poor surgical outcomes. In this work, we augment the MRI labels i.e., the
resected regions for MRI-negative patients and the expert marked lesion of
MRI-positive patients with the results of iEEG analysis to mitigate the effects
of label noise.
By using this combined supervision, our goal is to show that the manual
mask-reduction step can be by-passed and the results of the iEEG exam can
be used to focus the attention of the learner on regions that might have a high
chance of being abnormal as compared to other regions within the resection. It
should be noted here, that during classification we only have access to patient’s
MRI data, and the extra supervision from iEEG is used only for training.
Thus, the proposed method serves as a pre-surgical patient evaluation tool that
48
detects candidate lesional regions on a patient’s MRI data; these candidate
regions are further evaluated using invasive iEEG and video monitoring to
locate the final target for resective surgery.
To address inter-patient variability we treat each patient as a separate clas-
sification task. To this end, we use the patient’s MRI to isolate the ressected
region (positive instances) and extract the same region from an age and gender
matched healthy control subject (negative instances). We then use MTL to
learn a common classifier (across all tasks), using the datasets gathered from
all the subjects. The common classifier can then be used to detect FCD lesions
in new patients.
The contributions of this work are:
� We extend the regularized multitask learning framework [31] (described
briefly in Section 4.1) to incorporate auxiliary label information as an
additional source of supervision when the training data has weak labels.
� We model the case when the auxiliary information has similar semantics
across tasks, and the case when its semantics differ among tasks (Section
4.2).
� We cast the task of automated detection of FCD lesions in MRI-negative
epilepsy patients, as a multitask learning problem and incorporate the
results of their iEEG exams to provide additional supervision to amelio-
rate the problem of label noise that arises when resection zones are used
as ground truth (Sections 4.3 and 4.4).
4.1 Multitask Learning
Multi-task learning (MTL) simultaneously learns multiple related prediction
tasks which can be represented using a shared common structure [22]. MTL is
49
ideally suited for domains in which the data is collected from different sources,
each of which when considered individually does not have enough data to fa-
cilitate learning a reliable prediction model. However, because the data are
collected from disparate sources, the underlying distribution for each source
has its own distinct characteristics. This causes a co-variate shift that neg-
atively impacts the performance of a classifier learned by simply pooling the
data together. MTL exploits the “relatedness” among tasks by sharing the
common information, through joint representation, while regulating the influ-
ence of each data source on the final model. A number of different approaches
have been taken to develop robust MTL frameworks, including hierarchical
Bayes [98], regularization [31], and Gaussian processes [109]. These have been
successfully applied in various application domains, such as object detection
[100], conjoint analysis [58], classification of light curves [109], etc.
Similar to traditional learning methods, MTL requires accurately anno-
tated training data. However, in many domains there is significant label noise
that arises due to human subjectivity, imprecise measurements, etc. Because
MTL allows information to be shared among tasks during the learning phase,
the impact of label noise can be compounded, undermining the performance
of the prediction model. In this work we formulate an MTL model that learns
from imprecise labels, given access to an additional source of information which
provides a score for an instance quantifying the confidence associated with its
label. We address the added complexity of this auxiliary label information
when it exhibits a co-variate shift and behaves differently across tasks. To this
end, we extend the regularized MTL framework [31] to incorporate auxiliary
supervision when the training labels are uncertain. We have chosen this frame-
work as it admits a solution similar to support vector learning [104], which
along with learning large-margin classifiers allows the use of kernel functions
50
[86] to learn both linear and non-linear solutions.
Label noise is a prevalent in most real-world datasets [40, 68], especially in
domains where human experts are involved in the labeling process. In general,
there are two main approaches to deal with label noise within the context of
supervised learning. The first is to identify the noisy labels [21], and either
discard them or assign them lower weights [77]. The second approach assumes
that we are provided with scores quantifying the uncertainty of each training
label. For example, in learning from probabilistic labels [90, 61] the class of
each instance is specified by a probability distribution over the possible class
labels. Our work falls within this second approach but instead of assuming
that the training labels are “soft” we assume that we have access to a sec-
ondary source of information, providing the means to infer the probability for
a particular instance as belonging to either the positive or negative class. In
this regard, the work that is closest to ours is Nguyen et al. [69], in which
the authors assume the availability of an additional source of label informa-
tion. They model this side information as inducing a pair-wise ranking in the
context of learning a binary classifier. We adopt a similar ranking approach,
however, the main difference between their approach and our work is that we
consider the inclusion of additional label information in the context of MTL
rather than for single task learning. We go beyond simply incorporating the
additional label information into the regularized MTL framework, by taking
two different modeling approaches. In the first approach we assume that the
additional source behaves uniformly across tasks, while in the second approach
we allow its underlying semantics to be different for different tasks.
51
4.2 MTL with Auxiliary Label Information
We first provide the details of our notation and review the regularized MTL
framework [31]. We then describe our proposed modifications to this frame-
work to incorporate auxiliary label information when the training labels are
imprecise. For clarity and ease of comparison we mostly use the same notation
as Evgeniou et al. [31].
Notation: We consider that we have data from T related classification tasks
given as (xti, yti), where xti ∈ Rd and yti ∈ {−1, 1} for all i ∈ {1, 2, . . . ,m} and
t ∈ {1, 2, . . . , T}. All tasks share the same feature space and, without loss
of generality we assume that all tasks have an equal number (m) of training
instances and the underlying data distributions for all tasks are different but
related. The goal then is to simultaneously learn T classifiers, one per task,
such that ft(xti) = yti ; ∀t ∈ {1, 2, . . . , T}.
In addition to labeled data we also have a label score rti for each training
instance. This score is considered relative to either the positive or the negative
class and represents the degree of “positive-ness” or “negative-ness” of the
instance. For the sake of clarity in the rest of the chapter we assume that this
score is generated relative to the positive class. This score induces a pairwise
ranking of instances for each task: (i, j) ∈ Πt : rti ≥ rtj. The ranking function
Πt is adapted from rank-SVM [48]. The score assigned by the ranking function
i.e., rti , reflects the degree of belief about an instance belonging to the positive
class.
52
4.2.1 Regularized Multi-task Learning (MTL)
The regularized MTL approach [31] learns a separate classification function,
ft(x) = wt · x for each individual task t, defined as:
wt = w0 + vt (4.1)
where w0 ∈ Rd is a vector that represents the parameters common to all tasks
and vt ∈ Rd are the task-specific parameters. If the tasks are highly similar,
then the vt are small relative to w0, and vice versa. All the classifiers ft can
be learned simultaneously by solving the following optimization problem:
minw0,vt,ξti
T∑t=1
m∑i=1
ξti +λ1
T
T∑t=1
‖vt‖2 + λ2‖w0‖2
subject to:
∀i, ∀t : yti (w0 + vt) · xti ≥ 1− ξti
ξti ≥ 0
(4.2)
where, λ1 and λ2 are regularization parameters that control the relatedness of
the tasks; a large value of the ratio λ1λ2
will drive the task-specific parameter
vectors toward zero with the effect that all tasks will have an identical solution
given by w0. Whereas, a small value of λ1λ2
has the opposite effect of making all
the tasks independent and driving w0 to zero [31]. This model can be viewed
as jointly learning a mean support vector machine (SVM) represented by w0
and T task-specific SVMs, each represented by wt, such that each of the wt
has a large margin while being as close as possible to w0 [31]. Equation 4.2 can
be solved efficiently by formulating its dual. If we re-parametrize Equation 4.2
by defining two new parameters: C = T2λ1
and µ = Tλ2λ1
, then the dual problem
53
takes the form [31]:
maxαti
{T∑t=1
m∑i=1
αti −1
2
T∑t=1
m∑i=1
T∑s=1
m∑j=1
αtiytiαsjysjKst
(xti, x
sj
)subject to:
∀i, ∀t : 0 ≤ αti ≤ C
(4.3)
where αti are the Lagrange multipliers. The dual optimization problem is
identical to the dual of a binary SVM. Therefore, any standard SVM solver
can be used to solve for α. The structure of MTL is captured by the kernel
function Kst(.):
Kst(xti, x
sj) =
(1
µ+ δst
)xti · xsj (4.4)
This multi-task kernel couples the inner-product of the instances across tasks
based on µ = Tλ2λ1
which represents the degree of relatedness among the tasks.
The usual kernel-trick can be applied here to learn non-linear classifiers, by
replacing the standard inner product in Equation 4.4 with a kernel function
[31].
4.2.2 Incorporating Auxiliary Label Information
Most real-world datasets contain varying levels of label noise that result from
human subjectivity, missing information, imprecise measurements or varia-
tions in expert opinions over time. The MTL framework outlined above re-
quires accurate class labels during the learning phase, and the presence of label
noise can seriously undermine its performance. We consider the case when the
labels are noisy but there is another source of obtaining supplementary la-
bel information that can accurately grade the “positive-ness” of an instance.
This additional source may represent the subjective judgment of a domain
expert(s) about a particular instance, based on either the same set of features
54
that are available to the learner or some other view of the data. For example,
in our case, the data available to the learner consists of image features taken
from the patient’s MRI, while the source of auxiliary label information is pro-
vided by the results of an iEEG examination carried out by a panel of expert
epileptologists.
Modeling Side Information as Ranking
In this work we interpret the auxiliary label scores as the output of a pairwise
ranking function. This means that an instance with a higher score is more
“positive” than another instance that has a lower score. We can model the
behavior of the ranking function as being globally-consistent (i.e., it does not
not vary across tasks). In other words, if we consider the ranking function as
representing an expert’s judgment, then this assumption would require that
his/her evaluation criteria for ascertaining the rank of an instance does not
change from one task to another. This is a strong assumption, because most
real-life experts will make varying judgments based on the nature of the task at
hand. For example, determining the range of cortical thickness values that are
abnormal for a patient depends on the patient’s age and gender [84]. Therefore,
an expert’s criteria would be calibrated differently for different patients. We
can model this variation by taking a task-specific approach that allows each
task to have its own ranking function.
From the perspective of MTL, the difference between the two approaches
is whether the rankings are shared across tasks or not. In the task-specific
approach, we cannot compare the ranks of two instances belonging to differ-
ent tasks, because the underlying semantics of ranking are different. In the
globally-consistent case, rank information can be shared among tasks without
introducing any discrepancies.
55
We incorporate the auxiliary label information by modifying the original
regularized MTL model (Equation 4.2) such that the final model not only max-
imizes classification accuracy but additionally preserves the pairwise ranking.
Similar to rank-SVM [48], we take the rank of each instance as being propor-
tional to its distance from the separating hyperplane:
(i, j) ∈ Πt : wt · xti ≥ wt · xtj
where, Πt is the ranking function for task t. For each pair of instances be-
longing to task t we augment the original MTL problem (Equation 4.2) with
pairwise rank constraints [69]. A similar approach is taken in [69] for training
single-task binary SVMs in the presence of label noise.
4.2.3 Globally-Consistent Label Ranking (GC)
Here we consider all pairwise ranking functions Πt, ∀t to be identical. In this
case we modify the original MTL framework (Equation 4.2) by adding rank
constraints, which involve both the shared and task-specific parameters. The
new optimization problem is:
minw0,vt,ξti ,η
tpq
{1
2
T∑t=1
‖vt‖2 +µ
2‖w0‖2+
+C
T∑t=1
m∑i=1
ξti + C ′T∑t=1
∑(p,q)∈Πt
ηtpq
subject to:
∀i, ∀t : yti (w0 + vt) · xti ≥ 1− ξti
∀t,∀ (p, q) ∈ Πt : (w0 + vt) ·(xtp − xtq
)≥ 1− ηtpq
ξti ≥ 0 , ηtpq ≥ 0
(4.5)
56
where ηtpq are slack variables that allow some of the rank constraints to be
violated. C ′ is a positive scalar and represents the relative cost of violating a
rank constraint. It is defined as a multiple of the original MTL cost parameter,
C ′ = aC, a ∈ R+. Equation 4.5, can be viewed as learning a classifier with a
dataset augmented with a fixed number of pseudo-examples corresponding to
the difference vectors generated by the rank constraints. This becomes more
evident if we re-write the rank constraints as:
ztpq (w0 + vt) ·∆tpq ≥ 1− ηtpq
where, ztpq = 1 are the labels for each pseudo-example: ∆tpq = xtp − xtq. By
augmenting the data for each task with the pseudo-examples ∆ij and their
corresponding labels zij = 1 we combine the two sets of constraints to solve
a single classification problem. However, the number of pseudo-examples is
quadratic in terms of the number of instances in the original dataset for each
task. This will cause the number of positive instances to be substantially
higher than the number of negative instances, which in the worst case scenario
can result in a degenerate solution in which the resulting hyperplanes only
respect the rank constraints. The trade-off between preserving the ranking
and accurate classification is controlled by the cost parameters C ′ and C,
respectively. The cost parameters are analogous to the cost parameter of
the traditional support vector machine (SVM) [86]. We set these parameters
based on a grid search, which is the standard procedure for training SVMs.
The problem described in Equation 4.5 can be efficiently solved by formulating
57
its dual using the kernel function from Equation 4.4.
maxαti,β
tpq
T∑t=1
m∑i=1
αti +
T∑t=1
∑(p,q)∈Πt
βtpq −
1
2
T∑t=1
m∑i=1
T∑s=1
m∑j=1
αtiytiαsjysjKst
(xti, x
sj
)−
T∑t=1
m∑i=1
T∑s=1
∑(p,q)∈Πt
αtiytiβspqz
spqKst
(xti,∆
spq
)
−1
2
T∑t=1
∑(p,q)∈Πt
T∑s=1
∑(k,l)∈Πs
βtpqztpqβ
sklz
sklKst
(∆tpq,∆
skl
)subject to:
∀i, ∀t : 0 ≤ αti ≤ C , 0 ≤ βtpq ≤ C ′
(4.6)
where, αti and βtpq are Lagrange multipliers corresponding to the classification
and rank constraints, respectively.
It is worth mentioning that the pseudo-examples are created on a per task
basis; i.e., there are no pseudo-examples resulting from comparing the ranks
of two instances from different tasks. The assumption of global consistency
is exploited in the construction of the rank constraints (c.f., Equation 4.5),
in which both w0 and vt are required to preserve the ranking. This can be
made explicit by inspecting the optimal solutions for both w0 and vt, obtained
by formulating and solving the Lagrangian function for Equation 4.5. We
can find the optimal value of both the mean weight vector w0 and the task-
specific weight vectors vt by formulating the Lagrangian function for problem
4.5. These are found to be:
w∗0 =1
µ
T∑t=1
m∑i=1
αtiytixti +
T∑s=1
∑(p,q)∈Πs
βspqzspq∆
spq
(4.7)
v∗t =m∑i=1
αtiytixti +
∑(p,q)∈Πt
βtpqztpq∆
tpq (4.8)
where, α and β are Lagrange multipliers.
58
4.2.4 Task-Specific Label Ranking (TS)
To model the peculiarities that may exist in the source of auxiliary information
as we move from one task to another, we can limit the influence that the rank
constraints have on the overall solution, by limiting them to affect only the
task-specific components. This is formulated similar to Equation 4.5, with
modified rank constraints:
minw0,vt,ξti ,η
tpq
{1
2
T∑t=1
‖vt‖2 +µ
2‖w0‖2+
C
T∑t=1
m∑i=1
ξti + C ′T∑t=1
∑(p,q)∈Πt
ηtpq
subject to:
∀i, ∀t : yti (w0 + vt) · xti ≥ 1− ξti
∀t,∀ (p, q) ∈ Πt : ztpq vt ·∆tpq ≥ 1− ηtpq
ξti ≥ 0 , ηtpq ≥ 0
(4.9)
By not allowing the rank information to directly influence the shared compo-
nent w0, the ranking function Πt is no longer coupled across tasks, and can
behave differently for different tasks. It should be noted that although the
shared weight vector w0 is not required to preserve the rankings, it is still
indirectly affected by the rank constraints through vt (c.f., Equation 4.1). The
dual of 4.9 can be formulated as:
maxαti,β
tpq
T∑t=1
m∑i=1
αti +
T∑t=1
∑(p,q)∈Πt
βtpq −1
2µ
T∑t=1
m∑i=1
T∑s=1
m∑j=1
αtiytiαsjysj
⟨xti, x
sj
⟩−
T∑t=1
m∑i=1
∑(p,q)∈Πt
αtiytiβtpqz
tpq
⟨xti,∆
tpq
⟩−1
2
T∑t=1
∑(p,q)∈Πt
∑(k,l)∈Πt
βtpqztpqβ
tklz
tkl
⟨∆tpq,∆
tkl
⟩subject to:
∀i, ∀t : 0 ≤ αti ≤ C , 0 ≤ βtpq ≤ C ′
(4.10)
59
where, αti and βtpq are Lagrange multipliers corresponding to the classification
and rank constraints, respectively, and 〈., .〉 is the canonical-dot product. The
dual optimization problem in this case can be formulated in a form identical to
Equation 4.6, by using a new kernel function. Let X t ∈ Rd be the augmented
data for task t obtained by combining all the original data instances (xti) and
the pseudo-examples (∆tpq), and let uti be an indicator variable defined as:
utk =
1 if xtk = xti,
0 if xtk = ∆tpq.
where, k ∈ {1, . . . , |X t|}. Using these indicator variables, we obtain a new
kernel function from the primal to dual transformation, which is given as:
Kst(xtk, x
sl ) =
(utku
sl
µ+ δst
)xtk · xsl (4.11)
This multitask kernel [30] does not allow the ranking function Πt to directly
impact w0, restricting the auxiliary label information from being shared among
tasks. In this formulation, the optimal solution for vt does not change and
remains identical to Equation 4.8. The difference lies in the optimal solution
for the shared parameter vector w0, which in this case is given as:
w∗0 =1
µ
T∑t=1
m∑i=1
αtiytixti (4.12)
As expected, it can be seen that w0 is no longer affected by the rank constraints.
In both the globally-consistent and task-specific cases, the final optimiza-
tion takes the form of a standard quadratic program (QP) with box constraints
[19] which can be easily solved using any off-the-shelf QP-solver.1
1In our implementation we used the QP solver included in the optimization toolbox forMatlab (http://www.mathworks.com).
60
4.3 Detecting Cortical Malformations
Instead of classifying individual vertices we classify image patches taken from
the flattened reconstructed surfaces of patients and controls as “lesional” or
“normal”, using a classifier formed from training data that comprises of healthy
controls and patients.
4.3.1 Data Description
The dataset consists of 16 MRI-negative and 2 MRI-positive FCD patients col-
lected over a three year period from the level-4 NYU comprehensive epilepsy
treatment center. Out of the sixteen MRI-negative patients, thirteen overlap
with the MRI-negatives used in the previous Chapter, and three new patients
were added. One of the patients, namely NY68 who was initially classified as
MRI-negative was declared MRI-positive on later examination, we therefore
treat him as an MRI-positive in our current evaluation. All the patients un-
derwent successful resective surgery and histopathological examination of the
ressected tissue showed evidence of FCD. This may seem a small set but note,
that only a few MRI-negative patients proceed to surgery, and out of those
only a third have successful outcomes [57]. The controls were matched from a
cohort comprising of 115 neurotypical controls. All patients and controls were
scanned on the same scanner, with the same specialized T1 MRI sequence.
Our training and test data comprises of image patches taken from the
resected regions of the patients, and corresponding regions from matched con-
trols. We focus only on the resected regions because all the patients included
in our experiments were completely (11 Engel Class 1) or partially seizure-
free (4 Engel Class 2) after surgery, which shows that their resected regions
contained the primary FCD lesion(s). Furthermore, for new MRI-negative
61
patients, seizure-semiology (i.e., signs and symptoms of a seizure) provides
credible but crude estimation about the location of the epileptogenic zone
[81, 70, 101]. In these cases our methods can be applied to the suspected
cortical region to detect possible abnormal regions.
We learn all the patient specific classifiers simultaneously using our pro-
posed MTL approaches. In the original regularized MTL framework (c.f.
Section-4.2.1), the learned model is tested on previously left-out data from
the same tasks that generated the training data. However, in our case the
test data comes from new patients that were not part of the training data. To
generalize the model, such that it can classify data from out-of-sample tasks,
all the task-specific parameter vectors vt are discarded and only the mean
component w0 is retained for detecting FCD lesions for new patients [29].
4.3.2 Segmentation
All the patient and control surfaces were registered to an average surface, such
that there was a one-to-one correspondence between them. After registration
the resected region(s) for each patients and the corresponding region(s) from
his/her matched control was isolated, and flattened to obtain a standard 2-d
image. We use cortical thickness to represent the intensity values, as cortical
thickness has been established as one of the most informative features for char-
acterizing FCD lesions [14, 96, 43]. It should be noted that even though we
only use cortical thickness to obtain the super-pixels, a larger set of morpholog-
ical features (including cortical thickness) is used to describe each super-pixel
during the learning phase.
We use Quickshift [106] for unsupervised segmentation. The standard quick
shift algorithm is a fast mode seeking algorithm similar to mean shift [23]. It
performs a hierarchical segmentation of the image, where the sub-trees repre-
62
sent image segments. One of the main advantages of using quick shift is that
the number and size of segments need not be specified. Additionally, quick
shift does not penalize for boundary regions, and produces a diverse set of
segments having different shapes and sizes.
Quickshift requires setting two parameters, namely the size of the Gaussian
kernel (σQS) used by a Parzen window density estimator, and the maximum
distance (δQS) between two pixels permitted while remaining part of the same
segment. The scale parameter σQS is varied to change the average size of
segments, and δQS is set to be a multiple of σQS [105]. Thus, higher values of
σQS produce larger segments. All the patients and controls were segmented
using the same set of Quickshift parameters (σQS = 8, δQS = 32) that were
optimized using a parameter estimation set of three patients, that were kept
distinct from the fifteen remaining patients whose results are reported here.
After segmentation, each super-pixel is treated as an independent instance.
We used the mean and standard deviation of cortical thickness, gray-white
contrast (GWC), curvature, sulcal depth, Jacobian distortion and local gyri-
fication index (LGI) to represent each super-pixel [96]. Additionally, we also
included the average surface area measured on both the pial and white matter
surfaces.
4.3.3 Creating Electrode Maps
In order to include the additional label information from the iEEG exam we
need to map the implanted electrodes to the cortical surface for each patient.
However, the spatial resolution of the iEEG and MRI are not identical: the
MRI voxels (and their corresponding surface vertices) are smaller than the
point spread function that defines the iEEG generators. More importantly,
localizing the source for a given electrode is subject to the ill-posed inverse
63
Figure 4.1: Mapping iEEG electrodes on the cortical surface. The red spheresrepresent grid electrodes, and the blue spheres represent strip electrodes. Dur-ing iEEG monitoring all the electrodes are monitored for abnormal electricalactivity arising from clinical and sub-clinical seizures. Each electrode is thenlabeled as being part of the seizure-onset zone or not by expert epileptologists.
problem [112]. These are well-known limitations of iEEG that preclude accu-
rate and unambiguous assignment of voxels/vertices to an electrode. Epilep-
tologists and neurosurgeons face the same problem when deciding which parts
of the brain to resect based solely on iEEG results. Solving this problem is
beyond the scope of this work. However, a sphere with half the diameter of
the inter-electrode distance (approximately 10 mm), presents a reasonable cri-
terion for matching iEEG and MRI. Therefore, all surface vertices within a
radius of five millimeters (Euclidean distance) from the electrode’s location
were considered within range of the electrode. We only selected the electrodes
that overlapped with the resected region, and were labeled by the experts
as either being part of the seizure-onset zone or recorded abnormal electrical
activity during seizure onset. Super-pixels containing the selected electrodes
were given higher label scores as compared to other super-pixels in the resec-
64
tion zone. Figure 4.1 shows an example where the iEEG electrodes are mapped
onto the cortical surface.
4.4 Results
The lesions of all the 18 patients used in our experimental evaluation were
located in the temporal region, which is one of the most prevalent localization
of FCD in adults [52]. We have chosen to work with this limited dataset,
in order to reduce computational complexity resulting from the number of
pseudo-instances created for incorporating ranking constraints (from iEEG) in
the proposed model (c.f. Section 4.2.2). All patients had undergone successful
resective surgery and were histopathologically verified to have FCD. There are
two hypotheses that we need to establish based on our experiments: i) to show
that the regularized MTL framework is more effective than traditional learning
methods where the data from all patients is treated the same, and ii) including
auxiliary supervision in the MTL framework boosts the lesion detection rate.
We explain the selection of the baseline methods, the experimental setup, and
then provide the details of how the hyper-parameters were set for the proposed
methods and the baselines.
4.4.1 Baseline Selection
To show that the regularized MTL framework [31] can be used effectively to
detect FCD lesions, we adapt the LDA based classification scheme of Hong
et al., [43] as one of the baselines. To this end we use LDA to classify super-
pixels using the same set of features as used by our proposed MTL methods.
Instead of using a two-stage classifier where the first classifier is used to de-
tect the lesional vertices and the second classifier post-processes the detection
65
results to reduce the number of false detections, we train a single LDA based
classifier that classifies super-pixels as being lesional or not. We also use our
vertex based approach (ML) from the previous chapter as one of the baselines,
without any post-processing to facilitate a fair comparison.
Furthermore, we compare the performance of our proposed methods with
the single task support vector machine with rank constraints (SVMR) based
on the formulation in Ngyuen et al. [69]. As both SVMR and our proposed
methods use iEEG based auxiliary information, this comparison is aimed at es-
tablishing the efficacy of using an MTL formulation, for the task of FCD lesion
detection. Similarly, to highlight the benefit of incorporating auxiliary label
information we contrast the performance of our proposed task-specific (TS)
and globally-consistent (GC) approaches with the regularized MTL framework
[31] which does not incorporate any rank constraints.
4.4.2 Experimental Setup:
A leave-one-out cross-validation strategy was used, in which we left out one
patient’s data and trained on the remaining patients. Hyperparameters for
the proposed methods and baselines were set using the data of three MRI-
negative (NY343, NY394, NY299) patients and their matched controls whose
iEEG data was not available. We will refer to this set of three tasks (i.e., three
patients and their matched controls) as the model parameter set (MPS). The
data for these three patients are distinct from the fifteen patients and controls
used in our experiments.
Setting the Hyper-Parameters
It should be noted that the hyper-parameters were set individually for each test
subject using the MPS. Below we provide the details about how the parameters
66
were set for the different baselines and our proposed methods:
� LDA: The detection threshold (τ ∈ [0, 1]) for LDA was optimized by
maximizing the area under the curve (AUC) over the MPS, during each
round of leave-one-task-out cross validation.
� ML: The detection threshold (τ ∈ [0, 1]) for logistic regression was opti-
mized by maximizing the area under the curve (AUC) over the MPS.
� SVMR: We also used a single-task SVM using the RBF kernel that
incorporates ranking constraints based on the model in [69] as a baseline.
In addition to the cost parameter (identical to the cost parameter of
tradtional SVM) and the scale parameter of the RBF kernel, there is a
third parameter a that defines the relative cost of violating a ranking
constraint (C ′ = aC). All three parameters were set by optimizing the
AUC using the MPS.
� MTL: This corresponds to the regularized MTL framework that does
not incorporate any auxiliary supervision, and uses the resection zone
as class labels. The model parameters were set using a grid-search [31]
and include mis-classification cost (C), task-relatedness parameter (µ)
and the scale (γ) of the RBF kernel. To find suitable values for the
parameters we used a three-level grid and optimized the area under the
curve (AUC) over the MPS.
� GC & TS: These are the proposed methods that incorporate auxiliary
supervision derived from iEEG data. In addition to the three model pa-
rameters for MTL, there is a fourth parameter a that defines the relative
cost C ′ = aC of violating a rank constraint (c.f., Equations 4.5 and 4.9).
To find suitable values for the parameters we designed a four-level grid
and optimized the AUC over the MPS.
67
Parameter Range
µ 10−7, 5−6, 10−6, 5−5, . . . , 103
C 2−10, 2−9, . . . , 210
γ 2−10, 2−9, . . . , 210
a 10−6, 5−5, 10−5, . . . , 103
Table 4.1: Range of values for the model hyper-parameters used in the gridsearch. The grid search optimized the area under the curve (AUC) over themodel parameter set (MPS) consisting of three patients whose data is distinctfrom the fifteen patients used for performance analysis.
Table 4.1 lists the ranges for the parameters used in the grid search for SVMR,
MTL, GC and TS.
4.4.3 Performance Analysis:
We have developed the proposed methods keeping in view their final use as
focus-of-attention tools for neuroradiologists to help them detect visually elu-
sive FCD lesions. Therefore, the detection rate (sensitivity) (i.e., the number
of patients whose lesions are correctly detected) constitute the main result.
For a more detailed performance analysis we calculate the recall and the false
positive rate (FPR). Recall corresponds to the percentage of the vertices on the
patient’s resection zone correctly identified as lesional, while FPR corresponds
to the percentage of vertices incorrectly labeled as lesional on the matched con-
trol’s selected region. Note that the estimates of recall should be considered
as lower bounds because they are calculated using the noisy resection zones as
ground truth.
Table-4.2 compares the detection rate (number of patients with a recall
higher than zero), recall and false positive rate of the proposed task-specific
(TS) and the globally-consistent (GC) approaches with the selected baselines.
Among the baselines, MTL outperforms both LDA and ML by correctly
68
detecting the lesions in twelve patients, whereas LDA and ML detect the lesion
in eleven and nine patients, respectively. Not only does MTL achieve a higher
detection rate, it also has a higher average recall than both LDA and ML. On
the other hand, TS detects the lesion in all fifteen patients and achieves higher
average recall as compared to LDA, ML and MTL. Both TS and SVMR have
the same detection rate, but SVMR clearly outperforms TS as far as average
recall is concerned. The high detection rates of TS and SVMR show that
using iEEG based auxiliary supervision enhances the sensitivity of the lesion
detection scheme, but at the cost of an increased false positive rate. As far
as the FPR is concerned both SVMR and TS have higher FPRs than any of
the other methods. The average FPR of TS is significantly lower than that of
SVMR which is unacceptably high (37%). The lower FPR of TS shows that an
MTL based formulation coupled with auxiliary supervision leads to superior
detection rate and lower FPR.
Turning now to GC, we see that it has the worst performance in terms of
detection rate among all the methods, and correctly detects lesions in only four
of the fifteen patients. The low detection rate of GC when compared to TS,
substantiates our assumption that the information obtained from the iEEG
analysis has task-specific semantics and the criteria used for determining the
seizure-onset zone differ on a patient by patient basis. When this informa-
tion was shared freely among tasks, the label noise was further compounded
resulting in an overall low detection rate for the GC method.
4.5 Conclusion
In this work we addressed the problem of MTL in the presence of uncertain
labels, assuming an additional source of supervision, that we modeled as a
69
Recall False Postive Rate
Id. LDA ML SVMR MTL GC TS LDA ML SVMR MTL GC TS
NY67 0.07 0.04 0.48 - - 0.09 - 0.05 0.22 - - 0.21NY68∗ 0.11 0.19 0.39 0.28 0.09 0.25 0.18 0.14 0.56 0.32 0.15 0.29NY148 0.17 - 0.52 0.22 0.02 0.11 0.07 0.03 0.26 0.08 - 0.03NY169 - 0.12 0.30 0.20 - 0.20 0.1 0.10 0.35 0.17 - 0.13NY186 0.15 0.03 0.60 0.03 - 0.03 - 0.02 0.26 0.03 - 0.03NY187∗ 0.12 - 0.62 - - 0.09 - 0.05 0.39 0.04 - 0.09NY212 - - 0.30 - - 0.09 0.14 0.03 0.26 0.02 - -NY226 0.16 0.13 0.67 0.12 0.06 0.12 0.07 0.04 0.30 0.06 0.03 0.03NY255 0.15 0.06 0.49 0.08 0.46 0.22 - - 0.48 - 0.16 0.05NY259 0.04 - 0.42 0.04 - 0.04 0.05 0.01 0.23 0.09 - 0.13NY294 0.08 - 0.37 0.07 - 0.05 - 0.05 0.25 0.11 - 0.15NY297 - 0.05 0.57 0.05 - 0.05 0.07 0.07 0.23 0.05 - 0.05NY312 0.13 0.21 0.95 0.16 - 0.44 0.13 0.11 0.70 0.13 - 0.14NY351 - - 0.28 0.14 - 0.21 0.02 0.16 0.65 0.16 - 0.15NY371 0.09 0.15 0.53 0.04 - 0.1 0.15 0.05 0.39 - - 0.12Mean 0.09 0.07 0.50 0.10 0.04 0.14 0.07 0.06 0.37 0.08 0.02 0.11
Table 4.2: Detailed results for MRI-negative subjects. LDA is the Fisher linear discriminant analysis based method adapted from[43], ML represents the stratfified classification scheme described in Chapter 3, MTL represents regularized MTL [31] withoutauxiliary supervision, GC and TS are the globally-consistent and the task-specific approaches, respectively (‘-’ represents avalue of zero for FPR and no-detection for recall and precision, ‘*’ MRI-positive patients).
pairwise ranking function. To this end, we extended the regularized MTL
framework [31] by incorporating additional rank constraints. We modeled
the case when there is a single ranking function for all tasks, and the case
where each task has its own ranking function. In the latter task-specific case,
we developed a new multitask kernel for ensuring that ranks are not directly
shared among tasks. In all cases, the model parameters were found by solving
a quadratic optimization problem (QP) with box-constraints [19], which is
solvable with any standard QP solver. We demonstrated the efficacy of the
proposed method on the challenging problem of detecting FCD lesions, in TRE
patients.
By incorporating label scores from iEEG analysis, the task-specific ap-
proach and the SVMR baseline correctly detected lesional regions within the
resections of all patients, as compared to other baseline methods which achieved
lower detection rates. However, the task-specific approach achieved a lower
false-positive rate than SVMR. Even though the proposed task-specific ap-
proach is effective in identifying FCD lesions, it has the following limitations:
� For larger sets of patients, the number of pseudo-examples added to the
dataset for incorporating rank constraints can severely limit the training
time of the proposed algorithm. This is because the pseudo-examples
increase quadratically (O(n2)) with the data (n), in the worst case.
� It has been shown that the presence of outlying tasks, can seriously un-
dermine the performance of multitask learning algorithms, and in some
cases even result in negative transfer (i.e., jointly learning tasks results
in degraded performance) [82, 54]. Therefore, using patients or controls
who are outliers in terms of their demographic characteristics, location
of the lesion, and other pathological findings can limit the sensitivity of
the proposed scheme.
71
� In certain cases the lesion may not completely co-register with the seizure
onset zone [81]. In such cases, the cortical region corresponding to elec-
trodes that are active during a seizure may actually correspond to normal
tissue. In such cases the pairwise ranking function would falsely attribute
higher weights to normal instances.
In the next chapter, we cast lesion detection as an outlier detection problem
and develop a semi-supervised method which does not require vertex-level
labels for training. By discarding the need for vertex-level labels, we do away
with the need for iEEG as an auxiliary source of supervision to augment the
weak labels provided by the resected regions.
Identifying the abnormal region in cryptogenic epilepsies is based on a
confluence of evidence from multiple sources such as MRI, PET, iEEG, etc.
Even with the above mentioned limitations, the high sensitivity of the proposed
method can have a positive impact on FCD lesion detection by using carefully
selected patients and controls. Furthermore, the proposed method can be
applied to other domains in which decisions are made based on converging
evidence from disparate sources.
72
Chapter 5
Hierarchical Conditional
Random Fields For Detecting
FCD Lesions
“Sometimes it's not enough toknow what things mean,sometimes you have to knowwhat things don't mean.”
Bob Dylan
Most automated FCD lesion detection methods are vertex based classifiers
[15, 96, 43], similar to the one developed in Chapter 3 and 4. These studies
classify individual vertices of the cortical surface as lesional or normal, using
labeled training data from MRI-positive patients and controls. There are four
crucial issues that these methods and their evalutation studies fail to address:
(1) The goal of resective surgery is to remove the entire lesion. If any part
of the lesion is left behind, the outcome will not be successful. This introduces
label noise, because the expert-marked lesion can contain normal vertices; the
margin around the lesion is marked in a “generous” manner to increase the
chances of capturing the entire lesion. In Chapter 3 we used a stratified logistic
73
regression classifier to detect lesions in MRI-negative patients. By manually
reducing the resection masks for MRI-negative patients to correct for label
noise we were able to achieve a detection rate of 58%, as opposed to 12%
when the original resection masks were used as the ground truth. However, the
manual mask reduction procedure is ad hoc in nature and introduces human
subjectivity via the mask reduction thresholds.
(2) Individual vertices are assumed to be independent and identically dis-
tributed (i.i.d.). This is a strong assumption as it completely ignores the spa-
tial correlation that exists between neighboring vertices. It has been shown in
other domains such as object detection and segmentation in natural images,
that modeling spatial correlations leads to superior performance [78, 75].
(3) Vertex-based classification methods typically employ a post-processing
method to reduce the false positive rate. In this strategy a portion of the
vertices labeled lesional by the classifier are relabeled as normal. This can
be done by training a second-level classifier to classify the detected clusters
as lesional or non-lesional [14, 43]. Similarly, different heuristics can also be
used such as the surface area of the detected clusters [96]. Discarding any
detected region based on its size or surface area can result in discarding the
actual lesion or part of the lesion, because FCD lesions can be located in any
part of the cortex, vary in size, and occur in multiple lobes [18].
(4) Results are evaluated either on MRI-positive patients [14, 96] or pa-
tients who were initially deemed MRI-negative during their preliminary radi-
ological screening, but later their lesions were found to be visible on MRI [43].
However, the real challenge is to find lesions in MRI-negative patients. A vi-
sually detected lesion during pre-surgical evaluation can substantially increase
the chances of a successful surgical outcome [57], and inform iEEG electrode
placement which can result in minimal sampling errors [43] and an accurate
74
delineation of the resection target.
In this chapter we develop a lesion detection method that is designed to
explicitly address these issues. First, we model lesion detection as an out-
lier detection problem. The assumption is that a lesional cortical region is
an outlier in a suitable feature space when compared to the same cortical re-
gion across a control (normal) population. This view eliminates the use of
noisy class labels, and consequently bypasses the need for any manual mask
reduction procedure.
Second, instead of classifying individual vertices we classify segmented
patches of the cortex. The patches are obtained using unsupervised segmen-
tation of the flattened cortex that isolates regions with homogeneous feature
values. As the size of the FCD lesions varies widely, using a single scale to
isolate the lesion may not be effective. To minimize the chances of missing
the lesion, we employ a multiscale strategy in which the segmentation is car-
ried out at different scales of varying granularity. The interplay between the
patches obtained in this scale hierarchy is modeled as a tree structured con-
ditional random field (CRF) [91], rooted at the most crude scale and having
leaves at the finest scale. These random fields are also known as hierarchical
conditional random fields (HCRF) [78], because they model the dependencies
between patch labels within the scale hierarchy. HCRFs are able to fully ex-
ploit the spatial dependencies in the data by classifying image patches rather
than vertices, and furthermore larger spatial interactions are explicitly cap-
tured by the HCRF, as detailed in Section 5.2.2.
Third, we define a ranking criterion which takes into account both the size
and probability of a cortical region (cluster) that is labeled as being lesional.
Ranking eliminates the need to post-process the results, and provides a nat-
ural way of presenting the results to a radiologist to function as a “focus of
75
attention” mechanism.
Finally, we evaluate our approach on MRI-negative patients whose resec-
tions contained the primary FCD lesion, confirmed by a histological exam on
the resected tissue. MRI-negative patients account for approximately 45% of
histologically confirmed FCD lesions that go undetected during visual inspec-
tion [110]. The chances of a successful surgical outcome in the presence of
a visually detected lesion are 66% as compared to only 29% when the lesion
is not detected [57, 99]. Therefore, patients who lack an MRI-visible lesion
are less likely to be referred to specialized epilepsy center by neurologists [38]
and many epilepsy specialists are reluctant to operate without a well-defined
lesion. For these reasons, resective surgery remains underutilized, despite a
growing number of studies demonstrating that surgery is effective for patients
with focal TRE [9]. Development of computational methods of FCD lesion de-
tection that are able to achieve high sensitivity in MRI-negative cases, could
have a high impact on the number of patients who undergo resective surgery
and achieve better quality of life.
5.1 Hierarchical Conditional Random Fields
Hierarchical Conditional Random Fields (HCRFs) provide a suitable frame-
work for supervised image segmentation [78], object detection and semantic im-
age labeling [75]. In the original HCRF framework proposed for figure-ground
segmentation [78], an image is first segmented into a number of patches at dif-
ferent scales. Each patch is then classified as being part of the background or
foreground, using a suitable binary classifier based on image features such as
texture, SIFT, etc. Exploiting the fact that the labels assigned to overlapping
patches between different scales should agree, an HCRF (a tree-structured con-
76
ditional random field) is constructed to model these inter-scale interactions.
The image is thus modeled as a forest, where the root node for each tree cor-
responds to a patch obtained at the coarsest scale, whereas the leaves reside
at the finest scale. The joint probability of all patch labels is estimated by
running inference on the HCRFs. The image is segmented by thresholding the
final probabilities at the leaves. Plath et al. [75], extend this framework to
work with more than two classes. Mutli-class image labeling using HCRFs is
also done in the work by Awasthi et al. [5], where instead of obtaining image
patches using segmentation, the authors impose a grid structure on the image
at different scales and model the HCRF as a quad-tree structure. These mul-
tiscale methods are highly sensitive to the accuracy of pixel-level labels. For
example in Murphy et al. [78], the bounding boxes around the region of in-
terest (ROI) in training images were manually refined to eliminate extraneous
pixels and this resulted in a significant increase in accuracy.
5.2 HCRFs for Lesion Detection
For FCD lesion detection, we have training data from MRI-negative patients
who have undergone surgical resection and are seizure-free. The resected corit-
cal region, can be used to obtain vertex-level labels which can then be used
to train a classifier. However, as explained previously these labels tend to be
highly noisy and using them to train a classifier will result in noisy predictions
[1]. To ameliorate this problem we extend the HCRF framework proposed in
[78] to perform outlier detection on registered image data. In contrast to the
approaches mentioned previously, we cannot utilize vertex-level labels. Our
proposed method works in a semi-supervised manner, where only global labels
are available (i.e., whether the cortical surface belong to a healthy control or
77
a patient). Thus, we define an FCD lesion as a region of the brain which is
considered an outlier when compared to the same region across a population
of normal controls.
The construction of the HCRF for FCD lesion detection involves the fol-
lowing steps [2]:
1. Segment the cortex at multiple scales, to obtain image patches of varying
sizes.
2. Assign an outlier score to each image patch by comparing it to the same
cortical region across the control population. This one-to-one comparison
is made possible by registering each of the controls’ and patients’ cortical
surface to the same average surface.
3. Construct multiple HCRFs, one for each image patch obtained at the
coarsest scale.
4. Run inference on the HCRFs to calculate the posterior probability at
each node. The final lesion is detected by thresholding the posterior at
the leaves.
We start by describing our approach to segmentation.
5.2.1 Segmentation
The functional organization of the cortex is two-dimensional, e.g., the func-
tional mapping of the primary visual areas [103]. Therefore, as an initial
simplification we have chosen to work with the flattened cortex because it
will simplify the segmentation procedure and allow us to use already well-
established image segmentation techniques. Using SBM the cortex is modeled
as a two-dimensional surface, which on average contains approximately 0.15
78
million vertices. Even though it is possible to flatten the entire cortex, it’s
segmentation and subsqeuent inference on the resulting HCRFs would require
significant computational resources. Thus, to reduce the processing overload
we have chosen to subdivide the lesion detection task into smaller regions of the
cortical surface as defined by a standard neuro-anatomical atlas, which out-
lines cortical regions based on their morpho-functional properties [34]. These
regions are also known as parcellations. One such atlas is shown in Figure
5.1. Instead of segmenting the entire cortical surface at once, we isolate these
parcellations one at a time and flatten them individually to obtain a stan-
dard two-dimensional image, which we then segment at multiple scales. Any
morphological feature (e.g., cortical thickness, curvature, etc.), can be used to
represent the intensity values in the resulting image. Figure 5.1 illustrates the
overall HCRF construction process for a parcellation.
We use quick shift [106] for unsupervised segmentation. One of the main
advantages of using quick shift is that the number and size of segments need
not be specified. Additionally, quick shift does not penalize for boundary
regions, and produces a diverse set of segments having different shapes and
sizes. It should be noted that any segmentation method can be used, as long
as it has the ability to segment the image at different scales.
The standard quick shift algorithm is a fast mode seeking algorithm similar
to mean shift [23]. It performs a hierarchical segmentation of the image, where
the sub-trees represent image segments. It has two parameters namely the size
of the Gaussian kernel (σ) used by a Parzen window density estimator, and the
maximum distance (∆) between two pixels permitted while remaining part of
the same segment. We vary the scale parameter σ to change the average size
of segments, and set ∆ to be a multiple of σ [105]. Thus, higher values of σ
produce larger segments. By using different combinations of these parameters,
79
Figure 5.1: Constructing an HCRF using a standard neuro-anatomical atlas(left), and a parcellation image (top-right). Any morphological feature can beused to represent the image (this image was created using cortical thickness).At the bottom we have image patches obtained at two different scales usingQuickshift. Each image patch on the coarser scale (bottom-left) becomes aroot having children at the adjacent finer scale (bottom-right).
we construct the scale-hierarchy that is the basic building block of the HCRF,
as explained next.
5.2.2 HCRF Construction
Once the multiscale segmentation is complete for a particular subject, we
obtain a set of patches at different scales. Let Ikp be the pth patch obtained at
the kth scale. We can collect the corresponding patches from all controls and
then estimate a label y ∈ {0, 1} for Ikp , where y = 1 indicates that Ikp is an
outlier. This label cannot be considered independent from the labels of other
patches that overlap with Ikp at other scales.
We model the joint prediction of these mutually dependent labels of all the
patches using a tree structured HCRF. Let Ik+1p be an image patch at level
k + 1, it has a parent Ikq at the immediately coarser level k, such that Ikq has
maximal overlap with Ik+1p [78]. We find the index q as follows:
q := arg maxq
|Ik+1p ∩ Ikq ||Ikq |
(5.1)
80
Each patch at the coarsest scale is the root of a tree having leaves at the finest
scale. Therefore, the parcellation image is represented by a forest, where each
tree is modeled as an HCRF, as shown in Figure 5.1.
CRFs model the joint conditional probability distribution of all the patch
labels y = (y1, . . . , yn) in the tree based on the values of the input morpholog-
ical feature (x). Generally, this can be written as:
p(y|x, θ) =1
Z(x, θ)
∏i
φ(yi|x, θ)∏i,π(i)
ψ(yi, yπ(i)) (5.2)
where, π(.) represents the parent patch, and Z(x, θ) is the normalization con-
stant also called the partition function. φ(.) is called the node potential and
represents the local evidence for the label yi based on the observed data x.
The edge potentials that model the coupling between adjacent labels are rep-
resented by ψ(.). Because the graph is a tree we can efficiently calculate Z(x, θ)
and the posterior probabilities of the patch labels at all scales using standard
belief propagation [73].
Traditionally, for conditional random fields the node and edge potentials
are jointly learned from labeled training data (see [91] for details). For our
application, because the labels are noisy and we have chosen to work in an
unsupervised manner, we set the node and edge potentials separately, which
we describe next. Similar strategies for parameter estimation in HCRFs have
been used for figure-ground segmentation [78] and for object detection [75] in
natural images.
Node Potentials
The node potential is modeled to reflect our belief about the abnormality of
an individual image patch. Most of the available outlier detection mechanisms
81
produce outlier scores that are poorly calibrated i.e., the range of the outlier
score is dependant on the dataset [87]. This makes it difficult to compare the
outlier scores among datasets produced by the same method. Popular outlier
detection methods such as local outlier factor (LOF) [20] and local correlation
integral (LOCI) [72] suffer from the same problem. In our case we would
like to work with an outlier detection method that produces standardized
scores carrying the same semantics at each scale and thus can be compared
between different scales. This is an important design choice because running
inference on non-standardized scores, not comparable among different scales,
will produce meaningless results. To overcome this, we have chosen to work
with local outlier probabilities (LoOP) [51], a standardized version of LOF
that produces standardized scores within the range [0, 1]; these scores can be
interpreted as the probability that a data point is an outlier.
LoOP assumes that each data instance x has a context set S ⊆ D, and the
set of distances between x and s ∈ S has a Gaussian distribution [51]. The
standard deviation of these distances σ(x, S) combined with a significance
factor λ produces the probabilistic set distance of x to S [51] defined as:
pdist(λ, x, S) := λ · σ(x, S) (5.3)
where S is determined using a k-nearest neighbor query. The parameter λ
defines the sensitivity of the final probability estimates. It denotes that any
instance that deviates more than λ times the standard deviation would be
considered an outlier. Its values are analogous to the empirical confidence
levels defined for the standard normal distribution [51]. The probabilistic
82
local outlier factor for x can then be calculated in a manner similar to LOF:
PLOFλ,S(x) :=pdist(λ, x, S)
Es∈S[pdist(λ, s, S(s))]− 1 (5.4)
PLOF values of greater than zero indicate that the given instance may be
an outlier. In order to convert a PLOF value into a probability estimate, we
assume that they are distributed around 0 with a standard deviation calculated
as√E[(PLOF )2]. The final probability can then be calculated as:
LoOPS(x) := max
{0, erf(
PLOFλ,S(x)
λ√
2E[(PLOF )2])
}(5.5)
where, erf(.) is the Gauss error function [3].
Edge Potentials
Each edge in the HCRF represents the dependency between the “parent” image
patch at scale t and the “child” patch at scale t+ 1. We set the edge potential
to reflect the visual similarity between the two patches, using the chi-squared
distance between the histograms of scale invariant feature transform (SIFT)
features [62] of the parent and child patches. Thus, the labels of image patches
that bear close visual similarity to each other in the scale hierarchy are more
strongly coupled than those with lower similarity. This heuristic is similar to
one chosen by [78].
To estimate the histograms of the SIFT features for each image, we initially
learn a codebook of m codewords using the control data. For each control
image in the subset we flatten and isolate the parcellation, and then calculate
a SIFT feature vector at each pixel. These vectors are then clustered into m
clusters using k-means clustering. Each feature has its own range of values
and defines separate morphological properties of the cortex (see Section 2.2.3),
83
we learn a separate codebook for each parcellation/feature combination. The
edge potential between two adjacent nodes in the tree is then calculated as
[78, 75]:
ψ(yi, yj) =
eγ.ηij e−γ.ηij
e−γ.ηij eγ.ηij
(5.6)
where, γ is a free parameter that represents the strength of coupling between
adjacent levels in the CRF and ηij = e−χ2(xi,xj). xl represents the normalized
histogram of SIFT features for the lth patch in the HCRF, and χ2(., .) is the
chi-squared distance between two normalized histograms each having n bins
and defined as:
χ2(P,Q) =1
2·
n∑i=1
(Pi −Qi)2
(Pi +Qi)(5.7)
where, P and Q are normalized histograms.
5.2.3 Lesion Detection
For each subject, we calculate the posterior probabilities at each node of the
HCRF for every parcellation by running belief propagation [73]. The final
detection is obtained by thresholding the posterior beliefs at the leaves of
each HCRF [78, 75]. Different strategies for thresholding can be used, such
as defining a single threshold across all subjects, or calculating a threshold
for each subject individually. In this work we calculate an adaptive threshold
for each patient separately. This decision is based on the observations that
1) FCD lesions can manifest differently for different individuals, and 2) the
morphological features vary with different demographic factors such as gender
and age. For example cortical thickness is correlated with the age of the patient
[84]. To this end, we sort the posterior probabilities and define the threshold
as the lowest probability among the top K probability estimates. In practice
84
the value of K can be left as a free parameter which the user can vary to see
the different regions deemed lesional with varying levels of confidence. Thus,
the radiologist has a knob to turn which shows more/fewer possible candidate
lesions. This is a desirable feature, because the detection scheme presented
here is designed to be a part of the comprehensive pre-surgical evaluation
protocol that includes MRI, Positron Emission Tomography (PET), scalp EEG
and iEEG. The final resection target is determined by combining evidence
from all evaluations. Therefore, the ability to generate multiple cortical maps
delineating possible lesions at different confidence levels provides a richer set of
evidence which in turn increases the probability of capturing the actual lesion.
5.3 Empirical Evaluation
Our data consists of MRI-negative patients who have undergone resective
surgery and for whom their resected tissue was histologically verified to contain
abnormal tissue. Each patient who undergoes surgery is assigned an “Engel”
class. An Engel class of 1 represents complete seizure freedom whereas an
Engel class of 4 represents no improvement. We selected only patients with an
Engel class outcome of 1 for our experiments in order to verify that the region
resected was indeed the primary lesion and that no additional epileptogenic
lesions were present in other parts of the brain. This resulted in a dataset with
twenty MRI-negative patients (refer to Appendix A for patient related infor-
mation). This may appear to be a small dataset, but few patients proceed to
surgery when no visible lesion is found on their MRI, and of those that do, less
than a third experience complete seizure freedom [57]. These twenty patients
include all MRI-negative patients who underwent surgery at New York Univer-
sity comprehensive epilepsy treatment center during the past three years, and
85
were classified post-surgically as Engel class 1. Developing automated lesion
mechanisms for MRI-negative patients is an active area of research and our
sample size is consistent with the existing work in the domain ([14, 96, 43]).
However, in contrast to our evaluation, other studies evaluate their proposed
detection schemes on MRI-positive patients (i.e., patients whose lesion was vis-
ible on the MRI during the initial evaluation or was found visually at a later
stage). It is important to note that our sample consists of “pure” MRI-negative
patients, and therefore our results target that patient population where there
is an actual need of an automated lesion detection scheme and where such a
scheme can have a positive impact impact on the outcome of resective surgery.
5.3.1 Data Pre-processing and Parameter Selection
After the surface has been reconstructed using the freesurfer software1 we used
the Desikan-Killiany atlas [25] to isolate the different parcellations. Note that
any suitable neuro-anatomical atlas can be used to subdivide the cortical sur-
face. Each parcellation is flattened to obtain a standard 2-d image, where the
intensity of each pixel can be represented by any one of the four morphological
features.
The values of the different parameters such as the segmentation scales,
number of nearest neighbors in calculating the outlier probabilities, etc., de-
pend on various factors, such as the size of the control population, the distri-
bution of ages across the control cohort and the gender of the subject. We
therefore present these parameters as actual free parameters that can be varied
over a preset range of values to get different detection results. Whether an im-
age patch is an outlier depends on the set of controls used to learn the “normal”
model. Most morphological features vary with different demographic factors
1Available at http://surfer.nmr.mgh.harvard.edu/
86
such as age, gender, education, etc. Ideally, we could choose a customized set
of controls for each patient, but currently we do not have enough controls to
customize for age and other factors, but we do select controls based on the
patient’s gender.
To select the parameters for the various aspects of our method, we used a
validation set consisting of two MRI-positive and two MRI-negative patients,
which are distinct from the patients used to evaluate our method. We used all
115 controls to learn a separate codebook of SIFT features for every parcella-
tion/feature combination. Dense SIFT features were calculated at each pixel.
We tested vocabulary sizes of 50, 100 and 500 and selected a vocabulary size
of 50 as it resulted in higher recall and precision on the validation set. This
codebook was used subsequently to estimate the histograms of SIFT features
at each pixel location for all patient parcellation images in the test set.
Each parcellation image was segmented at three different scales using quick
shift. We used σ = {2, 3, 4} and ∆ was set to 5σ. These values were chosen
such that the smallest possible lesion in our validation set is over-segmented
(i.e., there are multiple segments that contain the lesional area). This increases
the probability that a patch can be entirely formed from lesion vertices, rather
than having patches that partially overlap with the lesion, which would be
harder to detect as outliers. Based on these settings, the validation set resulted
an average of 4255 ± 107 HCRF models per patient, with 19292 ± 373 leaves
at the finest scale using cortical thickness. Although the size of the validation
seems small as far as the number of patients are concerned, we conjecture that
the resulting number of HCRF models and number of instances are adequate
for setting the model parameters.
Finally, before performing outlier detection, we apply a standard dimension
reduction technique on each patch using principal component analysis (PCA)
87
[102]. Note that the PCA is done using only the control data. We retained
the top m principal components that accounted for 95% of the variance in
data. Based on results for the validation set (carried out independently for
each feature), the parameters for outlier detection were set to k = 10 in LoOP
and γ (c.f. equation (5.6)) was set to 50.
5.3.2 Evaluation Methodology
The final detection for each subject is determined by thresholding the pos-
terior probabilities at the leaves of the CRF, which represent the segments
obtained at the finest scale. We determine the detection thresholds by divid-
ing the last percentile of the final outlier probabilities into ten equal parts.
The first threshold corresponds to the lowest probability in the highest 0.1%
scores and so on. For the results presented in this section we determine five
such thresholds to get five different possible detections. Because, this is an
adaptive mechanism, it has a possible drawback that it always detects some-
thing even when the probabilities are very small. Thus we set 1× 10−4 as the
minimum probability, such that no threshold is calculated below this value.
This limiting value was selected based on the observation that any threshold
calculated below this value resulted in more than 80% of the cortex being la-
beled as lesional for the patients in the validation set. This lower bound on
the threshold is a free parameter of the model and can be adjusted according
to the needs of the user.
5.3.3 Cluster Ranking
We have chosen to evaluate and contrast the performance of the detection
techniques in an information retrieval framework. We first calculate the clus-
ters by thresholding the posterior probability at a given threshold. All the
88
detected clusters are then ranked based on the following score function:
score(c ;α) = αs(c) + (1− α)o(c) ; 0 ≤ α ≤ 1 (5.8)
where c is a cluster detected at a pre-defined threshold, s(.) ∈ [0, 1] is the
relative surface area of the cluster calculated as the ratio between the surface
area of c and the total surface area labeled as lesional. o(.) ∈ [0, 1] is a scoring
function that represents the degree of “outlier-ness” of the cluster. For the
HCRF, we model o(.) as the average of the outlier probabilities calculated at
each vertex that is part of the cluster. α is a tradeoff parameter such that
α = 1 defines a ranking that is based solely on cluster-size. Setting α to 0
results in a cluster ranking based only on their probability of being lesional.
Intermediate values of α define a ranking in which a smaller cluster detected at
a stringent threshold is ranked higher than a larger cluster detected at a more
lenient threshold and vice versa. In the ideal case clusters having a higher
rank should be within the lesion/resection zone of the patient.
Baseline Methods
We compare the results of our proposed technique against the univariate z-
score based technique reported in [96], and to our vertex based approach de-
veloped in Chapter 3. Both techniques require registration of the control and
patient surfaces to an average surface.
The z-score based baseline, calculates the z-scores at each vertex for the
patients, which are then thresholded to obtain the detection results. We cal-
culate the z-score based on gender matched controls instead of using all the
controls. We have chosen this technique as the baseline method because, i) it is
a semi-supervised approach and does not require accurate vertex-level labels,
89
and ii) it has been part of the pre-surgical evaluation at the NYU comprehen-
sive epilepsy treatment center where the patients included in our evaluation
were treated.
We also use the vertex-based classification scheme developed in Chapter
3, to compare the HCRF results when the detection results are combined
across the four morphological features. Recall, that the vertex-based scheme
required manual reduction of the resection masks to eliminate label noise. We
perform all the pre-processing steps including mask reduction, and use leave-
one-patient-out cross-validation to obtain the detection for each patient in the
dataset.
We omit the last step of both baseline methods, which post-processes the
detections to eliminate “small” clusters [96], based on the cluster surface area.
To facilitate comparison we calculate multiple thresholds in the exact same
manner as outlined above for HCRF, and rank the clusters at each threshold
based on Equation 5.8. We next describe the measures used to evaluate our
proposed method.
Detection Rate
Detection rate is defined as the number of patients for whom one or more
detected clusters overlap with the resected area. Usually a post-processing step
is applied to the raw detections before estimating the detection rate. Hong
et al. [43], train a classifier to distinguish between clusters detected in the
resection/lesional area and the extra-lesional clusters using the training data,
this classifier is then applied to the clusters detected on the test subject before
estimating the detection rate. Similarly, in Thesen et al. [96], all clusters below
a pre-set size threshold are discarded, and a successful detection results if one
or more of the remaining clusters overlap with the lesional area. Discarding
90
any detected cluster based on its size increases the risk of discarding subtle
lesions. Instead of discarding detected clusters, we use cluster ranking to
estimate the detection rate. To this end, we calculate five thresholds based on
the outlier probabilities for the HRCF method, and similarly for the z-score
method. After ranking the detected clusters based on (Equation 5.8), at each
threshold we consider a subject to be correctly detected if a cluster amongst
the top n (where n is relatively small as compared to the total number of
detected clusters) completely or partially overlaps with the lesion/resection.
This produces more conservative estimates of the detection rate as compared
to approaches that do not use cluster ranking.
Precision and Recall
In order to compare the quality of detections, we calculated the precision and
recall for both HCRF and the z-score based method. To this end, we consider
all detected clusters at each threshold. We define recall as the ratio of the
total surface area of all the clusters that overlap with the resection zone to the
surface area of the resection zone. Similarly, we define precision as the ratio
of the surface areas of clusters overlapping with the resection zone to the sum
of the surface area of all the detected clusters.
Accurately calculating the false postive rate for the proposed detection
scheme is challenging for several reasons. A patient can have abnormalities
outside the lesion/resection zone which may not be epileptogenic. For example,
abnormal cortical thinning remote from the epileptogenic onset region has been
observed in focal epilepsy [63, 60] and attributed to the destructive impact of
chronic seizures on brain structure rather than from malformations during cor-
tical development. This might result in elevated extra-lesional false positives
when detecting structural malformations characterized by abnormal cortical
91
thickness. We have compared our detections on MRI-positive patients with
an expert neuroradiologist (for details see Section 5.5). In 50% of the cases,
the expert identified abnormal regions that coincided with detections outside
the resection that would be classified as false positives using our evaluation
methodology which uses resection zones as the ground truth. This problem
becomes more challenging for MRI-negative patients whose structural abnor-
malities are not visible on their MRI. In order to circumvent the presence of
false negatives in our labeled data that would result in elevated estimates of
the false positive rate, we use precision to evaluate the efficacy of our proposed
scheme. Furthermore, based on the existence of structural abnormalities out-
side the resection zone (false negatives) the precision estimates provided here
should be treated as lower bounds.
5.4 Results
In this section we provide a comprehensive evaluation of the HCRF lesion de-
tection framework for MRI-negative patients. The first set of results deals with
using the four individual features (i.e., cortical thickness, curvature, GWC, and
sulcal depth). In the second phase we test different strategies of combining
the detection results obtained from individual features.
5.4.1 Individual Features
In our experiments we first evaluate the HCRF framework independently for
each of the four morphological features: cortical thickness, gray/white-matter
contrast, curvature and sulcal depth. We contrast the performance of the
HCRF framework with the z-score based univariate method [96]. In the next
set of experiments we analyze different mechanisms of combining the detec-
92
Figure 5.2: Detection results for patient NY67 using cortical thickness shownon an inflated model of the lateral cortical surface. The resected region isdelineated as the white circled region and the detection results are shown asfilled yellow regions. It can be seen that large clusters are detected withinthe resecion zone at the individual scales (i) and (iii) and small clusters aredetected at scale (ii), prior to combining the outlier probabilities using HCRF.However, at the second scale (ii) a large cluster is detected outside the resec-tion. When these findings are combined using the HCRF as shown in (iv) thelargest detected cluser is within the resection zone while the false detectionin (ii) is suppressed. (v) shows the detection made by the the z-score vertexbased approach. The results are shown for the most stringent (first) thresholdwithout any post-processing. (vi) shows the lesion highlighted on a T1 MRIslice.
tions from individual features, and use both baseline methods to contrast the
performance. Recall, that the ranking function (Equation 5.8) has a direct im-
pact on the detection rate and by setting the tradeoff parameter we can assign
more weight to either cluster size or the average cluster outlier probability.
To facilitate comparison between the proposed method and the baselines we
initially set the tradeoff parameter α to 1, in order to rank clusters only on
their surface area.
Figure 5.3(a) shows the comparison of the detection rates for MRI-negative
patients when cortical thickness is used to represent the cortex. HCRF per-
93
forms better than the z-score baseline across all the five thresholds, for the
top five detections. HCRF detects the lesion in 14 (70%) patients, while the
baseline detects only 11 (55%) subjects when considering the top ten largest
clusters. HCRF is also able to achieve higher recall and precision as shown in
Figures 5.3(b)-5.3(c). The difference between the recall values of the proposed
method (1.1140 ± 0.5654) and the baseline (0.8035 ± 0.4745) was significant
at t(9) = 7.9927, p < 0.001. Similarly, the differences in precision for HCRF
(10.4710± 1.0248) and the baseline (9.0608± 0.5577) were found to be signif-
icant at t(9) = 6.1161, p < 0.001 using a paired t-test. Figure-5.2 provides an
example of the detected clusters using HCRF and the baseline for a patient.
Using GWC, HCRF is able to detect abnormal clusters within the resection
zones of ten (50%) patients as opposed to the baseline that detects only nine
(45%), as shown in Figure 5.3(d). Figure 5.3(e) shows the recall for HCRF
method (0.6931±0.3702) that is significantly higher (t(9) = 7.1317, p < 0.001)
than the recall of the baseline method (0.3815±0.2334). Figure 5.3(f) compares
the precision of the HCRF and baseline using GWC. The differences in the
precision values for HCRF (7.5286±0.5769) and the baseline (6.4313±0.2987)
were found to be significant at t(9) = 4.2350, p = 0.0022 using a paired t-test.
Although, using GWC HCRF is able to outperform the baseline, the resulting
detection rate is worse than HCRF with cortical thickness.
Figure 5.3(g) shows the comparison of the detection rates using curvature
to represent the cortex. HCRF dominates the z-score baseline across all the five
thresholds, for both top five and top ten detections. HCRF detects abnormal
clusters within the resection zones of 13 (65%) patients, while the baseline
detects only 9 (45%) subjects when the top ten largest clusters are considered.
Figures 5.3(h)-5.3(i) show that HCRF is able to achieve higher recall and
precision, respectively. The difference between the recall values of the proposed
94
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score (Top−5)
HCRF (Top−10)
Z−Score (Top−10)
HCRF (Top−10)
(a)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(b)
1 2 3 4 5 6 7 8 9 107.5
8
8.5
9
9.5
10
10.5
11
11.5
12
12.5
13
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(c)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score (Top−5)
HCRF (Top−5)
Z−Score (Top−10)
HCRF (Top−10)
(d)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(e)
1 2 3 4 5 6 7 8 9 105.5
6
6.5
7
7.5
8
8.5
9
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(f)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Ra
te
Z−Score (Top−5)
HCRF (Top−5)
Z−Score (Top−10)
HCRF (Top−10)
(g)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Detection Threshold
Reca
ll (
%)
HCRF
Z−Score
(h)
1 2 3 4 5 6 7 8 9 105
6
7
8
9
10
11
12
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(i)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score (Top−5)
HCRF (Top−5)
Z−Score (Top−10)
HCRF (Top−10)
(j)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Detection Threshold
Reca
ll (
%)
HCRF
Z−Score
(k)
1 2 3 4 5 6 7 8 9 105
6
7
8
9
10
11
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(l)
Figure 5.3: Comparison of detection rates, precision and recall between theHCRF based approach and the baseline method using thickness (a)-(c), GWC(d)-(f), curvature (g)-(i) and sulcal depth (j)-(l). Here, α = 1 so that largerclusters are ranked higher (refer to Equation 5.8).
method (0.9473 ± 0.4927) and the baseline (0.5549 ± 0.3903) was significant
at t(9) = 8.825, p < 0.001. Similarly, the differences in precision for HCRF
(8.5373±1.2063) and the baseline (6.6313±0.6597) were found to be significant
95
Figure 5.4: Detection results for paitent NY294 based on curvature. Thewhite outlined area represents the region that was resected, while the filledyellow patches represent the detected clusters at the first detection thresholdfor both the HCRF and the z-score based method. The detected clusters atthe individual scales are shown in (i)-(iii). It can be seen that very small(almost negligible) clusters are detected that overlap with the resected region.However, after running belief propagation (iv) a large cluster is detected withinthe resection zone while the outliers are eliminated. (v) shows the results forthe vertex-based z-score based method while (vi) shows the lesion highlightedon a T1 MRI slice.
at t(9) = 3.9135, p < 0.0035 using a paired t-test. Figure 5.4 shows the
resulting detections from both the HCRF and the baseline when curvature is
used to characterize the cortex for an MRI-negative patient.
When sulcal depth is used to represent the cortex, both the HCRF and
the baseline method achieve the same detection rate. Both approaches are
able to detect abnormal clusters that overlap with the resections of 12 (60%)
patients (Figure 5.3(j)). However, as Figures 5.3(k)-5.3(l) show, HCRF is able
to achieve higher recall and precision values. The difference between the recall
values for HCRF (0.9585 ± 0.5013) and the baseline (0.4891 ± 0.3165) was
significant at t(9) = 7.7730, p < 0.001. Similarly, the differences in precision
for HCRF (8.7008± 0.8541) and the baseline (6.0124± 0.4679) were found to
96
be significant at t(9) = 8.0983, p < 0.001 using a paired t-test.
Using individual features, HCRF is able to achieve a maximum detection
rate of 70% while the baseline has a maximum detection rate of 60%, when top
ten largest clusters are considered. For the baseline sulcal depth and cortical
thickness achieve higher detection rates as compared to GWC and curvature.
Cortical thickness outperforms all other features based on its average precision
and recall. For the HCRF method sulcal-depth and curvature achieve identical
performance with GWC ranking the lowest.
An important consideration is the degree of consensus between the individ-
ual features with respect to the detected patients. If there is some degree of
disagreement among the features, then combining their detection can poten-
tially increase the overall detection rate. Considering top ten largest clusters,
two patients were not detected by any of the four features. Both cortical
thickness and curvature detect a combined total of 16 patients, differing on
one patient each. On the other hand all except a single patient detected by
GWC and sulcal depth were detected by either thickness or curvature. Based
on these results if we combine the output probabilities of all four features, and
then use the same thresholding and ranking technique we should be able to
achieve a detection rate that is higher than the detection rate of the individ-
ual features. We investigate the combination of all four features in the next
section.
5.4.2 Combining Features
In this section we explore the question of whether the HCRF based method
will achieve a higher detection rate if the detections of the individual features
are combined. As a first strategy, we can simply aggregate the posterior prob-
abilities as obtained by the application of HCRF to each individual feature.
97
Because every feature defines its own segmentation of a given parcellation im-
age, it is not possible to directly aggregate the probabilities obtained at the
leaves of the HCRF. To solve this issue we map the posterior probabilities ob-
tained at the leaves of the HCRF, back to the cortical surface for each feature
and then define a combination rule at every vertex. We use two basic aggre-
gation rules, in the first we average the probabilities across the four features,
and in the second each vertex is assigned a probability that is calculated as
the maximum of the four individual probabilities.
Aggregation based on averaging is similar to majority vote rule. In this
strategy, vertices for whom most of the features have a high probability of
being abnormal will be considered abnormal in the final detection. This has
the effect of lowering estimation errors leading to a lower false positive rate by
smoothing the outlier probabilities at each vertex. The second strategy that
uses the maximum across the probabilities would label a vertex as lesional
even if one of the features assigns it a lower outlier probability. This would
lead to a higher detection rate along with a high number of false positives.
Performance Comparison to Baseline Methods
We compare the performance of HCRF with combined features, to both the
z-score method (ZSC) [96] and the vertex-based classifier (ML) developed in
Chapter 3. For ZSC we calculate a single z-score estimate at each vertex,
by averaging and taking the maximum across the z-scores calculated for each
individual feature. Similarly, for the ML technique we combine the probabil-
ity calculated at each vertex as an average (and maximum) across the bag
of logistic regression classifiers. Figures 5.5 and 5.6 contrast the results of
HCRF by applying the two aggregation strategies with the z-score based base-
line method, and the logistic regression based method from Chapter 2 (ML),
98
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score (Top−5)
HCRF (Top−5)
Z−Score (Top−10)
HCRF (Top−10)
(a)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(b)
1 2 3 4 5 6 7 8 9 106.5
7
7.5
8
8.5
9
9.5
10
10.5
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(c)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score (Top−5)
HCRF (Top−5)
Z−Score (Top−5)
HCRF (Top−10)
(d)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(e)
1 2 3 4 5 6 7 8 9 105
6
7
8
9
10
11
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(f)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Ra
te
Cortical Thickness
GWC
Curvature
Sulcal Depth
Maximum (All Features)
Average (All Features)
(g)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Detection Threshold
Recall
(%
)
Cortical Thickness
GWC
Curvature
Sulcal Depth
Average (All Features)
Maximum (All Features)
(h)
1 2 3 4 5 6 7 8 9 107
8
9
10
11
12
13
Detection Threshold
Precis
ion
(%
)
Cortical Thickness
GWC
Curvature
Sulcal Depth
AVERAGE (All Features)
Maximum (All Features)
(i)
Figure 5.5: Comparison of detection rates, precision and recall between theHCRF based approach and the z-score based baseline method when the de-tection scores are averaged across features (a)-(c), and when the final outputscore is computed as the maximum across features (d)-(f). (g) contrasts thedetection rate of both aggregation strategies with that of the individual fea-tures when the top ten largest clusters are considered and (h)-(i) provide thesame comparison for recall and precision. Note that α = 1 such that largerclusters are ranked higher (refer to Equation 5.8).
repectively.
Figure-5.5(a) shows the detection rates when the probabilities are aver-
aged across features. It can be seen that the baseline performs better than
the HCRF at the early thresholds, however HCRF is able to produce better
results as the threshold becomes more lenient. Considering the top ten largest
clusters, HCRF is able to achieve a detection rate of 60% which is slightly
99
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Det
ecti
on
Rate
ML (Top−5)
HCRF (Top−5)
ML (Top−10)
HCRF (Top−10)
(a)
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Detection Threshold
Recall
(%
)
HCRF
ML
(b)
1 2 3 4 5 6 7 8 9 107
7.5
8
8.5
9
9.5
10
10.5
Detection Threshold
Precis
ion
(%
)
HCRF
ML
(c)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
ML (Top−5)
HCRF (Top−5)
ML (Top−10)
HCRF (Top−10)
(d)
1 2 3 4 5 6 7 8 9 100.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Detection Threshold
Recall
(%
)
HCRF
ML
(e)
1 2 3 4 5 6 7 8 9 108.5
9
9.5
10
10.5
11
11.5
12
12.5
Detection Threshold
Precis
ion
(%
)
HCRF
ML
(f)
Figure 5.6: Comparison of detection rates, precision and recall between theHCRF based approach and the logistic regression based baseline method, whenthe detection scores are averaged across features (a)-(c), and when the finaloutput score is computed as the maximum across features (d)-(f). Note thatα = 1 such that larger clusters are ranked higher (refer to Equation 5.8).
higher than the baseline that achieves a detection rate of 55%. The recall
and precision for the HCRF method are significantly higher than the baseline
as shown in Figures 5.5(b)-5.5(c). Using a paired t-test, the difference in the
recall values of the HCRF (1.0206±0.5797) and the baseline (0.7046±0.4410)
was significant at t(9) = 7.1317, p < 0.001, and the difference in the precision
values of the HCRF (9.2946± 0.9708) and the baseline (8.1422± 0.5595) was
significant at t(9) = 4.2350, p = 0.0022.
Similarly, the ML baseline method achieves a maximum detection rate of
55% which is lower than the detection rate of HCRF as shown in Figure-
5.6(a). As far as precision is considered ML achieves higher precision on av-
erage as compared to HCRF. However, using a paired t-test, the difference
in the preicison values of the HCRF (9.2946 ± 0.9708) and the ML baseline
(9.3424 ± 0.4004) was not significant at t(9) = −0.1975, p = 0.8479. On the
100
other hand, HCRF has higher average recall (1.0206 ± 0.5797) than the ML
method (0.8199±0.4702), and the differences in the average recall values were
significant at t(9) = 5.1051, p < 0.001.
When the posterior probability at each vertex is calculated as the maximum
across the four features, HCRF achieves higher detection rate as compared to
the z-score method, shown in Figure 5.5(d). HCRF detects abnormal clusters
within the resection zones of 13 (65%) patients, while the z-score method
detects only 10 (50%) subjects when top ten largest clusters are considered.
Whereas, the logistic regression based method (ML) detects only 9 (45%)
subjects when top ten largest clusters are considered. HCRF achieves higher
recall (t(9) = 8.825, p < 0.001) and precision (t(9) = 3.9135, p < 0.0035)
than the z-score based baseline, as shown in Figures 5.5(e)-5.5(f), respectively.
As compared to the ML method, HCRF has higher average recall (t(9) =
2.3474, p < 0.05), but lower average precision (t(9) = −1.0694, p = 0.3127).
Performance Comparison to Individual Features
A comparison of the detection rate of both aggregation strategies with the
detection rates of the individual features is shown in Figure 5.5(g). We can
see that both perform worse than cortical thickness, achieving a maximum de-
tection rate of 65% when the top ten largest clusters are considered. The same
detection rate is also achieved by curvature. Similarly, based on recall and pre-
cision values we can see that both combination strategies fail to outperform
any of the individual features with the exception of GWC.
Both the averaging and maximum strategies achieved lower recall and pre-
cision than cortical thickness (Figures 5.5(h) and 5.5(i), respectively). In ad-
dition to cortical thickness, the maximum strategy achieved lower precision
than both curvature and sulcal-depth. On the other hand, for the averag-
101
ing technique the differences in recall and precision were not significant when
compared to curvature and sulcal depth.
One reason for the failure of the combined strategies to perform better is
that each feature has its own idiosyncrasies, which when not accounted for
will introduce noise in the ranking/detection process. As an example, con-
sider sulcal depth and curvature. Both features when used within the HCRF
framework, achieve similar precision and recall but different detection rates.
This shows that although, sulcal depth detects clusters within the resection
zones of patients, it detects larger clusters outside the resection zone. If the
detections of sulcal depth and curvature are combined then the noisy clusters
detected by sulcal depth will cause a drop in the overall detection rate. There
are two possible solutions: 1) select only informative features and discard the
ones that are noisy, and 2) tune the tradeoff parameter (α) in the ranking
function (Equation 5.8) such that the ranks of smaller clusters that are highly
abnormal remain resilient to the presence of larger noisy clusters. It should
be noted that changing the ranking function will have no effect on the overall
precision and recall, because cluster ranking only influences the detection rate.
To explore option 1, feature selection, we selected cortical thickness and
curvature because of their higher detection rates, precision and recall. We
employ the same aggregation strategies as before, namely averaging and max-
imum. In Figure 5.7 we observe that when using only thickness and curvature,
both the averaging and maximum strategies produce higher detection rates,
precision and recall than the baseline (Figures 5.7(a)-5.7(f)).
More interestingly, when compared to individual features, the combination
of curvature and cortical thickness is able to achieve significantly higher preci-
sion and recall, with the exception of cortical thickness (the average precision
and recall is higher but the differences are not statistically significant), as
102
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score (Top−5)
HCRF (Top−5)
Z−Score (Top−10)
HCRF (Top−10)
(a)
1 2 3 4 5 6 7 8 9 100
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
Detection Threshold
Dete
cti
on
Rate
HCRF
Z−Score
(b)
1 2 3 4 5 6 7 8 9 108.5
9
9.5
10
10.5
11
11.5
12
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(c)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score (Top−5)
HCRF (Top−5)
Z−Score (Top−10)
HCRF (Top−10)
(d)
1 2 3 4 5 6 7 8 9 100
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(e)
1 2 3 4 5 6 7 8 9 104
5
6
7
8
9
10
11
12
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(f)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Ra
te
Cortical Thickness
GWC
Curvature
Sulcal Depth
Maximum (Thickness+Curvature)
Average (Thickness+Curvature)
(g)
1 2 3 4 5 6 7 8 9 100
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
Detection Threshold
Recall
(%
)
Cortical Thickness
GWC
Curvature
Sulcal Depth
Average (Thickness+Curvature)
Maximum (Thickness+Curvature)
(h)
1 2 3 4 5 6 7 8 9 107
8
9
10
11
12
13
Detection Threshold
Precis
ion
(%
)
Cortical Thickness
GWC
Curvature
Sulcal Depth
Average (Thickness+Curvature)
Maximum (Thickness+Curvature)
(i)
Figure 5.7: Comparison of detection rates, precision and recall between theHCRF based approach and the z-score based baseline method when the de-tection scores are averaged across thickness and curvature (a)-(c), and whenthe final output score is computed as the maximum between the two features(d)-(f). (g) contrasts the detection rate of both averaging and maximum withthat of the individual features when the top ten largest clusters are consideredand (h)-(i) compare the recall and precision. Note that, α = 1 so that largerclusters are ranked higher (refer to Equation 5.8).
shown in Figures 5.7(h)-5.7(i), respectively. However, as Figure 5.7(g) shows
the detection rate although higher than any of the feature aggregation strate-
gies, does not exceed that achieved by thickness alone.
Next, we explore the effects of tuning the size/probability tradeoff param-
eter α, on the detection rate of both individual and combined strategies.
103
5.4.3 Ranking Criterion and the Detection Rate
Thus far we have fixed the ranking criterion to be the size of the detected clus-
ter. However, as defined in Equation 5.8, we can tune the tradeoff parameter
such that the cluster ranking criterion pays attention to both the size and the
average outlier probability of the cluster. To ascertain how α influences the
performance of HCRF framework, we varied α over its entire range of values
and determined the detection rate for individual features, the combination of
all features, and the combination of the top two features.
For a given input feature (or a combination of features) we divided the
range of α ∈ [0, 1] uniformly into twenty 21 points. At each resulting value of
α we re-estimated the ranking of the clusters. The detection rate corresponding
to each value of α was determined by taking the maximum number of patients
detected using the top ten ranked clusters, across the first five thresholds.
Figure 5.8(a) shows the detection rates of each individual feature for different
values of α. Both cortical thickness and sulcal depth achieve their maximum
detection rates when α = 1, whereas curvature does so for intermediate values
of α = 50, 75. On the other hand the detection rate of GWC drops as
α increases. Thus, every feature has its own idiosyncratic dependency on α
which should be taken into account when combining the outputs from multiple
features, especially because the goal is to improve the overall detection rate.
Figures 5.8(b)-5.8(c) compare the influence of alpha on the detection rates
of the two combination strategies (averaging and maximum) when all the fea-
tures are used and when only cortical thickness and curvature are used, re-
spectively. It can be observed that the highest detection rate, 75%, results
from using an averaging technique to combine the posterior probabilities of
cortical thickness and curvature (Figure 5.8(c)).
In order to contrast the performance of the HCRF based method based
104
0 5 25 50 75 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
α (%)
Dete
cti
on
Rate
Cortical Thickness
GWC
Curvature
Sulcal Depth
(a)
0 5 25 50 75 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
α (%)
Dete
cti
on
Rate
Cortical Thickness
Curvature
Maximum (All Features)
Average (All Features)
(b)
0 5 25 50 75 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
α (%)
Dete
cti
on
Rate
Cortical Thickness
Curvature
Maximum (Thickness + Curvature)
Average (Thickness + Curvature)
(c)
Thickness GWC Curvature Sulc Depth Avg. All Max All Avg. Top2 Max Top20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
Z−Score
HCRF
(d)
Figure 5.8: Effect of α on the detection rates of (a) individual features, (b)combination of all four morphological features and (c) combination of thetop two ranked features. In (b) and (c) we have omitted the detection ratesof sulcal depth and GWC to improve the clarity of the plot. (d) comparesthe overall median detection rate of HCRF with the baseline method usingdifferent input features and their combinations across the entire range of α.
on different input settings across the entire range of α, we used a two-sided
Wilcoxon signed-rank test [46]. Figure 5.8(d) compares the median detec-
tion rate of HCRF and the z-score based method for different input features
(and their combinations). For each indivdual feature HCRF achieves a higher
detection rate as compared to the z-score based baseline, except in the case
of sulcal-depth, for which the baseline is able to outperform HCRF. Similarly,
when we combine cortical thickness and curvature to define the final detection,
HCRF dominates the baseline, achieving the highest detection rate 75% using
105
an averaging technique to combine the posterior probabilities of the two input
features. However, when the same averaging technique is used to combine
the results of all four features, both HCRF and the baseline perform compa-
rably (albeit worse than using only two features) and the difference in their
performance across the different values of α is not statistically significant.
5.5 HCRF versus Human Expert
In 2014, We compared the results of HCRF detection, to the senior neuro-
radiologist Dr. Ruben Kuzniecky2, at New York University’s comprehensive
epilepsy treatment center which is one of the world’s leading tertiary epilepsy
treatment centers. Dr. Kuzniecky was first presented with the anonymized
clinical scans of each subject, this included T1 and T2 weighted MRI along
with FLAIR. He was then asked to identify any possible abnormality, being
totally blind to any post-surgical data and the HCRF detection results. We
compared Dr. Kuzniecky’s findings to the clusters found by HCRF at the most
lenient (fifth) threshold, using cortical thickness as the feature of interest.
For this experiment, we required patients that had a full pre-surgical MRI
dataset (e.g., T2-weighted images, FLAIR). Note that pre-surgical clinical
MRI sequences were not obtainable for many subjects due to their having
been referred from external centers. This set of patients included six of the
2Dr. Kuzniecky is Co-Director of the NYU Comprehensive Epilepsy Center and Directorof Epilepsy Research. He trained in neurology, epilepsy, and EEG at the Montreal NeurologicInstitute, McGill University, Canada. He has authored over three books, 36 chapters, andover 250 journal articles on a number of topics related to epilepsy, and has received epilepsyresearch grants from the NIH and numerous foundations. He is Co-Principal Investigatorof the Epilepsy Phenome Genome Project (EPGP), the largest epilepsy genetic study ofits type funded by National Institute of Neurological Diseases and Stroke (NINDS). Heis also the Co-Principal Investigator of the Human Epilepsy Project, a new internationalinitiative to investigate biomakers in epilepsy. His research interest focuses on brain imaging,malformations of brain development and epilepsy. Dr. Kuzniecky has been recognized forhis efforts in the “Best Doctors in America” multiple times and with many honorary lecturesaround the world.
106
MRI-positive patients and only 3 of the MRI-negative patients. We included
three MRI-negative patients who did not have an Engel outcome of 1 (i.e.,
they were not seizure free) because we had their clinical scans.
For the MRI-positive patients, the radiologist detected abnormalities in all
six cases, that correlated with the HCRF detections for five subjects consider-
ing the top five clusters, and with all six in the top ten clusters. For three of the
MRI-positive subjects his findings also identified abnormal regions that over-
lapped with extra-lesional clusters detected by HCRF that ranked amongst the
top ten. An example is depicted in Figure 5.9. In the case of MRI-negative
subjects, the radiologist was unable to identify any visible abnormality in all
six subjects. HCRF on other hand identified three subjects out of a total of five
that had outcomes of Engel class 1-3. For one MRI-negative subject with an
Engel-4 outcome, HCRF found no cluster among the top ten that overlapped
with his resection zone.
The preliminary results on the MRI-negative patients, albeit on a small
sample, are promising because the HCRF method is able to identify high
ranked clusters within the resection zones of MRI-negative patients who have
complete to partial seizure freedom after surgery. Overall, the results indi-
cate that the HCRF approach has a higher sensitivity to histopathologically-
confirmed lesions that are not visible to an expert radiologist, even when a full
set of clinical MRI sequences are available for review.
5.6 Conclusion
Any method of automated detection of FCD lesions is meant to augment the
standard comprehensive clinical evaluation protocol for epilepsy surgery can-
didates. This standard protocol typically involves a neurological exam, scalp
107
Figure 5.9: An MRI-positive patient (NY363) for whom the clusters detectedoutside the resection are also abnormal. Detected clusters using HCRF (a)-(b). The resected area (c), and the area corresponding to the largest clusteroutside the resection (d) shown on a T1 MRI slice.
electroencephalography (EEG), neuropsychological exam, positron emission
tomography (PET), and magnetic resonance imaging (MRI). Due to the com-
mon occurrence of widespread network abnormalities in focal epilepsy, each of
these methods has a high false positive rate. Thus, convergence of evidence
from multiple sources is critical to determining the region(s) with the highest
likelihood of hosting the seizure onset zone. In this work, we addressed this
challenging task of detecting FCD lesions in a semi-supervised image segmen-
tation framework. To this end, we developed a novel semi-supervised image
segmentation method based on hierarchical conditional random fields (HCRF).
We evaluated the proposed method on four morphological features, and also
investigated different mechanisms of combining the outcomes of these input
features.
In an empirical evaluation that involved twenty histologically verified MRI-
negative patients, who had undergone resective surgery and were subsequently
seizure-free, our proposed method was able to achieve higher detection rates
108
using four morphological features as compared to a baseline method. When
the detections based on these features were combined, HCRF was still able to
detect abnormal clusters within the resection zone of a higher number of pa-
tients as compared to the selected baseline. Not only did the proposed method
have a high detection rate, it also achieved significantly higher precision and
recall across all features and their combinations.
In this work we establish that each of the four morphological features,
namely cortical thickness, GWC, curvature and sulcal depth exhibit different
behavior for different settings of the cluster ranking criterion and some of them
produce noisier detections as compared to others. These two observations show
that any method that aims to combine detections from different features should
consider feature-specific properties such as the false positive rate and adjust
the ranking criterion to achieve a higher detection rate.
Because identifying the abnormal region in cryptogenic epilepsy is a mul-
tifaceted procedure that is based on a confluence of evidence from multiple
sources; the high detection rate of our proposed method will have a deeper
impact in the application domain by enhancing the sensitivity of the patient
evaluation methodology, as compared to the conventional visual assessment of
the patient’s MRI by trained radiologists. Indeed, our 75% detection rate on
the MRI-Negative patients in our evauation dataset (compared to a human
expert detection rate of 0%), suggests that this method can be used as an
effective tool in the pre-surgical evaluation of TRE patients who are likely to
undergo surgical resection.
109
Chapter 6
Conclusion
This thesis demonstrates that machine learning methods can be used to in-
crease the sensitivity of identifying epileptogenic cortical malformations in
treatment-resistant epilepsy patients, whose MRI evaluations are deemed nor-
mal by expert neuro-radiologists.
We investigated the main confounding factors that inhibit performance of
supervised learning algorithms for lesion detection in Chapter 3, and designed
a customized classification scheme tailored specifically to counter them. Our
analysis showed that by using training data from both MRI-negative and MRI-
positive patients, coupled with data pre-processing led to a higher detection
rate for MRI-negative patients as compared to a current automated lesion
detection method that is actively used at New York University’s comprehensive
epilepsy treatment center (our data collection source). In the course of this
analysis we identified label noise as the leading confounding factor, along with
inter-subject and intra-subject variations in brain morphology. To counter
label noise we performed ad hoc manual reduction of the resected region in
patients, which led to enhanced performance.
Chapter 4 proposed a novel multitask learning algorithm able to incorpo-
110
rate multiple sources of supervision. From the perspective of lesion detection,
we reinforced the weak vertex labels provided by the resected regions by us-
ing the results of the invasive iEEG exam as an added source of supervision.
Simlarly, we treated each patient as a separated learning task, with the goal
of countering the effects of inter-subject variations in brain morphology. Our
evaluation on a dataset consisting of patients with identical regions of resec-
tions, and their matched controls showed that the proposed algorithm was
able to detect abnormal regions within the resected regions of all patients.
We further established that using auxiliary supervision increased the detec-
tion rate and a multitask formulation was able to reduce the false positive rate
measured on control data, as compared to recently reported methods of lesion
detection.
Identifying label noise as the main confounding factor when using training
data from MRI-negative patients, we cast lesion detection as an outlier detec-
tion problem in Chapter 5. To this end, we developed a lesion detection frame-
work based on semi-supervised hierarchical conditional random fields (HCRF).
This method employed a multiscale strategy to locate the lesion, and explic-
itly modeled spatial dependencies among neighboring vertices. Furthermore,
we proposed a cluster ranking criterion that was able to rank clusters based
on their size and average probability of being abnormal. We evaluated this
method on both MRI-negative and MRI-positive patients, and the proposed
method was able to achieve higher detection rates, with higher precision and
recall as compared to two baseline methods. Furthermore, HCRF correctly de-
tected abnormal regions that overlapped with the resection zones of a higher
number of MRI-negative patients as compared to an expert neuroradiologist
who was unable to detect any abnormalities even with access to more compre-
hensive imaging data.
111
One of the primary aims of the proposed methods for lesion detection in
MRI-negative patients has been to counter the effects of label noise. To this
end, we performed manual mask reduction (Chapter 3), augmenting the weak
labels with the results of iEEG as another source of supervision (Chapter 4),
and discarding vertex-level labels and defining cortical lesions as outliers as
compared to the same regions in normal (control) brains (Chapter 5). An-
other confounding factor is the inter-regional variation in feature distributions
within a brain, and the inter-subject variation in brain morphology arising
from age, handedness and other demographic factors. The vertex-level clas-
sifier developed in Chapter 3, z-score normalized the feature values prior to
training using control data to counter the effects of inter-subject variation.
Similarly, the multitask method developed in Chapter 4 modeled each patient
(and a matched control) as a separate task to reduce the effects of differences
in brain morphology between different individuals. However, none of these
methods took inter-regional variation of feature values into account. On the
other hand, the HCRF framework (Chapter 5) models both inter-regional and
inter-patient variations, by comparing a given coritcal region from a patient
to the same region across the control population, and assigning outlier scores
based on the most similar (in a suitable feature space) k controls. By correct-
ing for all three factors (label noise, inter-regional and inter-subject variation)
HCRF is able to achieve higher performance than the other two methods.
The superior performance of the methods proposed in this thesis as com-
pared to both existing methods and human experts show that machine learning
based automated detection of epileptogenic malformations can serve as an ad-
ditional information source that can add weight to candidate brain regions for
further radiologic or electrophysiological probing, as part of the pre-surgical
protocol. Ultimately, such methods can serve as an additional check to ensure
112
that subtle, visually elusive regions are not overlooked. This is critical, con-
sidering that expert radiologists were unable to identify an abnormality in any
of the MRI-negative patients, as compared to the proposed methods whose
detection rates varied from 58% to 75%. This suggests that some electrophys-
iologically and histopathologically abnormal regions are not visually apparent
to the human eye but can be detected with machine learning methods. How-
ever, similar to all other informational sources in the comprehensive epilepsy
evaluation, results from automated detection methods will never be consid-
ered in isolation to identify a target brain region for surgical removal, but
could become a valuable addition to the clinical comprehensive epilepsy evalu-
ation protocol, particularly if they add weight to potentially abnormal regions
that may have been overlooked by standard radiological review. Similarly, the
results of the proposed methods can inform electrode placement as part of the
iEEG evaluation, thus increasing the chances of finding the lesion(s) prior to
the surgical procedure. The availability of such abnormal regions prior to sur-
gical resection has shown to increase the chances of a patient being seizure-free
(66%) as compared to the case when no potential targets are detected (29%).
Future avenues of research include the use of features derived from other
imaging techniques such as positron emission tomogepahy (PET), Fluid-attenutated
inversion recovery (FLAIR), etc. Similarly, morphological features can be com-
bined with network based features derived from different sources such as diffu-
sion tensor imaging (DTI), diffusion weighted imaging (DWI), functional MRI
(fMRI) and Magnetoencephalography (MEG). There is also a need for the
development of automated lesion detection methods that can work with data
collected at different imaging centers (i.e., heterogenuous scanners and scan-
ning sequences), as most of the current methods (including the work presented
in this thesis) rely on single source data.
113
Appendix A
Patient Information
Participants were selected from a large registry of patients with epilepsy treated
at the New York University School of Medicine Comprehensive Epilepsy Center
who signed consent for a research MRI scanning protocol. Criteria for inclusion
in this study included: (1) completion of a high resolution T1-weighted MRI
scan; (2) surgical resection to treat focal epilepsy; (3) diagnosis of FCD on
neuropathological examination of the resected tissue. This appendix provides
the demographic and seizure-related information for these participants.
Imaging for the research protocol was performed at the New York Uni-
versity Center for Brain Imaging on a Siemens Allegra 3T scanner. Image
acquisitions included a conventional 3-plane localizer and a T1-weighted vol-
ume pulse sequence (TE=3.25 ms, TR =2530 ms, TI =1100 ms, flip angle =7
deg field of view (FOV) = 256 mm, matrix = 256x256, vertex size =1x1x1.3
mm, scan time: 8:07 min). Acquisition parameters were optimized for in-
creased gray/white matter image contrast. The T1-weighted image was reori-
ented into a common space, roughly similar to alignment based on the AC-PC
line. Images were corrected for nonlinear warping caused by no-uniform fields
created by the gradient coils.
114
Clinical imaging sequences for radiological review were acquired at the
NYU Department of Radiology on a 3-Tesla Siemens scanner. Clinical se-
quences were variable across patients but commonly included a high-resolution
T1-weighted MPRAGE (magnetization-prepared rapid gradient echo), T2-
weighted images (axial and coronal, varying slice thickness from 1 to 3 mm),
and fluid-attenuated inversion recovery (FLAIR) images (26 mm slice thick-
ness). The research T1-weighted MPRAGE images used in our analyses were
included in the set of images reviewed by the clinical radiology team. Con-
ventional visual analysis of the clinical scans resulted in an MRI diagnosis of
FCD in 13 patients (MRI-positive) and a normal report in 24 patients (MRI-
negative). The higher number of MRI-negative patients in this sample may
be due to a tendency for patients with more complex, MRI-negative epilepsy
to be referred to the Level 4 epilepsy treatment center.
Table A.1, provides the demographic and seizure related information for
the MRI-positive patients. Note that, three MRI-positive patients namely,
NY68, NY116, and NY169 were initially classified as MRI-negative however,
on a later evaluation they were re-classified as MRI-positive. These three
patients appear as MRI-negatives in Chapter 3.
Table A.2, provides the demographic and seizure related information for
the MRI-negative patients. All except three Engel class I patients were used
for evaluating the HCRF framework (Chapter 5). These three patients in-
cluded, NY212, NY297 and NY312. NY212 was discarded because of missing
scan sequence data, while the other two patients were discarded as they had
incomplete post-surgical histopathological data.
115
Patient Location Age Sex Seizure Seizure EngelOnset Age Frequency Class
NY68 L Temporal 26 M 15 12 2NY116 R Temporal 30 M 22 84 1NY123 L Parietal 14 M 7 730 2NY143 R Frontal 38 F 4 1248 1NY156 L Parietal & Frontal 20 M 7 182 2NY169 R Temporal 26 M 3 1277 1NY174 R Temporal & Frontal 16 M 9 52 2NY187 L Temporal 45 F 5 14 1NY342 R Temporal & Frontal 47 F 32 12 3NY363 L Temporal 24 F 13 52 3NY388 L Frontal 30 F 8 1640 1NY453 R Temporal 18 F 8 12 1NY459 L Basal Frontal 17 M 1.5 365 1Mean 27.00 10.35 436.92
Table A.1: Demographic and seizure-related information for the MRI-positivepatients.
116
Patient Location Age Sex Seizure Seizure EngelOnset Age Frequency Class
NY46 R Temporal 41 M 3 52 1NY67 R Temporal 27 M 13 1825 1NY72 R Temporal 46 M 2 74 2NY148 L Temporal 37 M 35 3 2NY149 R Frontal 32 F 11 1460 1NY159 R Parietal 21 F 8 2190 1NY171 R Temporal 26 F 19 5 4NY177 L Temporal 38 F 19 5 3NY186 L Temporal 35 F 6 1095 2NY212 L Temporal 37 M 21 166 1NY226 L Temporal 40 F 5 8 1NY255 R Temporal 20 F 15 48 1NY259 L Temporal 26 F 9 288 2NY294 R Temporal 51 F 1 12 1NY297 R Temporal 51 F 8 52 1NY299 R Temporal 28 F 13 37 2NY312 L Temporal 43 F 6 24 1NY315 L Occipital 47 F 9 12 1
NY322R Frontal, Insular& Temporal
24 F 9 12 1
NY338 R Temporal 30 M 19 120 1NY343 R Temporal 32 M 21 1825 1NY351 L Temporal 30 M 12 12 1NY371 R Temporal 17 M 17 365 1NY375 R Temporal 16 F 2 54 1NY394 R Temporal 27 M 19 72 1NY404 R Temporal 51 F 45 6 1NY441 L Temporal 41 M 31 72 1NY451 R Inferior Parietal 25 M 9 912 1NY455 L Temporal 61 M 19 12 1NY486 L Temporal 29 F 27 96 1Mean 34.30 14.43 363.80
Table A.2: Demographic and seizure-related information for the MRI-negativepatients.
117
Appendix B
HCRF Results for MRI-Positive
Patients
We tested the proposed HCRF framework on a set of thriteen MRI-positive
patients (refer to Appendix A for patient information), using four individual
morphological features. These included, cortical thickness, gray/white con-
trast (GWC), curvature and sulcal depth. HCRF was evaluated using identical
parameter settings as described in Section 5.3.1. The detection rate was de-
termined by setting α to 1 (c.f. Equation 5.8), so that all clusters are ranked
based only on their surface area. We include these results to demonstrate
the superior performance of HCRF as compared to the z-score based baseline
method.
Figure B.2(a) shows the comparison of the detection rates for MRI-negative
patients when cortical thickness is used to represent the cortex. HCRF per-
forms better than the z-score baseline using both top five and top ten detec-
tions. HCRF detects the lesion in 12 (92%) patients, while the baseline detects
11 (85%) subjects when considering the top ten largest clusters. HCRF is able
to achieve higher recall as shown in Figure B.2(b). The difference between
118
Figure B.1: Detection results for an MRI positive patient (NY156) shown onan inflated model of the lateral cortical surface. The actual lesion is delineatedas the white circled region and the detection results are shown as filled yellowregions. Detected clusters after thresholding outlier probabilities at each indi-vidual scale (a)-(c), after running belief propagation (d), and using the z-scorebased approach (e). The results are shown for the highest ranking thresholdwithout any post-processing. (f) shows the lesion highlighted on a T1 MRIslice.
the recall values of the proposed method (14.4321± 0.6078) and the baseline
(2.0878± 1.0392) was significant at t(9) = 87.0053, p < 0.001, using a paired
t-test. As far as precision is considered, HCRF has higher precision values
(19.6634 ± 0.6749) than the baseline (19.1915 ± 1.1462) however the differ-
ences in the values were not significant. Figure-B.1 provides an example of
the detected clusters using HCRF and the baseline for a patient.
Using GWC, HCRF is able to detect abnormal clusters within the resection
zones of eight (61.5%) patients as opposed to the baseline that detects nine
(69%), as shown in Figure B.2(d). Figure B.2(e) shows the recall for HCRF
method (2.0367 ± 1.0882) that is significantly higher (t(9) = 6.7320, p <
0.001) than the recall of the baseline method (0.7814± 0.5094). Figure B.2(f)
compares the precision of the HCRF and baseline using GWC. The differences
119
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Det
ecti
on
Rate
ZSC (Top−5)
HCRF (Top−5)
ZSC (Top−10)
HCRF (Top−10)
(a)
1 2 3 4 5 6 7 8 9 100
2
4
6
8
10
12
14
16
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(b)
1 2 3 4 5 6 7 8 9 1017.5
18
18.5
19
19.5
20
20.5
21
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(c)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Rate
ZSC (Top−5)
HCRF (Top−5)
ZSC (Top−10)
HCRF (Top−10)
(d)
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
3
3.5
4
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(e)
1 2 3 4 5 6 7 8 9 107
8
9
10
11
12
13
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(f)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Ra
te
ZSC (Top−5)
HCRF (Top−5)
ZSC (Top−10)
HCRF (Top−10)
(g)
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(h)
1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(i)
1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection Threshold
Dete
cti
on
Ra
te
ZSC (Top−5)
HCRF (Top−5)
ZSC (Top−10)
HCRF (Top−10)
(j)
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
3
Detection Threshold
Recall
(%
)
HCRF
Z−Score
(k)
1 2 3 4 5 6 7 8 9 1011
12
13
14
15
16
17
18
Detection Threshold
Precis
ion
(%
)
HCRF
Z−Score
(l)
Figure B.2: Comparison of detection rates, precision and recall between thenHCRF based approach and the baseline method using thickness (a)-(c), GWC(d)-(f), curvature (g)-(i) and sulcal depth (j)-(l). Here, α = 1 so that largerclusters are ranked higher (refer to Equation 5.8).
in the precision values for HCRF (11.4304±1.0621) and the baseline (8.4030±
0.5979) were found to be significant at t(9) = 15.8046, p < 0.001 using a
paired t-test. Although, using GWC HCRF is able to outperform the baseline
120
in terms of recall and precision, however, the resulting detection rate is lower
than the baseline.
Figure B.2(g) shows the comparison of the detection rates using curvature
to represent the cortex. HCRF detects abnormal clusters within the resec-
tion zones of 9 (69%) patients, while the baseline detects only 8 (61.5%) sub-
jects when the top ten largest clusters are considered. Figures B.2(h)-B.2(i)
show that HCRF is able to achieve higher recall and precision than the base-
line, repsectively. The difference between the recall values of the proposed
method (1.2575 ± 0.6311) and the baseline (0.8718 ± 0.4904) was significant
at t(9) = 6.5582, p < 0.001. However, the differences in precision for HCRF
(12.3179 ± 1.6842) and the baseline (11.6453 ± 0.8254) were not significant
(t(9) = 1.6289, p = 0.1378) using a paired t-test.
When sulcal depth is used to represent the cortex, the baseline dominates
HCRF at the first three thresholds, however HCRF is able to perform better at
the more lenient thresholds. Overall, HCRF is able to detect abnormal clusters
that overlap with the resections of 6 (46%) patients, while the baseline detects
the lesion in 5 (38.5%) patients (Figure B.2(j)). Figure B.2(k) shows the
recall for HCRF method (2.6564± 0.1726) that is significantly higher (t(9) =
11.0850, p < 0.001) than the recall of the baseline method (0.9731± 0.6418).
Figure B.2(l) compares the precision of the HCRF and baseline using GWC.
On average, the baseline has a higher precision than HCRF, however, the
differences in the precision values for HCRF (12.5145±0.8396) and the baseline
(13.5163± 1.4811) were not significant (t(9) = −1.8481, p = 0.0976).
Using individual features, HCRF is able to achieve a maximum detection
rate of 92% while the baseline has a maximum detection rate of 85%, when top
ten largest clusters are considered. Cortical thickness outperforms all other
features based on its average precision and recall.
121
Bibliography
[1] B. Ahmed, C. E. Brodley, K. E. Blackmon, R. Kuzniecky, G. Barash,C. Carlson, B. T. Quinn, W. Doyle, J. French, O. Devinsky, and T. The-sen. Cortical feature analysis and machine learning improves detectionof mri-negative focal cortical dysplasia. Epilepsy & Behavior, 48:21 – 28,2015.
[2] B. Ahmed, T. Thesen, K. Blackmon, Y. Zhao, O. Devinsky,R. Kuzniecky, and C. Brodley. Hierarchical conditional random fieldsfor outlier detection: An application to detecting epileptogenic corticalmalformations. In Proceedings of the 31st International Conference onMachine Learning (ICML-14), pages 1080–1088, 2014.
[3] L. Andrews. Special Functions of Mathematics for Engineers. SPIEOptical Engineering Press, 1992.
[4] J. Ashburner and K. J. Friston. Voxel-Based Morphometry-The Meth-ods. NeuroImage, 11(6):805 – 821, 2000.
[5] P. Awasthi, A. Gagrani, and B. Ravindran. Image modelling using treestructured conditional random fields. In IJCAI, pages 2060–2065, 2007.
[6] K. Babalola, B. Patenaude, P. Aljabar, J. Schnabel, D. Kennedy,W. Crum, S. Smith, T. Cootes, M. Jenkinson, and D. Rueckert. Com-parison and evaluation of segmentation techniques for subcortical struc-tures in brain mri. In Medical Image Computing and Computer-AssistedIntervention MICCAI 2008, pages 409–416, 2008.
[7] A. J. Barkovich, R. Guerrini, R. I. Kuzniecky, G. D. Jackson, and W. B.Dobyns. A developmental and genetic classification for malformationsof cortical development: update 2012. Brain, 135(5):1348–1369, 2012.
[8] A. J. Barkovich and C. A. Raybaud. Neuroimaging in disorders of cor-tical development. Neuroimaging Clinics of North America, 14(2):231 –254, 2004.
[9] S. R. Benbadis, L. Heriaud, W. O. Tatum IV, and F. L. Vale. Epilepsysurgery, delays and referral patterns-are all your epilepsy patients con-trolled? Seizure, 12(3):167 – 170, 2003.
122
[10] A. Bernasconi and N. Bernasconi. Unveiling epileptogenic lesions: Thecontribution of image processing. Epilepsia, 52:20–24, 2011.
[11] A. Bernasconi, N. Bernasconi, B. C. Bernhardt, and D. Schrader. Ad-vances in MRI for ’cryptogenic’ epilepsies. Nature Reviews Neurology,7(2):99–108, 2011.
[12] P. Besson, F. Andermann, F. Dubeau, and A. Bernasconi. Small focalcortical dysplasia lesions are located at the bottom of a deep sulcus.Brain, 131(12):3246–3255, 2008.
[13] P. Besson, F. Andermann, F. Dubeau, and A. Bernasconi. Small focalcortical dysplasia lesions are located at the bottom of a deep sulcus.Brain, 131(12):3246–3255, 2008.
[14] P. Besson, N. Bernasconi, O. Colliot, et al. Surface-based texture andmorphological analysis detects subtle cortical dysplasia. In MICCAI,pages 645–652, 2008.
[15] P. Besson, N. Bernasconi, O. Colliot, A. Evans, and A. Bernasconi.Surface-based texture and morphological analysis detects subtle corti-cal dysplasia. In Proceedings of the 11th International Conference onMedical Image Computing and Computer-Assisted Intervention MICCAI’08-Part I, pages 645–652, 2008.
[16] C. Bishop. Pattern Recognition and Machine Learning (InformationScience and Statistics). Springer-Verlag New York, Inc., 2006.
[17] K. Blackmon, E. Halgren, W. B. Barr, C. Carlson, O. Devinsky,J. DuBois, B. T. Quinn, J. French, R. Kuzniecky, and T. Thesen. In-dividual differences in verbal abilities associated with regional blurringof the left gray and white matter boundary. J Neurosci., 31(43):15257–15263, 2011.
[18] I. Blumcke, M. Thom, E. Aronica, D. D. Armstrong, H. V. Vinters, et al.The clinicopathologic spectrum of focal cortical dysplasias: A consensusclassification proposed by an ad hoc task force of the ILAE diagnosticmethods commission. Epilepsia, 52(1):158–174, 2011.
[19] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Uni-versity Press, 2004.
[20] M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. LOF: IdentifyingDensity-Based Local Outliers. In ACM SIGMOD ICMD, pages 93–104.ACM, 2000.
[21] C. E. Brodley and M. A. Friedl. Identifying mislabeled training data. J.A.I. Res., 11:131–167, 1999.
123
[22] R. Caruana. Multitask learning. Mach. Learn., 28(1):41–75, 1997.
[23] D. Comaniciu and P. Meer. Mean shift: a robust approach toward featurespace analysis. PAMI, 24(5):603–619, 2002.
[24] A. M. Dale, B. Fischl, and M. I. Sereno. Cortical surface-based analysis:I. segmentation and surface reconstruction. NeuroImage, 9(2):179 – 194,1999.
[25] R. Desikan, F. Sgonne, B. Fischl, et al. An automated labeling systemfor subdividing the human cerebral cortex on MRI scans into gyral basedregions of interest. NeuroImage, 31(3):968–980, 2006.
[26] L. R. Dice. Measures of the amount of ecologic association betweenspecies. Ecology, 26(3):297–302, 1945.
[27] J. S. Duncan, G. P. Winston, M. J. Koepp, and S. Ourselin. Brainimaging in the assessment for epilepsy surgery. The Lancet Neurology,15(4):420–433, 2016.
[28] C. Ecker, A. Marquand, J. Mouro-Miranda, P. Johnston, E. M. Daly,M. J. Brammer, S. Maltezos, C. M. Murphy, D. Robertson, S. C.Williams, and D. G. M. Murphy. Describing the brain in autism in fivedimensions – magnetic resonance imaging-assisted diagnosis of autismspectrum disorder using a multiparameter classification approach. TheJournal of Neuroscience, 30(32):10612–10623, 2010.
[29] A. Esbroeck, L. Smith, Z. Syed, S. Singh, and Z. Karam. Multi-taskseizure detection: addressing intra-patient variation in seizure morpholo-gies. Machine Learning, 102(3):309–321, 2015.
[30] T. Evgeniou, C. A. Micchelli, M. Pontil, et al. Learning multiple taskswith kernel methods. J. Mach. Learn. Res., 6:615–637, 2005.
[31] T. Evgeniou and M. Pontil. Regularized multi–task learning. In Proceed-ings of the Tenth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, KDD ’04, pages 109–117, 2004.
[32] S. Fauser, S. M. Sisodiya, L. Martinian, M. Thom, C. Gumbinger, H.-J.Huppertz, C. Hader, K. Strobl, B. J. Steinhoff, M. Prinz, J. Zentner,and A. Schulze-Bonhage. Multi-focal occurrence of cortical dysplasia inepilepsy patients. Brain, 132(8):2079–2090, 2009.
[33] B. Fischl and A. M. Dale. Measuring the thickness of the human cerebralcortex from magnetic resonance images. Proceedings of the NationalAcademy of Science, 97:11050–11055, 2000.
124
[34] B. Fischl, D. Salat, E. Busa, et al. Whole brain segmentation: Auto-mated labeling of neuroanatomical structures in the human brain. Neu-ron, 33(3):341–355, 2002.
[35] B. Fischl, M. I. Sereno, and A. M. Dale. Cortical surface-based anal-ysis: Ii: Inflation, flattening, and a surface-based coordinate system.NeuroImage, 9(2):195 – 207, 1999.
[36] B. Fischl, M. I. Sereno, R. B. Tootell, and A. M. Dale. High-resolutionintersubject averaging and a coordinate system for the cortical surface.Human Brain Mapping, 8(4):272–284, 1999.
[37] S. S. Ghosh, S. Kakunoori, J. Augustinack, A. Nieto-Castanon, I. Kovel-man, N. Gaab, J. A. Christodoulou, C. Triantafyllou, J. D. Gabrieli, andB. Fischl. Evaluating the validity of volume-based and surface-basedbrain image registration for developmental cognitive neuroscience stud-ies in children 4 to 11 years of age. NeuroImage, 53(1):85 – 93, 2010.
[38] A. S. Hakimi, M. V. Spanaki, L. A. Schuh, B. J. Smith, and L. Schultz. Asurvey of neurologists’ views on epilepsy surgery and medically refractoryepilepsy. Epilepsy & Behavior, 13(1):96 – 101, 2008.
[39] W. A. Hauser and D. C. Hesdorffer. Epilepsy: frequency, causes andconsequences. Epilepsy Foundation of America, 1990.
[40] R. J. Hickey. Noise modelling and evaluating learning from examples.Artificial Intelligence, 82(12):157 – 179, 1996.
[41] P. A. Hofman, G. J. Fitt, A. S. Harvey, R. Kuzniecky, and G. Jackson.Bottom-of-sulcus dysplasia: imaging features. AJR Am J Roentgenol.,196(4):881–885, 2011.
[42] S.-C. Hong, K.-S. Kang, D. W. Seo, S. B. Hong, M. Lee, D.-H. Nam,J.-I. Lee, J. S. Kim, H.-J. Shin, K. Park, W. Eoh, Y.-L. Suh, and J.-H.Kim. Surgical treatment of intractable epilepsy accompanying corticaldysplasia. Journal of Neurosurgery, 93(5):766–773, 2000.
[43] S. J. Hong, H. Kim, D. Schrader, N. Bernasconi, B. C. Bernhardt, andA. Bernasconi. Automated detection of cortical dysplasia type II inMRI-negative epilepsy. Neurology, 83(1):48–55, 2014.
[44] H.-J. Huppertz, J. Kassubek, D.-M. Altenmller, T. Breyer, andS. Fauser. Automatic curvilinear reformatting of three-dimensional MRIdata of the cerebral cortex. NeuroImage, 33(10):1932 – 1938, 2012.
[45] R. I. Kuzniecky and A. J. Barkovich. Malformations of cortical develop-ment and epilepsy. Brain Dev., 23(1):2–11, 2001.
125
[46] N. Japkowicz and M. Shah. Evaluating Learning Algorithms: A Classi-fication Perspective. Cambridge University Press, New York, NY, USA,2011.
[47] N. Japkowicz and S. Stephen. The class imbalance problem: A system-atic study. Intell. Data Anal., 6(5):429–449, 2002.
[48] T. Joachims. Optimizing search engines using clickthrough data. InACM SIGKDD, KDD ’02, pages 133–142, 2002.
[49] L. G. Kini, J. C. Gee, and B. Litt. Computational analysis in epilepsyneuroimaging: A survey of features and methods. NeuroImage: Clinical,page To Appear, 2016.
[50] A. Klein, S. S. Ghosh, B. Avants, B. Yeo, B. Fischl, B. Ardekani, J. C.Gee, J. Mann, and R. V. Parsey. Evaluation of volume-based and surface-based brain image registration methods. NeuroImage, 51(1):214 – 220,2010.
[51] H.-P. Kriegel, P. Kroger, E. Schubert, and A. Zimek. LoOP: LocalOutlier Probabilities. In ACM CIKM, pages 1649–1652, 2009.
[52] P. Krsek, B. Maton, B. Korman, E. Pacheco-Jacome, P. Jayakar,C. Dunoyer, et al. Different features of histopathological subtypes ofpediatric focal cortical dysplasia. Annals of Neurology, 63(6):758–769,2008.
[53] P. Krsek, T. Pieper, A. Karlmeier, M. Hildebrandt, D. Kolodziejczyk,P. Winkler, E. Pauli, I. Blmcke, and H. Holthausen. Different presur-gical characteristics and seizure outcomes in children with focal corticaldysplasia type i or ii. Epilepsia, 50(1):125–137, 2009.
[54] A. Kumar and H. Daume. Learning task grouping and overlap in multi-task learning. In Proceedings of the 29th International Conference onMachine Learning (ICML-12), pages 1383–1390, 2012.
[55] P. Kwan and M. J. Brodie. Early identification of refractory epilepsy.New England Journal Of Medicine, 342(5):314–319, 2000.
[56] P. Kwan, S. C. Schachter, and M. J. Brodie. Drug-resistant epilepsy.New England Journal of Medicine, 365(10):919–926, 2011.
[57] M. L. Bell et al. Epilepsy surgery outcomes in temporal lobe epilepsywith a normal MRI. Epilepsia, 50(9):2053–2060, 2009.
[58] P. J. Lenk, W. S. DeSarbo, P. E. Green, and M. R. Young. Hierarchi-cal bayes conjoint analysis: Recovery of partworth heterogeneity fromreduced experimental designs. Marketing Science, 15(2):173–191, 1996.
126
[59] J. P. Lerch and A. C. Evans. Cortical thickness analysis examinedthrough power analysis and a population simulation. NeuroImage,24(1):163 – 173, 2005.
[60] J. Lin, N. Salamon, A. Lee, et al. Reduced neocortical thickness andcomplexity mapped in mesial temporal lobe epilepsy with hippocampalsclerosis. Cereb. Cortex, 17(9):2007–2018, 2007.
[61] P. L. Lopez-Cruz, C. Bielza, P. Larranaga, et al. Learning conditionallinear gaussian classifiers with probabilistic class labels. In CAEPIA ’13,pages 139–148, 2013.
[62] D. Lowe. Object recognition from local scale-invariant features. In ICCV,pages 1150–1157, 1999.
[63] C. McDonald, D. J. H. Jr, M. E. Ahmadi, et al. Regional neocorti-cal thinning in mesial temporal lobe epilepsy. Epilepsia, 49(5):794–803,2008.
[64] C. Mellerio, M.-A. Labeyrie, F. Chassoux, C. Daumas-Duport, E. Lan-dre, B. Turak, F.-X. Roux, J.-F. Meder, B. Devaux, and C. Oppen-heim. Optimizing mr imaging detection of type 2 focal cortical dysplasia:Best criteria for clinical practice. American Journal of Neuroradiology,39(1):80 – 86, 2008.
[65] K. J. Miller, M. denNijs, P. Shenoy, J. W. Miller, R. P. N. Rao, and J. G.Ojemann. Real-time functional brain mapping using electrocorticogra-phy. NeuroImage, 37(2):504 – 507, 2007.
[66] S. Mueller, K. Laxer, J. Barakos, I. Cheong, P. Garcia, and M. Weiner.Widespread neocortical abnormalities in temporal lobe epilepsy with andwithout mesial sclerosis. NeuroImage, 46(2):353 – 359, 2009.
[67] A. Muhlebner, R. Coras, K. Kobow, M. Feucht, T. Czech, H. Stefan,D. Weigel, M. Buchfelder, H. Holthausen, T. Pieper, M. Kudernatsch,and I. Blumcke. Neuropathologic measurements in focal cortical dys-plasias: validation of the ilae 2011 classification system and diagnosticimplications for mri. Acta Neuropathologica, 123(2):259–272, 2011.
[68] D. F. Nettleton, A. Orriols-Puig, and A. Fornells. A study of the effect ofdifferent types of noise on the precision of supervised learning techniques.Artificial Intelligence Review, 33(4):275–306, 2010.
[69] Q. Nguyen, H. Valizadegan, M. Hauskrecht, et al. Learning classificationwith auxiliary probabilistic information. In IEEE ICDM ’11, pages 477–486, 2011.
[70] S. Noachtar and A. Peters. Semiology of epileptic seizures: A criticalreview. Epilepsy Behav., 15(1):2–9, 2009.
127
[71] C. Nordahl, D. Dierker, I. Mostafavi, et al. Cortical folding abnormal-ities in autism revealed by surface-based morphometry. J Neurosci.,27(43):11725–11735, 2007.
[72] S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos. LOCI:fast outlier detection using the local correlation integral. In ICDE, pages315–326, 2003.
[73] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks ofPlausible Inference. Morgan Kaufmann Publishers Inc., 1988.
[74] R. Pienaar, B. Fischl, V. Caviness, N. Makris, and P. E. Grant.A methodology for analyzing curvature in the developing brain frompreterm to adult. International Journal of Imaging Systems and Tech-nology, 18(1):42–68, 2008.
[75] N. Plath, M. Toussaint, and S. Nakajima. Multi-class image segmenta-tion using conditional random fields and global classification. In ICML,pages 817–824, 2009.
[76] A. A. Raymond, D. R. Fish, S. M. Sisodiya, N. Alsanjari, J. M. Stevens,and S. D. Shorvon. Abnormalities of gyration, heterotopias, tuber-ous sclerosis, focal cortical dysplasia, microdysgenesis, dysembryoplasticneuroepithelial tumour and dysgenesis of the archicortex in epilepsy:Clinical, eeg and neuroimaging features in 100 adult patients. Brain,118(3):629–660, 1995.
[77] U. Rebbapragada and C. E. Brodley. Class noise mitigation throughinstance weighting. In ECML ’07, pages 708–715, 2007.
[78] J. Reynolds and K. Murphy. Figure-ground segmentation using a hier-archical conditional random field. In CRV, pages 175–182, 2007.
[79] L. Rimol, R. Nesvag, D. Hagler Jr., et al. Cortical volume, surfacearea, and thickness in schizophrenia and bipolar disorder. BiologicalPsychiatry, 71(6):552–560, 2012.
[80] D. Rivire, J.-F. Mangin, D. Papadopoulos-Orfanos, J.-M. Martinez,V. Frouin, and J. Rgis. Automatic recognition of cortical sulci of thehuman brain using a congregation of neural networks. Medical ImageAnalysis, 6(2):77 – 92, 2002.
[81] F. Rosenow and H. Luders. Presurgical evaluation of epilepsy. Brain,124(9):1683–1700, 2001.
[82] M. T. Rosenstein, Z. Marx, L. P. Kaelbling, and T. G. Dietterich. Totransfer or not to transfer. In In NIPS05 Workshop, Inductive Transfer:10 Years Later, 2005.
128
[83] P. Rzezak, P. Squarzoni, F. L. Duran, T. de Toledo Ferraz Alves,J. Tamashiro-Duran, C. M. Bottino, S. Ribeiz, P. A. Lotufo, P. R.Menezes, M. Scazufca, and G. F. Busatto. Relationship between brainage-related reduction in gray matter and educational attainment. PLoSONE, 10(10):1–15, 2015.
[84] D. Salat, R. Buckner, A. Snyder, et al. Thinning of the cerebral cortexin aging. Cerebral Cortex, 14(7):721–730, 2004.
[85] M. Sampat, Z. Wang, M. Markey, G. Whitman, T. Stephens, andA. Bovik. Measuring intra- and inter-observer agreement in identify-ing and localizing structures in medical images. In Image Processing,2006 IEEE International Conference on, pages 81–84, 2006.
[86] B. Scholkopf and A. J. Smola. Learning with Kernels: Support VectorMachines, Regularization, Optimization, and Beyond. MIT Press, 2001.
[87] E. Schubert, R. Wojdanowski, A. Zimek, and H.-P. Kriegel. On evalu-ation of outlier rankings and outlier scores. In SDM, pages 1047–1058,2012.
[88] S. M. Sisodiya. Surgery for focal cortical dysplasia. Brain, 127(11):2383–2384, 2004.
[89] S. M. Sisodiya, S. Fauser, J. H. Cross, and M. Thom. Focal corticaldysplasia type ii: biological features and clinical perspectives. The LancetNeurology, 8(9):830 – 843, 2009.
[90] P. Smyth, U. M. Fayyad, M. C. Burl, P. Perona, and P. Baldi. Inferringground truth from subjective labelling of venus images. In NIPS ’94,pages 1085–1092, 1994.
[91] C. Sutton and A. McCallum. An Introduction to Conditional RandomFields, 2010. eprint arXiv:1011.4088.
[92] J. T. Lerner et al. Assessment and surgical outcomes for mild type Iand severe type II cortical dysplasia: a critical review and the UCLAexperience. Epilepsia, 50(6):1310–1335, 2009.
[93] L. Tassi, N. Colombo, R. Garbelli, S. Francione, G. Lo Russo, R. Mai,F. Cardinale, M. Cossu, A. Ferrario, C. Galli, M. Bramerio, A. Citterio,and R. Spreafico. Focal cortical dysplasia: neuropathological subtypes,eeg, neuroimaging and surgical outcome. Brain, 125(8):1719–1732, 2002.
[94] L. Tassi, N. Colombo, R. Garbelli, S. Francione, G. Lo Russo, R. Mai,F. Cardinale, M. Cossu, A. Ferrario, C. Galli, M. Bramerio, A. Citterio,and R. Spreafico. Focal cortical dysplasia: neuropathological subtypes,eeg, neuroimaging and surgical outcome. Brain, 125(8):1719–1732, 2002.
129
[95] J. F. Tellez-Zenteno, R. Dhar, L. Hernandez-Ronquillo, and S. Wiebe.Long-term outcomes in epilepsy surgery: antiepileptic drugs, mortality,cognitive and psychosocial aspects. Brain, 130(2):334–345, 2007.
[96] T. Thesen et al. Detection of Epileptogenic Cortical Malformations withSurface-Based MRI Morphometry. PLoS ONE, 6(2):1–10, 2011.
[97] M. Thom, L. Martinian, A. Sen, J. H. Cross, B. N. Harding, and S. M.Sisodiya. Cortical neuronal densities and lamination in focal corticaldysplasia. Acta Neuropathologica, 110(4):383–392, 2005.
[98] S. Thrun and L. Pratt. Learning to Learn. Kluwer Academic Publishers,1996.
[99] J. F. Tllez-Zenteno, L. H. Ronquillo, F. Moien-Afshari, and S. Wiebe.Surgical outcomes in lesional and non-lesional epilepsy: A systematicreview and meta-analysis. Epilepsy Research, 89(23):310 – 318, 2010.
[100] A. Torralba, K. Murphy, W. Freeman, et al. Sharing features: efficientboosting procedures for multiclass object detection. In IEEE CVPR ’04,pages 762–769, 2004.
[101] K. Tufenkjian and H. O. Luders. Seizure semiology: Its value and limita-tions in localizing the epileptogenic zone. J Clin Neurol., 8(4):243–250,2012.
[102] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces.In Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on, pages 586–591, 1991.
[103] D. C. Van Essen, H. A. Drury, S. Joshi, and M. I. Miller. Functionaland structural mapping of human cerebral cortex: Solutions are in thesurfaces. Proceedings of the National Academy of Sciences, 95(3):788–795, 1998.
[104] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., 1995.
[105] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library ofcomputer vision algorithms. http://www.vlfeat.org/, 2008a.
[106] A. Vedaldi and S. Soatto. Quick shift and kernel methods for modeseeking. In ECCV, pages 705–718, 2008b.
[107] J. von Oertzen, H. Urbach, S. Jungbluth, M. Kurthen, M. Reuber,G. Fernndez, and C. E. Elger. Standard magnetic resonance imagingis inadequate for patients with refractory focal epilepsy. Journal of Neu-rology, Neurosurgery & Psychiatry, 73(6):643–647, 2002.
130
[108] B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Classimbalance, redux. In Proceedings of the IEEE International Conferenceon Data Mining (ICDM ’11), pages 754–763, 2011.
[109] Y. Wang and R. Khardon. Sparse gaussian processes for multi-tasklearning. In ECML/PKDD’12, pages 711–727, 2012.
[110] Z. I. Wang, A. V. Alexopoulos, S. E. Jones, Z. Jaisani, I. M. Najm, andR. A. Prayson. The pathology of magnetic-resonance-imaging-negativeepilepsy. Mod Pathol, 26(8):1051–1058, 2013.
[111] M. Wilke, J. Kassubek, S. Ziyeh, A. Schulze-Bonhage, and H. Hup-pertz. Automated detection of gray matter malformations using opti-mized voxel-based morphometry: a systematic approach. NeuroImage,20(1):330 – 343, 2003.
[112] J. Yuan, Y. Chen, and E. Hirsch. Intracranial electrodes in the presur-gical evaluation of epilepsy. Neurological Sciences, 33(4):723–729, 2012.
[113] A. Zijdenbos, B. Dawant, R. Margolin, and A. Palmer. Morphometricanalysis of white matter lesions in MR images: method and validation.Medical Imaging, IEEE Transactions on, 13(4):716–724, Dec 1994.
[114] K. H. Zou, S. K. Warfield, A. Bharatha, C. M. Tempany, M. R. Kaus,S. J. Haker, W. M. Wells III, F. A. Jolesz, and R. Kikinis. Statistical val-idation of image segmentation quality based on a spatial overlap index:scientific reports. Academic Radiology, 11(2):178–189, 2004.
131