Clustering neural systems using generative embedding

7/31/2019 Clustering neural systems using generative embedding

1/40

Clustering neural systemsusing generative embedding

Semester Thesis

Ajita Gupta

December 7, 2011

Advisor: Kay H. Brodersen1,2

Supervisors: Professor Joachim M. Buhmann1, Professor Klaas E. Stephan2

1Machine Learning Laboratory, Department of Computer Science, ETH Zurich2Laboratory for Social and Neural Systems Research, Department of

Economics, University of Zurich


2/40

Abstract

Multivariate decoding models have been increasingly used to infer cognitive orclinical brain states from measures of brain activity. The performance of conven-tional clustering techniques is restricted by high data dimensionality, low samplesize and lack of mechanistic interpretability. In this project, we are extendingprevious work on classification of neural dynamical systems to clustering, askingwhat structure can be discovered when no labelling information is available. Weillustrate the utility of our approach in the context of neuroimaging and validateour solutions in relation to known clinical diagnostics. We also investigate howour solution can be visualized and interpreted in the context of the underlyinggenerative model. We envisage that future applications of model-based cluster-ing may help dissect spectrum disorders into physiologically more well-defined

subgroups.


3/40

Acknowledgements

I would like to express my deepest gratitude to everyone who has accompaniedme throughout the course of this project in the past four months.

To begin with, I would like to thank Prof. Dr. Joachim M. Buhmann and

Prof. Dr. Dr. med. Klaas E. Stephan for giving me this wonderful opportunityto work in their group, as well as to attend the first International SystemsX.chConference on Systems Biology this year. It was a memorable experience anda significant exposure of my academic career.

I am grateful to Kay H. Brodersen for his guidance, help and patience. Hehas been an excellent advisor and has consistently engaged with me in insight-ful conversations by providing valuable hints and regular feedback.

I have been fortunate to work with Kate I. Lomakina. She has mentored meat the initial stage of my project by introducing me to the neuroscience back-ground, as well as the fundamental mathematical concepts needed to accomplishmy task.

Special thanks to the participants of the Dynamic Causal Modelling Seminar.They were a constant source of inspiration. This project builds upon their ex-pertise and input.

My heartfelt thanks to Prof. Dr. Andreas Krause, Prof. Dr. Randy McIntosh,Alexander Vezhnevets, Alberto G. Busetto and Lin Zhihao for their contribu-tions.

Finally, I would like to thank my family for their unflinching support and coop-eration.


4/40

Contents

1 Introduction 51.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Adopted Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Methods 72.1 Dynamic causal modelling (DCM) . . . . . . . . . . . . . . . . . 72.2 Generative embedding . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Model inversion . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Kernel construction . . . . . . . . . . . . . . . . . . . . . 9

2.3 Clustering Techniques . . . . . . . . . . . . . . . . . . . . . . . . 92.3.1 K-Means clustering . . . . . . . . . . . . . . . . . . . . . 92.3.2 Gaussian Mixture Models . . . . . . . . . . . . . . . . . . 10

2.4 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.1 Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.2 Davies-Bouldin Index . . . . . . . . . . . . . . . . . . . . 122.4.3 Log Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 122.4.4 Bayesian Information Criterion . . . . . . . . . . . . . . . 12

2.5 Predictive validity . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.1 Balanced Purity . . . . . . . . . . . . . . . . . . . . . . . 132.5.2 Normalized Mutual Information . . . . . . . . . . . . . . 13

2.6 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.7 Illustration of Clusters . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Results 163.1 Application to LFP Data . . . . . . . . . . . . . . . . . . . . . . 16

3.1.1 DCM for LFP . . . . . . . . . . . . . . . . . . . . . . . . 173.1.2 Clustering Validation . . . . . . . . . . . . . . . . . . . . 18

3.1.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.4 Computational Efficiency . . . . . . . . . . . . . . . . . . 22

3.2 Application to fMRI Data . . . . . . . . . . . . . . . . . . . . . . 233.2.1 DCM for fMRI . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Regional Correlations . . . . . . . . . . . . . . . . . . . . 253.2.3 PCA Reduction . . . . . . . . . . . . . . . . . . . . . . . . 253.2.4 Clustering Validation . . . . . . . . . . . . . . . . . . . . 253.2.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.6 Computational Efficiency . . . . . . . . . . . . . . . . . . 29

1


5/40

4 Discussion 31

4.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

A MATLAB Implementation 33A.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.3 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34A.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34A.5 Operating on Cluster . . . . . . . . . . . . . . . . . . . . . . . . . 34

Bibliography 35

2


6/40

List of Figures

1.1 Analysis Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 K-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . . 11

3.1 LFP data (Experimental Design) . . . . . . . . . . . . . . . . . . 173.2 Internal Validation on LFP Data for k-Means . . . . . . . . . . . 193.3 Internal Validation on LFP Data for Gaussian mixture models . 193.4 External Validation on LFP Data . . . . . . . . . . . . . . . . . . 203.5 MDS for k-Means on LFP Data . . . . . . . . . . . . . . . . . . . 213.6 MDS for Gaussian Mixture Models on LFP Data . . . . . . . . . 223.7 Criteria stability for LFP Data . . . . . . . . . . . . . . . . . . . 233.8 Right Hemisphere of the Brain . . . . . . . . . . . . . . . . . . . 243.9 Internal Validation on fMRI Data . . . . . . . . . . . . . . . . . . 263.10 Approach Comparison for fMRI Data . . . . . . . . . . . . . . . 273.11 MDS for k-Means on fMRI Data . . . . . . . . . . . . . . . . . . 283.12 MDS for GMM on fMRI Data . . . . . . . . . . . . . . . . . . . . 293.13 Criteria stability for fMRI Data . . . . . . . . . . . . . . . . . . . 30

3


7/40

List of Tables

A.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.3 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

A.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34A.5 Cluster Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4


8/40

Chapter 1

Introduction

1.1 Motivation

Complex biological systems can be studied using dynamical systems models.These are built upon differential equations which describe how individual sys-tem elements interact in time. We note that even simple systems may induceexceptionally complex trajectories; conversely, a system that seems complicatedat the surface may be the work of a surprisingly simple apparatus underneath.This is a very fundamental observation of cognition we obtain from dynamics.

In two previous studies, a newfangled approach was proposed, which appliesdynamical systems in neurobiology using the concept of generative embedding.

In the first study, a generative model of local field potentials (LFP) in mice wasused to read out the trial-by-trial identity of a sensory stimulus from activity inthe somatosensory cortex [1]. In the subsequent analysis, a model of functionalmagnetic resonance imaging (fMRI) data was utilized to diagnose aphasia, animpairment of language ability, in human stroke patients. The model was basedon activity in non-lesioned, specifically thalamo-temporal brain regions recordedduring speech processing [2].

The question, whether generative embedding could aid in discovering plausi-ble structure in unmarked datasets, remains an open question.

5


9/40

1.2 Adopted Approach

In this project, we have addressed this question by developing a model-basedclustering approach. More precisely, we (i) inverted a dynamic causal model [3]of neuroimaging data in a trial-wise or subject-wise fashion; (ii) constructed akernel function; (iii) applied baseline clustering algorithms on subject-specificparameter estimates; (iv) compared the solution with respect to group labelsgiven a priori; and finally (v) interpreted the obtained results with regard tothe underlying model (see Figure 1.1).

After the described analysis pipeline was laid down, we applied our methodsto experimental data. At the outset, we included the analysis of LFP datarecorded from mice with different whiskers being stimulated on each trial, ask-ing whether the model might recognize the distinct set of trials. Moreover, we

reanalyzed previously procured functional MRI data from stroke patients andhealthy controls, examining what group structure might emerge when feedingunmarked data through a generative kernel. Using these examples, we investi-gated questions of model order selection and examined the nature and extent ofagreement between our unsupervised approach and a conventional supervisedanalysis.

We anticipate that analyses that are based on mechanistically interpretablemodels will play an increasingly vital role in the future. For instance, they mightbecome particularly relevant for objective grouping of spectrum disorders [4],where the intention is to split up groups of patients with similar symptoms intopathophysiologically distinct subsets. This work represents one step towards

this goal.

Figure 1.1: Analysis Pipeline. This figure depicts the five salient milestonesin using generative embedding for model-based clustering of fMRI data.

6


10/40

Chapter 2

Methods

In this chapter, we elaborate on the mathematical concepts which serve as thefoundation of our analysis. In a next step, we will look at the various analysisand illustration techniques, followed by a discussion of criteria to validate ourapproach.

2.1 Dynamic causal modelling (DCM)

Our entire analysis is built upon a mathematical modelling technique called Dy-namic Causal Modelling (DCM) for which we describe the generic idea in thissection.

The key concept of DCM (see [3] for an elaborated description) is to treatthe brain as a deterministic nonlinear dynamic system. Effective connectivity isparameterised in terms of coupling between brain regions (e.g. neuronal activityin different regions).

Dynamic causal modelling is distinguished from alternative approaches by ac-commodating the nonlinear and dynamic aspects of neuronal interactions, aswell as by staging the estimation problem in terms of experimental pertur-bations. DCM calls upon the same experimental design principles to invokeregion-specific interactions that are used in traditional approaches. However,the causal (or explanatory) variables now become external inputs, whereas theparameters represent effective connectivity.

DCM represents an attempt to establish forward models which are able to con-vincingly capture how neuronal dynamics react to different input vectors andgenerate recorded responses. This resonates the increasing acceptance of neu-ronal models and their importance for understanding measured brain activity(see [5] for a discussion).

7


11/40

2.2 Generative embedding

Todays traditional clustering methods are restricted by two major issues. Firstly,even the most refined algorithms have great difficulties in separating useful fea-tures and uninformative sources of noise. Secondly, most clustering studies areblind to the neuronal mechanisms that discriminate between brain states (e.g.,diseases). Hence, they are unable to improve our mechanistic understanding ofthe discovered structures.

Generative embedding stands on two major pillars: a generative model for se-lection of mechanistically interpretable features and a discriminative method forclustering (see Figure 1.1).

Generative kernels are used to construct a so-called generative score space, where

the set of observations is mapped to statistical representations [1, 621]. Well-known examples are the P-kernel [22] and the Fisher kernel [17].

Generative models have proven highly beneficial in explaining how observed datais generated by the underlying (neuronal) system. One example in neuroimag-ing is DCM, which is the foundation our work. It enables statistical inference onphysiological quantities that are not directly observable with current methods,such as directed interregional coupling strengths and their modulation, e.g., bysynaptic gating [23]. From a pathophysiological perspective, disturbances ofsynaptic plasticity and neuromodulation are at the heart of psychiatric spec-trum diseases such as schizophrenia [4] or depression [24].

It is therefore likely that clustering of disease states could benefit from ex-ploiting estimates of these quantities. We anticipate that generative embeddingfor model-based clustering yields better class separability when fed into a dis-criminative clustering algorithm than conventional techniques based on brainstructure or activation and provides convincing solutions to the challenges out-lined above.

2.2.1 Model inversion

Bayesian inversion of a given dynamic causal model m defines a projectionX M that maps subject-specific data x X to a multivariate probabil-ity distribution p(|x, m) in a parametric set M. The model m determinesthe neuronal regions of interest, external inputs u, synaptic connections and a

prior distribution over the estimated parameters p(|m). Taking into accountthe model m and the data x, model inversion proceeds in an unsupervised andsample-wise manner.

The combination of the prior density over the parameters p(|m) with the likeli-hood function p(x|, m) gives the posterior density p(|x, m). The most efficientway of performing inversion is to maximize a variational free-energy bound tothe log model evidence, ln p(x|m), under Gaussian assumptions about the pos-terior (see [25] for an overview). Model inversion yields a probability densityp(|x, m) for each sample x, that is characterized by the vector of posteriormeans Rd and a covariance matrix Rdd, given d parameters.

8


12/40

2.2.2 Kernel construction

The kernel defines a similarity measure to evaluate generative models. Theselection of this metric depends on the definition of the feature space. For in-stance, one could consider the posterior means or maximum a posteriori (MAP)estimates of relevant model parameters (e.g. parameters encoding synaptic con-nection strengths) in the case of generative embedding.

We define a mapping M Rd that filters MAP estimates MAP from theposterior distribution p(|x, m). This d-dimensional vector space representsdiscriminative information required for the subsequent clustering step. In thecase when group differences reveal rich information, it might be beneficial tointegrate elements of the posterior covariance matrix into the vector space.

After creating a generative score space, any conventional kernel k : Rd

Rd

Rcan be used to compare two inverted models. The simplest one is the linear ker-nel k(xi, xj) = xi, xj, expressing the inner product between two vectors xi andxj . Nonlinear kernels (e.g. quadratic, polynomial or radial basis functions) onthe other hand, have several disadvantages for generative embedding. Complexkernels come with an increasing risk of overfitting. Moreover, the contributionof each model parameter is simple to interpret in relation to the underlyingmodel when they do not undergo further transformation. Therefore, a simplelinear kernel is highly recommended.

2.3 Clustering Techniques

Clustering considers the problem of identifying groups of data points in a high-dimensional space.

Suppose we have a data set consisting of N observations of a random D-dimensional variable x. Our goal is to partition the data set into some numberK of clusters, where we shall assume for the moment that the value of K isknown a priori. Intuitively, one might think of a cluster as comprising a groupof data points, which exhibit similar behavior.

In this section, we will look at two commonly used baseline techniques: k-Means clustering and Gaussian mixture models.

2.3.1 K-Means clusteringK-means [26] is the simplest unsupervised learning algorithm that solves thewell-known clustering problem.The goal is to find an assignment of data points to clusters, as well as a set ofcluster centers called centroids, such that the sum of the squares of the distancesof each data point to its centroid, becomes minimal.

We can achieve this goal with the help of an iterative procedure, which al-ternatively optimizes data point assignment and centroid computation. Thisis repeated until there is no further change in the assignments. Because each

9


13/40

phase reduces the value of the cost function, convergence of the algorithm is

assured. One example is illustrated in Figure 2.1.

Figure 2.1: K-Means Algorithm. This figure (adopted from [27]) shows theiterative k-Means algorithm for k = 2 on a given data set.

2.3.2 Gaussian Mixture Models

Gaussian mixture models are formed by combining multivariate normal densitycomponents. Similar to k-means clustering, Gaussian mixture modeling uses aniterative algorithm called Expectation-Maximization, that converges to a localoptimum. An example is given in Figure 2.2.

Gaussian mixture models are more flexible, since they take into account bothvariance and covariance values. Thus, the EM outcome is able to accommodateclusters of variable size and correlations much better than k-means.

Furthermore, data assignments are now referred to as soft, since they are notstrictly binary (i.e. hard assignments), but are based on maximum posteriorprobabilities for each data point.

10


14/40

Figure 2.2: Gaussian Mixture Models. This figure (adopted from [27])illustrates the EM algorithm on the same data set for two clusters.

2.4 Model selection

One of the fundamental challenges of clustering is how to evaluate results with-out auxiliary information. A common approach for evaluation of clustering

results is to use validity indices. Clustering validity approaches can use twodifferent criteria: external and internal criteria. The final selection depends onthe kind of information available and the nature of the problem.

For model selection, clustering results are evaluated purely based on the datathemselves. The best scores are typically assigned to solutions with high simi-larity within clusters and low similarity between clusters.

In our analysis, we looked at four different internal validation measures.

2.4.1 Distortion

The simplest quality measure for cluster validation is Distortion. It is used as

the cost function in k-Means clustering and is given by (see [27])

J =

Nn=1

ki=1

rin ||xn i||2 (2.1)

Distortion represents the sum of the squared distances of each data point to itscorresponding cluster. Our goal is to find values for data assignments rni andthe cluster means i so as to minimize J. The lower J, the better the model fit.k denotes the number of clusters, whereas N is the total number of data points.

11


15/40

2.4.2 Davies-Bouldin Index

The Davies-Bouldin Index (DBI) goes one step further, since it aims to identifysets of clusters that are compact and well separated. It is defined by

DBI =1

k

Ni=1

maxi=j

i + jd(i, j)

(2.2)

d(i, j) stands for the respective inter-cluster and i for the average intra-cluster distances. The smaller the value, the more appropriate the clusteringsolution.

2.4.3 Log Likelihood

The results of GMM clustering are different from those computed by k-means.Thus, we need a measure (see [27]) which considers probabilities, i.e. soft as-signments.

ln p(X|,,) =Nn=1

ln

ki=1

iN(xn|i, i)

(2.3)

In a mixture of Gaussians, the goal is to maximize the likelihood function withrespect to the given parameters (comprising the means i and covariances i ofthe components as well as the mixing coefficients i). The higher the likelihood

value, the better the model fits to the data.

2.4.4 Bayesian Information Criterion

The Bayesian Information Criterion (BIC) is devised to avoid the overfittingproblem and can be computed using the following formula:

BI C = 2 ln(L) + k ln(N) (2.4)

N is the sample size, L is the maximized value of the likelihood function forthe estimated model and k is the number of free parameters (here: clusters) inthe Gaussian model. The first term of the BIC represents model fit, whereas

the second part expresses model complexity. The model with the lowest BIC isconsidered to be the optimal trade-off between the two characteristics.

2.5 Predictive validity

Clustering results can also be evaluated with respect to an external criterion,such as known class labels. External criteria measure how closely the clusteringsolution has captured known structure in the data.

We opted for the following two metrics of external validation, since they weremost suitable to our application.

12


16/40

2.5.1 Balanced Purity

Purity is a simple evaluation measure [28]. To compute purity, each cluster isassigned to the label which is most frequent in the cluster. The accuracy of thisassignment is measured by counting the number of correctly assigned points anddividing by the total number of samples. Formally:

purity(,L) =1

N

k

maxj

|k lj | (2.5)

= {1, 2, . . . , K} is the set of clusters and L = {l1, l2, . . . , lJ} is the set oflabels. We interpret i as the set of points in cluster i and li as the number ofpoints in label i.

The term balanced is used to emphasize that the labels are uniformly distributedwithin the data set. If the dataset is imbalanced, the purity will be inflated.In order to remove this bias, we perform a linear shift. In the case where thecardinality of label classes is equal, the balanced purity reduces to the puritymeasure (50%).

balanced purity =1

2

purity Rlabels1 Rlabels

+1

2(2.6)

Here, Rlabels refers to the ratio between label classes. Bad clusterings have

purity values close to 0, a perfect clustering has a purity of 1. The purityincreases for a growing number of clusters. In particular, the purity is 1 if eachpoint gets its own cluster. Thus, we cannot use this measure to trade off thequality of clustering against the number of clusters.

2.5.2 Normalized Mutual Information

A measure that allows us to make this tradeoff is the Normalized Mutual Infor-mation (NMI) [28].

NMI(,L) =I(,L)

[H() + H(L)]/2(2.7)

I is called the Mutual Information:

I(;L) =k

j

P(k lj)logP(k lj)

P(k)P(lj)

=k

j

|k lj |

Nlog

N|k lj |

|k||lj |(2.8)

13


17/40

P(k), P(lj), and P(klj) are the probabilities of a data point being in cluster

k, label lj , and in the intersection of k and lj, respectively. Both equationsare equivalent to maximum likelihood estimates of the probabilities (i.e., theestimate of each probability is the corresponding relative frequency).

H is called the entropy and is defined as:

H() = k

P(k)log P(k) = k

|k|

Nlog

|k|

N(2.9)

H(L) = j

P(lj)log P(lj) = j

|lj |

Nlog

|lj |

N(2.10)

The second equation is once again derived from maximum likelihood estimatesof the probabilities.

I(;L) measures the amount of information by which our knowledge aboutcluster assignment increases when we know their respective labels. Maximummutual information is reached for a clustering solution that perfectly recreatesthe class labels (e.g. when K = N). Similar to purity, large cardinalities arenot penalized.

The normalization by the denominator [H() + H(L)]/2 resolves this issue,since the entropy increases with the number of clusters. For instance, H()reaches a maximum of log N for K = N, which ensures that NMI is low for

K = N. Because NMI is normalized, we can use it to compare clusterings withdifferent numbers of clusters. The denominator is chosen in this particular form,since [H() + H(L)]/2 is a tight upper bound on I(;L). Thus, the NMI isalways a number between 0 and 1.

2.6 Bootstrap

The bootstrap technique (see [29, 30] for more details) is an integral part ofstatistics. Essentially, it allows the estimation of uncertainty about a given astatistical property (such as its mean, variance or standard deviation).

This is achieved by measuring the desired property while sampling from an

approximating distribution. One standard choice is the empirical distributionof the observed data. In the particular case where a set of observations can beassumed to be independent and identically distributed, this can be implementedby creating a number of new samples of equal size, each of which is determinedby random sampling with replacement from the original dataset.

A striking advantage of bootstrap is its simplicity. It is straightforward toderive estimates of standard errors and confidence intervals. Using a bootstrapallows us to characterize the uncertainty of any validation statistic with regardto the population from which subjects were drawn. By capturing this between-subjects (or random-effects) uncertainty, our results will apply to the entire

14


18/40

population, not just to the particular sample drawn from it. Hence, it is a per-

tinent way to control and determine the stability of the obtained results (thedifferent clustering validation criteria in our case).

2.7 Illustration of Clusters

In order to visualize our data set, we have opted for an exploratory illustrationtechnique called Multidimensional scaling (MDS). This facilitates interpretationand serves as a sanity check of the obtained clustering results.

Multidimensional scaling (see [31] for an overview) encompasses a collection ofmethods, which provide insight in the underlying structure of relations betweenentities. This technique allows a geometrical representation of these relations.

Per se, these methods belong to the more generic class of methods for multi-variate data analysis. An arbitrary relation between a pair of entities, which istransformable into a proximity or a dissimilarity measure, can be considered aspossible input for multidimensional scaling. The selection of the type of spatialrepresentation can be considered to be the most important part of modellingand is determined by the application context.

In this project, we have applied Classical Multidimensional Scaling (CMDS),also known as Principal Coordinates Analysis, to visualize our clustering solu-tions for both algorithms. This method takes a matrix of interpoint distancesand creates a new constellation. The dataset is now reconstructed taking intoaccount a linear combination of all feature dimensions. The Euclidean distancesbetween them approximately reproduce the original distance matrix.

15


19/40

Chapter 3

Results

This chapter is entirely devoted to the results of our analysis.

First, we briefly review our two independent neuroimaging data sets. Then,we validate the performance of generative embedding using several differentcriteria. As an additional control, selected solutions are visualized to obtaina deeper understanding of the underlying structure. Finally, we examine theoutcome with respect to its computational efficiency. Results are presented forboth data sets.

3.1 Application to LFP Data

Our first dataset was obtained from mice (see [1] for a more details) in an ear-lier experiment. After the induction of anaesthesia and surgical preparation,animals were directed on a stereotactic frame. After inserting a single-shankelectrode with 16 channels into the barrel cortex, voltage traces were moniteredfrom all different sites (duration: 2 s). Local field potentials were recorded bypassing the data through a band-pass filter (bandwidth: 1-200 Hz).

In every trial, one of two whiskers was stimulated by means of a quick flexureof a piezo actuator. The two neighboring whiskers were chosen, such that theygenerated reliable responses at the site of each recording (dataset A1: whiskersE1 and D3; dataset A2: whiskers C1 and C3; datasets A3-A4: whiskers D3 and). The experiment involved a total of 600 samples for each mouse (see Fig.

3.1). The goal was to determine which particular whisker had been twitched ineach trial based on the neuronal activity.

Data was collected from three adult male mice. In one of these, an additionalround (with 100 trials) was conducted after the standard procedure describedabove. In this session, the actuator was very close to the whiskers, but did nottouch it, serving as a control condition to prevent experimental artifacts frominterfering with the decoding performance.

16


20/40

Figure 3.1: LFP data (Experimental Design). This figure (as given in [1])shows how the stimuli are inserted with the help of a piezo actuator. Local fieldpotentials are extracted from the barrel cortex using a 16 sites silicon probe.

3.1.1 DCM for LFP

In this section, we provide a brief summary of the main modelling principles

using DCM for LFP data (see [1] for an elaborate description).

The neural-mass model in DCM acts as the bottom layer within the modelchain. It describes a set of n neuronal populations (defined by m states each)as a system of interacting elements and it models their dynamics in the contextof experimental disruptions.

At each time incident t, the state of the system is represented by a vectorx(t) Rnm. The evolution of the system over time is described by a set ofdifferential equations that evolve the state vector and introduces for conductiondelays among spatially distinct populations. The equations specify the rate ofchange of activity in each region (i.e., of each element in x(t)) as a function of

three variables: the current state x(t) itself, the strength of experimental inputsu(t) (e.g., sensory stimulation) and a set of time-invariant parameters . Moreformally, the dynamics of the model are given by an n-valued function F = dx

dt.

The counterpart to the previous model is the forward model within DCM, whichdescribes how (hidden) neuronal activity in individual regions generates (ob-served) measurements. The model expresses field potentials as a combinationof activity in three local neuronal populations of every single brain region: exci-tatory pyramidal cells (60%); inhibitory interneurons (20%); and spiny stellate(or granular) cells (20%).

17


21/40

After inverting the model we constructed a feature space of a single-region DCM

by including the estimated posterior means of all intrinsic parameters , as wellas the posterior variances. This led to feature vectors in the set R14 for eachanimal.

3.1.2 Clustering Validation

We investigate how well our algorithms have performed with respect to good-ness of fit and in comparison to external benchmarks (here: labels). We therebyconsider both standard algorithms: k-Means and Gaussian mixture models.

The Figures below (Fig. 3.2 and 3.3) show two internal criteria for k-Means(Distortion, Davies-Bouldin Index) and for Gaussian mixture models (Negativelog likelihood, Bayesian information criterion) each. The standard mean values

are illustrated along with the 95% confidence intervals for all animals.

The Normalized Distortion1 measure represents the cost function for k-Means.Its monotonous falling behavior is self-explanatory, since the average intra-cluster distance decreases with an increasing number of clusters. The Nor-malized Davies-Bouldin Index performs in a similar manner2. However, thismonotonous descent is now scaled by the inter-cluster distance and the numberof clusters.

The negative log likelihood measure serves as a cost function. The constantdescent is expected, since its negative counterpart, the log likelihood, expressesthe objective measure for the GMM scheme. The growing number of clusters

facilitates accommodation of data points. As a direct consequence, the modelfit improves. The BIC measure weighs model fit against model complexity. Alower BIC value represents a better model. The drop tells us that the mini-mum value is achieved at point k = 10. However, this is only a local optimumrestricted by our chosen window of cluster size [1, 10]. Thus, in this particularscenario, we cannot consider the BIC to be a very meaningful metric.

1The term Normalized articulates the scaling of values by the sample size of trials.2One must note that the DB Index is only computed after k = 2, as the inter-cluster metric

for one cluster is non-existent.

18


22/40

Figure 3.2: Internal Validation. This figure illustrates the internal criteriafor the outcome of k-Means (for until 10 clusters) on the electrophysiologicaldataset.

Figure 3.3: Internal Validation. This figure denotes the internal criteria forthe outcome of GMM (for until 10 clusters) on the electrophysiological dataset.

The two external quality metrics, Balanced Purity and Normalized MutualInformation can be compared across individual subjects. From Figure 3.4, weobserve a noticeable disjunct evolution for our experimental subjects in contrastto the control animal, which only exhibits a slim advantage to the null baseline,where the class labels are shuffled in random order (i.e. class membership isarbitrary) in every computation. In k-Means, the highest discrepancy in NMIvalues lies between animal 1 and 4 and animal 3 and 4 in GMM (see nextsubsection for a detailed discussion).

19


23/40

According to k-Means the ideal cluster size is k = 2 and between k = 3 and

k = 5 in the case of GMM. Since the ground truth is known to be two, welearn that our selected set of algorithms are able to discover groups which arein line with the legitimate structure. Therefore, we conclude that our clusteringtechnique is sensitive or well-tuned to the input data.

Figure 3.4: External Validation. This figure compares the two external cri-teria for all four animals in k-Means and GMM (for until 10 clusters).

20


24/40

3.1.3 Visualization

We use a technique called CMDS (see Section 2.7) to visualize our clustering so-lution graphically. We contrast our previously selected cluster domain (k = 2, 3and 4) for an experimental subject and a control animal.

From Figure 3.5 we see that the newly discovered structures for both animalsare almost equally balanced between the whiskers in both cases k = 2 and k = 4.Hence, the purity remains the same (as seen in Figure 3.4). However, the NMIvalue decreases due to the two additional clusters. We conclude that k-Meansfavors two clusters for the experimental subject. This is in agreement with thenumber of whisker types.

Figure 3.5: MDS for k-Means. This figure compares the k-Means clusteringsolution for k = 2 and k = 4 for an experimental animal (subject 1) and thecontrol subject (subject 4).

Figure 3.6 looks at the clustering solution in GMM for k = 2 and k = 3 for twosubjects. We see that the additional cluster in the control animal hardly affectsthe accuracy, whereas the majority of the members in the new blue groupingfor subject 3 belongs to one whisker class. Thus, the overall homogeneity rises,as seen in Figure 3.4.

21


25/40

Figure 3.6: MDS for Gaussian Mixture Models. This figure contraststhe GMM clustering solution for k = 2 and k = 3 for an experimental animal(subject 3) and the control subject (subject 4).

3.1.4 Computational Efficiency

All validation measures used in this study are subject to two sources of variance.First, they are influenced by within-trials (or within-subjects) variability, dueto the non-deterministic nature of the employed clustering algorithms. Second,they are subject to between-trials (or between-subjects) uncertainty, due to thebootstrap procedure that we used to enable inference on the population fromwhich the available trials (or subjects) were sampled.

Using simulations, we assessed how many algorithmic iterations we would needto obtain estimates that were sufficiently stable to allow for comparison of dif-ferent models. In particular, we repeated our analysis (using resampled data)100 times and analyzed the evolution of the running mean (and running stan-dard error) of the statistics of interest. We also visualize an error mark of 1%(relative to the final value), which tells us that from this point onwards, ourresults are reliable with a 99% accuracy.

The simulation plots given below (Figure 3.7) visualize the computational effi-ciency for the LFP data set for the same cluster region as used for CMDS for

22


26/40

the most resource-intensive computations, one internal and one external criteria.

This is depicted for k-Means clustering. We see that the normalized DBI showsabsolutely no fluctuations in the case for k = 2 (since the both error marks arelocated at 0), but takes a while to converge for higher cluster numbers. TheNMI yields perfect stability for two or three clusters and starts varying onlyafter k = 4.

Figure 3.7: Criteria stability The two plots show the cumulative means es-timates for validation criteria over a total of 100 bootsample iterations. Thevalidation is performed on k-Means.

3.2 Application to fMRI Data

We used data from two groups of participants (patients with moderate aphasiavs. healthy controls) engaged in a simple speech-processing task (see [2] fordetails on the experiment).

The two groups of subjects consisted of 26 right-handed healthy participantswith normal hearing, English as their first language and no neurological diseasein their medical record (12 female; mean age 54.1 years; range 26-72 years); and11 patients diagnosed with moderate aphasia due to stroke (1 female; mean age66.1; range 45-90 years).

The patients aphasia profile was typifed using the Comprehensive AphasiaTest [32]. The scores were calculated in the aphasic range for: spoken and writ-ten word comprehension (single word and sentence level), single word repetitionand object naming. One must keep in mind that the lesions did not affect any ofthe temporal regions which were included into our analysis model (see Fig. 3.8).

23


27/40

Subjects were presented with two types of auditory stimulus: (i) normal speech;and (ii) time-reversed speech. They were expected to make a gender judgmenton each auditory stimulus by a brief button press.

Figure 3.8: Right Hemisphere of the Brain. This figure (as given in [33])highlights the selected brain regions. Our analysis was performed on the activityof six non-lesioned thalamo temporal regions, three from each hemisphere.

3.2.1 DCM for fMRI

DCMs for fMRI data consist of two hierarchical layers [34]. The first layer is aneuronal model of the dynamics of interacting neuronal populations with regardto experimental perturbations. Experimental manipulations u can enter themodel in two different ways: they can evoke responses through direct influenceson specific regions (e.g. sensory inputs) or modulate the strength of couplingamong regions (e.g. task demands or learning).In this project, we have used the classical bilinear DCM for fMRI [3].

dz(t)

dt= f(z(t), n, u(t)) = (A +

k

uj(t)B(j))z(t) + Cu(t), (3.1)

where z(t) represents the neuronal state vector z at time t, A is a matrix of en-dogenous connection strengths, B(j) represents the additive change of these con-nection strengths induced by modulatory input j and C denotes the strengthsof direct (driving) inputs. These neuronal parameters n = (A, B(1),...,B(

J), C)are rate constants with units s1.

The second layer of a DCM is a biophysically motivated forward model that

24


28/40

describes how a given neuronal state translates into a measurement. This model

has haemodynamic parameters h and comprises a Gaussian measurement er-ror . The nonlinear operator g(z(t), h) connects a neuronal state z(t) to apredicted blood oxygen level dependent (BOLD) signal via changes in vasodi-lation, blood flow, blood volume, and deoxyhaemoglobin content (see [35] fordetails).This is a set of nonlinear differential equations for fMRI data.

x(t) = (A + g(z(t), n) + (3.2)

The construction of the generative score space was based on the MAP estimatesof the neuronal model parameters n. The resulting space contained 22 featuresin total.

3.2.2 Regional Correlations

We compared generative embedding to a simple approach based on undirectedregional correlations.

Given the time series (i.e. raw data) of brain recordings, we calculated thespatial mean activity for each region of interest (there are six in total). In thenext step, we computed the Pearson correlation coefficients, which measure thelinear dependence between the time series of any two regions. This resultedin a final R15 feature vector, which was fed into the discriminative clusteringalgorithm.

3.2.3 PCA Reduction

Another common method applied on high-dimensional data sets is called Prin-cipal Component Analysis (PCA). This technique aided in condensing the origi-nal data comprising almost 4000 dimensions to a meagre 37-dimensional featurespace of neuronal activity. The dimensionality was chosen so as to match thenumber of features in generative embedding. The individual variance compo-nents were sorted in decreasing order.

In opposition to generative embedding, PCA-reduction achieves linear diver-sity of the data without providing a mechanistic view of how it could have beenproduced.

3.2.4 Clustering Validation

We first verify the performance of our model-based approach and compare thisto the results of the two alternative approaches. This is done for both algo-rithms, k-Means as well as Gaussian mixture models.

The figure below (Fig. 3.9) illustrates internal quality criteria. All measuresare shown along with 95% confidence intervals.The Distortion measure, the Davies-Bouldin Index and the Negative log like-lihood take on a similar pattern as seen for the electrophysiological data. Werecall the previously established conclusion that the model fit is proportional

25


29/40

to the number of clusters and confirm this statement for our second data set.

However, this time, we do attain an optimum number of clusters, namely fork = 7 (see plot for BIC). We mark this area for further investigation (see nextsubsection).

Figure 3.9: Internal Validation. This figure shows the internal criteria forvalidating clustering solutions obtained by k-Means and GMM (for until 30clusters).

In a next step, the external criteria are juxtaposed to alternative data extrac-tion approaches as well as to the null baseline (where class membership is kepthidden). This is shown in Figure 3.10.

26


30/40

The external criteria lie, as per definition, within the [0, 1] interval. The Bal-

anced purity starts at 0.5 (which can be seen as the equilibrium point, since allpoints are assigned to one single group) and gradually goes up to 1, where eachpoint is assigned to its own cluster. Unlike the Balanced purity, the Normalizedmutual information penalizes large cardinalities.

We observe a striking jump from k = 3 to the peak value k = 4 for bothcriteria in k-Means, which we look at into more detail by visualizing the clus-tering solution (see next subsection). In contrast to k-Means clustering, theexternal metrics for Gaussian Mixture Models unfold in a smooth flow withoutsalient fluctuations.

Figure 3.10: Approach Comparison. This figure denotes the external criteriaacross different feature extraction modes performed on the solution for k-Meansand GMM.

27


31/40

We derive two fundamental conclusions from these illustrations: i) All techniques

are consistently superior to their baseline variant that uses randomly permutedlabels and ii) Our model-based approach called generative embedding performsnoticeably better than alternative methods based on feature correlations anddistinction.

3.2.5 Visualization

We choose our critical cluster regions (as mentioned in the previous subsection)and visualize these for k-Means and Gaussian mixture models.

Figure 3.11 explains the startling jump we observed before in Figure 3.10. Thetransition from k = 2 k = 3 gives a minor increase in homogeneity. However,in the next step, the red cluster (for k = 2) is split up into two perfectly homo-

geneous clusters, where each of them favors healthy controls. A new red clustercontains patients with only three exceptions, as seen in Figure 3.11. Hence, thevalue for the Balanced purity rises instantly to the value of 84%.

Figure 3.11: MDS for k-Means. This figure shows a graphical illustration ofthe k-Means clustering solution for k = 2 and 4.

In opposition to k-Means, Gaussian mixture models features no remarkable im-provement in terms of cluster homogeneity in this area. Instead, we pick theregion of interest to be where the BIC reaches a minimum. From Figure 3.12 weobserve that the two clusters dominated by patients are almost identical in k = 5and k = 6. The healthy controls in the lower half, are, however, separated into

two groups. This increases the model fit. The progression from k = 6 k = 7and from k = 7 k = 10 yields a change in model fit (another partition of thelower half colored in gray), which is however, not as well compensated by themodel complexity (second term of the BIC) as in the first transition. Therefore,k = 7 is our best feasible trade-off between fit and complexity in this scenario.

28


32/40

Figure 3.12: MDS for GMM. This figure denotes a CMDS plot of the Gaussianmixture models clustering solution for k = 5, 6, 7 and 10.

3.2.6 Computational Efficiency

As before, our quality metrics are again inspected with regard to the computa-tional resources.

The iteration plots given below (Figure 3.13) illustrate the computational effi-ciency for the fMRI data set for the cluster range k = 2, 3 and 4 in the case ofGMM on two selected criteria.

We learn that the Normalized mutual information takes long to converge, whereasthe Balanced purity reaches its final value almost after half of the total boot-sample iterations. One must note, that the time and computational resourcesrequired for the LFP data set exceed the ones required for the fMRI data. Thisis due to a very simple reason that the former set is more data-intensive thanthe latter one.

The primary conclusion we derive from these simulations is that we do not needmillions of iterations. Just a couple hundred will suffice to give us a reasonableestimate. This makes our approach considerably faster.

29


33/40

Figure 3.13: Criteria stability The two plots show the cumulative meansestimates for external validation criteria over a total of 200 bootsamples. Here,we look at the values for GMM.

30


34/40

Chapter 4

Discussion

In this chapter, we first give a brief rundown of our findings. We then takea closer look at the boundaries, which confine our approach, before closingthis report with suggestions for further improvement and an outlook on moreadvanced analysis techniques.

4.1 Summary of Results

Generative embedding has previously been shown to be a powerful techniquefor classifying neural systems by relating patterns of connectivity parameters toan external (clinical) label. However, it has been unknown whether generativeembedding might also prove useful to discover new structure in a group of data,

where the relationship between neural activity and external labels is not givenbeforehand.

In the course of this study, clustering was performed and validated on twoindependent data sets. The implementation in MATLAB has led to two keyresults.

First, it was shown that the baselines techniques applied to a set of experi-mental trials recorded using electrophysiology (LFP data) were able to detectplausible structure within the data which agreed with the ground truth to afair amount. Second, while aiming to discover clusters in a group of subjectsengaged in a passive speech-listening task, our model-based analysis scheme

demonstrated a compelling advantage over conventional approaches, which didnot exploit discriminative information encoded in hidden physiological quan-tities.

4.2 Limitations

We observe huge error bars (see Section 3.2.2 Clustering Validation). Hence,our results come with a high level of uncertainty. From the technical perspec-tive, its the low amount of fMRI data, which limits the reliability of our results.In order to make more accurate and dependable statements, we must increasethe quantity of acquired data. However, it is organisationally difficult to recruit

31


35/40

many patients (one must find them within the given time window of a study

and obtain their consent). It is also rather expensive to use an MRI scanner(around 400 CHF per hour).

A crucial algorithmic aspect is internal validation. In contraposition to theBayesian Information Criterion, we have not made use of any metric for k-Means, which is capable of balancing model fit and model complexity. Ourcurrent verification targets the goodness of fit and, therefore, does not give usthe full picture required to pass a solid judgement.

From an even more fundamental perspective, the choice of a variable of interestrepresents a subtle challenge. As long as decoding restricts itself to experimentalvariables or presumed cognitive states, inference about information processingwill be limited. For instance, if decision making is implemented by a series of

processing stages, each representing a different decision-related quantity, thenspatial localization of choice will not afford major insights into the nature ofneural computations.

4.3 Future Work

Along with an increasing interest in the analysis to neuroimaging datasets, arich variety of new methods has been proposed over the last few years. Thesemethods have profited enormously from previous decades of machine learningresearch, though various field-specific lessons have been learnt as well. Here, welook at two recent techniques, which address the problem from a Bayesian orinformation-theoretic point of view.

Approximation Set Coding (ASC) (see [36] for details) represents an information-theoretic approach to clustering. Under this view, the best solution is the onethat optimally balances the competing objectives of informativeness and stabil-ity. This is the solution which extracts most bits from the data. Informativenessis maximized with as many clusters as data points, whereas stability reaches itspeak value with just 1 cluster that comprises the entire data set.

Variational Bayes Gaussian Mixture Models (VB-GMM) (see [37] for details) isanother approach to clustering, which uses the model evidence as criterion formodel selection. Similar to traditional GMM, it selects arbitrary initializationparameters and iteratively converges to an optimum for the log model evidence

(using the EM algorithm).

Though still at its preliminary stage, clustering of neuroimaging datasets islikely to continue to evolve rapidly in the coming years of brain research. Muchis at stake: basic research about the mapping between structure and functionon the one hand; applications in engineering, in a legal context, and in medicaldiagnosis on the other. In the domain of spectrum disorders, for instance, onecould decompose groups of patients with similar symptoms into pathophysio-logically distinct subgroups.

32


36/40

Appendix A

MATLAB Implementation

This part of the appendix provides a short description of the main analysis andcode scripts that were implemented in MATLAB for the purpose of this study.They are subdivided in the following categories: feature extraction, clustering,bootstrapping, visualization and operating on the cluster.

A.1 Feature Extraction

Table A.1 shows MATLAB scripts, which load the data and true labels fromrespective directories for further processing of each data set.

Function Input Output

load_elec_data subject (1, 2, 3 or 4) data matrix, labels vectorload_aphasic_data mode (o, m or p for k-Means) data matrix, labels vector

Table A.1: Feature Extraction

A.2 Clustering

Table A.2 numerates scripts, which perform clustering and output metrics forevaluation.

Function Input Output

k (clustersize) (Normalized) Distortion,kmeans_on_dataset_data data, labels, (Normalized) DBI, Balancedvarargin (visualization flag) purity, Random purity,

NMI, Random NMIk (clustersize) Negative Log Likelihood,

gmm_on_dataset_data data, labels, BIC, Balanced purity,varargin (visualization flag) Random purity,

NMI, Random NMI

Table A.2: Clustering

33


37/40

A.3 Bootstrap

The table below (Table A.3) lists three main scripts for the bootstrap technique.

Function Input Outputmean vector, running mean,

cumbootstraps confidence interval upper and lower bound,99% mark

bootstrap_on_aphasic_data mode (o, m or p), Nonealgo (kmeans or gmm)

bootstrap_on_elec_data subject (1, 2, 3 or 4), Nonealgo (kmeans or gmm)

Table A.3: Bootstrap

A.4 Visualization

Various measures have been visualized to understand, interpret and compareresults. Table A.4 contains plot and MDS scripts.

Function Input Outputmds_plot None None

plot_quality_measures None Noneplot_allmodes None None

plot_allsubjects None Noneplot_iteration_subplots None None

Table A.4: Visualization

A.5 Operating on Cluster

In order to speed up computation and save time resources, heavy computationswere delegated to Nash. Table A.5 summarizes the scripts used to run code onthe cluster.

Function Input Outputrun_elec_on_cluster None Nonerun_aphasic_on_cluster None None

Table A.5: Cluster Operation

34


38/40

Bibliography

[1] Kay H. Brodersen, F. Haiss, C.S. Ong, M. Tittgemeyer F. Jung, J.M.Buhmann, B. Weber, and K.E. Stephan. Model-based feature constructionfor multivariate decoding. NeuroImage, 56:601615, May 2011.

[2] Kay H. Brodersen, Thomas M. Schofield, Alexander P. Leff, Cheng SoonOng, Ekaterina I. Lomakina, Joachim M. Buhmann, and Klaas E. Stephan.Generative embedding for model-based classification of fmri data. PLoSComputational Biology, 7, June 2011.

[3] Lee M. Harrison, Will D. Penny, and Karl J. Friston. Dynamic causalmodelling. NeuroImage, 19:12731302, August 2003.

[4] Klaas E. Stephan, Karl J. Friston, and Chris D. Frith. Dysconnectionin schizophrenia: From abnormal synaptic plasticity to failures of self-monitoring. Schizophrenia Bulletin, 35:509527, May 2009.

[5] B. Horwitz, K.J. Friston, and J. G. Taylor. Neural modeling and functionalbrain imaging: an overview. Neural Networks, 13:829846, November 2000.

[6] Manuele Bicego, Vittorio Murino, and Mrio A. T. Figueiredo. Similarity-based classification of sequences using hidden markov models. PatternRecogn., 37:22812291, December 2004.

[7] Tony Jebara, Risi Kondor, and Andrew Howard. Probability product ker-nels. J. Mach. Learn. Res., 5:819844, December 2004.

[8] Matthias Hein and Olivier Bousquet. Hilbertian metrics and positive defi-nite kernels on probability measures. Proceedings of AISTATS, 10:136143,2004.

[9] Marco Cuturi, Kenji Fukumizu, and Jean-Philippe Vert. Semigroup kernelson measures. J. Mach. Learn. Res., 6:11691198, December 2005.

[10] Anna Bosch, Andrew Zisserman, and Xavier Munoz. Scene classificationvia plsa. Analysis, 3954:517530, 2006.

[11] A. Bosch, A. Zisserman, and X. Muoz. Scene classification using a hy-brid generative/discriminative approach. IEEE Transactions on PatternAnalysis and Machine Intelligence, 30(4):712727, 2008.

[12] Manuele Bicego, Elzbieta Elbieta Pekalska, David M. J. Tax, and RobertP. W. Duin. Component-based discriminative classification for hiddenmarkov models. Pattern Recognition, 42(11):26372648, 2009.

35


39/40

[13] Nathan Smith and Mahesan Niranjan. Data-dependent kernels in svm

classification of speech patterns. Sixth International Conference on SpokenLanguage Processing, 2001.

[14] Alex D. Holub, Max Welling, and Pietro Perona. Combining generativemodels and fisher kernels for object recognition. Tenth IEEE InternationalConference on Computer Vision ICCV05 Volume 1, 1:136143, 2005.

[15] Manuele Bicego, Marco Cristani, and Vittorio Murino. Clustering-basedconstruction of hidden markov models for generative kernels. Construction,3:466479, 2009.

[16] Thomas Hofmann. Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. Neural In-formation Processing Systems, pages 914920, 2000.

[17] Tommi S. Jaakkola and David Haussler. Exploiting generative models indiscriminative classifiers. In Advances in Neural Information ProcessingSystems 11, pages 487493. MIT Press, 1999.

[18] T.P. Minka. Discriminative models, not discriminative training. TechnicalReport TR-2005-144, Microsoft Research, Cambridge, 2005.

[19] Julia A. Lasserre, Christopher M. Bishop, and Thomas P. Minka. Principledhybrids of generative and discriminative models. Computer Vision andPattern Recognition, IEEE Computer Society Conference on, 1:8794, 2006.

[20] A. Perina, M. Cristani, U. Castellani, V. Murino, and N. Jojic. Ahybrid generative/discriminative classification framework based on free-energy terms. IEEE 12th International Conference on Computer Vision,1:20582065, 2009.

[21] M. Figueiredo, P. Aguiar, A. Martins, V. Murino, and M. Bicego. Informa-tion theoretical kernels for generative embeddings based on hidden markovmodels. In Proceedings of the 2010 joint IAPR international conferenceon Structural, syntactic, and statistical pattern recognition, pages 463472,Berlin, Heidelberg, 2010. Springer-Verlag.

[22] David Haussler. Convolution kernels on discrete structures. UCSC-CRL-99-10. Available: http://www.cbse.ucsc.edu/sites/default/files/convolutions.pdf, July 1999.

[23] Klaas E. Stephan, Lars Kasper, Lee M. Harrison, Jean Daunizeau, Han-neke E.M. den Ouden, Michael Breakspear, and Karl J. Friston. Nonlineardynamic causal models for fmri. NeuroImage, 42:649662, May 2008.

[24] Eero Castren. Is mood chemistry? Nature Reviews Neuroscience, 6:241246, March 2005.

[25] Karl Friston, Jeremie Mattout, Nelson Trujillo-Barreto, John Ashburner,and Will Penny. Variational free energy and the laplace approximation.2006.

36


40/40

[26] J.B. MacQueen. Some methods for classification and analysis of multivari-

ate observations. Proceedings of the Fifth Berkeley Symposium on Mathe-matical Statistics and Probability, 1:281297, 1967.

[27] Christopher M. Bishop. Pattern Recognition and Machine Learning.Springer, 2006.

[28] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze. In-troduction to Information Retrieval. Cambridge University Press, 2008.

[29] Peter Buehlmann und Martin Maechler. Computational statistics. Techni-cal report, ETH, Zurich, February 2008.

[30] Bradley Efron. Bootstrap methods: another look at the jackknife. Annalsof Statistics, 7(1):126, 1979.

[31] Ingwer Borg and Patrick J.F. Groenen. Modern Multidimensional Scaling.Springer, 2005.

[32] Kate Swinburn, Gillian Porter, and David Howard. Comprehensive AphasiaTest. Psychology Press, 2004.

[33] Alexander P. Leff, Thomas M. Schofield, Klass E. Stephan, Jennifer T.Crinion, Karl J. Friston, and Cathy J. Price. The cortical dynamics ofintelligible speech. Journal of Neuroscience, 28(49):1320913215, 2008.

[34] Klaas E. Stephan, L. M. Harrison, S. J. Kiebel, O. David, W.D. Penny, andK.J. Friston. Dynamic causal models of neural system dynamics: Currentstate and future extensions. J Biosci, 32:12944, June 2007.

[35] Klaas E. Stephan, N. Weiskopf, P. M. Drysdale, P. A. Robinson, and K. J.Friston. Comparing hemodynamic models with dcm. NeuroImage, 38:387401, November 2007.

[36] Information theoretic model validation for clustering. IEEE, 2010. (inpress).

[37] W.D. Penny. Variational bayes for d-dimensional gaussian mixture models.Technical report, Wellcome Department of Cognitive Neurology, UniversityCollege London, July 2001.

Clustering neural systems using generative embedding

Documents

Transcript of Clustering neural systems using generative embedding