Topics in Machine Learning and Neuroscience · Part I: Modellingearlyvisualprocessing Part II:...

Part I: Modelling early visual processingPart II: Neuroimaging data analysis

Future prospects

Topics in Machine Learning and Neuroscience

Aapo Hyvarinen

Dept of Computer Science and HIITUniversity of Helsinki, Finland

Aapo Hyvarinen Topics in Machine Learning and Neuroscience


Future prospects

Outline of this talk:

◮ Part I: Statistical models of responses in visual cortex.◮ Independent Component Analysis / Sparse Coding◮ Topographic ICA◮ Multi-layer approach can predict properties beyond V1.

◮ Part II: Analysing resting-state activity in EEG/MEG◮ Variants of ICA◮ Testing components◮ Causal analysis◮ Non-stationarity of connectivity



Future prospects

Statistical-ecological theoryLinear models & ICAIndependent Subspace Analysis & Topographic ICAEstimation of unnormalized modelsGoing towards V2

Modelling early visual processing

LGNV1

retina

From the eye to the primary visual cortex (V1)



Future prospects


Theories of response properties of visual neurons

◮ Edge detection

◮ Joint localization in space and frequency

◮ Texture classification

◮ But: the above give only vague predictions.

◮ Here: Statistical-ecological approach (Barlow, 1972)◮ What is important in a real environment?◮ Natural images have statistical regularities.◮ Can we “explain” receptive fields by basic statistical properties

of natural images?

◮ Extremely relevant to image processing /engineering



Future prospects


Linear statistical models of images

= s1· + s2· + · · ·+ sk ·

◮ Denote by I (x , y) the gray-scale values of pixels.

◮ Model as a linear sum of basis vectors:

I (x , y) =∑i

Ai(x , y)si (1)

◮ What are the “best” basis vectors for natural images?



Future prospects


Independent Component Analysis (Jutten and Herault, 1991)

◮ In linear model

I (x , y) =∑i

Ai(x , y)si (2)

◮ assume that◮ The si are mutually statistically independent◮ The si are nongaussian, e.g. sparse◮ For simplicity: number of Ai equals number of pixels

◮ Then, the underlying basis vectors Ai are identifiable (Comon,

1994).



Future prospects


Sparsity

◮ A form of nongaussianity often encountered in natural signals

◮ A random variable is “active” only rarely

−5

0

5

gaus

sian

−5

0

5

spars

e

◮ Sparse coding: Find features such that coefficient maximallysparse

◮ Deep result: For images, ICA is sparse coding.



Future prospects


ICA / sparse coding of natural images

(Olshausen and Field, 1996; Bell and Sejnowski, 1997)

Using the FastICA algorithm (Hyvarinen, 1999)



Future prospects


ICA of natural images with colour

(Hoyer and Hyvarinen, 2000)



Future prospects


Independent subspace analysis

◮ Components estimated from natural images are not reallyindependent.

◮ The statistical structure much more complicated (of course!).

◮ Assume si can be divided into groups or subspaces (Cardoso,1998), such that

◮ the si in the same group are dependent on each other◮ dependencies between different groups are not allowed.

◮ Use classic complex cell model, norm of projection in subspace

◮ Maximize independence / sparsity of complex cell output(Hyvarinen and Hoyer, NeCo, 2000)



Future prospects


Computation of energy features in ISA

IInput

8<w , I>

7<w , I>

(.)

2(.)

2

6

<w , I>

2<w , I>

1<w , I>

3

<w , I>

5<w , I>

4<w , I>

(.)

2(.)

2(.)Σ

2(.)

2(.)

2

2(.)

Σ



Future prospects


Independent subspaces of natural image patches

Each group of 4 basis vectors corresponds to one complex cell.Aapo Hyvarinen Topics in Machine Learning and Neuroscience


Future prospects


Topographic ICA (Hyvarinen and Hoyer, 2001)

◮ In the brain, response properties mostly change continuouslywhen moving on the cortical surface.

◮ Cells (components) are arranged on a

two-dimensional lattice

◮ Again, simple cell outputs are sparse, but not independent:dependence follows topography.

◮ Multi-layer version used by Google to find a cat detector



Future prospects


Topographic ICA on natural image patches

Basic vectors (simple cell RF’s) with spatial organization



Future prospects


Digression: Generalizations lead unnormalized models

◮ Generalize ISA and topographic ICA by estimating two layers(Osindero et al 2006)

p(x;W,M) =1

Z (W,M)exp[

∑i

G (∑i

mij(wTj x)

2)] (3)

◮ This is an unnormalized model:Density p known only up to a multiplicative constant

Z (W,M) =

∫exp[

∑i

G (∑i

mij(wTj x)

2)]dx

which cannot be computed with reasonable computing time



Future prospects


Estimation of unnormalized statistical models

◮ Score matching (Hyvarinen, JMLR, 2005)

◮ Take derivative of model log-density w.r.t. x, so partitionfunction disappears

◮ Fit this derivative to the same derivative of data density◮ Easy to compute due to a integration-by-parts trick

◮ Noise-contrastive estimation(Gutmann and Hyvarinen, JMLR, 2012)

◮ Learn to distinguish data from artificially generated noise◮ Logistic regression learns ratios of pdf’s of data and noise◮ For known noise pdf, we have in fact learnt data pdf



Future prospects


Another unnormalized model:

Linear correlations between components

◮ Many methods force components to be strictly uncorrelated

◮ So, any remaining linear correlations cannot be observed

◮ However, the “true” components could be correlated evenlinearly

◮ In ongoing work (Sasaki et al, 2013, 2014) we learn correlatedcomponents

− log p(x|W) =∑i

E [|wTi x|]+

∑i ,j

βij E [|wTi x−wT

j x|]−log Z (βij)

(4)



Future prospects


Going towards V2

◮Compute complex cell outputsfor natural images

++

++

−

+

+++

++++

+

−−−−

−−−

−

−−−−−−

++++

+++S

TIM

ULU

S

◮ Do ICA on complex cell outputs

◮ Emergence of corner detector cells (e)?(Hosoya & Hyvarinen, JNS, 2015)



Future prospects


Conclusion of Part I

◮ Properties of visual neurons can be quantitatively modelled bystatistical properties of natural images.

◮ Answers the Why question important in neuroscience

◮ Simple cell receptive fields can be learned by maximizingsparsity / independence of linear filters.

◮ By modelling dependencies between simple cell ouputs we canmodel complex cells and topography

◮ Theoretical development on estimation of unnormalizedmodels



Future prospects

Brain at restICA of spontaneous EEG/MEGTesting independent componentsCausal analysisNon-stationarity of connectivity

Outline of this talk:

◮ Statistical models of responses in visual cortex.◮ Independent Component Analysis / Sparse Coding◮ Topographic ICA◮ Multi-layer approach can predict properties beyond V1.

◮ Analysing resting-state activity in EEG/MEG◮ Variants of ICA◮ Testing components◮ Causal analysis◮ Non-stationarity of connectivity



Future prospects


The brain at rest

◮ The subject’s brain is being measured while◮ the subject has no task◮ the subject receives no stimulation

◮ Measurements by◮ functional magnetic resonance imaging (fMRI)◮ electroencephalography (EEG)◮ magnetoencephalography (MEG)

◮ Why is this data so interesting?◮ Not dependent on subjective choices in experimental design

(e.g. stimulation protocol, task)◮ Emphasis on the internal dynamics, instead of responses to

stimuli



Future prospects


Is anything happening in the brain at rest?

◮ Some brain areas are actuallymore active at rest

◮ “Default-mode network(s)” inPET and fMRI (Raichle 2001)

◮ Brain activity is “intrinsic”instead of just responses tostimulation

◮ How to analyse resting state inmore detail?

(Raichle, 2010 based on Shulman et al 1997)



Future prospects


ICA finds resting-state networks in fMRI

(Beckmann et al, 2005)

a) Medial and

b) lateral visual areas,

c) Auditory system,

d) Sensory-motor system,

e) Default-mode network,

f) Executive control,

g) Dorsal visual stream



Future prospects


ICA finds resting-state networks in fMRI

(Beckmann et al, 2005)

a) Medial and

b) lateral visual areas,

c) Auditory system,

d) Sensory-motor system,

e) Default-mode network,

f) Executive control,

g) Dorsal visual stream

Very similar results obtained if subject watching a movie!



Future prospects


How about EEG and MEG?

◮ Very high temporal accuracy (millisecond scale)

◮ Not so high spatial accuracy (less than in fMRI)

◮ Spontaneous activity vs. evoked responses

◮ Typically characterized by oscillations, e.g. at around 10 Hz

◮ Up to 306 time series (signals), 104 . . . 105 time points.

◮ Information very different from fMRI



Future prospects


Different sparsities of EEG/MEG data

◮ ICA finds components by maximizing sparsity, butsparsity of what?Depends on preprocessing and representation

◮ Assume we do wavelet or short-time Fourier transform

◮ We have different sparsities:



Future prospects






Sparsity in time:

Temporally modulated

0 2000 4000 6000 8000 10000 s



Future prospects






Sparsity in time:


0 2000 4000 6000 8000 10000 s

Sparsity in space:

Localised on cortex



Future prospects






Sparsity in time:


0 2000 4000 6000 8000 10000 s

Sparsity in space:

Localised on cortex

Sparsity in frequency:

narrow-band signals

0 100 200 300 400 500−2

−1

0

1

2



Future prospects


Spectral sparsity: Fourier-ICA

◮ Problem: Rhythmic sources(oscillations) may not be sparse

Modulation of oscillations

0 200 400−2

0

2

0 200 4000

1

2

3

0 200 400

−2

0

2

−4 −2 0 2 40

200

400

0 200 4000

1

2

3

0 200 400

−2

0

2

−2 −1 0 1 20

400

800

a)

b) c) d)

e) f) g)



Future prospects


Spectral sparsity: Fourier-ICA

◮ Problem: Rhythmic sources(oscillations) may not be sparse

Modulation of oscillations

0 200 400−2

0

2

0 200 4000

1

2

3

0 200 400

−2

0

2

−4 −2 0 2 40

200

400

0 200 4000

1

2

3

0 200 400

−2

0

2

−2 −1 0 1 20

400

800

a)

b) c) d)

e) f) g)

◮ Solution: Perform ICA on short-timeFourier transforms:

◮ Divide each channel intotime windows e.g. 1 sec long

◮ Fourier transform each window◮ Joint sparsity in time and frequency

(NeuroImage, 2010).

⇓



Future prospects


Spatial sparsity (spatial ICA)

◮ Images observed at different time points are linear sums of“source images”

= an1

= a21

= a11 +a12 ... +a1n

◮ Reverses the roles of observations and variables

◮ Maximizes spatial sparsity alone

◮ Almost always used in fMRI



Future prospects


Spatial ICA in MEG

◮ Spatial ICA possible for MEG by projecting data on the cortex

◮ We combine this with short-time Fourier transforms

◮ Maximizes sparsity spatially and spectrally

◮ No assumption on temporal independence

(Ramkumar et al, Human Brain Mapping, 2012.

Here, not resting data but with “naturalistic stimulation”)



Future prospects


Testing ICs: motivation

◮ ICA algorithms give a fixed number of components and do nottell which ones are reliable (statistically significant)

◮ How do we know that an estimated component is not just arandom effect?

◮ Algorithmic artifacts also possible (local minima)



Future prospects


Testing ICs: motivation

◮ ICA algorithms give a fixed number of components and do nottell which ones are reliable (statistically significant)

◮ How do we know that an estimated component is not just arandom effect?

◮ Algorithmic artifacts also possible (local minima)

◮ We develop a statistical test based on inter-subjectconsistency:

◮ Do ICA separately on several subjects◮ A component is significant if it appears in two or more subjects

in a sufficiently similar form◮ We formulate a rigorous null hypothesis to quantify this idea

(NeuroImage, 2011)



Future prospects


Testing ICs: results

One IC Another IC

b) Minimum norm estimate

beep spch v/hf v/bd tact2

3

4

5

6

7

z−sc

ore

(neg

)

d) Modulation by stimulation

10 15 20 25 30

c) Fourier spectrum

Frequency (Hz)

#6

b) Minimum norm estimate

beep spch v/hf v/bd tact2

3

4

5

6

7

z−sc

ore

(neg

)

d) Modulation by stimulation

10 15 20 25 30

c) Fourier spectrum

Frequency (Hz)

#6



Future prospects


Causal analysis: Introduction

◮ Model connections between the measured variables

◮ Two fundamental approaches◮ If time-resolution of measurements fast enough, we can use

autoregressive modelling (Granger causality)◮ Otherwise, we need structural equation models

◮ If measured variables are raw EEG/MEG, we should firstlocalize sources

◮ After blind source separation, sources are uncorrelated⇒ More meaningful to model dependencies of envelopes(amplitudes, variances)



Future prospects


Causal analysis: Structural equation models

◮ How does an externally imposed change inone variable affect the others?

xi =∑j 6=i

bijxj + ei

◮ Difficult to estimate, not simple regression◮ Classic methods fail in general



Future prospects


Causal analysis: Structural equation models

◮ How does an externally imposed change inone variable affect the others?

xi =∑j 6=i

bijxj + ei

◮ Difficult to estimate, not simple regression◮ Classic methods fail in general

◮ Can be estimated if (Shimizu et al., JMLR, 2006)

1. the ei (t) are mutually independent2. the ei (t) are non-Gaussian, e.g. sparse3. the bij are acyclic: There is an ordering of xi

where effects are all “forward”

x4

x2

-0.56

x3

-0.3

x1

0.89

x5

0.37

0.82

0.14

x6

1

x7

-0.26

0.12-1

1



Future prospects


Causal analysis: Simple measures of direction

◮ The very simplest case: choose between regression models

y = ρx + d (5)

where d is independent of x , and symmetrically

x = ρy + e (6)

◮ If data is Gaussian we can estimate ρ = E [xy ]BUT : Both models have same likelihood!

◮ For non-Gaussian data, approximate log-likelihood ratio as

R = ρE [x g(y)− g(x)y ] (7)

where g is a nonlinearity similar to those used in ICA(Hyvarinen & S.M. Smith, JMLR, 2013).

◮ Choose direction based on sign of R!



Future prospects


Sample of results on MEG

Black: positive influence, red: negative influence.Green: manually drawn grouping.

Here, using GARCH model (Zhang and Hyvarinen, UAI2010)



Future prospects


Analysis of non-stationarity

◮ Analyse temporal changes (non-stationarity) in connectivity(correlations)

◮ Alternatively, variability over subjects

◮ Our approach: Find pairs of brain areas so that connectivitybetween them is maximally changing.

◮ Related to simple linear algebra problem which can be solved

max‖w‖=‖v‖=1,wT v=0

wTKv (8)



Future prospects


Maximally variable pair of components in fcMRI

w

v

Connectivity of 177 ROI’s for 103 subjects(Hyvarinen et al, submitted)



Future prospects


Conclusion of Part II

◮ Exploratory data analysis by ICA can give information aboutinternal dynamics during rest, and

◮ activity not directly related to stimulation◮ responses when stimulation too complex



Future prospects





◮ We present two stages of analysis◮ Finding sources by different variants of ICA◮ Analyzing their effective connectivity:

◮ Non-Gaussian versions of SEM



Future prospects







◮ Significance tests possible using intersubject consistency



Future prospects







◮ Significance tests possible using intersubject consistency

◮ Non-stationarity of connectivity is a new target for machinelearning



Future prospects

Future prospects

◮ Neuroscience is entering the Big Data era◮ Human Connectome Project etc

◮ Machine Learning can help Neuroscience in many ways



Future prospects

Future prospects



◮ Computational neuroscience and machine learningapproaching each other again

◮ Like in the 1980’s neural network boom?◮ ... but this time, neuroscience data and computers are bigger



Future prospects

Future prospects



◮ Computational neuroscience and machine learningapproaching each other again

◮ Like in the 1980’s neural network boom?◮ ... but this time, neuroscience data and computers are bigger

◮ Methods I develop are general, even though inspired byneuroscience

◮ Perhaps their main impact will be found elsewhere



Future prospects

Future prospects: Some concrete ideas

◮ Neurofeedback—with big data, big computers, goodalgorithms

◮ Modelling resting-state dynamics? Cf. Dyna

◮ Deep un supervised learning— maybe like nonlinear ICA

◮ Gestalts, a new information-processing principle?

◮ Modelling variability in connectivity (previous slides)



Future prospects

Future prospects: Some concrete ideas

◮ Neurofeedback—with big data, big computers, goodalgorithms

◮ Modelling resting-state dynamics? Cf. Dyna

◮ Deep un supervised learning— maybe like nonlinear ICA

◮ Gestalts, a new information-processing principle?

◮ Modelling variability in connectivity (previous slides)

◮ Grandiose (or lunatic?) vision:◮ Can we do to neuroscience what deep learning did to AI?


Topics in Machine Learning and Neuroscience · Part I: Modellingearlyvisualprocessing Part II:...

Documents

Transcript of Topics in Machine Learning and Neuroscience · Part I: Modellingearlyvisualprocessing Part II:...