Machine Learning on fMRI Data

34
Machine Learning on fMRI Data Rebecca Hutchinson January 28, 2008

description

Machine Learning on fMRI Data. Rebecca Hutchinson January 28, 2008. Marcel Just. Svetlana Shinkareva. Francisco Pereira. Vincente Malave. Neurosemantics Research Team. Postdoctoral Fellows. Professional Staff. Tom Mitchell. Vladimir Cherkassky. Rob Mason. PhD Students. Kai Min Chang. - PowerPoint PPT Presentation

Transcript of Machine Learning on fMRI Data

Page 1: Machine Learning  on fMRI Data

Machine Learning on fMRI Data

Rebecca Hutchinson

January 28, 2008

Page 2: Machine Learning  on fMRI Data

Professional Staff

Vladimir Cherkassky

PhD Students

Rebecca HutchinsonKai Min Chang

Neurosemantics Research Team

Marcel Just

Postdoctoral Fellows

Tom Mitchell

Mark Palatucci Indra RustandiVincente Malave Francisco Pereira

Svetlana Shinkareva Rob Mason

Page 3: Machine Learning  on fMRI Data

fMRI Data

Sign

al

Am

plitu

de

Time (seconds)

Hemodynamic Response

Neural activity

Features: 10,000 voxels, imaged every second.Training examples: 10-40 trials (task repetitions).

Page 4: Machine Learning  on fMRI Data
Page 5: Machine Learning  on fMRI Data

Brain Imaging and Machine Learning

ML Case study: high dimensional, sparse data

• "Learning to Decode Cognitive States from Brain Images,“ T.M. Mitchell, et al., Machine Learning, 57(1), pp. 145-175, 2004

• "The Support Vector Decomposition Machine" F. Pereira, G. Gordon, ICML-2006.

• "Classification in Very High Dimensional Problems with Handfuls of Examples", M. Palatucci and T. Mitchell, ECML-2007

• Francisco Pereira PhD thesis topic

Page 6: Machine Learning  on fMRI Data

Brain Imaging and Machine Learning

ML Case study: learning models of individuals, of the population, and of individual variation

• "Training fMRI Classifiers to Discriminate Cognitive States across Multiple Subjects", X. Wang, R. Hutchinson, and T. M. Mitchell, NIPS 2003.

• "Classifying Multiple-Subject fMRI Data Using the Hierarchical Gaussian Naïve Bayes Classifier“, Indrayana Rustandi, 13th Conference on Human Brain Mapping. June 2007.

• Indra Rustandi PhD thesis topic

Page 7: Machine Learning  on fMRI Data

Brain Imaging and Machine Learning

ML Case study: complex time series generated by hidden processes

• "Hidden Process Models", Rebecca Hutchinson, T. Mitchell, I. Rustandi, ICML-2006.

• "Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data,"R. Hutchinson, T.M. Mitchell, I. Rustandi, 11th Conference on Human Brain Mapping. 2005.

• Rebecca Hutchinson PhD thesis topic

Page 8: Machine Learning  on fMRI Data

Study: Pictures and Sentences

• Task: Decide whether sentence describes picture correctly, indicate with button press.

• 13 normal subjects, 40 trials per subject.• Sentences and pictures describe 3 symbols: *,

+, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’.

• Images are acquired every 0.5 seconds.

Read Sentence

View Picture Read Sentence

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

Page 9: Machine Learning  on fMRI Data

Goals for fMRI

• To track cognitive processes over time. – Estimate process hemodynamic responses.– Estimate process timings.

• Allowing processes that do not directly correspond to the stimuli timing is a key contribution of HPMs!

• To compare hypotheses of cognitive behavior.

Page 10: Machine Learning  on fMRI Data

Process 1: ReadSentence Response signature W:

Duration d: 11 sec. Offsets : {0,1} P(): {0,1}

One configuration c of process instances i1, i2, … ik:

Predicted mean:

Input stimuli:

i1

Timing landmarks : 21

i2

Process instance: i2 Process : 2 Timing landmark: 2

Offset O: 1 (Start time: 2+ O)

sentencepicture

v1v2

Process 2: ViewPicture Response signature W:

Duration d: 11 sec. Offsets : {0,1} P(): {0,1}

v1v2

Processes of the HPM:

v1

v2

+ N(0,12)

+ N(0,22)

Page 11: Machine Learning  on fMRI Data

HPM FormalismHPM = <, C,>

= <1,…, >, a set of processes (e.g. ReadSentence) = <W,d,,>, a process

W = response signatured = process duration = allowable offsets = multinomial parameters over values in

C = <c1,…, cC>, a set of configurations

c = <i1,…,iL>, a set of process instances i = <,,O>, a process instance (e.g. ReadSentence(S1))

= process ID = timing landmark (e.g. stimulus presentation of S1)

O = offset (takes values in )

= <12,…,V

2>, variance for each voxel

Page 12: Machine Learning  on fMRI Data

Encoding Experiment Design

Configuration 1:

Input stimuli:

Timing landmarks :

21

ViewPicture = 2

ReadSentence = 1

Decide = 3

Configuration 2:

Configuration 3:

Configuration 4:

Constraints Encoded:

(i1) = {1,2}(i2) = {1,2}(i1) != (i2)O(i1) = 0O(i2) = 0(i3) = 3O(i3) = {1,2}

Processes:

Page 13: Machine Learning  on fMRI Data

Inference• Over configurations

• Choose the most likely configuration, where:

• C=configuration, Y=observed data, HPM=model

Page 14: Machine Learning  on fMRI Data

Learning

• Parameters to learn:– Response signature W for each process– Timing distribution for each process – Variance v

for each voxel

• Expectation-Maximization (EM) algorithm.– Latent variable z is an indicator of which configuration

is correct.– E step: estimate a probability distribution over z.– M step: update estimates of W (using weighted least

squares) and and v(using standard MLEs) based

on the E step.

Page 15: Machine Learning  on fMRI Data

Cognitive Processes

Page 16: Machine Learning  on fMRI Data

Comparing Cognitive Models5-fold cross-validation: Average data log-likelihood of held-out fold.

Page 17: Machine Learning  on fMRI Data
Page 18: Machine Learning  on fMRI Data

Learned Temporal Response

Images (each image is 0.5 seconds)

Pro

cess

res

pons

e pa

ram

eter

val

ueProcess response time course for a single voxel

of Subject K for process R under HPM13

Page 19: Machine Learning  on fMRI Data

Learned Offset DistributionsSubject K

Page 20: Machine Learning  on fMRI Data

Related Work

• fMRI– General Linear Model (Dale99)

• Must assume timing of process onset to estimate hemodynamic response.

– Computer models of human cognition (Just99, Anderson04)• Predict fMRI data rather than learning parameters of processes from

the data.

• Machine Learning – Classification of windows of fMRI data (Cox03, Haxby01,

Mitchell04)• Does not typically model overlapping hemodynamic responses.

– Dynamic Bayes Networks (Murphy02, Ghahramani97)• HPM assumptions/constraints are difficult to encode in DBNs.

Page 21: Machine Learning  on fMRI Data

Future Work

• Regularization for process responses:– Temporal smoothness– Spatial smoothness – Spatial sparsity– Spatial priors

• Improve algorithm complexities.

• Apply to open cognitive science problems.

Page 22: Machine Learning  on fMRI Data

Conclusions

• Take-away messages:– fMRI data is an interesting case study for a

number of machine learning challenges.– HPMs are a probabilistic model for time series

data generated by a latent collection of processes.

– In the fMRI domain, HPMs can simultaneously estimate the hemodynamic response and localize the timing of cognitive processes.

Page 23: Machine Learning  on fMRI Data

ReferencesJohn R. Anderson, Daniel Bothell, Michael D. Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1060, 2004. http://act-r.psy.cmu.edu/about/.

David D. Cox and Robert L. Savoy. Functional magnetic resonance imaging (fMRI) ”brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage, 19:261–270, 2003.

Anders M. Dale. Optimal experimental design for event-related fMRI. Human Brain Mapping, 8:109–114, 1999.

Zoubin Ghahramani and Michael I. Jordan. Factorial hidden Markov models. Machine Learning, 29:245–275, 1997.

James V. Haxby, M. Ida Gobbini, Maura L. Furey, Alumit Ishai, Jennifer L. Schouten, and Pietro Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425–2430, September 2001.

Marcel Adam Just, Patricia A. Carpenter, and Sashank Varma. Computational modeling of high-level cognition and brain function. Human Brain Mapping, 8:128–136, 1999. http://www.ccbi.cmu.edu/project 10modeling4CAPS.htm.

Tom M. Mitchell et al. Learning to decode cognitive states from brain images. Machine Learning, 57:145–175, 2004.

Kevin P. Murphy. Dynamic bayesian networks. To appear in Probabilistic Graphical Models, M. Jordan, November 2002.

Page 24: Machine Learning  on fMRI Data
Page 25: Machine Learning  on fMRI Data

Simple Case: Known Timing

T

N

=

1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 00 0 1 0 1 0 0 1 00 0 0 0 0 1 0 0 1… … …

p1 p3p2

p1

p3

p2

N

W(1)

W(2)

W(3)

Y

Page 26: Machine Learning  on fMRI Data

Challenge: Unknown Timing

T

N

=

1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 00 0 1 0 1 0 0 1 00 0 0 0 0 1 0 0 1… … …

p1 p3p2

p1

p3

p2

N

W(1)

W(2)

W(3)

Y

Uncertainty about the processes essentially makes the design matrix a random variable.

Page 27: Machine Learning  on fMRI Data

Uncertain Timings• Design matrix models several choices for

each time point.

1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 00 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0… … … 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 10 0 0 0 0 1 0 1 00 0 0 0 0 1 0 0 1... … …

P D

t=1t=1t=2t=2…t=18t=18t=18t=18…

T’>T

SConfigurations for each row:

3,41,23,41,2…3412…

Page 28: Machine Learning  on fMRI Data

Uncertain Timings

1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 00 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0… … …

P D

e1e2e3e4…

S

Y=

W

3,41,23,41,2…

Configurations: Weights:

e1 = P(C=3|Y,Wold,old,old) + P(C=4|Y,Wold,old,old)

• Weight each row with probabilities from E-step.

Page 29: Machine Learning  on fMRI Data

Our Approach

• Model of processes contains a probability distribution over when they can occur.

• Use prior knowledge about timing to limit the hypothesis space.

Page 30: Machine Learning  on fMRI Data

HPM Modeling Assumptions

• Model latent time series at process-level. • Process instances share parameters

based on their process types. • Use prior knowledge from experiment

design. • Sum process responses linearly.

Page 31: Machine Learning  on fMRI Data

HPMs: the graphical model

Offset O

Process Type

Start Time s

observed

unobserved

Timing Landmark

Yt,v

i1,…,ik

t=[1,T], v=[1,V]

The set C of configurations constrains the joint distribution on {(k),O(k)} k.

Configuration c

Page 32: Machine Learning  on fMRI Data

t

td1

dN

Process 1:

t

td1

dN

Process P:

d1

dN

Prior knowledge:

An instance of Process 1 begins in this window.

An instance of Process P begins in this window.

An instance of either Process 1 OR Process P begins in this window.

There are a total of 6 processes in this window of data.

Page 33: Machine Learning  on fMRI Data

t

td1

dN

Process 1:

t

td1

dN

Process P:

d1

dN

Process 1 timings:

Process P timings:

More questions:-Can we learn the parameters of these processes from the data (even when we don’t know when they occur)?-Would a different set of processes model the data better?

Page 34: Machine Learning  on fMRI Data

Approach and Assumptions

• Model latent time series at process-level.

• Processes contain probability distributions over when they occur.

• Process instances inherit parameters from their process types.

• Use prior knowledge from experiment design to limit complexity.

• Sum process responses linearly.