Adaptive Design Optimization in Functional MRI Experiments · 2018. 7. 17. · Adaptive Design...

Adaptive Design Optimization in Functional MRI

Experiments

Thesis

Presented in Partial Fulfillment of the Requirements for the DegreeMaster of Arts in the Graduate School of The Ohio State University

By

Giwon Bahg, M.A.

Graduate Program in Department of Psychology

The Ohio State University

2018

Master’s Examination Committee:

Brandon M. Turner, Advisor

Jay I. Myung

Zhong-Lin Lu

c© Copyright by

Giwon Bahg

2018

Abstract

Efficient data collection is one of the most important goals to be pursued in cog-

nitive neuroimaging studies because of the exceptionally high cost of data acqui-

sition. Design optimization methods have been developed in cognitive science to

resolve this problem, but most of them lack generalizability because their functional-

ity tends to rely on a specific type of cognitive models (e.g., psychometric functions)

or research paradigm (e.g., task-to-region mapping). In addition, traditional optimal

design methods fail to exploit neural and behavioral data simultaneously, which is

essential for providing an integrative explanation of human cognition. As one of the

possible solutions, we propose an implementation of Adaptive Design Optimization

(ADO; Cavagnaro, Myung, Pitt, & Kujala, 2010) in model-based functional MRI

(fMRI) experiments using a Joint Modeling Framework (B. M. Turner, Forstmann,

et al., 2013). First, we introduce a general architecture of fMRI-based ADO and

discuss practical considerations in real-world applications. Second, three simulation

studies show that fMRI-based ADO estimates parameters more accurately and pre-

cisely than conventional, randomized experimental designs. Third, a real-time fMRI

experiment validates the performance of fMRI-based ADO in the real-world setting.

The result suggests that ADO performs better than randomized designs in terms of

accuracy, but the unbalanced designs proposed by ADO may inflate the variability of

ii

trial-wise estimates of neural activation and therefore model parameters. Lastly, We

discuss the limitations, further developments, and applications of fMRI-based ADO.

iii

Acknowledgments

Foremost, I would like to express my sincere gratitude to my advisor, Dr. Brandon

Turner. He has always been supportive and open to my ideas for the past two years,

generously sharing his time, knowledge and experience. His patience and guidance

helped me in all the time of research and writing the thesis.

I would also like to thank to the rest of my thesis committee, Dr. Jay Myung for

invaluable suggestions about developing ADO, and Dr. Zhong-Lin Lu for thoughtful

advice regarding task designs and methodological issues in fMRI. I truly appreciate

the inspiring comments and questions on my thesis during the defense session as well.

My sincere gratitude also goes to Dr. Per Sederberg for his valuable inputs and

suggestions in developing real-time data processing components, Dr. Xiangrui Li for

helpful advice in carrying out fMRI experiments, and Dr. Mark Pitt for important

comments about improving ADO. I also appreciate the financial support of H. Dean

and Susan Regis Gibson for this study.

I am grateful to my fellow labmates, James, Qingfang, Fiona, and Brendan for

all the discussions and supports. I also thank to Joonsuk Park and Sang-Ho Lee for

helpful comments for understanding and practicing ADO.

Finally, I would especially like to thank to my parents for all their encouragements

and continuous supports.

iv

Vita

May 2018 - Present . . . . . . . . . . . . . . . . . . . . . . . . . Social and Behavioral Sciences Fellow-ship, The Ohio State University

August 2017 - May 2018 . . . . . . . . . . . . . . . . . . . . Graduate Research Associate, TheOhio State University

August 2016 - August 2017 . . . . . . . . . . . . . . . . . University Fellowship, The Ohio StateUniversity

February 2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.A. Psychology, Seoul National Uni-versity, Republic of Korea

February 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.A. Psychology, Seoul National Uni-versity, Republic of Korea

February 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.A. Philosophy, Seoul National Uni-versity, Republic of Korea

Publications

Research Publications

M. F. Molly, M. Galdo, G. Bahg, Q. Liu, B. M. Turner “What’s in a Response Time?:On the Importance of Response Time Measures in Constraining Models of ContextEffects”. In press at Decision.

J. J. Palestro, G. Bahg, P. B. Sederberg, Z.-L. Lu, M. Steyvers, B. M. Turner “Atutorial on joint models of neural and behavioral measures of cognition”. Journal ofMathematical Psychology, 84, 20-48.

Fields of Study

Major Field: Psychology

v

Table of Contents

Page

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Optimal Experimental Design in fMRI Experiments . . . . . . . . . 11.1.1 Static Optimization Methods . . . . . . . . . . . . . . . . . 31.1.2 Adaptive Optimization Methods . . . . . . . . . . . . . . . 5

1.2 Bayesian Online Design Optimization in Behavioral Cognitive Science 71.2.1 Early Applications in Psychophysics and Their Improvements 81.2.2 Adaptive Design Optimization and Its Applications . . . . . 9

1.3 A Model-based Cognitive Neuroscience Approach . . . . . . . . . . 111.4 Summary and Outline . . . . . . . . . . . . . . . . . . . . . . . . . 14

2. Basic Concepts of Adaptive Design Optimization in fMRI experiments . 15

2.1 Joint Modeling Framework . . . . . . . . . . . . . . . . . . . . . . 152.2 Adaptive Design Optimization . . . . . . . . . . . . . . . . . . . . 18

2.2.1 The Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.2 Extension to the Neural Data . . . . . . . . . . . . . . . . . 21

2.3 Single-trial Neural Activation . . . . . . . . . . . . . . . . . . . . . 222.3.1 A General Linear Model with Stimulus-level Regressors . . . 222.3.2 Incremental Analysis and Flexibility of Estimates . . . . . . 24

vi

2.4 Dynamic Gridding . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5 Posterior Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.6 Scanning Protocol and Real-time Data Flow . . . . . . . . . . . . . 312.7 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . 32

2.7.1 Developing a Joint Model of Neural and Behavioral Data . . 322.7.2 Discretizing a Continuous Space for Grid Search . . . . . . . 342.7.3 Including the Stimulus-wise Neural Activity: One-trial-lag

ADO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3. Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1 A Proof-of-Concept Study: Contrast Discrimination . . . . . . . . . 393.1.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2 Design and Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 453.2.1 Simulation 1: Fixed Parameters . . . . . . . . . . . . . . . . 473.2.2 Simulation 2: Randomly Generated Parameters . . . . . . . 523.2.3 Simulation 3: One-trial-lag ADO . . . . . . . . . . . . . . . 543.2.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.3.1 Simulation 1: Fixed Parameters . . . . . . . . . . . . . . . . 573.3.2 Simulation 2: Randomly Generated Parameters . . . . . . . 623.3.3 Simulation 3: One-trial-lag ADO . . . . . . . . . . . . . . . 68

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4. Real-time fMRI Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.1.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.1.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.1.4 Real-time fMRI Procedure . . . . . . . . . . . . . . . . . . . 80

4.2 Offline Analysis: Parameter Estimation . . . . . . . . . . . . . . . 844.2.1 Posterior sampling . . . . . . . . . . . . . . . . . . . . . . . 844.2.2 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.2.3 Determination of the Estimates . . . . . . . . . . . . . . . . 884.2.4 Definition of the Distance from the Benchmark Estimate . . 89

4.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 91

5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

vii

5.2 Further Developments and Practical Applications . . . . . . . . . . 1005.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

viii

List of Tables

Table Page

3.1 Default settings of Simulation 1. . . . . . . . . . . . . . . . . . . . . . 49

3.2 A list of 30 parameter sets used in Simulation 2. Parameter values arerounded up to three decimal places. . . . . . . . . . . . . . . . . . . . 54

3.3 Default settings in Simulation 2 . . . . . . . . . . . . . . . . . . . . . 55

ix

List of Figures

Figure Page

2.1 An illustration of the pipeline of a real-time fMRI-based ADO experi-ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 An illustration of the application of dynamic gridding. A contour plot(black line) represents the kernel density estimates of two-dimensionalposterior density distribution based on hypothetical posterior samples.Blue “×” markers are the grid points initially defined. Red circles arethe grid points updated by dynamic gridding based on singular valuedecomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 An illustration of the scanning protocol and data flow diagram in theADO-based real-time fMRI experiment. . . . . . . . . . . . . . . . . . 30

2.4 An illustration of the hemodynamic responses from the task and com-putational steps required within each trial. Dotted lines (red, blue,green) refer to hypothetical hemodynamic responses evoked by a stim-ulus within each trial, and a straight line (gray) shows the expectedvalue of convolved hemodynamic responses. The squares below thex-axis specifies the length of intervals required for each step. . . . . . 35

2.5 An illustration of ‘one-trial-lag ADO’. Dotted lines (red, blue, green)refer to hypothetical hemodynamic responses evoked by a stimuluswithin each trial, and a straight line (gray) shows the expected valueof convolved hemodynamic responses. The squares below the x-axisspecifies the length of intervals required for each step. . . . . . . . . . 37

3.1 An illustration of the trial structure of the contrast discrimination task. 40

x

3.2 A graphical representation of the joint model for contrast discrimi-nation. Each node in the model represents a variable. Filled circlesare observed data (i.e., stimulus-wise neural activation estimates, be-havioral responses), whereas empty circles are model parameters (i.e.,b, Rmax, c50, δ), a design variable (i.e., cij), and their transformation(i.e., bij, pi). Double-line circles are deterministic variables, whereassingle-line circles are stochastic variables. The outer and inner platesrepresent variables associated with each trial and stimulus, respectively. 46

3.3 The shape of Naka-Rushton equation with (b, Rmax, c50, δ) = (0.05, 1, 0.35, 0.2).The x-axis refers to the contrast level, while y-axis is expected neuralactivation (i.e., single-trial beta estimates). The solid line is the ex-pected neural expectation from the three shape parameters (b, Rmax, c50).b and Rmax determine the lower and upper asymptotes of the graph,whereas c50 affects the slope of the graph. δ controls the width of thecredible interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 The design space of the contrast discrimination experiment. The spaceconsists of 90 pairs of contrast levels for the first (x-axis) and second(y-axis) stimuli. The gray dots are individual designs that can besampled during the experiment. The left and right plots represent thesame design space in a linear scale and a logarithmic scale, respectively. 51

3.5 The scatter plots of the parameter sets used in Simulation 2. Blackdots in each plot indicate the values of (b, Rmax) (upper left), (b, c50)(upper center), (b, δ) (upper right), (Rmax, c50) (lower left), (Rmax, δ)(lower center), and (c50, δ) (lower right). . . . . . . . . . . . . . . . . 53

3.6 Pooled root mean squared error (RMSEpooled; upper) and pooled pos-terior standard deviation (PSDpooled; lower) in Simulation 1. All theperformance statistics (i.e., RMSEpooled, PSDpooled) are log-transformed.Red and black lines show the performance statistics from ADO designsand randomized designs, respectively. Solid lines represent the meanof the performance measures changing across trials. Dotted lines rep-resent 95% credible interval of the performance measures. . . . . . . . 58

xi

3.7 Root mean squared error (RMSEi; left) and posterior standard de-viation (PSDi; right) for each parameter (b, Rmax, c50, δ from top tobottom) in Simulation 1. All the performance statistics (i.e., RMSEi,PSDi) are log-transformed. Red and black lines show the perfor-mance statistics from ADO designs and randomized designs, respec-tively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval ofthe performance measures. . . . . . . . . . . . . . . . . . . . . . . . . 59

3.8 A trace plot of experimental designs from ADO (upper row) and ran-domized sampling (lower row) in Simulation 1. A sequence of 20 trialswere segmented into four intervals (Trials 1-5, 6-10, 11-15, and 16-20from left to right). The x-axis and y-axis of each subplot represent thecontrast level of the first and second stimuli, respectively. Black dotsrepresent individual design points. Shaded regions represent actuallyselected designs; more frequently selected designs have darker shades.The scale is intentionally omitted for simplicity. See Figure 3.4 for thedetailed information about the scale. . . . . . . . . . . . . . . . . . . 61



xii

3.11 The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 2. The x-axis and y-axis refer to the valueof performance statistics (i.e., RMSE, PSD) in the ADO experimentsand randomized design experiments. Each trial is color-coded for vi-sual clarity (Red: Trial 2, Orange: Trial 4, Green: Trial 8, Blue: Trial13, Purple: Trial 20). Colored dots represent the performance statis-tics from individual simulations. Solid lines represent the 80% highestdensity regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.12 The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 2. Points located at theshaded area are preferred. The accuracy and precision at each trial arerepresented as a red circle and a blue square, respectively. . . . . . . 67



3.15 The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 3. The x-axis and y-axis refer to the valueof performance statistics (i.e., RMSE, PSD) in the ADO experimentsand randomized design experiments. Each trial is color-coded for vi-sual clarity (Red: Trial 2, Orange: Trial 4, Green: Trial 8, Blue: Trial13, Purple: Trial 20). Colored dots represent the performance statis-tics from individual simulations. Solid lines represent the 80% highestdensity regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xiii

3.16 The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 3. Points located at theshaded area are preferred. The accuracy and precision at each trial arerepresented as a red circle and a blue square, respectively. . . . . . . 72

4.1 An illustration of the linear mask applied to a grating pattern. Theblack line shows the shape of the mask, while the red line describesthe masked grating pattern obtained when crossing the center of thescreen horizontally. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2 Examples of the grating stimuli used in the experiment. The contrastlevels of the five stimuli are 0.01, 0.03, 0.1, 0.3, and 1 (from left to right). 77

4.3 Variability of the stimulus-wise neural activaiton. The scatter plotplotshows the contrast levels and associated stimulus-wise neural activationobtained at the ADO run of the third scanning session of Subject 1. . 85

4.4 An illustration of the incremental parameter estimation. The grayshade represents the amount of data used for estimating parameters.When estimating parameters for comparing the performance of ADOand randomized design, we incrementally increase the amount of dataso that we can compare how the parameter and corresponding posteriordistribution change over trials. For evaluating the performance of eachdesign, we set a benchmark estimate using all the data obtained fromthe ADO-based and randomized-design runs within a scanning session. 87

4.5 Robustness of the estimates. The plot shows the differences in pos-terior estimates obtained by two distinct posterior distributions fromthe same data (Subject 1, Session 3, ADO run). The estimates areobtained incrementally: location at the x-axis represents the numberof trials used for obtaining the corresponding posterior estimates. Thered, green, and blue points represent the four-dimensional MAP esti-mates (Duong, 2007), marginal one-dimensional MAP estimates, andmarginal one-dimensional posterior mean values. . . . . . . . . . . . . 90

4.6 The scatter plot of the log-transformed distance measure between theMAP estimates of ADO (x-axis) and randomized designs (y-axis) fromthe benchmark MAP estimate. Each subplot represents the resultsfrom each subject. Each scanning session is color-coded differently forvisual clarity. The points located at the shaded region represent thetrials that ADO allowed better accuracy than randomized designs. . . 93

xiv

4.7 The scatter plot of the log-transformed, pooled standard deviation ofthe posterior distribution from ADO (x-axis) and randomized designs(y-axis). Each subplot represents the results from each subject. Eachscanning session is color-coded differently. The points located at theshaded region represent the trials that ADO allowed better precisionthan randomized designs. The range of both axes in the plots wereadjusted to (−3, 0) for visual clarity. . . . . . . . . . . . . . . . . . . 94

xv

Chapter 1: Introduction

We start our discussion with a short review of optimal experimental design re-

search in functional MRI experiments. Beyond the general linear modeling framework

and localization-based research, Adaptive Design Optimization (ADO; Cavagnaro et

al., 2010; Myung, Cavagnaro, & Pitt, 2013) is next introduced as an alternative de-

sign optimization method for model-based cognitive neuroscience experiments. With

this model-based orientation, the need for design optimization methods is discussed

based on modeling approaches incorporating both neural and behavioral measures

simultaneously.

1.1 Optimal Experimental Design in fMRI Experiments

Functional magnetic response imaging (fMRI) has become one of the most impor-

tant tools in cognitive science to investigate human brain activity because of nonin-

vasive nature, reasonable spatial and temporal resolution. However, the cost of data

acquisition in neuroimaging studies using fMRI is exceptionally high, especially when

one hopes to scan special participants such as those within a clinical population or

children. In addition, it is not easy to acquire good-quality data in fMRI experiments

as blood-oxygenation level dependent (BOLD) responses – the target measurement

of fMRI – are very noisy. The signal collected from fMRI scanning involves many

1

sources of noise besides pure measurement error, such as scanner instability, thermal

noise, physiology and motion artifact of a participant (Greve et al., 2013). Therefore,

collecting maximally informative data has been one of the important methodological

questions in fMRI studies for maximizing information from the limited amount of

data and resources.

Optimization of experimental design has been considered as a possible solution

to this problem. In general, there are two different goals of design optimization for

neuroscience experiments: (1) finding optimal event sequences and design parameters

(e.g., interstimulus or intertrial interval) which maximize the informativeness of the

data (Buracas & Boynton, 2002; Dale, 1999; Wager & Nichols, 2003), and (2) find-

ing optimal experimental design that clarifies the characteristics of a neural system

(Daunizeau, Preuschoff, Friston, & Stephan, 2011; DiMattina, 2016; Lorenz et al.,

2016; Sanchez et al., 2014).

There are two types of strategies in optimizing experimental design: static op-

timization and adaptive optimization (Ryan, Drovandi, McGree, & Pettitt, 2016).

Static optimization methods aim to generate a complete experimental design ex-

pected to provide the highest-quality data prior to running experiments. Adaptive

optimization methods provide optimal experimental settings (e.g., an experimental

condition to be tested, a stimulus to be presented) for the next trial in the middle of

the experiment, based on the history of previous designs and responses.

2

1.1.1 Static Optimization Methods

Static optimization methods have been frequently used in fMRI experiments.

They focus on optimization of event sequences defined by a set of stimuli and in-

terstimulus or intertrial intervals to attain (1) the maximum signal-to-noise ratio

for detecting brain activation and (2) the best estimates of shape parameters of a

hemodynamic response function (Holling, Maus, & van Breukelen, 2013).

Classic examples of static optimization methods in fMRI research are from Buracas

and Boynton (2002) and Wager and Nichols (2003). Buracas and Boynton (2002) sug-

gested an event sequence optimization method for maximizing estimation efficiency of

hemodynamic responses in experiments using rapid event-related design. Estimation

efficiency is defined as the inverse of estimator variance (Dale, 1999). Rather than gen-

erating uncountable numbers of random event sequences and comparing the efficiency

measures, their method proposes pseudorandom sequences termed “m-sequences” (or

maximum length shft-register sequences) which help expose each event equally and

construct nearly orthogonal sequences to its time-shifted version to reduce autocor-

relation, while reducing computational burden to generate random sequences.

Wager and Nichols (2003) used genetic algorithms to find optimal event sequences

and design parameters such as interstimulus interval (for a general review of genetic

algorithms used in design optimization, see Lin, Anderson-Cook, Hamada, Moore, &

Sitter, 2015). Given candidate experimental designs represented as vectors, genetic

algorithms mimic evolutionary processes occurring at the genetic level: selection,

crossover, and mutation. In short, candidates with highest evaluation scores are se-

lected (“selection”), then some part of them are exchanged across the selected candi-

dates (“crossover”) and sometimes randomly reassigned (“mutation”). By repeating

3

this cycle, the proposals at each generation are expected to attain better quality

measures (e.g., estimation efficiency) until the quality measure reaches its asymptotic

limit. Moreover, this method could manage multiple optimization constraints such

as contrast and estimation efficiency of hemodynamic responses.

Although the classic methods discussed above have been popular and well de-

veloped, static optimization techniques have clear limitations. One problem is local

optimality (Holling et al., 2013). It is practically impossible to measure parameters

such as scanner noise and temporal autocorrelation before actually running experi-

ments, although they are required for performing design optimization. Assuming fixed

values for these parameters is a common strategy to manage this problem. However,

generated stimulus sequences depend on the values used in the algorithms as a result.

In other words, we can expect optimal performance of the generated event sequences

only when the actual values of the set-up parameters are close to those considered in

the optimization algorithms.

Another limitation of static optimization is its exclusive focus on linear models

of fMRI – a general linear model (GLM) framework. GLM is the most basic mod-

eling approach to estimate brain activity and discriminate brain regions relevant to

a specific task, condition, or even trial. However, we cannot formalize the complete

brain-mind-behavior relationship solely based on the GLM framework. There are of

course design optimization approaches specialized for different types of brain models.

For example, Daunizeau et al. (2011) suggested a design optimization method for

comparing dynamical models of brain function based on Dynamic Causal Modeling

(DCM Friston, Harrison, & Penny, 2003). However, the method of Daunizeau et al.

4

(2011) has limited expandability (e.g., establishing additional connection to formal

cognitive models) due to its strong dependency on DCM.

The third issue of static design optimization is that it lacks flexibility for indi-

vidual or population-level differences. If individual differences in a specific cognitive

mechanism must be considered (Koffarnus et al., 2017), the experimenter must choose

a subset of experimental designs to gain maximal information efficiently. However,

selection of the fine-tuned designs is not possible solely with static design optimiza-

tion which exploits the whole design space. Another concern is comparing normal and

clinical populations in the same experimental settings. As different cognitive and/or

response characteristics are expected to be considered (Grabowski et al., 2006), static

optimization might not be an ideal approach by itself as it cannot differentiate distinct

population when generating a complete optimal design set.

1.1.2 Adaptive Optimization Methods

Adaptive optimization methods can be a good solution for the limitations of static

optimization methods as they allow researchers to automatically adjust details of the

experiment to meet the specific needs of individuals. In general, adaptive online

stimulus optimization methods have focused on characterizing a neural system (Di-

Mattina & Zhang, 2011; Lewi, Butera, & Paninski, 2009; Park, Weller, Horwitz, &

Pillow, 2014). For a review of adaptive optimization methods in neurophysiology and

sensory neuroscience, see DiMattina and Zhang (2013).

Not only have adaptive optimization methods rarely been applied in human neu-

roimaging (e.g., fMRI, EEG), when they are applied, they have not used neural data

as an input to the optimization algorithm. For example, Grabowski et al. (2006)

5

applied online optimization of interstimulus interval in a language production study.

However, it used vocal sound from a participant for triggering the optimization al-

gorithm. Hence, while the functional images were considered as a measurement in

interest, they were not involved in any step of design optimization. Similarly, Koffar-

nus et al. (2017) implemented an experimental protocol that optimizes the stimulus

set in the temporal discounting to shorten scanning time and prevent problems caused

by participant fatigue. However, their optimization method was not executed in the

MR scanner. Rather, it relies on an out-of-scanner task to find the most informative

stimulus set for an actual fMRI scanning.

Compared to the previous examples, “The Automatic Neuroscientist” (Lorenz,

Hampshire, & Leech, 2017; Lorenz et al., 2016) implemented the first adaptive opti-

mization technique that uses functional images from the MR scanner. This method

aims to provide a tool for systematic investigation of complex task-to-region rela-

tionship by finding a task or experimental design that evokes a specific target brain

state. The method relies on Bayesian optimization with a Gaussian process prior

(Snoek, Larochelle, & Adams, 2012) to model the brain state, and searches a task

or experimental design that minimizes the difference between the current and target

brain state.

However, the localization paradigm investigates only one axis of cognitive neu-

roscience research. The ultimate goal of cognitive neuroscience is to provide an in-

tegrative explanation of how the mind and behavior originate from brain activity.

Attempts to explain human cognition and behavior by relating them to localized

brain activation cannot achieve this goal because they do not consider how the brain

modulates cognitive and behavioral processes (Serences & Saproo, 2012).

6

Model-based cognitive neuroscience is a new area of research that pursues this goal

by using mathematical and computational models to link neural and behavioral data.

However, the method of Lorenz et al. (2016) is difficult to apply here because it was

not developed for model-oriented research questions such as estimating parameters

comparing cognitive models, which are central factors of model-based cognitive neu-

roscience experiments. Therefore, model-oriented fMRI experiments need to develop

new adaptive optimization methods.

1.2 Bayesian Online Design Optimization in Behavioral Cog-nitive Science

Unlike cognitive neuroimaging, adaptive optimization methods have a long history

in experimental psychology, especially in psychophysics. Most of the early applica-

tions have focused on generating optimal stimuli for estimating parameters of a psy-

chometric function (threshold and slope). The classic methods include nonparametric

staircasing procedures and parameter estimation by sequential testing. For a general

review of the adaptive experimental procedures in psychophysics, see Leek (2001).

Adaptive optimization methods in behavioral cognitive science experiments have

the unique potential for developing online design optimization tools for neuroimaging

because adaptive behavioral experiments are mostly model-based and so are some-

what automated. Therefore, here we provide a brief history of online design opti-

mization methods used in behavioral experimental psychology, mainly focusing on

Bayesian online design optimization methods. Then, Adaptive Design Optimization

(ADO) will be introduced as one of the recently proposed Bayesian optimal design

7

methodologies, as well as a method with broader applicability to more general, math-

ematical models of cognition. Lastly, the current status of the application of ADO in

cognitive neuroscience will be discussed.

1.2.1 Early Applications in Psychophysics and Their Improve-ments

The first applications of Bayesian adaptive optimization methods in psychophysics

were QUEST (Watson & Pelli, 1983) and the Psi method (Kontsevich & Tyler, 1999).

QUEST was the first Bayesian adaptive method suggested for estimating the threshold

of a psychometric function. For generating the stimulus used in the next trial, QUEST

used the maximum a posterior (MAP) estimate of the threshold obtained by the

history of stimuli and response accumulated by the current trial. Meanwhile, the Psi

method not only extended its optimization goal to estimation of both threshold and

slope parameters, but also used a different design proposal rule based on expected

entropy.

Recently, many new advances to the Psi method have been made. For exam-

ple, Kujala and Lukka (2006) suggested algorithms for faster computation of the

Psi method using Fast Fourier Transform and particle filter method. Prins (2013)

developed the Psy-marginal method which selectively focuses on a subset of param-

eters of psychometric functions. The motivation comes from different importance of

psychometric function parameters. Some parameters such as threshold and slope di-

rectly describe sensory mechanisms, whereas the lower and upper asymptotes of the

function are less relevant to the core sensory processing. Therefore, the optimization

algorithm must prioritize efficient estimation of threshold and slope than asymptotes

when suggesting an optimal design. Hence, the Psi-marginal method referred to the

8

marginal posterior distribution of more important parameters so that the optimiza-

tion routine will consider those parameters first. Unlike the previous applications

of the Psi-method focusing on parameter estimation, Cooke, Selen, van Beers, and

Medendorp (2017) developed the Psi method for comparing psychophysical models.

Recently, Bak and Pillow (2018) used the Psi method for multi-alternative psycho-

metric functions considering responses that are made by invalid response options or

made without attending to presented stimuli.

The use of Bayesian online design optimization methods is not limited to psycho-

metric functions. For example, Lesmes, Jeon, Lu, and Dosher (2006) and Lesmes,

Lu, Baek, and Albright (2010) proposed Bayesian adaptive methods for estimating

the threshold versus noise contrast function and the contrast sensitivity function, re-

spectively. However, all the applications discussed here tend to consider only one

specific class of cognitive models (i.e., psychometric functions, threshold versus noise

contrast function, contrast sensitivity function). Compared to the previous attempts,

Adaptive Design Optimization (Cavagnaro et al., 2010; Myung et al., 2013) considers

the problem of model selection, and in so doing, provides a more generalized tool for

design optimization.

1.2.2 Adaptive Design Optimization and Its Applications

Adaptive Design Optimization (ADO; Cavagnaro et al., 2010; Myung et al., 2013)

is another Bayesian design optimization method that uses mutual information as

its utility function. ADO was originally proposed as an online design optimization

tool for model comparison in cognitive science experiments. However, when only

considering one model, the method naturally reduces to an algorithm for optimizing

9

parameter estimation. In the context of parameter estimation, ADO proposes a design

for the next trial that maximizes information about the parameters, given the entire

history of stimuli used and responses acquired during the experiment. Essentially,

ADO requires a formal cognitive model that explicitly expresses a target cognitive

mechanism as a set of mathematical functions.

ADO has been applied to behavioral cognitive science experiments including mem-

ory retention (Cavagnaro, Pitt, & Myung, 2011), probability weighting for risky deci-

sion making (Cavagnaro, Pitt, Gonzalez, & Myung, 2013), and temporal discounting

(Cavagnaro, Aranovich, McClure, Pitt, & Myung, 2016). The goal of these previous

applications was to discriminate among a set of candidate models. ADO was used to

propose new experimental designs adaptively to facilitate the model discrimination.

ADO was extended with various functionalities after its development including

hierarchical ADO (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) and ADO using

dynamic programming (Kim, Pitt, Lu, & Myung, 2017). HADO exploits the Bayesian

nature of ADO by adding a hierarchical component to reflect the statistical depen-

dencies among model parameters from previously acquired data sets. This extension

enables ADO to incorporate information beyond individual participants and consider

group- or population-level information. Meanwhile, Kim et al. (2014) improved the

“near-sighted” performance of ADO (i.e., optimizing for only one trial ahead) by us-

ing dynamic programming (Bellman, 1957). The addition of dynamic programming

allowed ADO to consider sequences of trials rather than a single trial.

ADO or ADO-like approaches have recently been applied to neuroscientific prob-

lems (DiMattina, 2016; Sanchez et al., 2014; Sanchez, Lecaignard, Otman, Maby,

10

& Mattout, 2016), but they have yet to provide a compelling proof-of-concept re-

sult. For example, DiMattina (2016) used adaptive stimulus generation, which has

ADO-like mechanisms to compare contrast gain models. However, this application

neither modeled neural activity nor used neural data directly. Instead, he developed

an encoding-decoding model to map contrast stimuli to hypothesized neural responses

(encoding model) and then to behavioral responses (decoding model) so that ADO

can function only with behavioral response data. Therefore, this study involves the

neurophysiological model in ADO but does not use neural data for design optimiza-

tion. Sanchez et al. (2014) applied ADO to a single-trial EEG experiment using a

combination of perceptual learning models and an electrophysiological response model

to compare learning models. Later, Sanchez et al. (2016) extended their ADO appli-

cation to DCM. However, implications of both studies are limited because the results

are based on simulation experiments and were not applied to real data. Also, these

studies do not consider behavioral responses in their model structure, although they

are also important measures of latent brain computations.

1.3 A Model-based Cognitive Neuroscience Approach

As reviewed above, current applications of online design optimization mainly rely

on either neural or behavioral data. The method of Lorenz et al. (2016) modeled the

task-specific brain state within online design optimization algorithm. However, this

method is limited in that it can only answer research questions about brain-behavior

mapping. Although DiMattina (2016) and Sanchez et al. (2014, 2016) implement

online design optimization with neural models, they fail to consider a complete data

set acquired in neuroscience experiments – either neural or behavioral responses are

11

not introduced in the optimization method. In any case, ignoring either type of data

may result in a great loss of important information for making an inference about

cognitive mechanisms because brain and behavioral data – or neural and cognitive

models – can explain different aspects of cognition. Therefore, it is a goal worth

pursuing to develop online design optimization algorithm that takes both sources of

data into consideration.

To optimize designs with respect to both neural and behavioral measures, we

require a computational model that makes predictions for both measures. Models

that consider both measures simultaneously are inspired by the work of David Marr,

who introduced the concept of levels of analysis (Marr, 1982). He emphasized that an

integrative explanation of a mental system can be developed by considering (1) what

the system aims to achieve (computational level), (2) how the system represents and

manipulates information (algorithmic level), and (3) how the mental system emerges

from the physical world (implementation level).

Recent approaches aim to formalize integrative models by bridging evidence from

cognitive neuroscience research and behavioral experiments. On the neuroimaging

side, a computational neuroimaging approach (Serences & Saproo, 2012; Wandell,

1999) has been developed to explain the modulation of brain activation as a function

of visual stimuli.1 This approach especially focused on developing models for visual

systems, such as contrast response functions (Boynton, Demb, Glover, & Heeger,

1999) and population receptive field models (Wandell & Winawer, 2015). However,

1However, note that the name “computational neuroimaging” is frequently used to refer to asubset of “model-based cognitive neuroscience” approaches. This confusion occurs especially withthe research investigating neural correlates with cognitive model parameters (O’Reilly & Mars, 2011)or the “model-based neuroimaging” (O’Doherty, Hampton, & Kim, 2007) which uses parameterestimates of a cognitive model to generate a event-related regressor in the general linear modelanalysis (Dunne & O’Doherty, 2013).

12

as the goal of this research program is to develop formal models of neuroimaging data,

understanding of behavioral responses via these models were attempted implicitly, or

sometimes by simple psychophysics models (Boynton et al., 1999).

On the side of mathematical psychology, “model-based cognitive neuroscience”

(Forstmann & Wagenmakers, 2015; Forstmann, Wagenmakers, Eichele, Brown, &

Serences, 2011; B. M. Turner, Forstmann, Love, Palmeri, & Van Maanen, 2017) was

suggested to link neural evidence to cognitive models developed from behavioral ex-

periments. From Maar’s level-of-analysis persepctive, we consider cognitive models as

a tool for integrating neural and behavioral explanations of mind from computational

and algorithmic level (Love, 2015). This model-based approach claims that neural

and behavioral data take advantage of each other: for example, neural data analysis

can utilize mechanistic cognitive models to develop more systematic relationships be-

tween neural activation and behavioral data (Forstmann, Brown, Dutilh, Neumann,

& Wagenmakers, 2010). Cognitive models can also be assisted by neural data because

they offer additional constraints for model discrimination (Mack, Preston, & Love,

2013).

As the focus of this thesis is given more to cognitive levels, the model-based cog-

nitive neuroscience approach serves better for defining the use of neuroimaging data

within fMRI-based ADO experiments. The model-based cognitive neuroscience has

various modeling strategies (B. M. Turner, Forstmann, et al., 2017), including “two-

stage” approaches that correlate estimates separately obtained from a cognitive model

and a neural activation model across trials (Rodriguez, Turner, Van Zandt, & Mc-

Clure, 2015; van Maanen et al., 2011). For our purpose of design optimization, how-

ever, we rely on a joint modeling approach based on a hierarchical Bayesian framework

13

(Palestro et al., 2018; B. M. Turner, Forstmann, et al., 2013; B. M. Turner, Rodriguez,

Norcia, McClure, & Steyvers, 2016; B. M. Turner, Van Maanen, & Forstmann, 2015).

One reason using the joint modeling approach is that we can use both behavioral

and neural data simultaneously to constrain model parameters within a single model

structure, unlike in the “two-stage” strategies. More advantages of the joint modeling

approach will be discussed later.

1.4 Summary and Outline

For the remainder of this thesis, we will discuss how ADO can be implemented

in real-time fMRI experiments. Chapter 2 describes the concept of fMRI-based ADO

and the pipeline required for implementation. At the end of this chapter, we provide

practical guidelines for the use of ADO in fMRI experiments. Chapter 3 introduces

a model discussed throughout this thesis as a proof-of-concept. Based on this, sim-

ulation studies will reveal that ADO optimizes the experimental design compared to

conventional randomized design. In Chapter 4, we run a real-time fMRI experiment

to verify that fMRI-based ADO works in a real-world setting. The result of this ex-

periment will show that the performance of ADO observed in Chapter 3 is roughly

replicated in the real-world experiment. Chapter 5 will summarize the result, discuss

limitations of the current study, and propose applications of fMRI-based ADO.

14

Chapter 2: Basic Concepts of Adaptive Design Optimization

in fMRI experiments

We introduce each component of the fMRI-based ADO and discuss how they

communicate in a fMRI experiment throughout this section (see Figure 2.1). The

fMRI-based ADO framework utilizes the joint model (Section 2.1) and ADO (Section

2.2) as its core components. For introducing the neural input, the ADO framework

also requires a linear model for estimating task-related neural activity (Section 2.3).

Supplementary components such as real-time updating of the grid space (Section 2.4)

and full posterior sampling (Section 2.5) are incorporated for computational efficiency

of the grid search method of ADO.

2.1 Joint Modeling Framework

In model-oriented research, we assume that formalized models such as computa-

tional algorithms or stochastic processes can express the mechanism of human cogni-

tion. Therefore, the main goals of a model-based experiment are twofold: to estimate

the model parameters and compare candidate models. The former aims to identify

the characteristics of a cognitive system by locating appropriate parameter values

using collected data. Meanwhile, the latter pursues discriminating a better formal

15

tz

BOLD

resp

onse

s

0

BOLD

Res

pons

e

0 Time

DE-M

CM

C

ADO𝑈 𝑑 = $𝑢 𝑑, Ω,𝑦 𝑝 𝑦 Ω, 𝑑 𝑝 Ω 𝑑𝑦𝑑Ω

New

Des

ign

Joint Model

Figure 2.1: An illustration of the pipeline of a real-time fMRI-based ADO experiment.

16

explanation of cognitive processes with data. Whatever its purpose is, a model-based

study is initiated by developing a model.

As we introduced, the joint modeling framework (Palestro et al., 2018; B. M. Turner,

Forstmann, et al., 2013; B. M. Turner et al., 2016, 2015) can be a good starting point

for cognitive neuroscience experiments with online design optimization. In addition

to enabling simultaneous use of neural and behavioral data within a single modeling

framework, the joint modeling framework allows model parameters to be constrained

by neural and behavioral data together via its hierarchical structure.

Joint models consist of three components: a neural submodel, a behavioral sub-

model, and a linking function. The neural submodel is defined to describe trial-wise

neural activity such as amplitude of hemodynamic responses and mean EEG signal. If

a researcher uses raw neural data (e.g., BOLD responses), conventional data analysis

methods for extracting single-trial neural activation estimates (i.e., a general linear

model in fMRI analysis) can work as a neural submodel. The behavioral submodel is

a counterpart of the neural submodel whose goal is to describe behavioral responses.

Traditional cognitive models that do not involve neural-level explanations usually

serve as a behavioral submodel. The linking function connects the neural activity

with the parameters of behavioral submodels.

Joint models are typically classified into two groups according to the linking func-

tion used in the model: the covariance joint model and the directed joint model

(Palestro et al., 2018). The covariance joint model assumes a “hypermodel” that con-

strains parameters in the neural and behavioral submodels by the covariance struc-

ture of submodel parameters. In this approach, a multivariate normal distribution or

factor-analytic linking function (B. M. Turner, Wang, & Merkel, 2017) can be used

17

as representative examples of a hypermodel. Meanwhile, the directed joint model

assumes that submodel parameters are connected by direct transformation from one

submodel to the other. For example, the difference of neural activation evoked by

two different stimuli in a discrimination task (i.e., neural submodel parameters) can

serve as a drift rate parameter in Wiener diffusion decision model (i.e., behavioral

submodel parameter) (Palestro et al., 2018). Although brain-to-behavior transforma-

tions may seem natural from a reductionist perspective, behavior-to-brain mapping

can be used to constrain estimates of neural activity using estimates of behavioral

submodel parameters (van Ravenzwaaij, Provost, & Brown, 2017).

When joint models are used for online design optimization, we must carefully

choose a linking function because covariance-based and directed linking functions

require different computational burdens. The directed model usually does not as-

sume additional parameters when defining the transformation of submodel parame-

ters. However, the covariance model cannot avoid introducing new parameters such

as covariance coefficients, which results in increasing the dimensionality of the pa-

rameter space in ADO. Although factor-analysis-based covariance structure has been

suggested to limit the number of covariance parameters efficiently (B. M. Turner,

Wang, & Merkel, 2017), ADO still needs to consider more parameters for optimiza-

tion relative to the directed approach.

2.2 Adaptive Design Optimization

Here we introduce the details of ADO, and then discuss the strategy for extending

behavioral ADO to neural data.

18

2.2.1 The Mechanism

As we discussed in Section 1.2.2, ADO can be applied to both parameter estima-

tion and model comparison in real-time. However, it is worth noting that optimal

design for model comparison may not be the best for parameter estimation and vice

versa. In this thesis, we focus on the problem of parameter estimation.

ADO proposes an optimal design for upcoming trials by solving an optimization

problem

dt+1 = argmaxd

U(d) (2.1)

U(d) =

∫y∈Y

∫θ∈Θ

u(d, θ, y)p(y|θ, d)p(θ|d) dθ dy (2.2)

where d refers to candidate designs of an experiment. U(d) and u(d, θ, y) are real-

valued functions called global and local utility functions, respectively. A local utility

function u(d, θ, y) evaluates the utility or informativeness of a design d regarding a

model parameter set θ when a design d is used and a response y is anticipated in a

hypothetical experimental trial. The global utility U(d) is computed as an “average”

local utility by integrating the local utility over a parameter space Θ and a response

space Y . The final decision of design proposal is made by selecting a design of the

highest global utility.

Although a posterior covariance matrix and the sum of squared errors are often

used as utility functions (Ryan et al., 2016), a standard implementation of ADO relies

on mutual information to evaluate the utility of each design. One of the merits of

using mutual information for design optimization is that mutual information is known

to perform well for both parameter estimation and model comparison. A global utility

19

function based on mutual information is

U(d) =

∫y∗∈Y

∫θ∈Θ

logp(θ|d1:t, y1:t, d, y

∗)

p(θ|d1:t, y1:t)p(y1:t|θ, d1:t)p(θ|d1:t) dθ dy

∗ (2.3)

where d is a candidate design of interest, θ is a model parameter vector, y∗ is an

anticipated response at the (t + 1)-th trial, and d1:t and y1:t represent a series of

experimental designs and collected responses in the previous t trials, respectively.

Note that by the definition of mutual information, a local utility function in Equation

2.2 is

u(d, θ, y∗) = logp(θ|d1:t, y1:t, d, y

∗)

p(θ|d1:t, y1:t)(2.4)

(Myung et al., 2013).

For the numerical integration over parameter and response spaces, the first im-

plementation of ADO used a sequential Monte Carlo technique (Amzal, Bois, Parent,

& Robert, 2006; Cavagnaro et al., 2010; Myung & Pitt, 2009). Meanwhile, Myung

et al. (2013) suggested a simpler integration strategy based on grid based methods.

Myung et al.’s approach proceeds by first defining a number of grid points for each

dimension of design, parameter, and response spaces. Once the grids are defined over

an entire search space, ADO then evaluates local utilities (i.e., u(d, θ, y∗)) and joint

densities of θ and y1:t (i.e., p(y1:t|θ, d1:t)p(θ|d1:t)) for all grid points. A global utility

for a candidate design d is computed by taking a mean of weighted local utility values

sharing a target design d:

U(d) ≈ 1

nd

∑θ,y∗

logp(θ|d1:t, y1:t, d, y

∗)

p(θ|d1:t, y1:t)p(y1:t|θ, d1:t)p(θ|d1:t) (2.5)

where nd is the total number of grid points assigned to a candidate design d.

20

2.2.2 Extension to the Neural Data

Introducing neural data and its activation model does not change the definition

of the global utility function and the searching process. However, the dimension of

both parameter and response spaces increases because we have incorporated neural

data and therefore need to consider the expected neural responses into ADO.

Ideally, a full joint model can allow ADO to use a raw BOLD time-series vector

N as its neural input. (From here, we will use y as a behavioral response vector.)

Assuming a hierarchical joint model Ω = (θhyper, θneural, θbehavioral), a global utility

function is defined by

UJM(d) =

∫ ∫ ∫u(d,Ω,N, y)p(N, y|Ω, d)p(Ω|d) dΩ dN dy. (2.6)

However, using the raw neural data is practically impossible within ADO due to

its high dimensionality. Equation 2.6 suggests that all data points in the time-series

vector N must be integrated over Rn where n is the length of the time-series vector.

However, new data are continuously added during the scan causing increases in the

dimension of the neural data space n. Considering that new functional image data are

updated every 1-3 seconds in typical fMRI experiments, only five minutes of scanning

forces ADO to integrate over at least R100.

As an alternative, we can implement a global utility function based on a “limited”

version of the joint model structure using trial-wise neural activation estimates:

ULJM(d) =

∫ ∫ ∫u(d,Ω, β, y)p(β1:t, y1:t|Ω, d)p(Ω) dΩ dβ dy (2.7)

β1:t = g(X,N) (2.8)

where β is an estimate vector of stimulus- or trial-level neural activation, X is a design

matrix, and g(·) is an estimator of stimulus- or trial-wise neural activation β based on

21

X and N. When the limited joint model is used, we use single-trial neural activation

estimates that describes stimulus- or trial-wise brain activity as neural input of ADO

instead of the raw neural data vector. By reducing the set of possible data points to

single-trial activation parameters, the computational burden of using ADO becomes

manageable once again. However, this reduction does come at the cost of inflated

uncertainty in the estimates of neural activation.

2.3 Single-trial Neural Activation

According to the discussion in Section 2.2.2, the method for estimating stimulus-

or trial-wise neural activation estimates (so-called “single-trial neural activation”)

needs to be included in ADO to limit the computational load. Here, we will discuss

a general linear model for estimating single-trial neural activation.

2.3.1 A General Linear Model with Stimulus-level Regres-sors

The use of trial-wise neural activation estimates serves as a remedial strategy

for the high dimensionality problem of raw BOLD responses. To actually use the

single-trial activation estimates, fMRI-based ADO must include a component that

estimates neural activation amplitude evoked by each stimulus or trial so that the

neural estimates can be used for proposal generation.

The conventional approach to estimating single-trial activation is to perform a

general linear model (GLM) analysis – an application of multiple linear regression –

to fMRI data (Friston et al., 1995). A GLM uses a design matrix consisting of vectors

representing the onset times of events of interest (e.g., stimulus presentation, response

production) convolved with a hemodynamic response function. A typical approach is

22

to define condition-wise regressors for comparing the mean activation estimates across

conditions (for a more general introduction to this topic, see introductory textbooks

for fMRI data analysis such as Poldrack, Mumford, & Nichols, 2011).

However, when using ADO, GLM regressors must be defined at each stimulus-

or trial-level because we need information of neural activity associated with each

stimulus. Conceptually, stimulus-level regressors can be easily made by setting the

onset vectors for each individual stimulus, not for each condition. This method is

often referred to as “beta-series regression” in the context of multivoxel analysis

(Rissman, Gazzaley, & D’Esposito, 2004). A beta-series GLM can be implemented in

a Bayesian framework (e.g., Palestro et al., 2018). However, full posterior estimation

is time consuming in real-time fMRI experiments due to the large number of single-

trial regressors or multiple BOLD response vectors. In our application, we will use

frequentist estimates to obtain trial-wise neural activation estimates efficiently. For

example, ordinary least squares estimates (LSEs) can be derived as:

β = (XTX)−1XTN (2.9)

where X is a design matrix, a superscript T indicates the transpose operation, and

N is a raw BOLD time-series vector.

Note that Equation 2.9 is one of the plausible estimators of β1:t (i.e., g(·) in Equa-

tion 2.8). As stimulus- or trial-level estimates are known to have large variability

(Abdulrahman & Henson, 2016; Mumford, Turner, Ashby, & Poldrack, 2012), varia-

tions of the stimulus-level GLM analysis such as iterative GLM analyses with nuisance

regressors (Mumford, Davis, & Poldrack, 2014; Mumford et al., 2012; B. O. Turner,

Mumford, Poldrack, & Ashby, 2012) can be used to control the variability. Also, re-

gression models with temporal autoregressive errors such as AR(p) and ARMA(1,1)

23

(Bullmore et al., 1996; Lindquist, 2008) can be used to control temporal autocorrela-

tion in the BOLD time-series. In any case, the time required for estimation is one of

the most important concerns because ADO requires fast computation of single-trial

neural activation in real-time.

2.3.2 Incremental Analysis and Flexibility of Estimates

Estimation of single-trial neural activation or “single-trial beta estimation” is nec-

essary at the end of every trial so that ADO can use the information to compute global

utility. However, this incremental procedure implies that the time-series data will be

continuously updated during an entire scanning session, and therefore estimates of

neural activation can change every trial.

The first option to handle the variability of single-trial neural estimates is to allow

ADO to update the neural estimates every trial. From this perspective, ADO must

use the best “data” – in this case, single-trial neural activation estimates – available

at each trial. Hence, ADO must refer to new estimates as they become more accurate

and less variable as the experiment moves on.

The second option is to block the updating of neural estimates included in ADO

during previous trials. In this case, neural activation estimates of previous stimuli

or trials will be fixed in further trials and new estimates for those trials will not be

used in ADO. In other words, only the estimates from a new trial will be used. This

approach ensures the stability of ADO algorithm as the estimates of neural activity

remain constant once they have been estimated on a given trial.

In the simulation experiments (Chapter 3), we make an ideal assumption that

we always obtain perfect estimates of stimulus-wise neural activations. Therefore,

24

there is no need for considering the variability of neural estimates and updating the

“newly” updated parameters. In the real-world experiments (Chapter 4), we choose

the first strategy that updates neural estimates to make ADO use the best information

available in each trial.

2.4 Dynamic Gridding

As discussed in Section 2.2.1, the current implementation of fMRI-based ADO re-

lies on the grid-based method to compute the global utility. For efficient performance

of ADO, we need to discretize both parameter and response spaces appropriately.

Theoretically, an obvious first choice is to define a dense grid over a broad range of

values in both parameter and response spaces. However, a tradeoff ensues between

the number of grid points and computational efficiency due to multidimensionality

of the grid space. Adding only one more grid point per dimension will result in an

explosive increase of the number of grids in the entire search space. Hence, setting a

number of dense grids cannot be an appropriate solution.

Another disadvantage of the dense grid space is redundant grid points in low pos-

terior density regions. Global utility based on mutual information relies on posterior

densities obtained at each grid point. Joint posterior distributions of model parame-

ters will be constrained as the experiment proceeds, and therefore the number of grid

points with extremely small posterior density (i.e., p(y|θ, d)) will increase. In the end,

most of the grid points cannot contribute to generating new proposals due to small

posterior densities, which makes computation and aggregation of global utility values

inefficient.

25

One possible solution is to update the grid as the posterior distribution is updated.

This approach allows ADO computation to be affordable with limited computing

resources while achieving better efficiency. Implementation of this solution requires a

method for automatically adjusting the distribution of grids to capture a region with

high posterior density.

Here, we used a simple method based on singular value decomposition (SVD)

of a sample covariance matrix. The main idea is to decompose a sample covariance

matrix S into a rotation matrix and use its inverse to transform the posterior samples

to be orthogonally distributed. This approach is motivated by principal component

analysis (Johnson & Wichern, 2007): a sample covariance matrix can be decomposed

into three matrices

S = RCR−1

where R is a matrix consisting of eigenvectors of S and C is a diagonal matrix with

corresponding eigenvalues. Because eigenvectors in R construct an orthogonal basis

explaining the largest variance of the posterior samples, we can use its inverse R−1 as a

inverse-rotation function that maps the original posterior samples onto an orthogonal

principal component space without additional scaling. The marginal distributions of

transformed grid clusters are used to define new percentile-based grids. As a last

step, a rotation matrix R maps the newly defined grids onto the original space.

Figure 2.2 shows a hypothetical example of dynamic gridding. A black contour

plot shows a joint posterior distribution of (θ1, θ2). Blue “×” markers represent the

grid points that are initially defined in the parameter space. Most of the initial grid

points do not cover the high-density region of the parameter space. However, the

newly updated grids (red circles) based on the correlational structure of the posterior

26

0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Kernel density estimates of the posteriorInitial grid spaceUpdated grid space

θ 2

θ1

Figure 2.2: An illustration of the application of dynamic gridding. A contour plot(black line) represents the kernel density estimates of two-dimensional posterior den-sity distribution based on hypothetical posterior samples. Blue “×” markers are thegrid points initially defined. Red circles are the grid points updated by dynamicgridding based on singular value decomposition.

27

samples are successfully located within the high-density region. Adjusting grid points

allows ADO to investigate optimal design more efficiently because the grid space does

not incorporate non-informative regions of the parameter space.

It is worth noting that this dynamic gridding method can sometimes generate

invalid grid points according to assumptions on the model parameters. For example,

the standard deviation of a normal distribution, say σ, is not allowed to have negative

values by its definition. However, the SVD-based dynamic gridding might allow

invalid grid points (i.e., σ < 0) by the shape of the joint posterior distribution and

constraints imposed to other model parameters. These invalid grid points must be

ignored in further steps.

2.5 Posterior Updating

As being inherently Bayesian, ADO evaluates posterior densities of grid points

every trial. However, full posterior estimation of model parameters may be required

in real-time for two reasons: diagnosis of the performance of ADO and dynamic

gridding. We discussed how a full posterior distribution can be used for dynamic

gridding in Section 2.4. In addition, we might need to check the degree to which

estimates are constrained in real-time depending on the application of ADO.

Here we will use a Differential evolution Markov chain Monte Carlo sampler (DE-

MCMC; ter Braak, 2006; B. M. Turner, Sederberg, Brown, & Steyvers, 2013) for

posterior updating. Not only does DE-MCMC draw posterior samples with correla-

tion more efficiently, this method is known to suffer less from autocorrelation of the

sampling process than conventional Metropolis-Hastings algorithm.

28

In our ADO application, initialization of the chains depends on the grids generated

for evaluating global utility. In detail, initial chains are selected by a multinomial sam-

pling with a choice probability vector constructed by normalizing posterior densities

of all grids in the search space:

ci,t,1 ∼ Multinomial(p(t)).

ci,t,1 is the i-th chain of the DE-MCMC sampler initialized after completing the t-th

trial. A choice probability vector at trial t is defined as

p(t) = [p(t)1 , p

(t)2 , · · · , p(t)

J ]T

where

p(t)j =

f(θ(t)j |y1:t, d1:t)∑J

j=1 f(θ(t)j |y1:t, d1:t)

is the probability that the j-th grid is selected as an initial chain, θ(t)j is the j-th grid

point in the search space at trial t, and J is the total number of grids.

However, poor initialization can end up with multimodality of the posterior distri-

bution in that most chains are clustered at a high-density region while a small number

of “outlier” chains is gathered at extremely low-density region by chance. These out-

lier chains can affect the posterior distribution due to the nature of DE-MCMC that

uses the difference vector between chains as information to update posterior samples.

Migration (Hu & Tsui, 2005) could be a reasonable remedy to solve this problem

by swapping of the location of outlier chains during the first few trials with fixed

probability.

29

Subject spaceScanning protocol

Standard space (e.g. MNI)

Inverse-registration

Structural scanning

EPI localizer

Functional localizer

Main task

Anatomical mask

Task-specific mask

AveragedBOLD responses

Single-trial beta estimates

Structural image

Anatomical mask(e.g. Brodmann area)

Structural image

Figure 2.3: An illustration of the scanning protocol and data flow diagram in theADO-based real-time fMRI experiment.

30

2.6 Scanning Protocol and Real-time Data Flow

Figure 2.3 describes a typical scanning procedure and the flow of data in ADO-

based real-time fMRI experiments. The experiment is mainly divided into three

stages: (1) acquisition of structural and functional localizer images, (2) inverse-

registration of anatomical masks in a standard space, and (3) data collection in the

main task.

The first stage aims to collect information required for producing a task-specific

mask in the subject-specific brain space. After completing set-up for online data

transfer from an MR scanner to a terminal computer, an experimenter needs to collect

structural images of a participant’s brain and acquire a regional localizer based on an

echo-planar imaging (EPI) sequence. The former constructs the basis of the subject

space, while the latter limits the region to be scanned in the functional localizer and

the main tasks. The functional localizer task is performed to detect task-relevant

voxels as the last step. The functional localizer mask can be defined by performing

a whole-brain GLM analysis with data from the localizer task and extracting voxels

that have test statistics (e.g., t-statistics) greater than a specific threshold.

In the second stage, an experimenter extracts the task-relevant subject-specific

mask using the data acquired from the first stage. We use a template structural image

defined in a standard brain space such as MNI (Montreal Neurological Institute) atlas

(Grabner et al., 2006) as a reference. Once the experimenter collects the structural

image in the subject space, it is registered to the standard brain template to obtain

the transformation matrix that maps the subject space onto the standard space. The

inverse-transformation matrix is derived by taking an inverse of the transformation

matrix, and is used for mapping the anatomical masks in the standard space to

31

the subject space. When regions of interest (ROIs) must be constrained by masks

provided by standard anatomical atlases (e.g., Julich Histological Atlas; Eickhoff et

al., 2005), we can transform the standard masks to subject-specific masks by using

the inverse-transformation matrix. The conjunction between the inverse-transformed

anatomical mask and the functional localizer mask defines the task-relevant mask in

the subject space.

The task-specific mask enables one to obtain voxel-wise BOLD responses in real-

time during the main task. When an experimenter is interested in a specific ROI

defined by the task-relevant mask, a common approach is to average neural signals

from all voxels in the mask for running the GLM analysis for stimulus-wise neural

estimates. The stimulus-wise neural activation estimates are considered as neural

inputs of ADO. The detailed description of single-trial beta estimation is discussed

in Section 2.3.

2.7 Practical Considerations

In the practice of fMRI-based ADO, we need additional considerations in model

development and implementation of the computational framework. Here, we discuss

methodological and technical issues expected in the practice and possible remedies.

2.7.1 Developing a Joint Model of Neural and BehavioralData

If our cognitive model does not involve a neural component, the most reasonable

practice is to run behavioral ADO (i.e., Equations 2.1 and 2.2) while collecting fMRI

data simultaneously without assuming any communication between ADO and fMRI

32

data. Afterward, we can conduct conventional offline data analyses with a joint model

using a hierarchical linking function.

However, this “post-hoc analysis” approach may not always be the best choice

for three reasons. Firstly, designing fMRI experiments calls for additional considera-

tions due to the characteristics of BOLD responses. For example, stimulus duration

and interstimulus interval are important in obtaining a better signal-to-noise ratio.

In addition, when we run an offline analysis, we should be aware that ADO exper-

iments cannot have a balanced design by its nature. Especially, a condition-level

offline GLM analysis may not provide the best result because the number of exposure

of each condition is not balanced. Most importantly, a behavioral ADO experiment

incorporating neural data by offline data analysis does not serve its original goal:

offering the maximally informative design on the fly. ADO implemented in this post-

hoc strategy will not exploit the information provided by neural data in the data

collection and optimization procedures. We can imagine that the maximally infor-

mative experimental designs obtained by behavioral data only might not be the same

as those derived by both neural and behavioral data.

In short, to maximize the information obtained from both neural and behavioral

data, a joint model framework is strongly recommended rather than performing be-

havioral ADO and offline fMRI analysis separately. If reconfiguring the cognitive

model to involve neural components is difficult, a hierarchical approach could be the

simplest remedy to connect the neural and behavioral data.

33

2.7.2 Discretizing a Continuous Space for Grid Search

Every continuous parameter and data subspace must be discretized when using a

grid-based method for numerical integration. However, setting a reasonable range of

grid points may not be straightforward when we discretize the neural response space

because the range of neural estimates can vary according to stimulus settings (e.g.,

duration, flickering, interstimulus interval) and hemodynamic response function used

in the model.

The easiest solution is to adaptively adjust the range of neural response grid by

using the minimum and maximum of neural estimates as anchors. Here, we should

note that extending the scope of the neural response space sacrifices grid resolution

if we assume a fixed number of grid points for each dimension. On the contrary, if we

increase the number of grid points in the neural response space to maintain the res-

olution, the computation time increases exponentially. Therefore, a pilot experiment

is strongly recommended to identify appropriate grid settings.

2.7.3 Including the Stimulus-wise Neural Activity: One-trial-lag ADO

Let us assume that we are on the t-th trial of the experiment. Ideally, both neural

and behavioral data for the first t trials must be incorporated in ADO. However, we

should expect loss of the latest one-trial amount of neural data (i.e., stimulus-wise

neural activation estimates) in practice due to the slow-changing temporal profile

of the hemodynamic response. In detail, the hemodynamic responses consist of an

increasing period to a peak that takes 5-6 seconds, a decreasing period with an un-

dershoot below a baseline activation, and a slow asymptotic recovery period. The

34

0 10 20 30 40

Time (second)

BOLD responses for the first trialBOLD responses for the second trialBOLD responses for the third trialConvolved BOLD resopnsesStimulus onset

Response interval

Single-trial beta estimation +ADO

Additional time required for single-trial beta estimation

Stimulus duration

Figure 2.4: An illustration of the hemodynamic responses from the task and com-putational steps required within each trial. Dotted lines (red, blue, green) refer tohypothetical hemodynamic responses evoked by a stimulus within each trial, and astraight line (gray) shows the expected value of convolved hemodynamic responses.The squares below the x-axis specifies the length of intervals required for each step.

total length of a hemodynamic response usually takes up to 30 seconds. Hence, neu-

ral measures in fMRI experiments need to be collected for at least 5-6 seconds to

characterize their peak intensity. Hence, a temporal lag of 5-6 seconds might be too

long depending on stimulus presentation settings (i.e., stimulus duration, interstim-

ulus/intertrial interval). In this case, we can collect a behavioral response but not a

neural activation estimate at the end of the trial.

Figure 2.4 illustrates a hypothetical experiment of which stimulus duration is 2

seconds (i.e., a canonical hemodynamic response function is convolved with a boxcar

35

function with 2-second duration) and response interval is 4 seconds. In this example,

let us assume that the intertrial interval of 4 seconds still remains for single-trial

beta estimation and ADO computation. However, we cannot estimate single-trial

neural activation estimates immediately after the response interval is over because

a theoretically assumed hemodynamic response (broken lines) has not attained its

peak yet. We have to exhaust the intertrial interval (blank rectangles) just for getting

more neural data, unlike our original intention to exploit the interval for ADO. A new

trial then begins when we are ready to estimate single-trial neural activation of the

previous trial. Therefore, we can collect behavioral responses but not neural data –

single-trial neural activation estimates – at the end of each trial. In this situation, a

hemodynamic lag hinders finishing single-trial beta estimation and ADO computation

within each trial. As this problem is not a matter of computation speed, developing

efficient algorithms may not be helpful.

One possible solution for the loss of neural data is to use the neural and behavioral

data obtained by the (t − 1)-th trial to generate the optimal proposal for (t + 1)-th

trial, a strategy we refer to as ‘one-trial-lag ADO’. Figure 2.5 describes how one-trial-

lag ADO works. For example, the first trial uses an ADO proposal that is derived by

the prior distribution of model parameters, whereas the second trial uses randomly

generated designs since the neural estimates from the first trial are not available at

this point. During the second trial, single-trial neural activation of the first trial is

estimated and used together with behavioral data to compute the optimal design for

the third trial. Similarly at the third trial, ADO uses the data obtained by the second

trial (green blank rectangle) to generate the optimal proposal for the fourth trial.

We can of course simplify the implementation of one-trial-lag ADO using randomly

36

(Single-trial beta estimation +)ADO

Randomized design

(Prior)

Trial #1 Trial #2 Trial #3

0 10 20 30 40

Time (second)

BOLD responses for the first trialBOLD responses for the second trialBOLD responses for the third trialConvolved BOLD resopnsesStimulus onset

Response interval

Stimulus duration

Design proposal

Figure 2.5: An illustration of ‘one-trial-lag ADO’. Dotted lines (red, blue, green) referto hypothetical hemodynamic responses evoked by a stimulus within each trial, anda straight line (gray) shows the expected value of convolved hemodynamic responses.The squares below the x-axis specifies the length of intervals required for each step.

37

generated designs for the first few trials, which is the strategy used in the real-world

experiment (Chapter 4). However, we first tested the performance of the most ideal

implementation.

One-trial-lag ADO relieves us from burdensome computational time when acquir-

ing single-trial beta estimates. Therefore, when single-trial beta estimates are the

main consideration in ADO, implementation of one-trial-lag ADO may be worth con-

sidering. The performance of one-trial-lag ADO is validated in Section 3.3.3.

38

Chapter 3: Simulation Experiments

In this chapter, we test the performance of fMRI-based ADO within a simulated

environment. First, we introduce the contrast discrimination task and develop a joint

model that explains the discrimination processes using both neural and behavioral

data. Next, we discuss simulation experiments for testing the performance of ADO

at different levels of generalizability. Lastly, we discuss the result of simulations and

its implications.

3.1 A Proof-of-Concept Study: Contrast Discrimination

In this section, we will introduce a task environment and propose a joint model

that explains decision processes embedded in the task. As a proof-of-concept study, we

selected a contrast discrimination task (Boynton et al., 1999) because formal models

describing neural activities for contrast stimuli have been well studied (DiMattina,

2016). Moreover, the task has been used in psychology and cognitive neuroscience

for studying human visual perception (Boynton et al., 1999) and the mechanism of

attention (Li, Lu, Tjan, Dosher, & Chu, 2008).

39

Fixation1 seconds Stimulus 1

6 seconds(flickering at 4Hz)

Interstimulusinterval

6-10 seconds(mean: 8 seconds)

Stimulus 26 seconds

(flickering at 4Hz) Fixation1 second Response

Same duration with the interstimulus

interval

Figure 3.1: An illustration of the trial structure of the contrast discrimination task.

3.1.1 Task

A contrast discrimination task uses a Gabor patch as its stimulus. A Gabor

patch is defined by a sinusoid convolved with a two-dimensional Gaussian function

on a two-dimensional space, and is represented as black-white stripes overlaid by a

circular mask gradually blocked as the distance from the center of the patch increases.

Having a higher contrast level means that borderline between black and white stripes

are clear.

Figure 3.1 presents an example of the task structure of the contrast discrimination

task based on a two-forced-alternative-choice paradigm. Given two grating stimuli

with different contrast levels, a participant is instructed to answer which of two stimuli

is of higher contrast. We assume that two grating stimuli will be presented consecu-

tively because simultaneous presentation may complicate discriminating signals from

each stimulus. We assumed sequential presentation of stimuli to allow a long enough

40

interstimulus interval to avoid excessive superimposition of hemodynamic responses

from two stimuli. Note that we used sequential presentation in the real-world exper-

iment (Chapter 4) as well.

3.1.2 Model

We propose a joint model that explains decision processes embedded in the con-

trast discrimination task. A neural submodel will explain how contrast level of a

grating annulus evokes neural activation in early visual cortex. Meanwhile, a be-

havioral submodel will generate a behavioral response from the neural activation

amplitudes. In addition, we assume that the neural activity will directly guide a

behavioral response (i.e., directed joint model).

Neural Submodel

First, we need to define a neural submodel that maps the contrast level of a

stimulus to the amplitude of single-trial neural activation level. Boynton et al. (1999)

found that the amplitude of BOLD responses tends to increase with the contrast level.

Here, we use Naka-Rushton equation to model the relationship between contrast levels

and associated neural response (DiMattina, 2016; Li et al., 2008):

β(c) = b+Rmaxc

2

c250 + c2

(3.1)

where c ∈ (0, 1) is the contrast level, b is the baseline neural activation, Rmax is the

maximum activation level achieved above the baseline b and c50 is the contrast level

which evokes half the maximum level of activation.

Given two consecutive stimuli with different contrast values c1 and c2 , we assume

that β1(c1) and β2(c2) predicted by the Naka-Rushton equation are compared and

41

lead the following decision scheme:Stimulus 1 has a higher contrast level if β2 > β1

Stimulus 2 has a higher contrast level if β2 < β1

.

However, estimated neural activation for each stimulus may have noise and therefore

might not perfectly match the prediction of the neural submodel. This assumption is

reasonable because stimulus-wise beta estimates are known to have large variability

(Abdulrahman & Henson, 2016; Mumford et al., 2012). Hence, we assume that actual

estimates of single-trial neural activation are considered as samples from a Gaussian

distribution with the prediction of the Naka-Rushton equation as its mean:

βi ∼ N(βi, σ2β) (i = 1, 2) (3.2)

where i is the stimulus order index, and N(µ, σ2) means a normal distribution with

a mean µ and a standard deviation σ.

Behavioral Submodel

Our assumptions about the behavioral submodel are twofold: (1) the model must

be able to explain variability in behavioral responses such as response error, and (2)

the response must be made based on comparison between contrast levels of two grat-

ing stimuli. Thurstone (1927) provided the first statistical formalization of compara-

tive judgment serving both assumptions, which will be referred to as a Thurstonian

decision model henceforth. He assumed that when a stimulus φ is mapped onto a

psychological scale as ψ, it is represented as a Gaussian distribution to reflect the

uncertainty of the psychological effect:

ψi ∼ N(ψi, s2i ) (i = 1, 2), (3.3)

42

where i is the stimulus order, ψi is the actual mental representation of a stimulus φi,

ψi is the (theoretically assumed) accurate representation of φi, and si is a standard

deviation term for the stimulus i representing uncertainty of the mental representa-

tion. Comparing the intensities of two stimuli φ1 and φ2, variability in comparative

judgment can be explained by a distribution of the difference between ψ1 and ψ2, the

mental representation of the intensities of the two stimuli. The difference distribution

can be derived by the following statistical principle:

ψ2 − ψ1 ∼ N(ψ2 − ψ1,√s2

1 + s22

2

).

The response probability p that a participant choose the second stimulus as having

higher contrast therefore computed as

p = 1− Φ∗(0; ψ2 − ψ1,√s2

1 + s22

2

) =

∫ ∞0

N(x; ψ2 − ψ1,√s2

1 + s22

2

)dx

where Φ∗(·;µ, σ2) is a cumulative distribution function of a Gaussian distribution

with a mean µ and a standard deviation σ. Finally, a behavioral response y based on

a Bernoulli distribution as

y ∼ Bernoulli(p).

Linking Function

As the last step, we will connect the neural and behavioral model to ensure that

the behavioral decision process is informed by neural activation. For this purpose,

the distribution of single-trial neural activation in Equation 3.2 will substitute the

distribution of mental representation (i.e., Equation 3.3). We consider stimulus-wise

neural activation estimates (βi) as an internal representation of grating stimuli (ψi).

The predictions from the Naka-Rushton equation (βi) substitutes the “accurate” rep-

resentation of the stimulus (ψi). The neural representation βi is normally distributed

43

with a mean βi and a standard deviation σβ; therefore, the standard deviation term

in the Thurstonian model (si) is replaced by σβ. Note that we use the same standard

deviation across all stimulus intensities (i.e., contrast levels).

Hence, the linking function can be expressed as follows:

β2 − β1 ∼ N(β2 − β1, (

√2σβ)2

),

p = 1− Φ∗(0; β2 − β1, (

√2σβ)2

)=

∫ ∞0

N(x; β2 − β1, (

√2σβ)2

)dx (3.4)

This linking function implies that if the neural activation level of the second stimulus

is higher than that of the first stimulus, the response probability p increases, which

means that the response supporting the second stimulus becomes more likely.

Likelihood Function

Given a matrix of presented contrast levels Ck×2 = cij, a matrix of single-trial

neural activation estimates Bk×2 = βij, and a behavioral response vector y = yi

(i = 1, · · · , k, j = 1, 2), the likelihood function of the joint model is described as

βij = b+Rmaxc

2ij

c250 + c2

ij

,

L(b, Rmax, c50, δ|C,B,y) =k∏i=1

[2∏j=1

N

(βij; βij,

( δ√2

)2)×

1− Φ∗(0; βi2 − βi1, δ2)yi×

Φ∗(0; βi2 − βi1, δ2)1−yi

](3.5)

where βij and βij are the prediction and the actual observation of single-trial neural

activation for the j-th stimulus of the i-th trial, and k is the total number of trials.

Note that the standard deviation of the difference distribution (and therefore that

44

of the single-trial neural activation distribution) is reparameterized (i.e., δ =√

2σβ).

This reparameterization is to stress that the variability in the difference distribution

explains the variability in a behavioral response.

Figure 3.2 presents a diagram of the joint model developed throughout this sec-

tion. Each node in the diagram represents a variable. Shaded circles are observed

data (i.e., stimulus-wise neural activation estimates, behavioral responses), whereas

empty circles are model parameters (i.e., b, Rmax, c50, δ), a design variable (i.e., cij),

and their transformation (i.e., βij, pi). Double-border and single-border circles repre-

sent deterministic and stochastic variables, respectively. The outer and inner plates

represent variables associated with each trial and stimulus, respectively.

3.2 Design and Procedure

Three simulations were performed to test the performance of ADO in a simulated

environment. In each simulation, an experiment collects the data for estimating

parameters of the joint model introduced in 3.1.2:

βij ∼ N

(βij,

( δ√2

)2)

where βij = b+Rmaxc

2ij

c2ij + c2

50

,

yi ∼ Bernoulli(

1− Φ∗(0; βi2 − βi1, δ2)). (3.6)

Here i = 1, · · · , 20 is the trial index, and j = 1, 2 is the stimulus order. (b, Rmax, c50, δ)

and cij ∈ (0, 1] refer to a parameter set and a contrast level used for generating the

prediction of a neural response βij from the Naka-Rushton equation. βij are the

single-trial neural activation level estimated from a raw BOLD time-series, and yi is

a behavioral response. Φ∗(x;µ, σ2) is a cumulative distribution function of a normal

distribution with mean µ and standard deviation σ.

45

Stimulus order: 𝑗 = 1, 2Trial: 𝑖 = 1,⋯ , 𝑇

𝛽*+,𝑐+,

𝑐./𝑅123𝑏

𝛽+,

𝛿

𝑦+𝑝+

Figure 3.2: A graphical representation of the joint model for contrast discrimination.Each node in the model represents a variable. Filled circles are observed data (i.e.,stimulus-wise neural activation estimates, behavioral responses), whereas empty cir-cles are model parameters (i.e., b, Rmax, c50, δ), a design variable (i.e., cij), and their

transformation (i.e., bij, pi). Double-line circles are deterministic variables, whereassingle-line circles are stochastic variables. The outer and inner plates represent vari-ables associated with each trial and stimulus, respectively.

46

The parameters of interest are three shape parameters of the Naka-Rushton equa-

tion (i.e., b, Rmax, c50) and a standard deviation of the difference distribution in the

Thurstonian decision model (i.e., δ). We compare the performance of ADO-based

experiments and randomized-design based experiments in terms of accuracy of pa-

rameter estimation and precision of the posterior distribution.

The first simulation tests the performance of ADO in terms of accuracy and preci-

sion of parameter estimation with a fixed “true” parameter set. The second simulation

extends the first simulation by using randomly generated parameter sets so that the

result of the first experiment can be generalized to any parameter settings. The third

simulation validates one-trial-lag ADO discussed in Section 2.7.3.

In the simulations, we assume that stimulus-level beta estimates are already ac-

quired from averaged BOLD responses across voxels in the ROI. This assumption

is made to control the randomness of data generation processes only by the Naka-

Rushton equation.

3.2.1 Simulation 1: Fixed Parameters

The purpose of the first simulation is to test the performance of fMRI-based ADO

with a fixed “true” parameters. The target parameter set is set as (b, Rmax, c50, δ) =

(0.05, 1, 0.35, 0.2). Figure 3.3 describes the shape of the Naka-Rushton equation and

its 95% credible interval from the given parameter set. Considering the neural vari-

ability imposed by a normal distribution (Equation 3.4), we expect that 95% of the

single-trial beta estimates at a specific contrast level are located within the 95%

credible interval.

47

0.0 0.2 0.4 0.6 0.8 1.0

−0.

50.

00.

51.

01.

5Mean activation function95% credible interval

Neu

ral a

ctiv

atio

n

Contrast

Figure 3.3: The shape of Naka-Rushton equation with (b, Rmax, c50, δ) =(0.05, 1, 0.35, 0.2). The x-axis refers to the contrast level, while y-axis is expectedneural activation (i.e., single-trial beta estimates). The solid line is the expected neu-ral expectation from the three shape parameters (b, Rmax, c50). b and Rmax determinethe lower and upper asymptotes of the graph, whereas c50 affects the slope of thegraph. δ controls the width of the credible interval.

48

Variable DetailsThe number of experiments 100

The number of trials 20Stimulus

(Rounded to 3 decimal places)0.010, 0.017, 0.028, 0.046, 0.077,0.129, 0.215, 0.359, 0.599, 1.000

Model parameters (b, Rmax, c50, δ) = (0.05, 1, 0.35, 0.2)

Prior

b (-3, 5)Rmax (-3, 5)c50 (0, 1)δ (0.0001, 5)

Initial grid setting

b -1, -0.7, -0.4, -0.1, 0.2Rmax 0.8, 0.9, 1, 1.1, 1.2c50 0.25, 0.3625, 0.475, 0.5875, 0.7δ 0.25, 0.4375, 0.625, 0.8125, 1

Neural response0.00, 0.11, 0.22, 0.33, 0.44,0.56, 0.67, 0.78, 0.89, 1.00

Grid sizeDesign space 90 = 102 − 10

Parameter space 625 = 54

Response space 50 = 52 × 2

DE-MCMC

Chains 24Burn-in samples 200

Valid posterior samples 800Migration probability 0.1

Dynamic GriddingMethod Singular value decompositionSchedule After every trialPercentile (20%, 35%, 50%, 65%, 80%)

Table 3.1: Default settings of Simulation 1.

Table 3.1 summaries the default settings for Simulation 1. We performed the

contrast discrimination experiment 100 times. Each experiment consists of 20 trials.

All parameters have uniform prior distributions as follows for evaluating posterior

49

densities and global utilities:

b ∼ U(−3, 5),

Rmax ∼ U(−3, 5),

c50 ∼ U(0, 1),

δ ∼ U(0.0001, 5).

Although we could implement different prior settings such as (truncated) normal

distributions for (b, Rmax, c50) and an inverse gamma distribution for δ, uniform priors

allow faster computation.

Ten contrast levels are defined and used in the experiment by logarithmically spac-

ing the interval [0.01, 1]. The size of the design space is the total number of contrast

combinations within a trial (i.e., 100 grid points); however, we exclude designs that

allow the two stimuli to have the same contrast level (i.e., 10 grid points). Therefore,

the total size of the design space is 90 grid points. Figure 3.4 illustrates the design

space represented in a linear scale (left) and a logarithmic scale (right).

Initial grid points in the four-dimensional parameter space represents the initial-

ization of the joint posterior distribution of parameters. Each parameter dimension

sets five grid points (i.e., (b, Rmax, c50, δ)), totaling 625 grid pointsl. Grid points in

each dimension are evenly spaced given the initial settings of the minimum and the

maximum (i.e., b ∈ [−1, 0.2], Rmax ∈ [0.8, 1.2], c50 ∈ [0.25, 0.7], δ ∈ [0.25, 1]). At the

end of every trial, a dynamic gridding procedure adaptively adjusts the distribution

of the parameter grid points.

50

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

−4 −3 −2 −1 0

−4

−3

−2

−1

0

1st contrast

2nd

cont

rast

Linear scale Logarithmic scale

Figure 3.4: The design space of the contrast discrimination experiment. The spaceconsists of 90 pairs of contrast levels for the first (x-axis) and second (y-axis) stimuli.The gray dots are individual designs that can be sampled during the experiment. Theleft and right plots represent the same design space in a linear scale and a logarithmicscale, respectively.

51

A three-dimensional response space defines grid points for expected neural and

behavioral responses. The first two dimensions represent the expected neural activa-

tion (i.e., single-trial beta estimates) evoked by the first and second grating stimuli;

the third dimension represents the expected behavioral response. For each neural re-

sponse dimension, ten grid points were set by evenly dividing the interval [0, 1]. The

behavioral response dimension consists of two grid points (i.e., [0, 1]) as we assume

that the behavioral responses are Bernoulli trials.

3.2.2 Simulation 2: Randomly Generated Parameters

The second simulation is designed to test whether or not the performance of

ADO remains stable with various combinations of the “true” parameters. 30 sets

of parameters were generated by uniformly sampling from b ∈ [−0.2, 0.5], Rmax ∈

[1.0, 2.2], c50 ∈ [0.1, 0.6] and δ ∈ [0.2, 0.6]. As the performance of ADO may depend

on not only the true parameters but also randomly generated responses, we use each

parameter set 10 times to include the variability in the data generation process.

Therefore, Simulation 2 consists of 300 experiments in total. Table 3.2 and Figure

3.5 present a complete list and graphical illustrations of the parameter sets used in

Simulation 2.

We changed the initial grid settings for the parameter space by adjusting the

minimum and maximum of grid points for each dimension: b ∈ [−2, 2], Rmax ∈ [0.5, 3],

c50 ∈ [0.05, 0.95], δ ∈ [0.001, 1.2]. The grid space for expected neural responses was

also updated by evenly dividing the interval [0, 2] into 10 grid points. All other

settings are the same with Simulation 1. Table 3.3 provides the summary of the

default settings in Simulation 2.

52

−1.0 −0.5 0.0

1.0

1.4

1.8

2.2

b

Rm

ax

−1.0 −0.5 0.0

0.2

0.4

0.6

0.8

b

c 50

−1.0 −0.5 0.0

0.2

0.4

0.6

0.8

1.0

b

δ

1.0 1.4 1.8 2.2

0.2

0.4

0.6

0.8

Rmax

c 50

1.0 1.4 1.8 2.2

0.2

0.4

0.6

0.8

1.0

Rmax

δ

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

1.0

c50

δ

Figure 3.5: The scatter plots of the parameter sets used in Simulation 2. Black dotsin each plot indicate the values of (b, Rmax) (upper left), (b, c50) (upper center), (b, δ)(upper right), (Rmax, c50) (lower left), (Rmax, δ) (lower center), and (c50, δ) (lowerright).

53

SetParameter values

SetParameter values

b Rmax c50 δ b Rmax c50 δ1 -0.654 1.551 0.423 0.487 16 -1.028 2.347 0.447 0.2552 -0.690 1.020 0.791 0.979 17 0.015 1.892 0.522 0.2253 -0.857 1.444 0.794 0.658 18 -1.072 1.781 0.178 0.8124 0.094 2.363 0.594 0.359 19 -1.068 1.719 0.644 0.3905 0.096 1.112 0.659 0.433 20 -0.152 1.284 0.471 0.9926 -0.469 1.167 0.509 0.327 21 -0.212 1.709 0.624 0.9017 -0.904 1.991 0.333 0.850 22 -0.164 1.240 0.727 0.9188 0.342 1.794 0.556 0.395 23 -1.015 1.086 0.620 0.8359 -1.025 1.310 0.168 0.606 24 -0.579 2.063 0.283 0.77010 -1.344 2.184 0.301 0.400 25 0.071 1.814 0.300 0.34811 -0.372 2.293 0.796 0.347 26 -1.284 1.420 0.869 0.30612 -1.179 1.711 0.207 0.868 27 -0.849 1.625 0.577 0.39013 0.009 2.352 0.713 0.610 28 -0.704 1.600 0.879 0.59414 -1.145 2.008 0.387 0.736 29 0.172 1.039 0.429 0.22315 -1.378 2.063 0.774 0.812 30 -0.353 1.321 0.432 0.651

Table 3.2: A list of 30 parameter sets used in Simulation 2. Parameter values arerounded up to three decimal places.

3.2.3 Simulation 3: One-trial-lag ADO

The third simulation aims to validate the practicability of one-trial-lag ADO. As

described in Section 2.7.3, the experiment starts with an ADO proposal derived by

the prior distribution. However, the second trial uses a randomly generated design

considering the time constraint for estimating the single-trial neural activation at the

first trial. At the end of trial t (t ≥ 2), an optimal design for the (t + 1)-th trial is

proposed by the single-trial beta estimates and behavioral responses obtained by the

(t−1)-th trial. Figure 2.5 offers the visualization of the one-trial-lag ADO procedures

used in Simulation 3.

54

Variable DetailsThe number of experiments 10 per each parameter set

The number of trials 20Stimulus

(Rounded to 3 decimal places)0.010, 0.017, 0.028, 0.046, 0.077,0.129, 0.215, 0.359, 0.599, 1.000

Prior

b (-3, 5)Rmax (-3, 5)c50 (0, 1)δ (0.0001, 5)

Initial grid settings

b -2, -1, 0, 1, 2Rmax 0.5, 1.125, 1.75, 2.375, 3c50 0.05, 0.275, 0.5, 0.725, 0.95δ 0.001, 0.30075, 0.6005, 0.90025, 1.2

Neural response0, 0.22, 0.44, 0.67, 0.89,1.11, 1.33, 0.56, 0.78, 2

Grid sizeDesign space 90 = 102 − 10

Parameter space 625 = 54

Response space 50 = 52 × 2

DE-MCMC

Chains 24Burn-in samples 200

Valid posterior samples 800Migration probability 0.1

DynamicGridding

Method Singular value decompositionSchedule After every trialPercentile (20%, 35%, 50%, 65%, 80%)

Table 3.3: Default settings in Simulation 2

The basic settings including the “true” parameter sets follow Simulation 2 except

for the schedule of dynamic gridding. In Simulation 3, we apply dynamic gridding only

after the 4th, 8th, 12th, and 16th trials because the limited amount of data in one-

trial lag optimization might misguide grid updating. In the one-trial-lag ADO setting,

the grid updating procedure does not exploit the available data because it relies on

the joint posterior distribution estimated from the data lacking the latest one trial.

Updating grid points every trial is risky as we cannot rule out the the possibility that

55

the “deficient” posterior distribution may not include the “true” parameters. Sparse

dynamic gridding schedule is a reasonable choice here because the schedule will force

the grid-updating algorithm to wait until an enough amount of data is collected.

3.2.4 Procedure

At each trial, two single-trial beta estimates and one behavioral choice response

corresponding to them were generated by the joint model discussed in 3.1.2 (Equation

3.6). The design is selected by the proposal of ADO in ADO-based experiments, and

uniform sampling without replacement from the design space (Figure 3.4).

Once the neural and behavioral responses were acquired, posterior distributions

were updated using the DE-MCMC sampler (B. M. Turner, Sederberg, et al., 2013).

The 24 chains were initialized by multinomial sampling from the parameter grid points

with probability of the posterior densities of the grid points. We ran the algorithm

for 1,000 iterations and discarded the first 200 steps as burn-in. The migration was

applied during the first 101 iterations with probability of 0.1.

Dynamic gridding adjusted the parameter grid space according to its predeter-

mined schedule. The estimated joint posterior distribution was rotated to a four-

dimensional orthogonal space by multiplying a rotation matrix obtained by singular

value decomposition. New grid points were defined at the 20th, 35th, 50th, 65th, and

80th percentiles of the marginal distribution of each dimension. Newly constructed

grid points in the orthogonal space were reversed to the original parameter space by

multiplying an inverse rotation matrix to them.

We defined measures of accuracy and precision of posterior estimates by root mean

square error (RMSE) and standard deviation (PSD) of the posterior distribution.

56

Specifically, parameter-wise performance measures (RMSEi,t and PSDi,t) and pooled

performance measures (RMSEpooled,t and PSDpooled,t) were calculated at each trial t

as follows:

RMSEi,t =

√∑1000k=201(xijkt − θi)2

800,

RMSEpooled,t =

√√√√1

4

4∑i=1

RMSE2i,t,

PSDi,t =

√∑1000k=201(xijkt − xi·)2

800,

PSDpooled,t =

√√√√1

4

4∑i=1

PSD2i,t (3.7)

where θ = (θ1, θ2, θ3, θ4) ≡ (b, Rmax, c50, δ) is a set of “true” parameters assumed in

each simulation, and xijkt is a value of the j-th chain of the DE-MCMC sampler for

the parameter θi at the k-th iteration.

3.3 Results

3.3.1 Simulation 1: Fixed Parameters

Figures 3.6 and 3.7 show the performance measures (i.e., RMSE, PSD) pooled

across parameters (Figure 3.6) and computed for each parameter (Figure 3.7), re-

spectively. In both plots, red lines refer to the performance measures from the ADO

experiments (solid lines) and their 95% credible interval (dotted lines), whereas black

lines refer to those from the randomized design experiments (solid lines) and their 95%

credible interval (dotted lines). Lower values are preferred in both RMSE and PSD

because smaller RMSE and PSD mean higher accuracy and precision in parameter

estimation.

57

log(

RM

SE

)

log(

Pos

terio

r S

D)

−2.

0−

1.0

0.0

1.0

log(

qual

.rep

.sum

[, 2:

3, 1

, 2])

ADORandomized

−2.

0−

1.0

0.0

0.5

log(

qual

.rep

.sum

[, 2:

3, 2

, 2])

1 5 10 15 20

Trial

Figure 3.6: Pooled root mean squared error (RMSEpooled; upper) and pooled posteriorstandard deviation (PSDpooled; lower) in Simulation 1. All the performance statistics(i.e., RMSEpooled, PSDpooled) are log-transformed. Red and black lines show theperformance statistics from ADO designs and randomized designs, respectively. Solidlines represent the mean of the performance measures changing across trials. Dottedlines represent 95% credible interval of the performance measures.

58

b

Rm

ax

c 50

δ−

3−

10

log(

qual

.arr

ay[,

p, ,

j, i])

−3

−1

0lo

g(qu

al.a

rray

[, p,

, j,

i]) ADORandomizedADORandomized

−2

01

log(

qual

.arr

ay[,

p, ,

j, i])

−2.

5−

1.0

0.5

log(

qual

.arr

ay[,

p, ,

j, i])

−3.

0−

2.0

−1.

0lo

g(qu

al.a

rray

[, p,

, j,

i])

−3.

0−

2.0

log(

qual

.arr

ay[,

p, ,

j, i])

−3

−1

log(

qual

.arr

ay[,

p, ,

j, i])

1 5 10 15 201 5 10 15 20

−4

−2

0lo

g(qu

al.a

rray

[, p,

, j,

i])

1 5 10 15 201 5 10 15 20Trial

log(RMSE) log(Posterior SD)

Figure 3.7: Root mean squared error (RMSEi; left) and posterior standard deviation(PSDi; right) for each parameter (b, Rmax, c50, δ from top to bottom) in Simulation1. All the performance statistics (i.e., RMSEi, PSDi) are log-transformed. Redand black lines show the performance statistics from ADO designs and randomizeddesigns, respectively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval of the performancemeasures.

59

Figure 3.6 shows that ADO performs better than randomized designs in both

accuracy (RMSE) and precision (PSD). Note that the upper bound of the 95% credible

interval of the performance of ADO is in overall close or even lower than the mean

performance of the randomized-design experiments. This result suggests that ADO

experiments mostly show better accuracy and precision compared to the mean-level

performance of the randomized designs.

Figure 3.7 illustrates how differently ADO handles model parameters by showing

parameter-wise performance measures. The performances of ADO and randomized

designs do not significantly distinguish in b and δ. Especially in b, ADO shows a

similar level of accuracy and precision with randomized design for the first 4-5 trials,

but is slightly overridden by randomized designs from the 6th trial. On the contrary,

ADO performs better than randomized designs in Rmax and c50.

The performance of ADO discussed above can be explained by the sequential

pattern of the design proposals. Figure 3.8 compares the designs proposed by ADO

(upper row) and randomized sampling (lower row). A 20-trial experiment is divided

into four subplots containing design information of five trials for each (Trials 1-5,

6-10, 11-15, and 16-20 from left to right). The shades colored in red (ADO) and gray

(randomized designs) under each design point (black dots) represents how frequently

a specific design was selected. A design point is colored with a darker shade if the

point is selected more frequently.

In the randomized-design experiments (lower row), selected designs are distributed

nearly uniformly over the design space without any regularity, and this pattern does

not change as the experiments proceeds. However, in the ADO experiments (upper

row), the distribution of selected designs is concentrated on a few points and this

60

AD

O

Ran

dom

ized

log(prop.array[(5 * j − 4):(5 * j), 1, , i])

log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])


log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])


log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])


log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])


log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])


log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])


log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])


log(

prop

.arr

ay[(

5 *

j − 4

):(5

* j)

, 2, ,

i])

log(Contrast 1)

log(Contrast 2)

1−5 6−10 11−15 16−20

Figure 3.8: A trace plot of experimental designs from ADO (upper row) and random-ized sampling (lower row) in Simulation 1. A sequence of 20 trials were segmentedinto four intervals (Trials 1-5, 6-10, 11-15, and 16-20 from left to right). The x-axisand y-axis of each subplot represent the contrast level of the first and second stimuli,respectively. Black dots represent individual design points. Shaded regions representactually selected designs; more frequently selected designs have darker shades. Thescale is intentionally omitted for simplicity. See Figure 3.4 for the detailed informationabout the scale.

61

pattern changes over trials. For example, during the first five trials (the first left

subplot on the upper row), ADO focuses on the lowest-contrast pairs (points in the

left lower side of the subplot) or the highest-contrast pairs (points in the right upper

side of the subplot). In fact, those designs are optimal selections for estimating b and

Rmax. However, ADO gradually moves its attention to the mid-range contrast values

to estimate c50. Note that the “mid-range” contrast values that ADO searches the

most (i.e., the most reddish shades in the first and second right subplots) include the

contrast levels of 0.215 or 0.359, as they are the contrast values that are closest to the

predefined c50 = 0.35. At the end of the experiment, ADO continues selecting at least

one mid-level contrast value to collect more information about c50 while frequently

including either the lowest or the highest contrast level to improve the estimation of

b and Rmax.

3.3.2 Simulation 2: Randomly Generated Parameters

As in 3.3.1, Figures 3.9 and 3.10 show the performance measures (i.e., RMSE,

PSD) pooled across parameters and those computed for each parameter, respectively.

As the performance statistics from 300 simulations with different “true” parameter

sets are aggregated, the performance of ADO and randomized designs are less differ-

entiated compared to the result of Simulation 1. However, ADO-based experiments

still tend to perform better than randomized-design experiments in both mean accu-

racy (RMSE) and precision (PSD). The parameter-wise performances follow a similar

pattern as that in Simulation 1: ADO overrides randomized designs when estimating

Rmax and c50, while showing a similar level of performance with randomized designs

for δ and allowing overtaking for b.

62

log(

RM

SE

)

log(

Pos

terio

r S

D)

−2.

0−

1.0

0.0

1.0

log(

qual

.rep

.sum

[, 2:

3, 1

, 2])

ADORandomized

−2.

5−

1.5

−0.

50.

5lo

g(qu

al.r

ep.s

um[,

2:3,

2, 2

])

1 5 10 15 20

Trial

Figure 3.9: Pooled root mean squared error (RMSEpooled; upper) and pooled posteriorstandard deviation (PSDpooled; lower) in Simulation 2. All the performance statistics(i.e., RMSEpooled, PSDpooled) are log-transformed. Red and black lines show theperformance statistics from ADO designs and randomized designs, respectively. Solidlines represent the mean of the performance measures changing across trials. Dottedlines represent 95% credible interval of the performance measures.

63

b

Rm

ax

c 50

δ−

3−

10

log(

qual

.arr

ay[,

p, ,

j, i])

−3

−1

0lo

g(qu

al.a

rray

[, p,

, j,


−2

01

log(

qual

.arr

ay[,

p, ,

j, i])

−3.

0−

1.5

0.0

log(

qual

.arr

ay[,

p, ,

j, i])

−4

−2

log(

qual

.arr

ay[,

p, ,

j, i])

−4

−2

log(

qual

.arr

ay[,

p, ,

j, i])

−3

−1

1lo

g(qu

al.a

rray

[, p,

, j,

i])

1 5 10 15 201 5 10 15 20

−4

−2

0lo

g(qu

al.a

rray

[, p,

, j,

i])

1 5 10 15 201 5 10 15 20Trial



64

−3 −2 −1 0 1

−3

−2

−1

01

Trial # 2Trial # 4Trial # 8Trial # 13Trial # 20

−3 −2 −1 0 1ADO

Ran

dom

ized


Figure 3.11: The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 2. The x-axis and y-axis refer to the value of performancestatistics (i.e., RMSE, PSD) in the ADO experiments and randomized design exper-iments. Each trial is color-coded for visual clarity (Red: Trial 2, Orange: Trial 4,Green: Trial 8, Blue: Trial 13, Purple: Trial 20). Colored dots represent the perfor-mance statistics from individual simulations. Solid lines represent the 80% highestdensity regions.

65

Because Simulation 2 includes 30 different “true” parameter sets, it is difficult to

show the trace of design proposals for all simulations. Instead, Figure 3.11 illustrates

the same result from the 300 simulations separately as a scatter plot. For each

experiment, we compared log-transformed RMSE (left) and PSD (right) obtained

from the ADO experiment (x-axis) and from the randomized-design experiment (y-

axis). Each experiment is color-coded to differentiate the performance measures at

different trials: dots colored in red, orange, green, blue, and purple represent the

result after the 2nd, 4th, 8th, 13th, and 20th trials, respectively. Solid lines represent

80% credible regions obtained by two-dimensional kernel density estimation. Points

located at the shaded region are preferred because the performance measures obtained

from ADO are smaller than those from randomized designs.

Figure 3.11 shows that more points are located within the shaded area, especially

when the experiment is still in its earlier trials (i.e., the 4th and 8th trials). The

performance of the randomized-design experiments improves as the experiment pro-

ceeds, and finally becomes similar with that of ADO at the end of the experiment (i.e.,

purple points are located near the gray-dashed identity line). This result is expected

because accuracy and precision of parameter estimation will improve as long as we

keep data collection, regardless of how optimal the designs are. However, we should

consider it important that ADO drives the first few trials to guide data collection

procedures more efficiently.

Figure 3.12 depicts the proportion of the experiments that ADO performs better

randomized designs across trials (i.e., the proportion of the points located at the

shaded area in Figure 3.11 for each trial). The performance of ADO reaches at its

peak around the third or fourth trial, and starts to decrease gradually after that.

66

0.0

0.2

0.4

0.6

0.8

1.0

Trial

Pro

port

ion:

AD

O w

ins

1 5 10 15 20

RMSEPosterior SD

Figure 3.12: The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 2. Points located at the shaded areaare preferred. The accuracy and precision at each trial are represented as a red circleand a blue square, respectively.

67

However, more than half of the ADO experiments perform better the randomized-

design experiments even at the 20th trial.

3.3.3 Simulation 3: One-trial-lag ADO

Figures 3.13 and 3.14 show the performance measures (i.e., RMSE, PSD) pooled

across parameters and those computed for individual parameters, respectively. In

short, the performance of ADO shows similar patterns with Simulation 2 (Figures

3.9 and 3.10): ADO overrides randomized designs mainly because of its selective

optimization on Rmax and c50.

Figures 3.15 and 3.16 summarizes the result of 300 simulations with different

“true” parameters in Simulation 3. The result is similar with that from Simulation

2: the performance of ADO reaches at its peak around the third trial and more

than 70% of the ADO experiments supersede the randomized-design experiments

for next 4-5 trials. Although the proportion of the experiments that ADO shows

better performance gradually decreases as the number of trial increases, more than

half of the ADO experiments still show better performance the randomized-design

experiments at the 20th trial. To summarize, the result of Simulation 3 reveals

that the performance of one-trial-lag ADO is comparable to that of the ideal ADO

implementation without lagging.

3.4 Discussion

In this section, we first introduced the contrast discrimination task for a proof-

of-concept, and developed a directed joint model combining the Naka-Rushton equa-

tion and a Thurstonian decision model as its neural and behavioral submodels. We

then performed three simulation experiments to verify that ADO can generate design

68

log(

RM

SE

)

log(

Pos

terio

r S

D)

−2.

0−

1.0

0.0

1.0

log(

qual

.rep

.sum

[, 2:

3, 1

, 2])

ADORandomized

−2.

5−

1.5

−0.

50.

5lo

g(qu

al.r

ep.s

um[,

2:3,

2, 2

])

1 5 10 15 20

Trial

Figure 3.13: Pooled root mean squared error (RMSEpooled; upper) and pooled pos-terior standard deviation (PSDpooled; lower) in Simulation 3. All the performancestatistics (i.e., RMSEpooled, PSDpooled) are log-transformed. Red and black linesshow the performance statistics from ADO designs and randomized designs, respec-tively. Solid lines represent the mean of the performance measures changing acrosstrials. Dotted lines represent 95% credible interval of the performance measures.

69

b

Rm

ax

c 50

δ−

3−

10

log(

qual

.arr

ay[,

p, ,

j, i])

−3

−1

0lo

g(qu

al.a

rray

[, p,

, j,


−2

01

log(

qual

.arr

ay[,

p, ,

j, i])

−3.

0−

1.5

0.0

log(

qual

.arr

ay[,

p, ,

j, i])

−4

−2

log(

qual

.arr

ay[,

p, ,

j, i])

−4

−2

log(

qual

.arr

ay[,

p, ,

j, i])

−3

−1

1lo

g(qu

al.a

rray

[, p,

, j,

i])

1 5 10 15 201 5 10 15 20

−4

−2

0lo

g(qu

al.a

rray

[, p,

, j,

i])

1 5 10 15 201 5 10 15 20Trial



70

−3 −2 −1 0 1

−3

−2

−1

01

Trial # 2Trial # 4Trial # 8Trial # 13Trial # 20

−3 −2 −1 0 1ADO

Ran

dom

ized


Figure 3.15: The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 3. The x-axis and y-axis refer to the value of performancestatistics (i.e., RMSE, PSD) in the ADO experiments and randomized design exper-iments. Each trial is color-coded for visual clarity (Red: Trial 2, Orange: Trial 4,Green: Trial 8, Blue: Trial 13, Purple: Trial 20). Colored dots represent the perfor-mance statistics from individual simulations. Solid lines represent the 80% highestdensity regions.

71

0.0

0.2

0.4

0.6

0.8

1.0

Trial

Pro

port

ion:

AD

O w

ins

1 5 10 15 20

RMSEPosterior SD

Figure 3.16: The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 3. Points located at the shaded areaare preferred. The accuracy and precision at each trial are represented as a red circleand a blue square, respectively.

72

proposals that maximizes information about the model parameters by incorporating

neural and behavioral data in real-time. For thorough verification, ADO was tested

by three levels of difficulty: (1) with a fixed “true” parameter set, (2) with randomly

generated “true” parameter sets, and (3) using one-trial-lag optimization.

Across all three simulations, ADO successfully proposed optimal designs and

showed better performance than randomized designs both in accuracy (i.e., RMSE)

and precision (i.e., PSD) because ADO trades off the importance of b and (Rmax, c50)

in its optimization procedures. In contrast to randomized-design experiments, ADO

decided that reducing uncertainty in Rmax and c50 is more “informative” than fo-

cusing on b. This strategy seems reasonable because Rmax and c50 are inherently

ill-constrained than b. By its definition, b as a baseline parameter is implicitly con-

ditioned to have a value near zero. However, estimation of Rmax is more difficult as

there is no theoretical assumption on the upper bound for single-trial beta estimates.

The difficulty in estimating Rmax is associated with c50 as well because c50 needs Rmax

(and b) to be constrained first for estimation. From this relationship among model

parameters, we can interpret that ADO prioritized efficient estimation of Rmax while

sacrificing accuracy and precision of b as this parameter is easier to estimate than

Rmax (and therefore c50). In short, ADO’s selective focus on experimental designs

allows more accurate and precise parameter estimation.

The trace plot of the design proposals from ADO from Simulation 1 (Figure 3.8)

shows how ADO handled the tradeoff among b, Rmax, and c50. When the experiment

begins, ADO tries to acquire information about b and Rmax first by sampling the

lowest-contrast or highest-contrast designs. However, it gradually explores mid-range

73

contrast levels to constrain c50 and successfully identifies candidate contrast levels

that are most likely to be c50.

The results from Simulation 2 and 3 suggest that the advantage of ADO-based

experiments can be generalized even with various “true” parameter sets, with a lim-

ited schedule of dynamic gridding, and with a practical constraint of one-trial-lag

optimization. These results offer practical implications when using fMRI-based ADO

in the real-world. For example, by ensuring a diverse range of parameter sets, our re-

sults provide assurance that a feasible level of performance even in real-world settings

that we don’t have any knowledge about the “true” parameter values. The similar-

ity of the ADO performances led by full and reduced dynamic gridding schedules

saves both time and computation resources because we don’t need to estimate a full

joint posterior distribution every trial. Finally, one-trial-lag optimization helps us set

up more reasonable interstimulus/intertrial intervals in ADO-based experiments. If

the performance of one-trial-lag ADO were not comparable to that of no-lag ADO, it

would be better to wait until the neural activation level from a stimulus/trial becomes

fully estimable, spending 20 seconds or more between stimuli and trials.

74

Chapter 4: Real-time fMRI Experiment

In this chapter, we test the performance of fMRI-based ADO within a real-time

fMRI experiment. First, we describe how a grating stimulus was defined and gener-

ated in the experiment, and then discuss the designs of the functional localizer and

contrast discrimination tasks. Second, we explain the real-time fMRI procedures for

ADO-based runs in detail. Third, we discuss the methods used for evaluating the

performance of ADO. Lastly, we discuss the result of the real-world experiment.

4.1 Methods

4.1.1 Participants

Four participants participated in the experiment. Each participant had three

two-hour sessions including 90-minute functional MR scanning. Two among four

participants were female, and the mean age of participants was 24.75. All participants

were recruited from The Ohio State University and provided informed consent. The

study was approved by the Institutional Review Board of The Ohio State University.

4.1.2 Stimuli

All stimuli and instructions were generated by SMILE (State Machine Inter-

face Library for Experiments; http://smile-docs.readthedocs.io/en/latest/),

75

Distance from the center of the screen (degree)

Stim

ulus

inte

nsity

7.26 4.34 2.94 0 1.74 4.34 7.26

0.00

0.50

1.00

Linear maskGrating intensity

Figure 4.1: An illustration of the linear mask applied to a grating pattern. The blackline shows the shape of the mask, while the red line describes the masked gratingpattern obtained when crossing the center of the screen horizontally.

a Python library for programming psychological experiments on a MacBook Pro

2016. Each participant laid on the scanner bed and viewed the stimuli presented

onto a rear-projection screen in the coil. Stimuli were presented at eye level at a

distance of 74cm.

Each grating stimulus was generated with spatial frequency of 3.06 cycles per

degree, and formed as an annulus not to expose the grating patterns at fovea. The

radii of the external and internal circles were 14.52 degree and 3.48 degree in visual

angle, respectively. In addition, a linear mask was applied to the annulus to allow

gradual changes in stimulus intensity, which is depicted in Figure 4.1.

The black line describes the shape of the mask: the stimulus intensity increases

from a distance of 1.74 degree reaches its maximum at a distance of 2.94 degree, and

fades gradually from a distance of 4.34 degree from the center of screen. The red

wavy line shows the actual grating pattern after the mask is applied.

76

Figure 4.2: Examples of the grating stimuli used in the experiment. The contrastlevels of the five stimuli are 0.01, 0.03, 0.1, 0.3, and 1 (from left to right).

Contrast levels are defined in the interval [0, 1]. When the contrast level is 0, the

stimulus is completely flattened and shown as a gray plane. When the contrast level

is 1, the stimulus shows a fluctuating black-white stripe pattern. Figure 4.2 shows

examples of the grating stimuli used in the experiment. The contrast levels of the

five stimuli are 0.01, 0.03, 0.1, 0.3, and 1 from left to right. The figure illustrates that

the higher contrast level allows for better discriminability between the high-intensity

and low-intensity regions.

4.1.3 Design

Main task: Contrast Discrimination

The design of the main task follows the description of 3.1.1 and Figure 3.1. A

participant was presented two consecutive grating stimuli with different contrast levels

and asked to keep fixation at a white “+” marker located at the center of a screen.

When the fixation marker changed to a response cue (a white “×” marker), the

participant was asked to answer whether the first or the second one was of higher

contrast. The participant was given two 2-button response pads to both hands, and

was instructed to use one button for each side to make a response. The response-

button association rule altered every session. For example, a participant was asked to

77

use the button in the left box to respond that the first stimulus had higher contrast

level in one session, and to use the button in the right box to make the same response

in the next session.

As in the simulation experiments, the contrast values are logarithmically spaced

with 10 levels (i.e., 0.010, 0.017, 0.028, 0.046, 0.077, 0.129, 0.215, 0.359, 0.599, 1.000).

We also restricted the design space such that no two stimuli had exactly the same

contrast (see Figure 3.4 for graphical illustration of the design space). Each run

consisted of 20 trials. The design was randomly selected in the run without ADO,

while ADO proposed the optimal design in the ADO-based run after the first three

trials. The order of the without-ADO run and ADO-based runs altered every session.

The difference between the run without ADO and with ADO is the length of

intertrial interval. ADO requires time to calculate an optimal design at the end

of every trial, and for adjusting parameter grids after the 4th, 8th, 12th, and 16th

trials. Specifically, fMRI-based ADO in this experiment requires 6-8 seconds for

proposing the optimal design and additional 4-5 seconds for full posterior estimation

and grid adjustment. Therefore, 6 seconds of the mean intertrial interval used in

the randomized-design experiment was not enough in the ADO-based run. While

the intertrial interval of the run without ADO was either 6, 8, or 10 seconds, that

of the ADO-based run was extended for 4 seconds (i.e., 10, 12, or 14 seconds). The

total length of the run without ADO was 624 seconds. The length of the ADO-

based run slightly varied every session due to the computation time required for

ADO and its subcomponents (i.e., full posterior sampling, adaptive gridding), but

took approximately 15 minutes.

78

Functional Localizer

Before running the main task, we ran a functional localizer task to detect the

voxels rigorously coactivating with the grating stimuli. The functional localizer task

was based on a continuous carry-over design (Aguirre, 2007) that controls the order

effect of the signal by considering all possible carry-over patterns from a stimulus pool.

As we can expect that the order of stimuli affect the neural activation pattern, the

continuous carry-over design can be used to detect voxels that share similar activation

patterns and the carry-over effect.

The experiment using the continuous carry-over design uses a fixed stimulus pre-

sentation order that realizes all possible configuration of carry-over patterns. Here,

we recommend making stimulus presentation settings as similar as possible to those

of the main task. For example, we set the stimulus duration (6 seconds) and the mean

interstimulus interval (8 seconds) as it was in the main task. However, generating all

possible carry-over patterns from ten contrast levels made the task length excessive

and therefore could have caused problematic issues such as participant fatigue and

scanner drift. Hence, we decided to use only five logarithmically spaced contrast lev-

els that could approximate contrast levels used in the main task (i.e., 0.01, 0.03, 0.1,

0.3, 1). The total length of the functional localizer task was 528 seconds.

In the task, the participant was instructed to press a button when the current

stimulus was of the same contrast with the previous one while maintaining fixation

at the center of the screen. However, the behavioral task served no function; it was

required only to help participants concentrate on the stimulus presentation.

79

4.1.4 Real-time fMRI Procedure

Preliminary Tasks

The participant went through a 30-minute briefing including informed consent,

safety screening, and a brief introduction about the experimental task. MRI scanning

was performed in the Center for Cognitive and Behavioral Brain Imaging at The Ohio

State University. A Siemens MAGNETOM Prisma 3T Magnetic Resonance Imaging

System was used with a 32-channel head coil.

First, the MPRAGE sequence was used for obtaining the anatomical structure of

the brain (1 × 1 × 1 mm3 resolution, inversion time = 950 msec, repetition time =

1900 msec, echo time = 4.44 msec, flip angle = 12 degree, matrix size = 256 × 224

mm, 176 sagittal slices per slab; scan time = 6.5 minutes). As we hoped to constrain

the ROI to the primary visual cortex (V1), the area to be scanned was then specified

by covering the Brodmann area 17 and most of the occipital lobe with a T2*-weighted

EPI sequence (repetition time = 2000 msec, echo time = 28 msec, flip angle = 72

degree, field of view = 200× 200 mm, in-plane resolution = 2× 2 mm, and 33 slices

with 2-mm thickness), which is referred to as the EPI space henceforth for simplicity.

All BOLD responses from the functional localizer task and the contrast discrimination

task were obtained using the EPI sequence with the same setting.

We should mention that further analyses (i.e., detecting voxels of interest, real-

time ADO computation, offline data analysis) used brain images without preprocess-

ing steps that are usually performed in offline analyses such as spatial and temporal

filtering due to its time consumption. The only exception is motion correction: the

80

MR scanner used in this experiment offers functionality for prospective motion cor-

rection – computational methods for reducing head motion artifacts during data ac-

quisition (for a recent review of prospective motion correction, see Maclaren, Herbst,

Speck, & Zaitsev, 2013).

Data preprocessing

We first carried out the functional localizer task to detect the voxels co-activating

with the presented grating stimuli. After the functional localizer task was complete,

the experimenter processed the anatomical data and the EPI localizer data for regis-

tration to the standard MNI space. However, the protocol encounters a compatibility

issue here because the MR scanner exports images as DICOM (Digital Imaging and

Communications in Medicine) files. Our further data preprocessing steps rely on FSL

(FMRIB software library; Smith et al., 2004), which requires images in a NIfTI-1 for-

mat. Therefore, we used a Python library dcmstack (http://dcmstack.readthedocs

.io/en/v0.6.1/) to transform DICOM files into the NIfTI-1 format.

Once the images were reformatted, we registered the anatomical images in the

subject space to the standard MNI brain template with nonlinear warping using

FLIRT (Jenkinson, Bannister, Brady, & Smith, 2002; Jenkinson & Smith, 2001) and

FNIRT (Andersson & Jenkinson, 2007) in FSL. Next, we aligned the EPI localizer

images to the anatomical images using FLIRT. By using the linear and nonlinear

warping obtained from the previous steps, we converted the mask for Brodmann area

17 provided by Julich histological atlas (Amunts, Malikovic, Mohlberg, Schormann,

& Zilles, 2000; Eickhoff et al., 2005) to the EPI space. As these procedures usually

take more than 7 minutes due to nonlinear registration, we asked the participant to

81

practice the contrast discrimination task for (approximately) 6 minutes to learn the

response-button mapping rule.

Determination of Voxels of Interest

The functional localizer task must detect voxels whose activation patterns are

strongly associated with stimulus presentation in the task. For selecting target voxels

in the main task, we performed a GLM analysis to all voxels in the EPI space using

the data from the functional localizer task. The GLM design matrix used only one re-

gressor representing the hemodynamic responses caused by all stimuli presented in the

functional localizer task. This GLM analysis did not consider any temporally auto-

correlated noise in the model structure because the analysis may be time-consuming.

Voxels in interest (VOIs) were determined by thresholding the t-statistic associ-

ated with the regression coefficient of the task-relevant regressor. The decision rule

is as follows: If the number of voxels with t ≥ 5 was equal to or greater than 200, we

used the threshold as t = 5. However, when this criterion was not met, we adjusted

the threshold to t ≥ 4. If 100 or more voxels passed the adjusted threshold, we ac-

cepted the threshold t = 4. If this criterion was not met again, we ran the functional

localizer task one more time and repeated the analysis. If the result did not allow

100 or more voxels even in the second attempt, we used the threshold allowing the

greatest number of voxels among four options (i.e., t ≥ 5 from the first run, t ≥ 4

from the first run, t ≥ 5 from the second run, and t ≥ 4 from the second run).

Finally, we derived the subject-specific, task-relevant mask specifying VOIs in V1

by taking conjunction of the subject-specific V1 mask and the extracted task-relevant

voxels. A Python library nilearn (Abraham et al., 2014) was used for formatting

the final mask.

82

Contrast Discrimination Task

The contrast discrimination task was carried out after the processing of the mask

was finished. The ADO-based run and randomized-design run was done once each

within a scanning session so that we could consider between-session variability of

the neural signal. The order of the randomized-design run and ADO-based run was

reversed every session.

The randomized-design run follows the description in Section 4.1.3. In the ADO-

based run, the first three trials are randomly proposed because of the hemodynamic

lag that prevents immediate estimation of stimulus-wise neural activation estimates

(Section 2.7.3). From the third trial, ADO computed the global utility of candidate

designs and proposed an optimal stimulus pair by the following procedure. First,

we extracted the BOLD time series from the VOIs and averaged them. Then we

estimated single-trial neural activation for each grating stimulus by fitting a GLM with

the first-order temporal autocorrelation model for noise (GLM-AR(1)) to the data

with a Python library statsmodel (Seabold & Perktold, 2010). Here, the AR(1) noise

model assumes that the measurement noise at time t is correlated with measurement

noise at time t−1. Once we obtained the stimulus-wise estimates of neural activation,

they were put into ADO together with behavioral responses for computing the optimal

design of the next trial. After the 4th, 8th, 12th, and 16th trials, we sampled the

joint posterior distribution using the DE-MCMC sampler (B. M. Turner, Sederberg,

et al., 2013) for 1,000 iterations, and used the last 800 samples for dynamic gridding.

The total length of both ADO-based and randomized-design experiments is 20

trials. In other words, ADO used a simple stopping rule based on a fixed number of

trials (20 trials), as we need to control the amount of data for parameter estimation.

83

4.2 Offline Analysis: Parameter Estimation

4.2.1 Posterior sampling

The performances of ADO and randomized designs were compared by offline pa-

rameter estimation with a complete data set. We first estimated stimulus-wise neural

activation levels of ADO-based and randomized-design experiments. After averag-

ing the extracted BOLD time-series from the VOIs, we fitted a GLM-AR(1) model

to estimate stimulus-wise neural activation parameters. Once the single-trial neural

estimates were acquired, the joint model parameters were finally estimated by the DE-

MCMC sampler with the stimulus-wise neural activation and behavioral responses as

the data.

In the parameter estimation step, we had to modify the DE-MCMC sampler

settings due to the quality of neural data associated with the mechanism of ADO. As

discussed with Figure 3.8, ADO tends to generate the same design repeatedly until

it gets enough information about the specific parameter, and then proposes distinct

patterns of the design to explore different model parameters. We found that the

unbalanced design of ADO adds significant amount of variability of stimulus-wise

neural activation estimates and may induce difficulties in getting well-constrained

posterior distributions.

Figure 4.3 shows an example of the variability in stimulus-wise neural activation

in ADO-based experiments. The neural activation estimates are more variable when

a specific contrast level is presented more frequently, which might hinder constrain-

ing model parameters when using a regularly used sampling method. Therefore, we

decided to use a “burn-in mode” of the DE-MCMC sampler that concentrates pos-

terior samples to the high-density regions compared to the regular “sampling mode”

84

−5 −4 −3 −2 −1 0

02

46

log(Contrast)

Stim

ulus

−w

ise

neur

al a

ctiv

atio

n

Figure 4.3: Variability of the stimulus-wise neural activaiton. The scatter plotplotshows the contrast levels and associated stimulus-wise neural activation obtained atthe ADO run of the third scanning session of Subject 1.

85

(B. M. Turner & Sederberg, 2012), in addition to high migration probability. Specif-

ically, the DE-MCMC sampler was run with the “burn-in mode” for 3,000 iterations

in total: the sampled used the first 2,000 iterations as a burn-in phase while applying

migration at every iteration, and generated the valid posterior samples for the last

1,000 iterations.

Note that brain images from the ADO-based and randomized-design runs shared

the same data preprocessing procedures to make the stimulus-wise activation esti-

mates from both experiments comparable. We used the motion-corrected images

exported directly from the MR scanner, and did not apply spatial and temporal fil-

tering. The neural signal was extracted from the same VOI mask defined in 4.1.4 for

ADO.

Also, as in the simulation experiments in Chapter 3, joint model parameters were

estimated incrementally to capture the changes of the estimates’ quality over trials.

In other words, we repeated the estimation process starting from the data for the first

trial and adding an one-trial amount of neural and behavioral data until we use up

all 20-trial data.

4.2.2 Benchmark

Unlike the simulation study, we don’t have a “true” parameter that serves as a

benchmark to compare the performances of ADO and randomized design. Therefore,

we decided to use the posterior estimate obtained by using all the data from both

ADO-based and randomized-design runs within a session as a benchmark. We can

justify this approach for two reasons: (1) the stimulus-wise neural activation estimates

from ADO-based and randomized-design runs capture the neural activity of the same

86

⋯ ⋯

ADO Randomized designTrial

Data

1 2 3 4 20⋯ 1 2 3 4 20⋯

⋱ ⋱

Parameter comparison

Benchmark

Figure 4.4: An illustration of the incremental parameter estimation. The gray shaderepresents the amount of data used for estimating parameters. When estimatingparameters for comparing the performance of ADO and randomized design, we in-crementally increase the amount of data so that we can compare how the parameterand corresponding posterior distribution change over trials. For evaluating the per-formance of each design, we set a benchmark estimate using all the data obtainedfrom the ADO-based and randomized-design runs within a scanning session.

visual system, and (2) the uncertainty of model parameters will be most reduced

by using all the available data. The variability of stimulus-wise neural activation

estimates (Figure 4.3) may raise questions about the first assumption because ADO

might cause adaptation to repeatedly presented stimuli compared to randomized de-

signs (Krekelberg, Boynton, & van Wezel, 2006). However, we suggest that using the

combined data is the most reasonable way to establish a standard for performance

evaluation given the constraints in our data analysis.

Figure 4.4 describes the parameter estimation strategy we used for performance

evaluation. The orange and blue squares represent the neural and behavioral data

for each trial in ADO-based and randomized-design runs. The gray shades represent

the amount of data used for parameter estimation. Given the neural and behavioral

87

data for 20 trials from ADO and randomized-design runs, the parameters from ADO

and randomized designs are estimated separately by increasing the amount of data

in a trial-by-trial manner. Meanwhile, the benchmark estimate is obtained by using

all the data that are obtained within a scanning session.

4.2.3 Determination of the Estimates

Once the posterior samples from the ADO, randomized designs, and benchmark

setting were obtained, we computed the estimates to be used for performance eval-

uation. We originally intended to calculate a four-dimensional MAP estimate using

multidimensional kernel density estimation. However, the currently available meth-

ods (Duong, 2007; O’Brien, Kashinath, Cavanaugh, Collins, & O’Brien, 2016) either

required substantial computation time or were very susceptible to slight differences

in posterior samples.

Figure 4.5 shows an example of the robustness issue in multivariate kernel den-

sity estimation. From the data of the ADO-based run of the third scanning session

of Subject 1, we estimated two distinct posterior distributions by running the DE-

MCMC sampler with the same sampling strategy and sampler settings two times.

Based on these posterior distributions, we computed four-dimensional MAP estimates

by the multivariate kernel density estimation method of Duong (2007), marginal

one-dimensional MAP estimates using an Epanechnikov kernel, and marginal one-

dimensional posterior mean values from each posterior distribution. As the two

posterior distributions come from the same data, the estimates from the posterior

distributions are supposed to be very similar to each other. However, Figure 4.5

88

suggests that distinct posterior distributions from the same data may allow differ-

ences according to the type of the posterior estimates that we use. The red, green,

and blue points indicate the difference of four-dimensional MAP estimates, marginal

one-dimensional MAP estimates, and marginal posterior mean values obtained from

the two posterior distributions from the same data. The four-dimensional joint MAP

estimates are less stable against the differences in posterior samples in that the dif-

ferences between the estimates are relatively larger, compared to the marginal MAP

estimates or posterior means.

Due to the susceptibility of four-dimensional MAP estimates, we decide to use the

marginal one-dimensional MAP estimate instead. Although we could still consider

using the posterior mean, MAP estimates seems to be more appropriate in this case

because the sampling method we used (i.e., the burn-in mode of the DE-MCMC

sampler) intentionally biases posterior samples toward high-density regions.

4.2.4 Definition of the Distance from the Benchmark Esti-mate

For comparing the performance, the measure of distance needs to be defined be-

tween the posterior estimate of ADO or randomized designs and the benchmark es-

timate. Similarly as in the simulation study, let us denote the MAP estimates from

the ADO data, randomized-design data, and benchmark data as θADO, θRD, and θB

where θ = (θ1, θ2, θ3, θ4) ≡ (b, Rmax, c50, δ).

89

0.0

0.2

0.4

0.6

b

4−dimensional MAP estimateMarginal MAP estimateMarginal posterior mean

0.0

0.5

1.0

1.5

2.0

Rmax

0.0

0.2

0.4

0.6

c50

0.0

0.1

0.2

0.3

0.4

0.5

Trial

δ

1 5 10 15 20

Trials

abs(

Diff

eren

ce)

Figure 4.5: Robustness of the estimates. The plot shows the differences in posteriorestimates obtained by two distinct posterior distributions from the same data (Subject1, Session 3, ADO run). The estimates are obtained incrementally: location at thex-axis represents the number of trials used for obtaining the corresponding posteriorestimates. The red, green, and blue points represent the four-dimensional MAPestimates (Duong, 2007), marginal one-dimensional MAP estimates, and marginalone-dimensional posterior mean values.

90

We use the Euclidian distance between the MAP estimate of the experiment data

and the benchmark estimate:

DADO =

√√√√ 4∑i=1

(θADO,i − θB,i)2,

DRD =

√√√√ 4∑i=1

(θRD,i − θB,i)2

where θADO,i, θRD,i, and θB,i mean the marginal MAP estimate of θi (i = 1, · · · , 4)

obtained by the ADO data, randomized-design data, and benchmark data, respec-

tively.

4.3 Results and Discussion

Figure 4.6 illustrates the distance between the MAP estimate of each experiment

(i.e., ADO versus randomized designs) and the benchmark estimates. The x-axis and

y-axis represent the log-transformed distance measures from ADO (i.e., DADO) and

randomized designs (i.e., DRD), respectively. Each scanning session is color-coded

differently for visual clarity. If the points are located within the shaded region, we

consider that the ADO estimates are closer to the benchmark estimate compared to

the randomized-design estimates. In other words, we can interpret that the ADO

estimates are “more accurate” than the randomized-design estimates.

The result shows that ADO tends to allow better accuracy than randomized de-

signs, especially in Subject 1 and 2. In Subject 3, the randomized-design estimates

was more accurate than the ADO estimates in one out of three scanning sessions.

In Subject 4, the randomized-design estimates was more accurate than the ADO

91

estimates in one out of three scanning sessions (i.e., red points), while ADO and ran-

domized designs converged to the comparable level of accuracy in another scanning

session (i.e., blue points).

Figure 4.7 illustrates the pooled standard deviation of the posterior distribution of

ADO (x-axis) and randomized designs (y-axis). The pooled standard deviation was

defined similarly in the simulation study (i.e., Equation 3.7). Each scanning session is

color-coded differently. If the points are located within the shaded region, we consider

that the ADO estimates are better in precision than the randomized-design estimates.

The range of both axes in the plots were adjusted to (−3, 0).

Firstly, the pooled standard deviation tends to be very small at the first few trials,

which were cut out of the plot for visual clarity. We interpret that this tendency is

not a computationally meaningful result, rather a statistical artifact generated by

the small amount of neural data. The large variability of the stimulus-wise neural

activation estimates tends to make estimation of the Naka-Rushton equation difficult

(i.e., allows less precision). However, when we have only the neural data for only

one trial, the data can limit the shape of the Naka-Rushton equation and allow high

precision of the joint model parameters in overall. This argument is justified because

there exists only one set of the Naka-Rushton parameters allowing the perfect fit that

connects two stimulus-wise neural activation estimates from two distinct contrast

levels. However, if we get the data for two or more trials, the variability of the

stimulus-wise neural activation levels will not allow the perfect fit and will reduce the

precision temporarily. Of course, precision will improve as we accumulate more data

throughout the experiment.

92

−1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

Subject 1

log(DADO)

log(

DR

D)

−3 −2 −1 0 1

−3

−2

−1

01

Subject 2

log(DADO)lo

g(D

RD)

−1.5 −1.0 −0.5 0.0 0.5 1.0

−1.

5−

1.0

−0.

50.

00.

51.

0

Subject 3

log(DADO)

log(

DR

D)

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

Subject 4

log(DADO)

log(

DR

D)

seq(

20, 1

, −1)

Trial 1

Trial 10

Trial 20

Figure 4.6: The scatter plot of the log-transformed distance measure between theMAP estimates of ADO (x-axis) and randomized designs (y-axis) from the bench-mark MAP estimate. Each subplot represents the results from each subject. Eachscanning session is color-coded differently for visual clarity. The points located at theshaded region represent the trials that ADO allowed better accuracy than randomizeddesigns.

93

−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

−3.

0−

2.5

−2.

0−

1.5

−1.

0−

0.5

0.0

Subject 1

log(PSDADO, pooled)

log(

PS

DR

D, p

oole

d)

−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

−3.

0−

2.5

−2.

0−

1.5

−1.

0−

0.5

0.0

Subject 2

log(PSDADO, pooled)lo

g(P

SD

RD

, poo

led)

−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

−3.

0−

2.5

−2.

0−

1.5

−1.

0−

0.5

0.0

Subject 3

log(PSDADO, pooled)

log(

PS

DR

D, p

oole

d)

−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

−3.

0−

2.5

−2.

0−

1.5

−1.

0−

0.5

0.0

Subject 4

log(PSDADO, pooled)

log(

PS

DR

D, p

oole

d)

seq(

20, 1

, −1)

Trial 1

Trial 10

Trial 20

Figure 4.7: The scatter plot of the log-transformed, pooled standard deviation ofthe posterior distribution from ADO (x-axis) and randomized designs (y-axis). Eachsubplot represents the results from each subject. Each scanning session is color-codeddifferently. The points located at the shaded region represent the trials that ADOallowed better precision than randomized designs. The range of both axes in the plotswere adjusted to (−3, 0) for visual clarity.

94

In Figure 4.7, the result tends to support the randomized design in terms of the

precision of model parameters because many points are located out of the shaded

region. This result may be explained by that repeated proposal of the same or very

similar contrast levels causes larger variability in neural activation level, and therefore

fails to constrain the Naka-Rushton model parameters enough.

In summary, the real-world fMRI-based ADO experiments allow more accurate

estimates than randomized experimental designs, as we expected from the result of

the simulation study. However, the precision of the ADO estimates did not meet our

expectation compared to the randomized-design estimates. One possible explanation

about the precision of ADO-based estimates is that the variability of the neural data

induced by the unbalanced designs of ADO made the estimates less precise.

95

Chapter 5: Discussion

We proposed a method for fMRI-based ADO that incorporates both neural and

behavioral data within the optimization routine (Chapter 2). Next, we verified the

performance of fMRI-based ADO through both simulation (Chapter 3) and real-

world, real-time fMRI experiments (Chapter 4). The simulation study showed that

fMRI-based ADO estimates the joint model parameters more accurately and precisely

than conventional randomized designs. We replicated this result with various “true”

parameter sets (Simulation 2) and the lag of one trial in optimization (Simulation

3). In particular, Simulation 1 showed the selective focus of ADO in searching and

evaluating candidate designs. In the real-world fMRI experiment, ADO tends to es-

timate parameters more accurately than randomized designs. However, the precision

of the posterior distribution was better in the randomized-design experiments than

the ADO-based experiments.

The unbalanced designs of ADO might explain low precision of ADO estimates

in that repeated design proposals might have inflated the variability of single-trial

neural estimates, which was carried over to the joint model parameters. There are

many possible task-irrelevant factors such as neural adaptation and scanner drift that

might have inflated the variability of the estimated stimulus-wise neural activation.

96

However, we did not find clear evidence that suggested either adaptation or irreg-

ular drift patterns in both raw BOLD responses and single-trial neural estimates.

Therefore, we speculate that the inflated variability of single-trial neural estimates is

attributed to the selective experimental proposals of ADO.

5.1 Limitations

This study was the first implementation of Bayesian adaptive optimization meth-

ods of experimental design using both neural and behavioral data with a real-world

verification, and therefore leaves many methodological and technological questions to

be improved.

The first major issue is the interaction of the unbalanced design proposals of ADO

and the variability of stimulus-wise neural activation estimates. As discussed above,

the unbalanced designs of ADO can amplify the variability of the neural input of

ADO (i.e., single-trial neural estimates) because ADO constructs unbalanced designs.

The inflated variability of single-trial neural estimates can be carried over to the

model parameters, and therefore hinder precise parameter estimation in ADO. This

variability issue also forced us to use a different sampler setting (i.e., the burn-in

mode) compared to the simulation studies so that we can sample from the high-

density regions, which can harm our justification about the performance of ADO.

One way to handle the variability of single-trial neural estimates is to revise our

model assumptions. Specifically, we assumed that (1) the difference between stimulus-

wise neural estimates determines the response probability in the Thurstonian decision

model, and (2) the variability of stimulus-wise neural estimates is constant across

contrast levels. We made these assumptions to simplify the joint model, but they may

97

be prohibitively restrictive for capturing the dynamics of neural activity in our study.

For example, Boynton et al. (1999) pointed out that the assumption of identically

distributed noise may not be reasonable because the variability in the firing rate

tends to increase according to the mean firing rate. As BOLD responses are known

to be proportional to the mean firing rate (Heeger, Huk, Geisler, & Albrecht, 2000), we

can expect that the variability of the BOLD responses will increase according to the

amplitude of the BOLD responses. In this case, our assumption of constant variance

may not hold, and one potential solution is to develop a model of heteroscedasticity

in the stimulus-wise neural activations as a function of contrast levels.

There are several alternative strategies worth investigating in future work, such as

using different utility functions (Myung et al., 2013), planning the visitation sched-

ule (i.e., the order of parameters to be focused on in the optimization routine), and

focusing on the shape and uncertainty of the function to be estimated rather than

just parameters. However, applicability of these alternative optimization strategies

may depend on the assumptions and structure of the target model. Hence, simula-

tion experiments can play an important role to test how the different optimization

strategies cause changes in the patterns of the design proposals of ADO and therefore

the variability in the neural activation.

The second limitation of this study is our strategy for handling neural data. Al-

though we discussed that raw BOLD responses are not suitable in fMRI-based ADO

due to their dimensionality, we could consider alternative methods such as sequential

Monte Carlo methods (Cappe, Godsill, & Moulines, 2007). Also, within the GLM

framework, iterative trial-wise GLM with nuisance regressors (Mumford et al., 2014,

98

2012; B. O. Turner et al., 2012) could help control the variability of stimulus-wise

estimates.

The third issue is numerical precision of the current implementation of ADO. In

Equation 2.1, we can decompose p(θ|d1:t, y1:t) and p(θ|d1:t, y1:t, d, y) as

log p(θ|d1:t, y1:t) ∝ logp(θ)× p(y1:t|d1:t)

= log p(θ) + log p(y1:t|θ, d1:t), (5.1)

log p(θ|d1:t, y1:t, d, y∗) ∝ log

p(θ)× p(y1:t, y

∗|d1:t, d)

= log p(θ) + log p(y1:t|θ, d1:t) + log p(y∗|θ, d) (5.2)

where p(θ) is the prior density of θ, p(y1:t|θ, d1:t) is the likelihood of the current data

(i.e., y1:t), and p(y∗|θ, d) is the (anticipated) likelihood for the proposed design d

and expected response y∗. We can find that log p(θ) + log p(y1:t|θ, d1:t) is used in

Equation 2.1 multiple times: in addition to Equations 5.1 and 5.2, this term is equal

to log-transformed joint probability density of the parameter θ and response y1:t (i.e.,

p(y1:t|θ, d1:t)p(θ|d1:t) = p(y1:t, θ|d1:t)). Note that given the data set at trial t, there is

no need to compute log p(θ) + log p(y1:t|θ, d1:t) three times because it is a fixed value.

In the first version of fMRI-based ADO simulation, we evaluated log p(θ)+log p(y1:t|θ, d1:t)

repeatedly when exploring grid points in the parameter and response spaces, which

ended up with 40-60 seconds of ADO computation for each trial. However, to im-

prove the computation speed, we computed log p(θ) + log p(y1:t|θ, d1:t) only one time,

evaluated log p(y∗|θ, d) for each grid point in the parameter and response space, and

combined the two terms as in Equation 5.2. At the cost of speed, this computational

trick causes infinitesimal numerical errors compared to the original method, which

can be explained by floating point algebraic errors. This error may not matter when

99

the posterior distribution is not constrained enough; however, the size of the error

might increase proportionately to the constraints in the posterior.

5.2 Further Developments and Practical Applications

One natural extension of the fMRI-based ADO is model comparison, for which

ADO was originally proposed (Cavagnaro et al., 2013, 2011). This application seems

promising because model-based cognitive neuroscience studies have shown that neural

data can contribute to compare competitive cognitive models that can hardly be

discriminated by behavioral data alone (e.g., Mack et al., 2013).

We can also consider incorporating multiple ROIs in our optimization routine. In

this case, developing an appropriate joint model may be a critical factor because one

ROI might be associated with multiple parameters or one model parameter might be

correlated with multiple ROIs.

We may extend fMRI-based ADO for compatibility with distributed activation

patterns over multiple voxels. In the contrast discrimination task, we assumed that

voxels in V1 share a similar activation pattern modeled by the Naka-Rushton equa-

tion. However, many cognitive activities are represented in the brain with distributed

activation patterns because underlying neurons may have different tuning preferences.

For example, Cox and Savoy (2003) train a classifier model for object recognition us-

ing activation patterns across voxels in early visual cortex, rather than the averaged

signal. As multivariate pattern analysis has been widespread in fMRI research, com-

patibility with multi-voxel neural signals can add more generalizability to fMRI-based

ADO.

100

Practically, fMRI-based ADO has the potential for cognitive psychometric (van der

Maas, Molenaar, Maris, Kievit, & Borsboom, 2011) and computational psychiatric

(Wiecki, Poland, & Frank, 2015) settings by promoting efficient data collection. We

don’t have concrete examples of cognitive psychometric or computational psychiatric

studies using both neural and behavioral data for now. However, considering the cost

for collecting fMRI data (especially from a clinical population), fMRI-based ADO

may produce more efficient model-based neuroimaging studies.

5.3 Conclusions

By now, adaptive methods for design optimization in cognitive science have re-

lied on either neural or behavioral data only. Specifically in neuroimaging, the first

fully adaptive optimization methods (Lorenz et al., 2016) focused on task-to-region

mapping based on the localization paradigm. In this thesis, we proposed an ap-

plication of Adaptive Design Optimization (Cavagnaro et al., 2010) to model-based

fMRI experiments that aims to provide more systematic explanations between brain,

mind, and behavior. In addition to driving more accurate data collection, fMRI-based

ADO exploits both neural and behavioral data simultaneously with the joint model-

ing framework (B. M. Turner, Forstmann, et al., 2013). Future work could hopefully

control the variability of stimulus-wise neural estimates and improve the precision of

estimates.

101

References

Abdulrahman, H., & Henson, R. N. (2016). Effect of trial-to-trial variability onoptimal event-related fMRI design: Implications for Beta-series correlation andmulti-voxel pattern analysis. NeuroImage, 125 , 756–766.

Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kos-saifi, J., . . . Varoquaux, G. (2014). Machine learning for neuroimagingwith scikit-learn. Frontiers in Neuroinformatics , 8 , 14. Retrieved fromhttps://www.frontiersin.org/article/10.3389/fninf.2014.00014 doi:10.3389/fninf.2014.00014

Aguirre, G. K. (2007). Continuous carry-over designs for fMRI. NeuroImage, 35 (4),1480–1494.

Amunts, K., Malikovic, A., Mohlberg, H., Schormann, T., & Zilles, K. (2000). Brod-mann’s areas 17 and 18 brought into stereotaxic space?where and how variable?NeuroImage, 11 (1), 66–84.

Amzal, B., Bois, F. Y., Parent, E., & Robert, C. P. (2006). Bayesian-optimal designvia interacting particle systems. Journal of the American Statistical Associa-tion, 101 (474), 773–785.

Andersson, J. L. R., & Jenkinson, M. (2007). Non-linear registration aka Spa-tial normalisation (FMRIB Technical Report TR07JA2). Retrieved fromhttps://www.fmrib.ox.ac.uk/datasets/techrep/tr07ja2/tr07ja2.pdf

Bak, J. H., & Pillow, J. W. (2018). Adaptive stimulus selection for multi-alternativepsychometric functions with lapses. bioRxiv , 260976.

Bellman, R. (1957). Dynamic programming (1st ed.). Princeton, NJ, USA: PrincetonUniversity Press.

Boynton, G. M., Demb, J. B., Glover, G. H., & Heeger, D. J. (1999). Neuronal basisof contrast discrimination. Vision research, 39 (2), 257–269.

Bullmore, E., Brammer, M., Williams, S. C., Rabe-Hesketh, S., Janot, N., David,A., . . . Sham, P. (1996). Statistical methods of estimation and inference forfunctional MR image analysis. Magnetic Resonance in Medicine, 35 , 261–277.

Buracas, G. T., & Boynton, G. M. (2002). Efficient design of event-related fMRIexperiments using M-sequences. NeuroImage, 16 , 801–813.

Cappe, O., Godsill, S. J., & Moulines, E. (2007). An overview of existing methods andrecent advances in sequential Monte Carlo. In (Vol. 95, pp. 899–924). IEEE.

102

Cavagnaro, D. R., Aranovich, G. J., McClure, S. M., Pitt, M. A., & Myung, J. I.(2016). On the functional form of temporal discounting: An optimized adaptivetest. Journal of Risk and Uncertainty , 52 , 233–254.

Cavagnaro, D. R., Myung, J. I., Pitt, M. A., & Kujala, J. V. (2010). Adaptive designoptimization: A mutual information-based approach to model discrimination incognitive science. Neural Computation, 22 , 887–905.

Cavagnaro, D. R., Pitt, M. A., Gonzalez, R., & Myung, J. I. (2013). Discriminat-ing among probability weighting functions using adaptive design optimization.Journal of Risk and Uncertainty , 47 , 255–289.

Cavagnaro, D. R., Pitt, M. A., & Myung, J. I. (2011). Model discrimination throughadaptive experimentation. Psychonomic Bulletin and Review , 18 , 204–210.

Cooke, J. R. H., Selen, L. P. J., van Beers, R. J., & Medendorp, W. P. (2017).Bayesian adaptive stimulus selection for dissociating models of psychophysicaldata. bioRxiv , 220590.

Cox, D. D., & Savoy, R. L. (2003). Functional magnetic resonance imaging (fMRI)?brain reading?: detecting and classifying distributed patterns of fMRI activityin human visual cortex. NeuroImage, 19 (2), 261–270.

Dale, A. M. (1999). Optimal experimental design for event-related fMRI. HumanBrain Mapping , 8 , 109–114.

Daunizeau, J., Preuschoff, K., Friston, K., & Stephan, K. (2011).Optimizing experimental design for comparing models of brain func-tion. PLoS Computational Biology , 7 (11), e1002280. Retrieved fromhttps://doi.org/10.1371/journal.pcbi.1002280

DiMattina, C. (2016). Comparing models of contrast gain using psychophysicalexperiments. Journal of Vision, 16 , 1–18.

DiMattina, C., & Zhang, K. (2011). Active data collection for efficient estimationand comparison of nonlinear neural models. Neural Computation, 23 (9), 2242–2288.

DiMattina, C., & Zhang, K. (2013). Adaptive stimulus optimization for sensorysystems neuroscience. Frontiers in neural circuits , 7 , 101.

Dunne, S., & O’Doherty, J. P. (2013). Insights from the application of computationalneuroimaging to social neuroscience. Current opinion in neurobiology , 23 (3),387–392.

Duong, T. (2007). ks: Kernel density estimation and kernel discriminant analysis formultivariate data in R. Journal of Statistical Software, 21 (7), 1–16.

Eickhoff, S. B., Stephan, K. E., Mohlberg, H., Grefkes, C., Fink, G. R., Amunts, K.,& Zilles, K. (2005). A new SPM toolbox for combining probabilistic cytoarchi-tectonic maps and functional imaging data. Neuroimage, 25 , 1325–1335.

Forstmann, B. U., Brown, S., Dutilh, G., Neumann, J., & Wagenmakers, E.-J. (2010).The neural substrate of prior information in perceptual decision making: amodel-based analysis. Frontiers in Human Neuroscience, 4 , 40.

Forstmann, B. U., & Wagenmakers, E.-J. (2015). Model-based cognitive neuro-science: A conceptual introduction. In An introduction to model-based cognitive

103

neuroscience (pp. 139–156). Springer.Forstmann, B. U., Wagenmakers, E.-J., Eichele, T., Brown, S., & Serences, J. T.

(2011). Reciprocal relations between cognitive neuroscience and formal cognitivemodels: Opposites attract? Trends in Cognitive Sciences , 15 (6), 272–279.

Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. Neu-roImage, 19 (4), 1273–1302.

Friston, K. J., Holmes, A. P., Poline, J., Grasby, P., Williams, S., Frackowiak, R. S.,& Turner, R. (1995). Analysis of fMRI time-series revisited. NeuroImage, 2 (1),45–53.

Grabner, G., Janke, A. L., Budge, M. M., Smith, D., Pruessner, J., & Collins, D. L.(2006). Symmetric atlasing and model based segmentation: an application tothe hippocampus in older adults. In International Conference on Medical ImageComputing and Computer-Assisted Intervention (pp. 58–66).

Grabowski, T. J., Bauer, M. D., Foreman, D., Mehta, S., Eaton, B. L., Graves,W. W., . . . Bolinger, L. (2006). Adaptive pacing of visual stimulation for fMRIstudies involving overt speech. NeuroImage, 29 (3), 1023–1030. Retrieved fromhttps://doi.org/10.1016/j.neuroimage.2005.08.064

Greve, D. N., Brown, G. G., Mueller, B. A., Glover, G., Liu, T. T., et al. (2013). Asurvey of the sources of noise in fMRI. Psychometrika, 78 (3), 396–416.

Heeger, D. J., Huk, A. C., Geisler, W. S., & Albrecht, D. G. (2000). Spikes ver-sus BOLD: what does neuroimaging tell us about neuronal activity? NatureNeuroscience, 3 (7), 631.

Holling, H., Maus, B., & van Breukelen, G. J. P. (2013). Optimal design for functionalmagnetic resonance imaging experiments. Zeitschrift fur Psychologie, 221 , 174–189.

Hu, B., & Tsui, K.-W. (2005). Distributed evolutionary Monte Carlo with appli-cations to Bayesian analysis (Technical Report Number 1112). Retrieved fromhttp://www.stat.wisc.edu/techreports/tr1112.pdf

Jenkinson, M., Bannister, P., Brady, M., & Smith, S. (2002). Improved optimizationfor the robust and accurate linear registration and motion correction of brainimages. NeuroImage, 17 (2), 825–841.

Jenkinson, M., & Smith, S. (2001). A global optimisation method for robust affineregistration of brain images. Medical Image Analysis , 5 (2), 143–156.

Johnson, R. A., & Wichern, D. (2007). Applied Multivariate Statistical Analysis (6thed.). Upper Saddle River, New Jersey: Pearson Prentice Hall.

Kim, W., Pitt, M. A., Lu, Z.-L., & Myung, J. I. (2017). Planning beyond the nexttrial in adaptive experiments: A dynamic programming approach. CognitiveScience, 41 , 2234–2252.

Kim, W., Pitt, M. A., Lu, Z.-L., Steyvers, M., & Myung, J. I. (2014). A hierarchicaladaptive approach to optimal experimental design. Neural Computation, 26 ,2465–2492.

104

Koffarnus, M. N., Deshpande, H. U., Lisinski, J. M., Eklund, A., Bickel, W. K., & La-Conte, S. M. (2017). An adaptive, individualized fMRI delay discounting proce-dure to increase flexibility and optimize scanner time. NeuroImage, 161 , 56–66.Retrieved from https://doi.org/10.1016/j.neuroimage.2017.08.024

Kontsevich, L. L., & Tyler, C. W. (1999). Bayesian adaptive estimation of psycho-metric slope and threshold. Vision Research, 39 (16), 2729–2737.

Krekelberg, B., Boynton, G. M., & van Wezel, R. J. (2006). Adaptation: from singlecells to BOLD signals. Trends in Neurosciences , 29 (5), 250–256.

Kujala, J. V., & Lukka, T. J. (2006). Bayesian adaptive estimation: The nextdimension. Journal of Mathematical Psychology , 50 (4), 369–389.

Leek, M. R. (2001). Adaptive procedures in psychophysical research. Perception &Psychophysics , 63 (8), 1279–1292.

Lesmes, L. A., Jeon, S.-T., Lu, Z.-L., & Dosher, B. A. (2006). Bayesian adaptiveestimation of threshold versus contrast external noise functions: The quick TvCmethod. Vision Research, 46 (19), 3160–3176.

Lesmes, L. A., Lu, Z.-L., Baek, J., & Albright, T. D. (2010). Bayesian adaptiveestimation of the contrast sensitivity function: The quick CSF method. Journalof Vision, 10 (3), 17–17.

Lewi, J., Butera, R., & Paninski, L. (2009). Sequential optimal design of neurophys-iology experiments. Neural Computation, 21 (3), 619–687.

Li, X., Lu, Z.-L., Tjan, B. S., Dosher, B. A., & Chu, W. (2008). Bloodoxygenation level-dependent contrast response functions identify mechanismsof covert attention in early visual areas. Proceedings of the NationalAcademy of Sciences of the United States , 105 , 6202–6207. Retrieved fromhttps://doi.org/10.1073/pnas.0801390105

Lin, C. D., Anderson-Cook, C. M., Hamada, M. S., Moore, L. M., & Sitter, R. R.(2015). Using genetic algorithms to design experiments: A review. Quality andReliability Engineering International , 31 (2), 155–167.

Lindquist, M. A. (2008). The statistical analysis of fMRI data. Statistical Science,23 , 439–464.

Lorenz, R., Hampshire, A., & Leech, R. (2017). Neuroadaptive Bayesian optimizationand hypothesis testing. Trends in cognitive sciences , 21 (3), 155–167. Retrievedfrom https://doi.org/10.1016/j.tics.2017.01.006

Lorenz, R., Monti, R. P., Violante, I. R., Anagnostopoulos, C., Faisal,A. A., Montana, G., & Leech, R. (2016). The Automatic Neuro-scientist: A framework for optimizing experimental design with closed-loop real-time fMRI. NeuroImage, 129 , 320–334. Retrieved fromhttps://doi.org/10.1016/j.neuroimage.2016.01.032

Love, B. C. (2015). The algorithmic level is the bridge between computation andbrain. Topics in Cognitive Science, 7 (2), 230–242.

Mack, M. L., Preston, A. R., & Love, B. C. (2013). Decoding the brain?s algorithmfor categorization from its neural implementation. Current Biology , 23 (20),2023–2027.

105

Maclaren, J., Herbst, M., Speck, O., & Zaitsev, M. (2013). Prospective motioncorrection in brain imaging: A review. Magnetic Resonance in Medicine, 69 (3),621–636.

Marr, D. (1982). Vision: A Computational Investigation into the Human Represen-tation and Processing of Visual Information. New York: Freeman.

Mumford, J. A., Davis, T., & Poldrack, R. A. (2014). The impact of study design onpattern estimation for single-trial multivariate pattern analysis. NeuroImage,103 , 130–138.

Mumford, J. A., Turner, B. O., Ashby, F. G., & Poldrack, R. A. (2012). DeconvolvingBOLD activation in event-related designs for multivoxel pattern classificationanalyses. NeuroImage, 59 , 2636–2643.

Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on adaptive designoptimization. Journal of Mathematical Psychology , 57 , 53–67.

Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrim-ination. Psychological Review , 116 (3), 499–518.

O’Brien, T. A., Kashinath, K., Cavanaugh, N. R., Collins, W. D., & O’Brien, J. P.(2016). A fast and objective multidimensional kernel density estimation method:fastKDE. Computational Statistics & Data Analysis , 101 , 148–160.

O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and itsapplication to reward learning and decision making. Annals of the New YorkAcademy of sciences , 1104 (1), 35–53.

O’Reilly, J. X., & Mars, R. B. (2011). Computational neuroimaging: localising greekletters? Comment on forstmann et al. Trends in Cognitive Sciences , 15 (10),450.

Palestro, J. J., Bahg, G., Sederberg, P. B., Lu, Z.-L., Steyvers, M., & Turner, B. M.(2018). A tutorial on joint models of neural and behavioral measures of cogni-tion. Journal of Mathematical Psychology , 84 , 20–48.

Park, M., Weller, J. P., Horwitz, G. D., & Pillow, J. W. (2014). Bayesian activelearning of neural firing rate maps with transformed gaussian process priors.Neural computation, 26 (8), 1519–1541.

Poldrack, R. A., Mumford, J. A., & Nichols, T. E. (2011). Handbook of FunctionalMRI Data Analysis. New York: New York: Cambridge University Press.

Prins, N. (2013). The psi-marginal adaptive method: How to give nuisance parametersthe attention they deserve (no more, no less). Journal of Vision, 13 (7), 3–3.Retrieved from https://doi.org/10.1167/13.7.3

Rissman, J., Gazzaley, A., & D’Esposito, M. (2004). Measuring functional connec-tivity during distinct stages of a cognitive task. NeuroImage, 23 , 752–763.

Rodriguez, C. A., Turner, B. M., Van Zandt, T., & McClure, S. M. (2015). Theneural basis of value accumulation in intertemporal choice. European Journalof Neuroscience, 42 (5), 2179–2189.

Ryan, E. G., Drovandi, C. C., McGree, J. M., & Pettitt, A. N. (2016).A review of modern computational algorithms for Bayesian optimal de-sign. International Statistical Review , 84 , 128–154. Retrieved from

106

https://doi.org/10.1111/insr.12107

Sanchez, G., Daunizeau, J., Maby, E., Bertrand, O., Bompas, A., & Mattout, J.(2014). Toward a new application of real-time electrophysiology: online opti-mization of cognitive neurosciences hypothesis testing. Brain Sciences , 4 (1),49–72. Retrieved from https://doi.org/10.3390/brainsci4010049

Sanchez, G., Lecaignard, F., Otman, A., Maby, E., & Mattout, J.(2016). Active SAmpling Protocol (ASAP) to optimize individual neu-rocognitive hypothesis testing: A BCI-inspired dynamic experimental de-sign. Frontiers in Human Neuroscience, 10 , 347. Retrieved fromhttps://www.frontiersin.org/article/10.3389/fnhum.2016.00347 doi:10.3389/fnhum.2016.00347

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modelingwith python. In Proceedings of the 9th python in science conference (Vol. 57,p. 61).

Serences, J. T., & Saproo, S. (2012). Computational advances towards link-ing BOLD and behavior. Neuropsychologia, 50 , 435–446. Retrieved fromhttps://doi.org/10.1016/j.neuropsychologia.2011.07.013

Smith, S. M., Jenkinson, M., Woolrich, M. W., Beckmann, C. F., Behrens, T. E.,Johansen-Berg, H., . . . Matthews, P. M. (2004). Advances in functional andstructural MR image analysis and implementation as FSL. NeuroImage, 23 ,S208–S219.

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimizationof machine learning algorithms. In Advances in Neural Information ProcessingSystems (pp. 2951–2959).

ter Braak, C. J. F. (2006). A Markov Chain Monte Carlo version of the geneticalgorithm Differential Evolution: easy Bayesian computing for real parameterspaces. Statistics and Computing , 16 , 239–249.

Thurstone, L. L. (1927). A law of comparative judgement. Psychological Review , 34 ,278–286. Retrieved from https://doi.org/10.1037/h0070288

Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L.(2017). Approaches to analysis in model-based cognitive neuroscience. Journalof Mathematical Psychology , 76 , 65-79.

Turner, B. M., Forstmann, B. U., Wagenmakers, E.-J., Brown, S. D., Sederberg, P. B.,& Steyvers, M. (2013). A Bayesian framework for simultaneously modelingneural and behavioral data. NeuroImage, 72 , 193–206.

Turner, B. M., Rodriguez, C. A., Norcia, T. M., McClure, S. M., & Steyvers, M.(2016). Why more is better: A method for simultaneously modeling EEG,fMRI, and Behavior. NeuroImage, 128 , 96–115.

Turner, B. M., & Sederberg, P. B. (2012). Approximate Bayesian computation withdifferential evolution. Journal of Mathematical Psychology , 56 (5), 375–385.

Turner, B. M., Sederberg, P. B., Brown, S. D., & Steyvers, M. (2013). A method forefficiently sampling from distributions with correlated dimensions. PsychologicalMethods , 18 , 368–384.

107

Turner, B. M., Van Maanen, L., & Forstmann, B. U. (2015). Combining CognitiveAbstractions with Neurophysiology: The Neural Drift Diffusion Model. Psy-chological Review , 122 , 312–336.

Turner, B. M., Wang, T., & Merkel, E. (2017). Factor analysis linking functions forsimultaneously modeling neural and behavioral data. NeuroImage, 153 , 28-48.

Turner, B. O., Mumford, J. A., Poldrack, R. A., & Ashby, F. G. (2012). Spatiotempo-ral activity estimation for multivoxel pattern analysis with rapid event-relateddesigns. NeuroImage, 62 , 1429–1438.

van der Maas, H. L., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011).Cognitive psychology meets psychometric theory: On the relation between pro-cess models for decision making and latent variable models for individual dif-ferences. Psychological Review , 118 (2), 339.

van Maanen, L., Brown, S. D., Eichele, T., Wagenmakers, E.-J., Ho, T., Serences, J.,& Forstmann, B. U. (2011). Neural correlates of trial-to-trial fluctuations inresponse caution. Journal of Neuroscience, 31 (48), 17488–17495.

van Ravenzwaaij, D., Provost, A., & Brown, S. D. (2017). A Confirmatory Approachfor Integrating Neural and Behavioral Data into a Single Model. Journal ofMathematical Psychology , 76 , 131–141.

Wager, T. D., & Nichols, T. E. (2003). Optimization of experimental design in fMRI:A general framework using a genetic algorithm. NeuroImage, 18 , 293–309.

Wandell, B. A. (1999). Computational neuroimaging of human visual cortex. AnnualReview of Neuroscience, 22 , 145–173.

Wandell, B. A., & Winawer, J. (2015). Computational neuroimaging and populationreceptive fields. Trends in Cognitive Sciences , 19 (6), 349–357.

Watson, A. B., & Pelli, D. G. (1983). QUEST: A Bayesian adaptive psychometricmethod. Perception & Psychophysics , 33 (2), 113–120.

Wiecki, T. V., Poland, J., & Frank, M. J. (2015). Model-based cognitive neuroscienceapproaches to computational psychiatry: clustering and classification. ClinicalPsychological Science, 3 (3), 378–399.

108

Adaptive Design Optimization in Functional MRI Experiments · 2018. 7. 17. · Adaptive Design...

Documents

Transcript of Adaptive Design Optimization in Functional MRI Experiments · 2018. 7. 17. · Adaptive Design...