Adaptive Design Optimization in Functional MRI Experiments · 2018. 7. 17. · Adaptive Design...
Transcript of Adaptive Design Optimization in Functional MRI Experiments · 2018. 7. 17. · Adaptive Design...
Adaptive Design Optimization in Functional MRI
Experiments
Thesis
Presented in Partial Fulfillment of the Requirements for the DegreeMaster of Arts in the Graduate School of The Ohio State University
By
Giwon Bahg, M.A.
Graduate Program in Department of Psychology
The Ohio State University
2018
Master’s Examination Committee:
Brandon M. Turner, Advisor
Jay I. Myung
Zhong-Lin Lu
c© Copyright by
Giwon Bahg
2018
Abstract
Efficient data collection is one of the most important goals to be pursued in cog-
nitive neuroimaging studies because of the exceptionally high cost of data acqui-
sition. Design optimization methods have been developed in cognitive science to
resolve this problem, but most of them lack generalizability because their functional-
ity tends to rely on a specific type of cognitive models (e.g., psychometric functions)
or research paradigm (e.g., task-to-region mapping). In addition, traditional optimal
design methods fail to exploit neural and behavioral data simultaneously, which is
essential for providing an integrative explanation of human cognition. As one of the
possible solutions, we propose an implementation of Adaptive Design Optimization
(ADO; Cavagnaro, Myung, Pitt, & Kujala, 2010) in model-based functional MRI
(fMRI) experiments using a Joint Modeling Framework (B. M. Turner, Forstmann,
et al., 2013). First, we introduce a general architecture of fMRI-based ADO and
discuss practical considerations in real-world applications. Second, three simulation
studies show that fMRI-based ADO estimates parameters more accurately and pre-
cisely than conventional, randomized experimental designs. Third, a real-time fMRI
experiment validates the performance of fMRI-based ADO in the real-world setting.
The result suggests that ADO performs better than randomized designs in terms of
accuracy, but the unbalanced designs proposed by ADO may inflate the variability of
ii
trial-wise estimates of neural activation and therefore model parameters. Lastly, We
discuss the limitations, further developments, and applications of fMRI-based ADO.
iii
Acknowledgments
Foremost, I would like to express my sincere gratitude to my advisor, Dr. Brandon
Turner. He has always been supportive and open to my ideas for the past two years,
generously sharing his time, knowledge and experience. His patience and guidance
helped me in all the time of research and writing the thesis.
I would also like to thank to the rest of my thesis committee, Dr. Jay Myung for
invaluable suggestions about developing ADO, and Dr. Zhong-Lin Lu for thoughtful
advice regarding task designs and methodological issues in fMRI. I truly appreciate
the inspiring comments and questions on my thesis during the defense session as well.
My sincere gratitude also goes to Dr. Per Sederberg for his valuable inputs and
suggestions in developing real-time data processing components, Dr. Xiangrui Li for
helpful advice in carrying out fMRI experiments, and Dr. Mark Pitt for important
comments about improving ADO. I also appreciate the financial support of H. Dean
and Susan Regis Gibson for this study.
I am grateful to my fellow labmates, James, Qingfang, Fiona, and Brendan for
all the discussions and supports. I also thank to Joonsuk Park and Sang-Ho Lee for
helpful comments for understanding and practicing ADO.
Finally, I would especially like to thank to my parents for all their encouragements
and continuous supports.
iv
Vita
May 2018 - Present . . . . . . . . . . . . . . . . . . . . . . . . . Social and Behavioral Sciences Fellow-ship, The Ohio State University
August 2017 - May 2018 . . . . . . . . . . . . . . . . . . . . Graduate Research Associate, TheOhio State University
August 2016 - August 2017 . . . . . . . . . . . . . . . . . University Fellowship, The Ohio StateUniversity
February 2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.A. Psychology, Seoul National Uni-versity, Republic of Korea
February 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.A. Psychology, Seoul National Uni-versity, Republic of Korea
February 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.A. Philosophy, Seoul National Uni-versity, Republic of Korea
Publications
Research Publications
M. F. Molly, M. Galdo, G. Bahg, Q. Liu, B. M. Turner “What’s in a Response Time?:On the Importance of Response Time Measures in Constraining Models of ContextEffects”. In press at Decision.
J. J. Palestro, G. Bahg, P. B. Sederberg, Z.-L. Lu, M. Steyvers, B. M. Turner “Atutorial on joint models of neural and behavioral measures of cognition”. Journal ofMathematical Psychology, 84, 20-48.
Fields of Study
Major Field: Psychology
v
Table of Contents
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Optimal Experimental Design in fMRI Experiments . . . . . . . . . 11.1.1 Static Optimization Methods . . . . . . . . . . . . . . . . . 31.1.2 Adaptive Optimization Methods . . . . . . . . . . . . . . . 5
1.2 Bayesian Online Design Optimization in Behavioral Cognitive Science 71.2.1 Early Applications in Psychophysics and Their Improvements 81.2.2 Adaptive Design Optimization and Its Applications . . . . . 9
1.3 A Model-based Cognitive Neuroscience Approach . . . . . . . . . . 111.4 Summary and Outline . . . . . . . . . . . . . . . . . . . . . . . . . 14
2. Basic Concepts of Adaptive Design Optimization in fMRI experiments . 15
2.1 Joint Modeling Framework . . . . . . . . . . . . . . . . . . . . . . 152.2 Adaptive Design Optimization . . . . . . . . . . . . . . . . . . . . 18
2.2.1 The Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.2 Extension to the Neural Data . . . . . . . . . . . . . . . . . 21
2.3 Single-trial Neural Activation . . . . . . . . . . . . . . . . . . . . . 222.3.1 A General Linear Model with Stimulus-level Regressors . . . 222.3.2 Incremental Analysis and Flexibility of Estimates . . . . . . 24
vi
2.4 Dynamic Gridding . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5 Posterior Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.6 Scanning Protocol and Real-time Data Flow . . . . . . . . . . . . . 312.7 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7.1 Developing a Joint Model of Neural and Behavioral Data . . 322.7.2 Discretizing a Continuous Space for Grid Search . . . . . . . 342.7.3 Including the Stimulus-wise Neural Activity: One-trial-lag
ADO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3. Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 A Proof-of-Concept Study: Contrast Discrimination . . . . . . . . . 393.1.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Design and Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 453.2.1 Simulation 1: Fixed Parameters . . . . . . . . . . . . . . . . 473.2.2 Simulation 2: Randomly Generated Parameters . . . . . . . 523.2.3 Simulation 3: One-trial-lag ADO . . . . . . . . . . . . . . . 543.2.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.3.1 Simulation 1: Fixed Parameters . . . . . . . . . . . . . . . . 573.3.2 Simulation 2: Randomly Generated Parameters . . . . . . . 623.3.3 Simulation 3: One-trial-lag ADO . . . . . . . . . . . . . . . 68
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4. Real-time fMRI Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.1.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.1.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.1.4 Real-time fMRI Procedure . . . . . . . . . . . . . . . . . . . 80
4.2 Offline Analysis: Parameter Estimation . . . . . . . . . . . . . . . 844.2.1 Posterior sampling . . . . . . . . . . . . . . . . . . . . . . . 844.2.2 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.2.3 Determination of the Estimates . . . . . . . . . . . . . . . . 884.2.4 Definition of the Distance from the Benchmark Estimate . . 89
4.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 91
5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
vii
5.2 Further Developments and Practical Applications . . . . . . . . . . 1005.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
viii
List of Tables
Table Page
3.1 Default settings of Simulation 1. . . . . . . . . . . . . . . . . . . . . . 49
3.2 A list of 30 parameter sets used in Simulation 2. Parameter values arerounded up to three decimal places. . . . . . . . . . . . . . . . . . . . 54
3.3 Default settings in Simulation 2 . . . . . . . . . . . . . . . . . . . . . 55
ix
List of Figures
Figure Page
2.1 An illustration of the pipeline of a real-time fMRI-based ADO experi-ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 An illustration of the application of dynamic gridding. A contour plot(black line) represents the kernel density estimates of two-dimensionalposterior density distribution based on hypothetical posterior samples.Blue “×” markers are the grid points initially defined. Red circles arethe grid points updated by dynamic gridding based on singular valuedecomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 An illustration of the scanning protocol and data flow diagram in theADO-based real-time fMRI experiment. . . . . . . . . . . . . . . . . . 30
2.4 An illustration of the hemodynamic responses from the task and com-putational steps required within each trial. Dotted lines (red, blue,green) refer to hypothetical hemodynamic responses evoked by a stim-ulus within each trial, and a straight line (gray) shows the expectedvalue of convolved hemodynamic responses. The squares below thex-axis specifies the length of intervals required for each step. . . . . . 35
2.5 An illustration of ‘one-trial-lag ADO’. Dotted lines (red, blue, green)refer to hypothetical hemodynamic responses evoked by a stimuluswithin each trial, and a straight line (gray) shows the expected valueof convolved hemodynamic responses. The squares below the x-axisspecifies the length of intervals required for each step. . . . . . . . . . 37
3.1 An illustration of the trial structure of the contrast discrimination task. 40
x
3.2 A graphical representation of the joint model for contrast discrimi-nation. Each node in the model represents a variable. Filled circlesare observed data (i.e., stimulus-wise neural activation estimates, be-havioral responses), whereas empty circles are model parameters (i.e.,b, Rmax, c50, δ), a design variable (i.e., cij), and their transformation(i.e., bij, pi). Double-line circles are deterministic variables, whereassingle-line circles are stochastic variables. The outer and inner platesrepresent variables associated with each trial and stimulus, respectively. 46
3.3 The shape of Naka-Rushton equation with (b, Rmax, c50, δ) = (0.05, 1, 0.35, 0.2).The x-axis refers to the contrast level, while y-axis is expected neuralactivation (i.e., single-trial beta estimates). The solid line is the ex-pected neural expectation from the three shape parameters (b, Rmax, c50).b and Rmax determine the lower and upper asymptotes of the graph,whereas c50 affects the slope of the graph. δ controls the width of thecredible interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 The design space of the contrast discrimination experiment. The spaceconsists of 90 pairs of contrast levels for the first (x-axis) and second(y-axis) stimuli. The gray dots are individual designs that can besampled during the experiment. The left and right plots represent thesame design space in a linear scale and a logarithmic scale, respectively. 51
3.5 The scatter plots of the parameter sets used in Simulation 2. Blackdots in each plot indicate the values of (b, Rmax) (upper left), (b, c50)(upper center), (b, δ) (upper right), (Rmax, c50) (lower left), (Rmax, δ)(lower center), and (c50, δ) (lower right). . . . . . . . . . . . . . . . . 53
3.6 Pooled root mean squared error (RMSEpooled; upper) and pooled pos-terior standard deviation (PSDpooled; lower) in Simulation 1. All theperformance statistics (i.e., RMSEpooled, PSDpooled) are log-transformed.Red and black lines show the performance statistics from ADO designsand randomized designs, respectively. Solid lines represent the meanof the performance measures changing across trials. Dotted lines rep-resent 95% credible interval of the performance measures. . . . . . . . 58
xi
3.7 Root mean squared error (RMSEi; left) and posterior standard de-viation (PSDi; right) for each parameter (b, Rmax, c50, δ from top tobottom) in Simulation 1. All the performance statistics (i.e., RMSEi,PSDi) are log-transformed. Red and black lines show the perfor-mance statistics from ADO designs and randomized designs, respec-tively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval ofthe performance measures. . . . . . . . . . . . . . . . . . . . . . . . . 59
3.8 A trace plot of experimental designs from ADO (upper row) and ran-domized sampling (lower row) in Simulation 1. A sequence of 20 trialswere segmented into four intervals (Trials 1-5, 6-10, 11-15, and 16-20from left to right). The x-axis and y-axis of each subplot represent thecontrast level of the first and second stimuli, respectively. Black dotsrepresent individual design points. Shaded regions represent actuallyselected designs; more frequently selected designs have darker shades.The scale is intentionally omitted for simplicity. See Figure 3.4 for thedetailed information about the scale. . . . . . . . . . . . . . . . . . . 61
3.9 Pooled root mean squared error (RMSEpooled; upper) and pooled pos-terior standard deviation (PSDpooled; lower) in Simulation 2. All theperformance statistics (i.e., RMSEpooled, PSDpooled) are log-transformed.Red and black lines show the performance statistics from ADO designsand randomized designs, respectively. Solid lines represent the meanof the performance measures changing across trials. Dotted lines rep-resent 95% credible interval of the performance measures. . . . . . . . 63
3.10 Root mean squared error (RMSEi; left) and posterior standard de-viation (PSDi; right) for each parameter (b, Rmax, c50, δ from top tobottom) in Simulation 2. All the performance statistics (i.e., RMSEi,PSDi) are log-transformed. Red and black lines show the perfor-mance statistics from ADO designs and randomized designs, respec-tively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval ofthe performance measures. . . . . . . . . . . . . . . . . . . . . . . . . 64
xii
3.11 The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 2. The x-axis and y-axis refer to the valueof performance statistics (i.e., RMSE, PSD) in the ADO experimentsand randomized design experiments. Each trial is color-coded for vi-sual clarity (Red: Trial 2, Orange: Trial 4, Green: Trial 8, Blue: Trial13, Purple: Trial 20). Colored dots represent the performance statis-tics from individual simulations. Solid lines represent the 80% highestdensity regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.12 The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 2. Points located at theshaded area are preferred. The accuracy and precision at each trial arerepresented as a red circle and a blue square, respectively. . . . . . . 67
3.13 Pooled root mean squared error (RMSEpooled; upper) and pooled pos-terior standard deviation (PSDpooled; lower) in Simulation 3. All theperformance statistics (i.e., RMSEpooled, PSDpooled) are log-transformed.Red and black lines show the performance statistics from ADO designsand randomized designs, respectively. Solid lines represent the meanof the performance measures changing across trials. Dotted lines rep-resent 95% credible interval of the performance measures. . . . . . . . 69
3.14 Root mean squared error (RMSEi; left) and posterior standard de-viation (PSDi; right) for each parameter (b, Rmax, c50, δ from top tobottom) in Simulation 3. All the performance statistics (i.e., RMSEi,PSDi) are log-transformed. Red and black lines show the perfor-mance statistics from ADO designs and randomized designs, respec-tively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval ofthe performance measures. . . . . . . . . . . . . . . . . . . . . . . . . 70
3.15 The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 3. The x-axis and y-axis refer to the valueof performance statistics (i.e., RMSE, PSD) in the ADO experimentsand randomized design experiments. Each trial is color-coded for vi-sual clarity (Red: Trial 2, Orange: Trial 4, Green: Trial 8, Blue: Trial13, Purple: Trial 20). Colored dots represent the performance statis-tics from individual simulations. Solid lines represent the 80% highestdensity regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xiii
3.16 The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 3. Points located at theshaded area are preferred. The accuracy and precision at each trial arerepresented as a red circle and a blue square, respectively. . . . . . . 72
4.1 An illustration of the linear mask applied to a grating pattern. Theblack line shows the shape of the mask, while the red line describesthe masked grating pattern obtained when crossing the center of thescreen horizontally. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Examples of the grating stimuli used in the experiment. The contrastlevels of the five stimuli are 0.01, 0.03, 0.1, 0.3, and 1 (from left to right). 77
4.3 Variability of the stimulus-wise neural activaiton. The scatter plotplotshows the contrast levels and associated stimulus-wise neural activationobtained at the ADO run of the third scanning session of Subject 1. . 85
4.4 An illustration of the incremental parameter estimation. The grayshade represents the amount of data used for estimating parameters.When estimating parameters for comparing the performance of ADOand randomized design, we incrementally increase the amount of dataso that we can compare how the parameter and corresponding posteriordistribution change over trials. For evaluating the performance of eachdesign, we set a benchmark estimate using all the data obtained fromthe ADO-based and randomized-design runs within a scanning session. 87
4.5 Robustness of the estimates. The plot shows the differences in pos-terior estimates obtained by two distinct posterior distributions fromthe same data (Subject 1, Session 3, ADO run). The estimates areobtained incrementally: location at the x-axis represents the numberof trials used for obtaining the corresponding posterior estimates. Thered, green, and blue points represent the four-dimensional MAP esti-mates (Duong, 2007), marginal one-dimensional MAP estimates, andmarginal one-dimensional posterior mean values. . . . . . . . . . . . . 90
4.6 The scatter plot of the log-transformed distance measure between theMAP estimates of ADO (x-axis) and randomized designs (y-axis) fromthe benchmark MAP estimate. Each subplot represents the resultsfrom each subject. Each scanning session is color-coded differently forvisual clarity. The points located at the shaded region represent thetrials that ADO allowed better accuracy than randomized designs. . . 93
xiv
4.7 The scatter plot of the log-transformed, pooled standard deviation ofthe posterior distribution from ADO (x-axis) and randomized designs(y-axis). Each subplot represents the results from each subject. Eachscanning session is color-coded differently. The points located at theshaded region represent the trials that ADO allowed better precisionthan randomized designs. The range of both axes in the plots wereadjusted to (−3, 0) for visual clarity. . . . . . . . . . . . . . . . . . . 94
xv
Chapter 1: Introduction
We start our discussion with a short review of optimal experimental design re-
search in functional MRI experiments. Beyond the general linear modeling framework
and localization-based research, Adaptive Design Optimization (ADO; Cavagnaro et
al., 2010; Myung, Cavagnaro, & Pitt, 2013) is next introduced as an alternative de-
sign optimization method for model-based cognitive neuroscience experiments. With
this model-based orientation, the need for design optimization methods is discussed
based on modeling approaches incorporating both neural and behavioral measures
simultaneously.
1.1 Optimal Experimental Design in fMRI Experiments
Functional magnetic response imaging (fMRI) has become one of the most impor-
tant tools in cognitive science to investigate human brain activity because of nonin-
vasive nature, reasonable spatial and temporal resolution. However, the cost of data
acquisition in neuroimaging studies using fMRI is exceptionally high, especially when
one hopes to scan special participants such as those within a clinical population or
children. In addition, it is not easy to acquire good-quality data in fMRI experiments
as blood-oxygenation level dependent (BOLD) responses – the target measurement
of fMRI – are very noisy. The signal collected from fMRI scanning involves many
1
sources of noise besides pure measurement error, such as scanner instability, thermal
noise, physiology and motion artifact of a participant (Greve et al., 2013). Therefore,
collecting maximally informative data has been one of the important methodological
questions in fMRI studies for maximizing information from the limited amount of
data and resources.
Optimization of experimental design has been considered as a possible solution
to this problem. In general, there are two different goals of design optimization for
neuroscience experiments: (1) finding optimal event sequences and design parameters
(e.g., interstimulus or intertrial interval) which maximize the informativeness of the
data (Buracas & Boynton, 2002; Dale, 1999; Wager & Nichols, 2003), and (2) find-
ing optimal experimental design that clarifies the characteristics of a neural system
(Daunizeau, Preuschoff, Friston, & Stephan, 2011; DiMattina, 2016; Lorenz et al.,
2016; Sanchez et al., 2014).
There are two types of strategies in optimizing experimental design: static op-
timization and adaptive optimization (Ryan, Drovandi, McGree, & Pettitt, 2016).
Static optimization methods aim to generate a complete experimental design ex-
pected to provide the highest-quality data prior to running experiments. Adaptive
optimization methods provide optimal experimental settings (e.g., an experimental
condition to be tested, a stimulus to be presented) for the next trial in the middle of
the experiment, based on the history of previous designs and responses.
2
1.1.1 Static Optimization Methods
Static optimization methods have been frequently used in fMRI experiments.
They focus on optimization of event sequences defined by a set of stimuli and in-
terstimulus or intertrial intervals to attain (1) the maximum signal-to-noise ratio
for detecting brain activation and (2) the best estimates of shape parameters of a
hemodynamic response function (Holling, Maus, & van Breukelen, 2013).
Classic examples of static optimization methods in fMRI research are from Buracas
and Boynton (2002) and Wager and Nichols (2003). Buracas and Boynton (2002) sug-
gested an event sequence optimization method for maximizing estimation efficiency of
hemodynamic responses in experiments using rapid event-related design. Estimation
efficiency is defined as the inverse of estimator variance (Dale, 1999). Rather than gen-
erating uncountable numbers of random event sequences and comparing the efficiency
measures, their method proposes pseudorandom sequences termed “m-sequences” (or
maximum length shft-register sequences) which help expose each event equally and
construct nearly orthogonal sequences to its time-shifted version to reduce autocor-
relation, while reducing computational burden to generate random sequences.
Wager and Nichols (2003) used genetic algorithms to find optimal event sequences
and design parameters such as interstimulus interval (for a general review of genetic
algorithms used in design optimization, see Lin, Anderson-Cook, Hamada, Moore, &
Sitter, 2015). Given candidate experimental designs represented as vectors, genetic
algorithms mimic evolutionary processes occurring at the genetic level: selection,
crossover, and mutation. In short, candidates with highest evaluation scores are se-
lected (“selection”), then some part of them are exchanged across the selected candi-
dates (“crossover”) and sometimes randomly reassigned (“mutation”). By repeating
3
this cycle, the proposals at each generation are expected to attain better quality
measures (e.g., estimation efficiency) until the quality measure reaches its asymptotic
limit. Moreover, this method could manage multiple optimization constraints such
as contrast and estimation efficiency of hemodynamic responses.
Although the classic methods discussed above have been popular and well de-
veloped, static optimization techniques have clear limitations. One problem is local
optimality (Holling et al., 2013). It is practically impossible to measure parameters
such as scanner noise and temporal autocorrelation before actually running experi-
ments, although they are required for performing design optimization. Assuming fixed
values for these parameters is a common strategy to manage this problem. However,
generated stimulus sequences depend on the values used in the algorithms as a result.
In other words, we can expect optimal performance of the generated event sequences
only when the actual values of the set-up parameters are close to those considered in
the optimization algorithms.
Another limitation of static optimization is its exclusive focus on linear models
of fMRI – a general linear model (GLM) framework. GLM is the most basic mod-
eling approach to estimate brain activity and discriminate brain regions relevant to
a specific task, condition, or even trial. However, we cannot formalize the complete
brain-mind-behavior relationship solely based on the GLM framework. There are of
course design optimization approaches specialized for different types of brain models.
For example, Daunizeau et al. (2011) suggested a design optimization method for
comparing dynamical models of brain function based on Dynamic Causal Modeling
(DCM Friston, Harrison, & Penny, 2003). However, the method of Daunizeau et al.
4
(2011) has limited expandability (e.g., establishing additional connection to formal
cognitive models) due to its strong dependency on DCM.
The third issue of static design optimization is that it lacks flexibility for indi-
vidual or population-level differences. If individual differences in a specific cognitive
mechanism must be considered (Koffarnus et al., 2017), the experimenter must choose
a subset of experimental designs to gain maximal information efficiently. However,
selection of the fine-tuned designs is not possible solely with static design optimiza-
tion which exploits the whole design space. Another concern is comparing normal and
clinical populations in the same experimental settings. As different cognitive and/or
response characteristics are expected to be considered (Grabowski et al., 2006), static
optimization might not be an ideal approach by itself as it cannot differentiate distinct
population when generating a complete optimal design set.
1.1.2 Adaptive Optimization Methods
Adaptive optimization methods can be a good solution for the limitations of static
optimization methods as they allow researchers to automatically adjust details of the
experiment to meet the specific needs of individuals. In general, adaptive online
stimulus optimization methods have focused on characterizing a neural system (Di-
Mattina & Zhang, 2011; Lewi, Butera, & Paninski, 2009; Park, Weller, Horwitz, &
Pillow, 2014). For a review of adaptive optimization methods in neurophysiology and
sensory neuroscience, see DiMattina and Zhang (2013).
Not only have adaptive optimization methods rarely been applied in human neu-
roimaging (e.g., fMRI, EEG), when they are applied, they have not used neural data
as an input to the optimization algorithm. For example, Grabowski et al. (2006)
5
applied online optimization of interstimulus interval in a language production study.
However, it used vocal sound from a participant for triggering the optimization al-
gorithm. Hence, while the functional images were considered as a measurement in
interest, they were not involved in any step of design optimization. Similarly, Koffar-
nus et al. (2017) implemented an experimental protocol that optimizes the stimulus
set in the temporal discounting to shorten scanning time and prevent problems caused
by participant fatigue. However, their optimization method was not executed in the
MR scanner. Rather, it relies on an out-of-scanner task to find the most informative
stimulus set for an actual fMRI scanning.
Compared to the previous examples, “The Automatic Neuroscientist” (Lorenz,
Hampshire, & Leech, 2017; Lorenz et al., 2016) implemented the first adaptive opti-
mization technique that uses functional images from the MR scanner. This method
aims to provide a tool for systematic investigation of complex task-to-region rela-
tionship by finding a task or experimental design that evokes a specific target brain
state. The method relies on Bayesian optimization with a Gaussian process prior
(Snoek, Larochelle, & Adams, 2012) to model the brain state, and searches a task
or experimental design that minimizes the difference between the current and target
brain state.
However, the localization paradigm investigates only one axis of cognitive neu-
roscience research. The ultimate goal of cognitive neuroscience is to provide an in-
tegrative explanation of how the mind and behavior originate from brain activity.
Attempts to explain human cognition and behavior by relating them to localized
brain activation cannot achieve this goal because they do not consider how the brain
modulates cognitive and behavioral processes (Serences & Saproo, 2012).
6
Model-based cognitive neuroscience is a new area of research that pursues this goal
by using mathematical and computational models to link neural and behavioral data.
However, the method of Lorenz et al. (2016) is difficult to apply here because it was
not developed for model-oriented research questions such as estimating parameters
comparing cognitive models, which are central factors of model-based cognitive neu-
roscience experiments. Therefore, model-oriented fMRI experiments need to develop
new adaptive optimization methods.
1.2 Bayesian Online Design Optimization in Behavioral Cog-nitive Science
Unlike cognitive neuroimaging, adaptive optimization methods have a long history
in experimental psychology, especially in psychophysics. Most of the early applica-
tions have focused on generating optimal stimuli for estimating parameters of a psy-
chometric function (threshold and slope). The classic methods include nonparametric
staircasing procedures and parameter estimation by sequential testing. For a general
review of the adaptive experimental procedures in psychophysics, see Leek (2001).
Adaptive optimization methods in behavioral cognitive science experiments have
the unique potential for developing online design optimization tools for neuroimaging
because adaptive behavioral experiments are mostly model-based and so are some-
what automated. Therefore, here we provide a brief history of online design opti-
mization methods used in behavioral experimental psychology, mainly focusing on
Bayesian online design optimization methods. Then, Adaptive Design Optimization
(ADO) will be introduced as one of the recently proposed Bayesian optimal design
7
methodologies, as well as a method with broader applicability to more general, math-
ematical models of cognition. Lastly, the current status of the application of ADO in
cognitive neuroscience will be discussed.
1.2.1 Early Applications in Psychophysics and Their Improve-ments
The first applications of Bayesian adaptive optimization methods in psychophysics
were QUEST (Watson & Pelli, 1983) and the Psi method (Kontsevich & Tyler, 1999).
QUEST was the first Bayesian adaptive method suggested for estimating the threshold
of a psychometric function. For generating the stimulus used in the next trial, QUEST
used the maximum a posterior (MAP) estimate of the threshold obtained by the
history of stimuli and response accumulated by the current trial. Meanwhile, the Psi
method not only extended its optimization goal to estimation of both threshold and
slope parameters, but also used a different design proposal rule based on expected
entropy.
Recently, many new advances to the Psi method have been made. For exam-
ple, Kujala and Lukka (2006) suggested algorithms for faster computation of the
Psi method using Fast Fourier Transform and particle filter method. Prins (2013)
developed the Psy-marginal method which selectively focuses on a subset of param-
eters of psychometric functions. The motivation comes from different importance of
psychometric function parameters. Some parameters such as threshold and slope di-
rectly describe sensory mechanisms, whereas the lower and upper asymptotes of the
function are less relevant to the core sensory processing. Therefore, the optimization
algorithm must prioritize efficient estimation of threshold and slope than asymptotes
when suggesting an optimal design. Hence, the Psi-marginal method referred to the
8
marginal posterior distribution of more important parameters so that the optimiza-
tion routine will consider those parameters first. Unlike the previous applications
of the Psi-method focusing on parameter estimation, Cooke, Selen, van Beers, and
Medendorp (2017) developed the Psi method for comparing psychophysical models.
Recently, Bak and Pillow (2018) used the Psi method for multi-alternative psycho-
metric functions considering responses that are made by invalid response options or
made without attending to presented stimuli.
The use of Bayesian online design optimization methods is not limited to psycho-
metric functions. For example, Lesmes, Jeon, Lu, and Dosher (2006) and Lesmes,
Lu, Baek, and Albright (2010) proposed Bayesian adaptive methods for estimating
the threshold versus noise contrast function and the contrast sensitivity function, re-
spectively. However, all the applications discussed here tend to consider only one
specific class of cognitive models (i.e., psychometric functions, threshold versus noise
contrast function, contrast sensitivity function). Compared to the previous attempts,
Adaptive Design Optimization (Cavagnaro et al., 2010; Myung et al., 2013) considers
the problem of model selection, and in so doing, provides a more generalized tool for
design optimization.
1.2.2 Adaptive Design Optimization and Its Applications
Adaptive Design Optimization (ADO; Cavagnaro et al., 2010; Myung et al., 2013)
is another Bayesian design optimization method that uses mutual information as
its utility function. ADO was originally proposed as an online design optimization
tool for model comparison in cognitive science experiments. However, when only
considering one model, the method naturally reduces to an algorithm for optimizing
9
parameter estimation. In the context of parameter estimation, ADO proposes a design
for the next trial that maximizes information about the parameters, given the entire
history of stimuli used and responses acquired during the experiment. Essentially,
ADO requires a formal cognitive model that explicitly expresses a target cognitive
mechanism as a set of mathematical functions.
ADO has been applied to behavioral cognitive science experiments including mem-
ory retention (Cavagnaro, Pitt, & Myung, 2011), probability weighting for risky deci-
sion making (Cavagnaro, Pitt, Gonzalez, & Myung, 2013), and temporal discounting
(Cavagnaro, Aranovich, McClure, Pitt, & Myung, 2016). The goal of these previous
applications was to discriminate among a set of candidate models. ADO was used to
propose new experimental designs adaptively to facilitate the model discrimination.
ADO was extended with various functionalities after its development including
hierarchical ADO (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) and ADO using
dynamic programming (Kim, Pitt, Lu, & Myung, 2017). HADO exploits the Bayesian
nature of ADO by adding a hierarchical component to reflect the statistical depen-
dencies among model parameters from previously acquired data sets. This extension
enables ADO to incorporate information beyond individual participants and consider
group- or population-level information. Meanwhile, Kim et al. (2014) improved the
“near-sighted” performance of ADO (i.e., optimizing for only one trial ahead) by us-
ing dynamic programming (Bellman, 1957). The addition of dynamic programming
allowed ADO to consider sequences of trials rather than a single trial.
ADO or ADO-like approaches have recently been applied to neuroscientific prob-
lems (DiMattina, 2016; Sanchez et al., 2014; Sanchez, Lecaignard, Otman, Maby,
10
& Mattout, 2016), but they have yet to provide a compelling proof-of-concept re-
sult. For example, DiMattina (2016) used adaptive stimulus generation, which has
ADO-like mechanisms to compare contrast gain models. However, this application
neither modeled neural activity nor used neural data directly. Instead, he developed
an encoding-decoding model to map contrast stimuli to hypothesized neural responses
(encoding model) and then to behavioral responses (decoding model) so that ADO
can function only with behavioral response data. Therefore, this study involves the
neurophysiological model in ADO but does not use neural data for design optimiza-
tion. Sanchez et al. (2014) applied ADO to a single-trial EEG experiment using a
combination of perceptual learning models and an electrophysiological response model
to compare learning models. Later, Sanchez et al. (2016) extended their ADO appli-
cation to DCM. However, implications of both studies are limited because the results
are based on simulation experiments and were not applied to real data. Also, these
studies do not consider behavioral responses in their model structure, although they
are also important measures of latent brain computations.
1.3 A Model-based Cognitive Neuroscience Approach
As reviewed above, current applications of online design optimization mainly rely
on either neural or behavioral data. The method of Lorenz et al. (2016) modeled the
task-specific brain state within online design optimization algorithm. However, this
method is limited in that it can only answer research questions about brain-behavior
mapping. Although DiMattina (2016) and Sanchez et al. (2014, 2016) implement
online design optimization with neural models, they fail to consider a complete data
set acquired in neuroscience experiments – either neural or behavioral responses are
11
not introduced in the optimization method. In any case, ignoring either type of data
may result in a great loss of important information for making an inference about
cognitive mechanisms because brain and behavioral data – or neural and cognitive
models – can explain different aspects of cognition. Therefore, it is a goal worth
pursuing to develop online design optimization algorithm that takes both sources of
data into consideration.
To optimize designs with respect to both neural and behavioral measures, we
require a computational model that makes predictions for both measures. Models
that consider both measures simultaneously are inspired by the work of David Marr,
who introduced the concept of levels of analysis (Marr, 1982). He emphasized that an
integrative explanation of a mental system can be developed by considering (1) what
the system aims to achieve (computational level), (2) how the system represents and
manipulates information (algorithmic level), and (3) how the mental system emerges
from the physical world (implementation level).
Recent approaches aim to formalize integrative models by bridging evidence from
cognitive neuroscience research and behavioral experiments. On the neuroimaging
side, a computational neuroimaging approach (Serences & Saproo, 2012; Wandell,
1999) has been developed to explain the modulation of brain activation as a function
of visual stimuli.1 This approach especially focused on developing models for visual
systems, such as contrast response functions (Boynton, Demb, Glover, & Heeger,
1999) and population receptive field models (Wandell & Winawer, 2015). However,
1However, note that the name “computational neuroimaging” is frequently used to refer to asubset of “model-based cognitive neuroscience” approaches. This confusion occurs especially withthe research investigating neural correlates with cognitive model parameters (O’Reilly & Mars, 2011)or the “model-based neuroimaging” (O’Doherty, Hampton, & Kim, 2007) which uses parameterestimates of a cognitive model to generate a event-related regressor in the general linear modelanalysis (Dunne & O’Doherty, 2013).
12
as the goal of this research program is to develop formal models of neuroimaging data,
understanding of behavioral responses via these models were attempted implicitly, or
sometimes by simple psychophysics models (Boynton et al., 1999).
On the side of mathematical psychology, “model-based cognitive neuroscience”
(Forstmann & Wagenmakers, 2015; Forstmann, Wagenmakers, Eichele, Brown, &
Serences, 2011; B. M. Turner, Forstmann, Love, Palmeri, & Van Maanen, 2017) was
suggested to link neural evidence to cognitive models developed from behavioral ex-
periments. From Maar’s level-of-analysis persepctive, we consider cognitive models as
a tool for integrating neural and behavioral explanations of mind from computational
and algorithmic level (Love, 2015). This model-based approach claims that neural
and behavioral data take advantage of each other: for example, neural data analysis
can utilize mechanistic cognitive models to develop more systematic relationships be-
tween neural activation and behavioral data (Forstmann, Brown, Dutilh, Neumann,
& Wagenmakers, 2010). Cognitive models can also be assisted by neural data because
they offer additional constraints for model discrimination (Mack, Preston, & Love,
2013).
As the focus of this thesis is given more to cognitive levels, the model-based cog-
nitive neuroscience approach serves better for defining the use of neuroimaging data
within fMRI-based ADO experiments. The model-based cognitive neuroscience has
various modeling strategies (B. M. Turner, Forstmann, et al., 2017), including “two-
stage” approaches that correlate estimates separately obtained from a cognitive model
and a neural activation model across trials (Rodriguez, Turner, Van Zandt, & Mc-
Clure, 2015; van Maanen et al., 2011). For our purpose of design optimization, how-
ever, we rely on a joint modeling approach based on a hierarchical Bayesian framework
13
(Palestro et al., 2018; B. M. Turner, Forstmann, et al., 2013; B. M. Turner, Rodriguez,
Norcia, McClure, & Steyvers, 2016; B. M. Turner, Van Maanen, & Forstmann, 2015).
One reason using the joint modeling approach is that we can use both behavioral
and neural data simultaneously to constrain model parameters within a single model
structure, unlike in the “two-stage” strategies. More advantages of the joint modeling
approach will be discussed later.
1.4 Summary and Outline
For the remainder of this thesis, we will discuss how ADO can be implemented
in real-time fMRI experiments. Chapter 2 describes the concept of fMRI-based ADO
and the pipeline required for implementation. At the end of this chapter, we provide
practical guidelines for the use of ADO in fMRI experiments. Chapter 3 introduces
a model discussed throughout this thesis as a proof-of-concept. Based on this, sim-
ulation studies will reveal that ADO optimizes the experimental design compared to
conventional randomized design. In Chapter 4, we run a real-time fMRI experiment
to verify that fMRI-based ADO works in a real-world setting. The result of this ex-
periment will show that the performance of ADO observed in Chapter 3 is roughly
replicated in the real-world experiment. Chapter 5 will summarize the result, discuss
limitations of the current study, and propose applications of fMRI-based ADO.
14
Chapter 2: Basic Concepts of Adaptive Design Optimization
in fMRI experiments
We introduce each component of the fMRI-based ADO and discuss how they
communicate in a fMRI experiment throughout this section (see Figure 2.1). The
fMRI-based ADO framework utilizes the joint model (Section 2.1) and ADO (Section
2.2) as its core components. For introducing the neural input, the ADO framework
also requires a linear model for estimating task-related neural activity (Section 2.3).
Supplementary components such as real-time updating of the grid space (Section 2.4)
and full posterior sampling (Section 2.5) are incorporated for computational efficiency
of the grid search method of ADO.
2.1 Joint Modeling Framework
In model-oriented research, we assume that formalized models such as computa-
tional algorithms or stochastic processes can express the mechanism of human cogni-
tion. Therefore, the main goals of a model-based experiment are twofold: to estimate
the model parameters and compare candidate models. The former aims to identify
the characteristics of a cognitive system by locating appropriate parameter values
using collected data. Meanwhile, the latter pursues discriminating a better formal
15
tz
BOLD
resp
onse
s
0
BOLD
Res
pons
e
0 Time
DE-M
CM
C
ADO𝑈 𝑑 = $𝑢 𝑑, Ω,𝑦 𝑝 𝑦 Ω, 𝑑 𝑝 Ω 𝑑𝑦𝑑Ω
New
Des
ign
Joint Model
Figure 2.1: An illustration of the pipeline of a real-time fMRI-based ADO experiment.
16
explanation of cognitive processes with data. Whatever its purpose is, a model-based
study is initiated by developing a model.
As we introduced, the joint modeling framework (Palestro et al., 2018; B. M. Turner,
Forstmann, et al., 2013; B. M. Turner et al., 2016, 2015) can be a good starting point
for cognitive neuroscience experiments with online design optimization. In addition
to enabling simultaneous use of neural and behavioral data within a single modeling
framework, the joint modeling framework allows model parameters to be constrained
by neural and behavioral data together via its hierarchical structure.
Joint models consist of three components: a neural submodel, a behavioral sub-
model, and a linking function. The neural submodel is defined to describe trial-wise
neural activity such as amplitude of hemodynamic responses and mean EEG signal. If
a researcher uses raw neural data (e.g., BOLD responses), conventional data analysis
methods for extracting single-trial neural activation estimates (i.e., a general linear
model in fMRI analysis) can work as a neural submodel. The behavioral submodel is
a counterpart of the neural submodel whose goal is to describe behavioral responses.
Traditional cognitive models that do not involve neural-level explanations usually
serve as a behavioral submodel. The linking function connects the neural activity
with the parameters of behavioral submodels.
Joint models are typically classified into two groups according to the linking func-
tion used in the model: the covariance joint model and the directed joint model
(Palestro et al., 2018). The covariance joint model assumes a “hypermodel” that con-
strains parameters in the neural and behavioral submodels by the covariance struc-
ture of submodel parameters. In this approach, a multivariate normal distribution or
factor-analytic linking function (B. M. Turner, Wang, & Merkel, 2017) can be used
17
as representative examples of a hypermodel. Meanwhile, the directed joint model
assumes that submodel parameters are connected by direct transformation from one
submodel to the other. For example, the difference of neural activation evoked by
two different stimuli in a discrimination task (i.e., neural submodel parameters) can
serve as a drift rate parameter in Wiener diffusion decision model (i.e., behavioral
submodel parameter) (Palestro et al., 2018). Although brain-to-behavior transforma-
tions may seem natural from a reductionist perspective, behavior-to-brain mapping
can be used to constrain estimates of neural activity using estimates of behavioral
submodel parameters (van Ravenzwaaij, Provost, & Brown, 2017).
When joint models are used for online design optimization, we must carefully
choose a linking function because covariance-based and directed linking functions
require different computational burdens. The directed model usually does not as-
sume additional parameters when defining the transformation of submodel parame-
ters. However, the covariance model cannot avoid introducing new parameters such
as covariance coefficients, which results in increasing the dimensionality of the pa-
rameter space in ADO. Although factor-analysis-based covariance structure has been
suggested to limit the number of covariance parameters efficiently (B. M. Turner,
Wang, & Merkel, 2017), ADO still needs to consider more parameters for optimiza-
tion relative to the directed approach.
2.2 Adaptive Design Optimization
Here we introduce the details of ADO, and then discuss the strategy for extending
behavioral ADO to neural data.
18
2.2.1 The Mechanism
As we discussed in Section 1.2.2, ADO can be applied to both parameter estima-
tion and model comparison in real-time. However, it is worth noting that optimal
design for model comparison may not be the best for parameter estimation and vice
versa. In this thesis, we focus on the problem of parameter estimation.
ADO proposes an optimal design for upcoming trials by solving an optimization
problem
dt+1 = argmaxd
U(d) (2.1)
U(d) =
∫y∈Y
∫θ∈Θ
u(d, θ, y)p(y|θ, d)p(θ|d) dθ dy (2.2)
where d refers to candidate designs of an experiment. U(d) and u(d, θ, y) are real-
valued functions called global and local utility functions, respectively. A local utility
function u(d, θ, y) evaluates the utility or informativeness of a design d regarding a
model parameter set θ when a design d is used and a response y is anticipated in a
hypothetical experimental trial. The global utility U(d) is computed as an “average”
local utility by integrating the local utility over a parameter space Θ and a response
space Y . The final decision of design proposal is made by selecting a design of the
highest global utility.
Although a posterior covariance matrix and the sum of squared errors are often
used as utility functions (Ryan et al., 2016), a standard implementation of ADO relies
on mutual information to evaluate the utility of each design. One of the merits of
using mutual information for design optimization is that mutual information is known
to perform well for both parameter estimation and model comparison. A global utility
19
function based on mutual information is
U(d) =
∫y∗∈Y
∫θ∈Θ
logp(θ|d1:t, y1:t, d, y
∗)
p(θ|d1:t, y1:t)p(y1:t|θ, d1:t)p(θ|d1:t) dθ dy
∗ (2.3)
where d is a candidate design of interest, θ is a model parameter vector, y∗ is an
anticipated response at the (t + 1)-th trial, and d1:t and y1:t represent a series of
experimental designs and collected responses in the previous t trials, respectively.
Note that by the definition of mutual information, a local utility function in Equation
2.2 is
u(d, θ, y∗) = logp(θ|d1:t, y1:t, d, y
∗)
p(θ|d1:t, y1:t)(2.4)
(Myung et al., 2013).
For the numerical integration over parameter and response spaces, the first im-
plementation of ADO used a sequential Monte Carlo technique (Amzal, Bois, Parent,
& Robert, 2006; Cavagnaro et al., 2010; Myung & Pitt, 2009). Meanwhile, Myung
et al. (2013) suggested a simpler integration strategy based on grid based methods.
Myung et al.’s approach proceeds by first defining a number of grid points for each
dimension of design, parameter, and response spaces. Once the grids are defined over
an entire search space, ADO then evaluates local utilities (i.e., u(d, θ, y∗)) and joint
densities of θ and y1:t (i.e., p(y1:t|θ, d1:t)p(θ|d1:t)) for all grid points. A global utility
for a candidate design d is computed by taking a mean of weighted local utility values
sharing a target design d:
U(d) ≈ 1
nd
∑θ,y∗
logp(θ|d1:t, y1:t, d, y
∗)
p(θ|d1:t, y1:t)p(y1:t|θ, d1:t)p(θ|d1:t) (2.5)
where nd is the total number of grid points assigned to a candidate design d.
20
2.2.2 Extension to the Neural Data
Introducing neural data and its activation model does not change the definition
of the global utility function and the searching process. However, the dimension of
both parameter and response spaces increases because we have incorporated neural
data and therefore need to consider the expected neural responses into ADO.
Ideally, a full joint model can allow ADO to use a raw BOLD time-series vector
N as its neural input. (From here, we will use y as a behavioral response vector.)
Assuming a hierarchical joint model Ω = (θhyper, θneural, θbehavioral), a global utility
function is defined by
UJM(d) =
∫ ∫ ∫u(d,Ω,N, y)p(N, y|Ω, d)p(Ω|d) dΩ dN dy. (2.6)
However, using the raw neural data is practically impossible within ADO due to
its high dimensionality. Equation 2.6 suggests that all data points in the time-series
vector N must be integrated over Rn where n is the length of the time-series vector.
However, new data are continuously added during the scan causing increases in the
dimension of the neural data space n. Considering that new functional image data are
updated every 1-3 seconds in typical fMRI experiments, only five minutes of scanning
forces ADO to integrate over at least R100.
As an alternative, we can implement a global utility function based on a “limited”
version of the joint model structure using trial-wise neural activation estimates:
ULJM(d) =
∫ ∫ ∫u(d,Ω, β, y)p(β1:t, y1:t|Ω, d)p(Ω) dΩ dβ dy (2.7)
β1:t = g(X,N) (2.8)
where β is an estimate vector of stimulus- or trial-level neural activation, X is a design
matrix, and g(·) is an estimator of stimulus- or trial-wise neural activation β based on
21
X and N. When the limited joint model is used, we use single-trial neural activation
estimates that describes stimulus- or trial-wise brain activity as neural input of ADO
instead of the raw neural data vector. By reducing the set of possible data points to
single-trial activation parameters, the computational burden of using ADO becomes
manageable once again. However, this reduction does come at the cost of inflated
uncertainty in the estimates of neural activation.
2.3 Single-trial Neural Activation
According to the discussion in Section 2.2.2, the method for estimating stimulus-
or trial-wise neural activation estimates (so-called “single-trial neural activation”)
needs to be included in ADO to limit the computational load. Here, we will discuss
a general linear model for estimating single-trial neural activation.
2.3.1 A General Linear Model with Stimulus-level Regres-sors
The use of trial-wise neural activation estimates serves as a remedial strategy
for the high dimensionality problem of raw BOLD responses. To actually use the
single-trial activation estimates, fMRI-based ADO must include a component that
estimates neural activation amplitude evoked by each stimulus or trial so that the
neural estimates can be used for proposal generation.
The conventional approach to estimating single-trial activation is to perform a
general linear model (GLM) analysis – an application of multiple linear regression –
to fMRI data (Friston et al., 1995). A GLM uses a design matrix consisting of vectors
representing the onset times of events of interest (e.g., stimulus presentation, response
production) convolved with a hemodynamic response function. A typical approach is
22
to define condition-wise regressors for comparing the mean activation estimates across
conditions (for a more general introduction to this topic, see introductory textbooks
for fMRI data analysis such as Poldrack, Mumford, & Nichols, 2011).
However, when using ADO, GLM regressors must be defined at each stimulus-
or trial-level because we need information of neural activity associated with each
stimulus. Conceptually, stimulus-level regressors can be easily made by setting the
onset vectors for each individual stimulus, not for each condition. This method is
often referred to as “beta-series regression” in the context of multivoxel analysis
(Rissman, Gazzaley, & D’Esposito, 2004). A beta-series GLM can be implemented in
a Bayesian framework (e.g., Palestro et al., 2018). However, full posterior estimation
is time consuming in real-time fMRI experiments due to the large number of single-
trial regressors or multiple BOLD response vectors. In our application, we will use
frequentist estimates to obtain trial-wise neural activation estimates efficiently. For
example, ordinary least squares estimates (LSEs) can be derived as:
β = (XTX)−1XTN (2.9)
where X is a design matrix, a superscript T indicates the transpose operation, and
N is a raw BOLD time-series vector.
Note that Equation 2.9 is one of the plausible estimators of β1:t (i.e., g(·) in Equa-
tion 2.8). As stimulus- or trial-level estimates are known to have large variability
(Abdulrahman & Henson, 2016; Mumford, Turner, Ashby, & Poldrack, 2012), varia-
tions of the stimulus-level GLM analysis such as iterative GLM analyses with nuisance
regressors (Mumford, Davis, & Poldrack, 2014; Mumford et al., 2012; B. O. Turner,
Mumford, Poldrack, & Ashby, 2012) can be used to control the variability. Also, re-
gression models with temporal autoregressive errors such as AR(p) and ARMA(1,1)
23
(Bullmore et al., 1996; Lindquist, 2008) can be used to control temporal autocorrela-
tion in the BOLD time-series. In any case, the time required for estimation is one of
the most important concerns because ADO requires fast computation of single-trial
neural activation in real-time.
2.3.2 Incremental Analysis and Flexibility of Estimates
Estimation of single-trial neural activation or “single-trial beta estimation” is nec-
essary at the end of every trial so that ADO can use the information to compute global
utility. However, this incremental procedure implies that the time-series data will be
continuously updated during an entire scanning session, and therefore estimates of
neural activation can change every trial.
The first option to handle the variability of single-trial neural estimates is to allow
ADO to update the neural estimates every trial. From this perspective, ADO must
use the best “data” – in this case, single-trial neural activation estimates – available
at each trial. Hence, ADO must refer to new estimates as they become more accurate
and less variable as the experiment moves on.
The second option is to block the updating of neural estimates included in ADO
during previous trials. In this case, neural activation estimates of previous stimuli
or trials will be fixed in further trials and new estimates for those trials will not be
used in ADO. In other words, only the estimates from a new trial will be used. This
approach ensures the stability of ADO algorithm as the estimates of neural activity
remain constant once they have been estimated on a given trial.
In the simulation experiments (Chapter 3), we make an ideal assumption that
we always obtain perfect estimates of stimulus-wise neural activations. Therefore,
24
there is no need for considering the variability of neural estimates and updating the
“newly” updated parameters. In the real-world experiments (Chapter 4), we choose
the first strategy that updates neural estimates to make ADO use the best information
available in each trial.
2.4 Dynamic Gridding
As discussed in Section 2.2.1, the current implementation of fMRI-based ADO re-
lies on the grid-based method to compute the global utility. For efficient performance
of ADO, we need to discretize both parameter and response spaces appropriately.
Theoretically, an obvious first choice is to define a dense grid over a broad range of
values in both parameter and response spaces. However, a tradeoff ensues between
the number of grid points and computational efficiency due to multidimensionality
of the grid space. Adding only one more grid point per dimension will result in an
explosive increase of the number of grids in the entire search space. Hence, setting a
number of dense grids cannot be an appropriate solution.
Another disadvantage of the dense grid space is redundant grid points in low pos-
terior density regions. Global utility based on mutual information relies on posterior
densities obtained at each grid point. Joint posterior distributions of model parame-
ters will be constrained as the experiment proceeds, and therefore the number of grid
points with extremely small posterior density (i.e., p(y|θ, d)) will increase. In the end,
most of the grid points cannot contribute to generating new proposals due to small
posterior densities, which makes computation and aggregation of global utility values
inefficient.
25
One possible solution is to update the grid as the posterior distribution is updated.
This approach allows ADO computation to be affordable with limited computing
resources while achieving better efficiency. Implementation of this solution requires a
method for automatically adjusting the distribution of grids to capture a region with
high posterior density.
Here, we used a simple method based on singular value decomposition (SVD)
of a sample covariance matrix. The main idea is to decompose a sample covariance
matrix S into a rotation matrix and use its inverse to transform the posterior samples
to be orthogonally distributed. This approach is motivated by principal component
analysis (Johnson & Wichern, 2007): a sample covariance matrix can be decomposed
into three matrices
S = RCR−1
where R is a matrix consisting of eigenvectors of S and C is a diagonal matrix with
corresponding eigenvalues. Because eigenvectors in R construct an orthogonal basis
explaining the largest variance of the posterior samples, we can use its inverse R−1 as a
inverse-rotation function that maps the original posterior samples onto an orthogonal
principal component space without additional scaling. The marginal distributions of
transformed grid clusters are used to define new percentile-based grids. As a last
step, a rotation matrix R maps the newly defined grids onto the original space.
Figure 2.2 shows a hypothetical example of dynamic gridding. A black contour
plot shows a joint posterior distribution of (θ1, θ2). Blue “×” markers represent the
grid points that are initially defined in the parameter space. Most of the initial grid
points do not cover the high-density region of the parameter space. However, the
newly updated grids (red circles) based on the correlational structure of the posterior
26
0.5 1.0 1.5 2.0 2.5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Kernel density estimates of the posteriorInitial grid spaceUpdated grid space
θ 2
θ1
Figure 2.2: An illustration of the application of dynamic gridding. A contour plot(black line) represents the kernel density estimates of two-dimensional posterior den-sity distribution based on hypothetical posterior samples. Blue “×” markers are thegrid points initially defined. Red circles are the grid points updated by dynamicgridding based on singular value decomposition.
27
samples are successfully located within the high-density region. Adjusting grid points
allows ADO to investigate optimal design more efficiently because the grid space does
not incorporate non-informative regions of the parameter space.
It is worth noting that this dynamic gridding method can sometimes generate
invalid grid points according to assumptions on the model parameters. For example,
the standard deviation of a normal distribution, say σ, is not allowed to have negative
values by its definition. However, the SVD-based dynamic gridding might allow
invalid grid points (i.e., σ < 0) by the shape of the joint posterior distribution and
constraints imposed to other model parameters. These invalid grid points must be
ignored in further steps.
2.5 Posterior Updating
As being inherently Bayesian, ADO evaluates posterior densities of grid points
every trial. However, full posterior estimation of model parameters may be required
in real-time for two reasons: diagnosis of the performance of ADO and dynamic
gridding. We discussed how a full posterior distribution can be used for dynamic
gridding in Section 2.4. In addition, we might need to check the degree to which
estimates are constrained in real-time depending on the application of ADO.
Here we will use a Differential evolution Markov chain Monte Carlo sampler (DE-
MCMC; ter Braak, 2006; B. M. Turner, Sederberg, Brown, & Steyvers, 2013) for
posterior updating. Not only does DE-MCMC draw posterior samples with correla-
tion more efficiently, this method is known to suffer less from autocorrelation of the
sampling process than conventional Metropolis-Hastings algorithm.
28
In our ADO application, initialization of the chains depends on the grids generated
for evaluating global utility. In detail, initial chains are selected by a multinomial sam-
pling with a choice probability vector constructed by normalizing posterior densities
of all grids in the search space:
ci,t,1 ∼ Multinomial(p(t)).
ci,t,1 is the i-th chain of the DE-MCMC sampler initialized after completing the t-th
trial. A choice probability vector at trial t is defined as
p(t) = [p(t)1 , p
(t)2 , · · · , p(t)
J ]T
where
p(t)j =
f(θ(t)j |y1:t, d1:t)∑J
j=1 f(θ(t)j |y1:t, d1:t)
is the probability that the j-th grid is selected as an initial chain, θ(t)j is the j-th grid
point in the search space at trial t, and J is the total number of grids.
However, poor initialization can end up with multimodality of the posterior distri-
bution in that most chains are clustered at a high-density region while a small number
of “outlier” chains is gathered at extremely low-density region by chance. These out-
lier chains can affect the posterior distribution due to the nature of DE-MCMC that
uses the difference vector between chains as information to update posterior samples.
Migration (Hu & Tsui, 2005) could be a reasonable remedy to solve this problem
by swapping of the location of outlier chains during the first few trials with fixed
probability.
29
Subject spaceScanning protocol
Standard space (e.g. MNI)
Inverse-registration
Structural scanning
EPI localizer
Functional localizer
Main task
Anatomical mask
Task-specific mask
AveragedBOLD responses
Single-trial beta estimates
Structural image
Anatomical mask(e.g. Brodmann area)
Structural image
Figure 2.3: An illustration of the scanning protocol and data flow diagram in theADO-based real-time fMRI experiment.
30
2.6 Scanning Protocol and Real-time Data Flow
Figure 2.3 describes a typical scanning procedure and the flow of data in ADO-
based real-time fMRI experiments. The experiment is mainly divided into three
stages: (1) acquisition of structural and functional localizer images, (2) inverse-
registration of anatomical masks in a standard space, and (3) data collection in the
main task.
The first stage aims to collect information required for producing a task-specific
mask in the subject-specific brain space. After completing set-up for online data
transfer from an MR scanner to a terminal computer, an experimenter needs to collect
structural images of a participant’s brain and acquire a regional localizer based on an
echo-planar imaging (EPI) sequence. The former constructs the basis of the subject
space, while the latter limits the region to be scanned in the functional localizer and
the main tasks. The functional localizer task is performed to detect task-relevant
voxels as the last step. The functional localizer mask can be defined by performing
a whole-brain GLM analysis with data from the localizer task and extracting voxels
that have test statistics (e.g., t-statistics) greater than a specific threshold.
In the second stage, an experimenter extracts the task-relevant subject-specific
mask using the data acquired from the first stage. We use a template structural image
defined in a standard brain space such as MNI (Montreal Neurological Institute) atlas
(Grabner et al., 2006) as a reference. Once the experimenter collects the structural
image in the subject space, it is registered to the standard brain template to obtain
the transformation matrix that maps the subject space onto the standard space. The
inverse-transformation matrix is derived by taking an inverse of the transformation
matrix, and is used for mapping the anatomical masks in the standard space to
31
the subject space. When regions of interest (ROIs) must be constrained by masks
provided by standard anatomical atlases (e.g., Julich Histological Atlas; Eickhoff et
al., 2005), we can transform the standard masks to subject-specific masks by using
the inverse-transformation matrix. The conjunction between the inverse-transformed
anatomical mask and the functional localizer mask defines the task-relevant mask in
the subject space.
The task-specific mask enables one to obtain voxel-wise BOLD responses in real-
time during the main task. When an experimenter is interested in a specific ROI
defined by the task-relevant mask, a common approach is to average neural signals
from all voxels in the mask for running the GLM analysis for stimulus-wise neural
estimates. The stimulus-wise neural activation estimates are considered as neural
inputs of ADO. The detailed description of single-trial beta estimation is discussed
in Section 2.3.
2.7 Practical Considerations
In the practice of fMRI-based ADO, we need additional considerations in model
development and implementation of the computational framework. Here, we discuss
methodological and technical issues expected in the practice and possible remedies.
2.7.1 Developing a Joint Model of Neural and BehavioralData
If our cognitive model does not involve a neural component, the most reasonable
practice is to run behavioral ADO (i.e., Equations 2.1 and 2.2) while collecting fMRI
data simultaneously without assuming any communication between ADO and fMRI
32
data. Afterward, we can conduct conventional offline data analyses with a joint model
using a hierarchical linking function.
However, this “post-hoc analysis” approach may not always be the best choice
for three reasons. Firstly, designing fMRI experiments calls for additional considera-
tions due to the characteristics of BOLD responses. For example, stimulus duration
and interstimulus interval are important in obtaining a better signal-to-noise ratio.
In addition, when we run an offline analysis, we should be aware that ADO exper-
iments cannot have a balanced design by its nature. Especially, a condition-level
offline GLM analysis may not provide the best result because the number of exposure
of each condition is not balanced. Most importantly, a behavioral ADO experiment
incorporating neural data by offline data analysis does not serve its original goal:
offering the maximally informative design on the fly. ADO implemented in this post-
hoc strategy will not exploit the information provided by neural data in the data
collection and optimization procedures. We can imagine that the maximally infor-
mative experimental designs obtained by behavioral data only might not be the same
as those derived by both neural and behavioral data.
In short, to maximize the information obtained from both neural and behavioral
data, a joint model framework is strongly recommended rather than performing be-
havioral ADO and offline fMRI analysis separately. If reconfiguring the cognitive
model to involve neural components is difficult, a hierarchical approach could be the
simplest remedy to connect the neural and behavioral data.
33
2.7.2 Discretizing a Continuous Space for Grid Search
Every continuous parameter and data subspace must be discretized when using a
grid-based method for numerical integration. However, setting a reasonable range of
grid points may not be straightforward when we discretize the neural response space
because the range of neural estimates can vary according to stimulus settings (e.g.,
duration, flickering, interstimulus interval) and hemodynamic response function used
in the model.
The easiest solution is to adaptively adjust the range of neural response grid by
using the minimum and maximum of neural estimates as anchors. Here, we should
note that extending the scope of the neural response space sacrifices grid resolution
if we assume a fixed number of grid points for each dimension. On the contrary, if we
increase the number of grid points in the neural response space to maintain the res-
olution, the computation time increases exponentially. Therefore, a pilot experiment
is strongly recommended to identify appropriate grid settings.
2.7.3 Including the Stimulus-wise Neural Activity: One-trial-lag ADO
Let us assume that we are on the t-th trial of the experiment. Ideally, both neural
and behavioral data for the first t trials must be incorporated in ADO. However, we
should expect loss of the latest one-trial amount of neural data (i.e., stimulus-wise
neural activation estimates) in practice due to the slow-changing temporal profile
of the hemodynamic response. In detail, the hemodynamic responses consist of an
increasing period to a peak that takes 5-6 seconds, a decreasing period with an un-
dershoot below a baseline activation, and a slow asymptotic recovery period. The
34
0 10 20 30 40
Time (second)
BOLD responses for the first trialBOLD responses for the second trialBOLD responses for the third trialConvolved BOLD resopnsesStimulus onset
Response interval
Single-trial beta estimation +ADO
Additional time required for single-trial beta estimation
Stimulus duration
Figure 2.4: An illustration of the hemodynamic responses from the task and com-putational steps required within each trial. Dotted lines (red, blue, green) refer tohypothetical hemodynamic responses evoked by a stimulus within each trial, and astraight line (gray) shows the expected value of convolved hemodynamic responses.The squares below the x-axis specifies the length of intervals required for each step.
total length of a hemodynamic response usually takes up to 30 seconds. Hence, neu-
ral measures in fMRI experiments need to be collected for at least 5-6 seconds to
characterize their peak intensity. Hence, a temporal lag of 5-6 seconds might be too
long depending on stimulus presentation settings (i.e., stimulus duration, interstim-
ulus/intertrial interval). In this case, we can collect a behavioral response but not a
neural activation estimate at the end of the trial.
Figure 2.4 illustrates a hypothetical experiment of which stimulus duration is 2
seconds (i.e., a canonical hemodynamic response function is convolved with a boxcar
35
function with 2-second duration) and response interval is 4 seconds. In this example,
let us assume that the intertrial interval of 4 seconds still remains for single-trial
beta estimation and ADO computation. However, we cannot estimate single-trial
neural activation estimates immediately after the response interval is over because
a theoretically assumed hemodynamic response (broken lines) has not attained its
peak yet. We have to exhaust the intertrial interval (blank rectangles) just for getting
more neural data, unlike our original intention to exploit the interval for ADO. A new
trial then begins when we are ready to estimate single-trial neural activation of the
previous trial. Therefore, we can collect behavioral responses but not neural data –
single-trial neural activation estimates – at the end of each trial. In this situation, a
hemodynamic lag hinders finishing single-trial beta estimation and ADO computation
within each trial. As this problem is not a matter of computation speed, developing
efficient algorithms may not be helpful.
One possible solution for the loss of neural data is to use the neural and behavioral
data obtained by the (t − 1)-th trial to generate the optimal proposal for (t + 1)-th
trial, a strategy we refer to as ‘one-trial-lag ADO’. Figure 2.5 describes how one-trial-
lag ADO works. For example, the first trial uses an ADO proposal that is derived by
the prior distribution of model parameters, whereas the second trial uses randomly
generated designs since the neural estimates from the first trial are not available at
this point. During the second trial, single-trial neural activation of the first trial is
estimated and used together with behavioral data to compute the optimal design for
the third trial. Similarly at the third trial, ADO uses the data obtained by the second
trial (green blank rectangle) to generate the optimal proposal for the fourth trial.
We can of course simplify the implementation of one-trial-lag ADO using randomly
36
(Single-trial beta estimation +)ADO
Randomized design
(Prior)
Trial #1 Trial #2 Trial #3
0 10 20 30 40
Time (second)
BOLD responses for the first trialBOLD responses for the second trialBOLD responses for the third trialConvolved BOLD resopnsesStimulus onset
Response interval
Stimulus duration
Design proposal
Figure 2.5: An illustration of ‘one-trial-lag ADO’. Dotted lines (red, blue, green) referto hypothetical hemodynamic responses evoked by a stimulus within each trial, anda straight line (gray) shows the expected value of convolved hemodynamic responses.The squares below the x-axis specifies the length of intervals required for each step.
37
generated designs for the first few trials, which is the strategy used in the real-world
experiment (Chapter 4). However, we first tested the performance of the most ideal
implementation.
One-trial-lag ADO relieves us from burdensome computational time when acquir-
ing single-trial beta estimates. Therefore, when single-trial beta estimates are the
main consideration in ADO, implementation of one-trial-lag ADO may be worth con-
sidering. The performance of one-trial-lag ADO is validated in Section 3.3.3.
38
Chapter 3: Simulation Experiments
In this chapter, we test the performance of fMRI-based ADO within a simulated
environment. First, we introduce the contrast discrimination task and develop a joint
model that explains the discrimination processes using both neural and behavioral
data. Next, we discuss simulation experiments for testing the performance of ADO
at different levels of generalizability. Lastly, we discuss the result of simulations and
its implications.
3.1 A Proof-of-Concept Study: Contrast Discrimination
In this section, we will introduce a task environment and propose a joint model
that explains decision processes embedded in the task. As a proof-of-concept study, we
selected a contrast discrimination task (Boynton et al., 1999) because formal models
describing neural activities for contrast stimuli have been well studied (DiMattina,
2016). Moreover, the task has been used in psychology and cognitive neuroscience
for studying human visual perception (Boynton et al., 1999) and the mechanism of
attention (Li, Lu, Tjan, Dosher, & Chu, 2008).
39
Fixation1 seconds Stimulus 1
6 seconds(flickering at 4Hz)
Interstimulusinterval
6-10 seconds(mean: 8 seconds)
Stimulus 26 seconds
(flickering at 4Hz) Fixation1 second Response
Same duration with the interstimulus
interval
Figure 3.1: An illustration of the trial structure of the contrast discrimination task.
3.1.1 Task
A contrast discrimination task uses a Gabor patch as its stimulus. A Gabor
patch is defined by a sinusoid convolved with a two-dimensional Gaussian function
on a two-dimensional space, and is represented as black-white stripes overlaid by a
circular mask gradually blocked as the distance from the center of the patch increases.
Having a higher contrast level means that borderline between black and white stripes
are clear.
Figure 3.1 presents an example of the task structure of the contrast discrimination
task based on a two-forced-alternative-choice paradigm. Given two grating stimuli
with different contrast levels, a participant is instructed to answer which of two stimuli
is of higher contrast. We assume that two grating stimuli will be presented consecu-
tively because simultaneous presentation may complicate discriminating signals from
each stimulus. We assumed sequential presentation of stimuli to allow a long enough
40
interstimulus interval to avoid excessive superimposition of hemodynamic responses
from two stimuli. Note that we used sequential presentation in the real-world exper-
iment (Chapter 4) as well.
3.1.2 Model
We propose a joint model that explains decision processes embedded in the con-
trast discrimination task. A neural submodel will explain how contrast level of a
grating annulus evokes neural activation in early visual cortex. Meanwhile, a be-
havioral submodel will generate a behavioral response from the neural activation
amplitudes. In addition, we assume that the neural activity will directly guide a
behavioral response (i.e., directed joint model).
Neural Submodel
First, we need to define a neural submodel that maps the contrast level of a
stimulus to the amplitude of single-trial neural activation level. Boynton et al. (1999)
found that the amplitude of BOLD responses tends to increase with the contrast level.
Here, we use Naka-Rushton equation to model the relationship between contrast levels
and associated neural response (DiMattina, 2016; Li et al., 2008):
β(c) = b+Rmaxc
2
c250 + c2
(3.1)
where c ∈ (0, 1) is the contrast level, b is the baseline neural activation, Rmax is the
maximum activation level achieved above the baseline b and c50 is the contrast level
which evokes half the maximum level of activation.
Given two consecutive stimuli with different contrast values c1 and c2 , we assume
that β1(c1) and β2(c2) predicted by the Naka-Rushton equation are compared and
41
lead the following decision scheme:Stimulus 1 has a higher contrast level if β2 > β1
Stimulus 2 has a higher contrast level if β2 < β1
.
However, estimated neural activation for each stimulus may have noise and therefore
might not perfectly match the prediction of the neural submodel. This assumption is
reasonable because stimulus-wise beta estimates are known to have large variability
(Abdulrahman & Henson, 2016; Mumford et al., 2012). Hence, we assume that actual
estimates of single-trial neural activation are considered as samples from a Gaussian
distribution with the prediction of the Naka-Rushton equation as its mean:
βi ∼ N(βi, σ2β) (i = 1, 2) (3.2)
where i is the stimulus order index, and N(µ, σ2) means a normal distribution with
a mean µ and a standard deviation σ.
Behavioral Submodel
Our assumptions about the behavioral submodel are twofold: (1) the model must
be able to explain variability in behavioral responses such as response error, and (2)
the response must be made based on comparison between contrast levels of two grat-
ing stimuli. Thurstone (1927) provided the first statistical formalization of compara-
tive judgment serving both assumptions, which will be referred to as a Thurstonian
decision model henceforth. He assumed that when a stimulus φ is mapped onto a
psychological scale as ψ, it is represented as a Gaussian distribution to reflect the
uncertainty of the psychological effect:
ψi ∼ N(ψi, s2i ) (i = 1, 2), (3.3)
42
where i is the stimulus order, ψi is the actual mental representation of a stimulus φi,
ψi is the (theoretically assumed) accurate representation of φi, and si is a standard
deviation term for the stimulus i representing uncertainty of the mental representa-
tion. Comparing the intensities of two stimuli φ1 and φ2, variability in comparative
judgment can be explained by a distribution of the difference between ψ1 and ψ2, the
mental representation of the intensities of the two stimuli. The difference distribution
can be derived by the following statistical principle:
ψ2 − ψ1 ∼ N(ψ2 − ψ1,√s2
1 + s22
2
).
The response probability p that a participant choose the second stimulus as having
higher contrast therefore computed as
p = 1− Φ∗(0; ψ2 − ψ1,√s2
1 + s22
2
) =
∫ ∞0
N(x; ψ2 − ψ1,√s2
1 + s22
2
)dx
where Φ∗(·;µ, σ2) is a cumulative distribution function of a Gaussian distribution
with a mean µ and a standard deviation σ. Finally, a behavioral response y based on
a Bernoulli distribution as
y ∼ Bernoulli(p).
Linking Function
As the last step, we will connect the neural and behavioral model to ensure that
the behavioral decision process is informed by neural activation. For this purpose,
the distribution of single-trial neural activation in Equation 3.2 will substitute the
distribution of mental representation (i.e., Equation 3.3). We consider stimulus-wise
neural activation estimates (βi) as an internal representation of grating stimuli (ψi).
The predictions from the Naka-Rushton equation (βi) substitutes the “accurate” rep-
resentation of the stimulus (ψi). The neural representation βi is normally distributed
43
with a mean βi and a standard deviation σβ; therefore, the standard deviation term
in the Thurstonian model (si) is replaced by σβ. Note that we use the same standard
deviation across all stimulus intensities (i.e., contrast levels).
Hence, the linking function can be expressed as follows:
β2 − β1 ∼ N(β2 − β1, (
√2σβ)2
),
p = 1− Φ∗(0; β2 − β1, (
√2σβ)2
)=
∫ ∞0
N(x; β2 − β1, (
√2σβ)2
)dx (3.4)
This linking function implies that if the neural activation level of the second stimulus
is higher than that of the first stimulus, the response probability p increases, which
means that the response supporting the second stimulus becomes more likely.
Likelihood Function
Given a matrix of presented contrast levels Ck×2 = cij, a matrix of single-trial
neural activation estimates Bk×2 = βij, and a behavioral response vector y = yi
(i = 1, · · · , k, j = 1, 2), the likelihood function of the joint model is described as
βij = b+Rmaxc
2ij
c250 + c2
ij
,
L(b, Rmax, c50, δ|C,B,y) =k∏i=1
[2∏j=1
N
(βij; βij,
( δ√2
)2)×
1− Φ∗(0; βi2 − βi1, δ2)yi×
Φ∗(0; βi2 − βi1, δ2)1−yi
](3.5)
where βij and βij are the prediction and the actual observation of single-trial neural
activation for the j-th stimulus of the i-th trial, and k is the total number of trials.
Note that the standard deviation of the difference distribution (and therefore that
44
of the single-trial neural activation distribution) is reparameterized (i.e., δ =√
2σβ).
This reparameterization is to stress that the variability in the difference distribution
explains the variability in a behavioral response.
Figure 3.2 presents a diagram of the joint model developed throughout this sec-
tion. Each node in the diagram represents a variable. Shaded circles are observed
data (i.e., stimulus-wise neural activation estimates, behavioral responses), whereas
empty circles are model parameters (i.e., b, Rmax, c50, δ), a design variable (i.e., cij),
and their transformation (i.e., βij, pi). Double-border and single-border circles repre-
sent deterministic and stochastic variables, respectively. The outer and inner plates
represent variables associated with each trial and stimulus, respectively.
3.2 Design and Procedure
Three simulations were performed to test the performance of ADO in a simulated
environment. In each simulation, an experiment collects the data for estimating
parameters of the joint model introduced in 3.1.2:
βij ∼ N
(βij,
( δ√2
)2)
where βij = b+Rmaxc
2ij
c2ij + c2
50
,
yi ∼ Bernoulli(
1− Φ∗(0; βi2 − βi1, δ2)). (3.6)
Here i = 1, · · · , 20 is the trial index, and j = 1, 2 is the stimulus order. (b, Rmax, c50, δ)
and cij ∈ (0, 1] refer to a parameter set and a contrast level used for generating the
prediction of a neural response βij from the Naka-Rushton equation. βij are the
single-trial neural activation level estimated from a raw BOLD time-series, and yi is
a behavioral response. Φ∗(x;µ, σ2) is a cumulative distribution function of a normal
distribution with mean µ and standard deviation σ.
45
Stimulus order: 𝑗 = 1, 2Trial: 𝑖 = 1,⋯ , 𝑇
𝛽*+,𝑐+,
𝑐./𝑅123𝑏
𝛽+,
𝛿
𝑦+𝑝+
Figure 3.2: A graphical representation of the joint model for contrast discrimination.Each node in the model represents a variable. Filled circles are observed data (i.e.,stimulus-wise neural activation estimates, behavioral responses), whereas empty cir-cles are model parameters (i.e., b, Rmax, c50, δ), a design variable (i.e., cij), and their
transformation (i.e., bij, pi). Double-line circles are deterministic variables, whereassingle-line circles are stochastic variables. The outer and inner plates represent vari-ables associated with each trial and stimulus, respectively.
46
The parameters of interest are three shape parameters of the Naka-Rushton equa-
tion (i.e., b, Rmax, c50) and a standard deviation of the difference distribution in the
Thurstonian decision model (i.e., δ). We compare the performance of ADO-based
experiments and randomized-design based experiments in terms of accuracy of pa-
rameter estimation and precision of the posterior distribution.
The first simulation tests the performance of ADO in terms of accuracy and preci-
sion of parameter estimation with a fixed “true” parameter set. The second simulation
extends the first simulation by using randomly generated parameter sets so that the
result of the first experiment can be generalized to any parameter settings. The third
simulation validates one-trial-lag ADO discussed in Section 2.7.3.
In the simulations, we assume that stimulus-level beta estimates are already ac-
quired from averaged BOLD responses across voxels in the ROI. This assumption
is made to control the randomness of data generation processes only by the Naka-
Rushton equation.
3.2.1 Simulation 1: Fixed Parameters
The purpose of the first simulation is to test the performance of fMRI-based ADO
with a fixed “true” parameters. The target parameter set is set as (b, Rmax, c50, δ) =
(0.05, 1, 0.35, 0.2). Figure 3.3 describes the shape of the Naka-Rushton equation and
its 95% credible interval from the given parameter set. Considering the neural vari-
ability imposed by a normal distribution (Equation 3.4), we expect that 95% of the
single-trial beta estimates at a specific contrast level are located within the 95%
credible interval.
47
0.0 0.2 0.4 0.6 0.8 1.0
−0.
50.
00.
51.
01.
5Mean activation function95% credible interval
Neu
ral a
ctiv
atio
n
Contrast
Figure 3.3: The shape of Naka-Rushton equation with (b, Rmax, c50, δ) =(0.05, 1, 0.35, 0.2). The x-axis refers to the contrast level, while y-axis is expectedneural activation (i.e., single-trial beta estimates). The solid line is the expected neu-ral expectation from the three shape parameters (b, Rmax, c50). b and Rmax determinethe lower and upper asymptotes of the graph, whereas c50 affects the slope of thegraph. δ controls the width of the credible interval.
48
Variable DetailsThe number of experiments 100
The number of trials 20Stimulus
(Rounded to 3 decimal places)0.010, 0.017, 0.028, 0.046, 0.077,0.129, 0.215, 0.359, 0.599, 1.000
Model parameters (b, Rmax, c50, δ) = (0.05, 1, 0.35, 0.2)
Prior
b (-3, 5)Rmax (-3, 5)c50 (0, 1)δ (0.0001, 5)
Initial grid setting
b -1, -0.7, -0.4, -0.1, 0.2Rmax 0.8, 0.9, 1, 1.1, 1.2c50 0.25, 0.3625, 0.475, 0.5875, 0.7δ 0.25, 0.4375, 0.625, 0.8125, 1
Neural response0.00, 0.11, 0.22, 0.33, 0.44,0.56, 0.67, 0.78, 0.89, 1.00
Grid sizeDesign space 90 = 102 − 10
Parameter space 625 = 54
Response space 50 = 52 × 2
DE-MCMC
Chains 24Burn-in samples 200
Valid posterior samples 800Migration probability 0.1
Dynamic GriddingMethod Singular value decompositionSchedule After every trialPercentile (20%, 35%, 50%, 65%, 80%)
Table 3.1: Default settings of Simulation 1.
Table 3.1 summaries the default settings for Simulation 1. We performed the
contrast discrimination experiment 100 times. Each experiment consists of 20 trials.
All parameters have uniform prior distributions as follows for evaluating posterior
49
densities and global utilities:
b ∼ U(−3, 5),
Rmax ∼ U(−3, 5),
c50 ∼ U(0, 1),
δ ∼ U(0.0001, 5).
Although we could implement different prior settings such as (truncated) normal
distributions for (b, Rmax, c50) and an inverse gamma distribution for δ, uniform priors
allow faster computation.
Ten contrast levels are defined and used in the experiment by logarithmically spac-
ing the interval [0.01, 1]. The size of the design space is the total number of contrast
combinations within a trial (i.e., 100 grid points); however, we exclude designs that
allow the two stimuli to have the same contrast level (i.e., 10 grid points). Therefore,
the total size of the design space is 90 grid points. Figure 3.4 illustrates the design
space represented in a linear scale (left) and a logarithmic scale (right).
Initial grid points in the four-dimensional parameter space represents the initial-
ization of the joint posterior distribution of parameters. Each parameter dimension
sets five grid points (i.e., (b, Rmax, c50, δ)), totaling 625 grid pointsl. Grid points in
each dimension are evenly spaced given the initial settings of the minimum and the
maximum (i.e., b ∈ [−1, 0.2], Rmax ∈ [0.8, 1.2], c50 ∈ [0.25, 0.7], δ ∈ [0.25, 1]). At the
end of every trial, a dynamic gridding procedure adaptively adjusts the distribution
of the parameter grid points.
50
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
−4 −3 −2 −1 0
−4
−3
−2
−1
0
1st contrast
2nd
cont
rast
Linear scale Logarithmic scale
Figure 3.4: The design space of the contrast discrimination experiment. The spaceconsists of 90 pairs of contrast levels for the first (x-axis) and second (y-axis) stimuli.The gray dots are individual designs that can be sampled during the experiment. Theleft and right plots represent the same design space in a linear scale and a logarithmicscale, respectively.
51
A three-dimensional response space defines grid points for expected neural and
behavioral responses. The first two dimensions represent the expected neural activa-
tion (i.e., single-trial beta estimates) evoked by the first and second grating stimuli;
the third dimension represents the expected behavioral response. For each neural re-
sponse dimension, ten grid points were set by evenly dividing the interval [0, 1]. The
behavioral response dimension consists of two grid points (i.e., [0, 1]) as we assume
that the behavioral responses are Bernoulli trials.
3.2.2 Simulation 2: Randomly Generated Parameters
The second simulation is designed to test whether or not the performance of
ADO remains stable with various combinations of the “true” parameters. 30 sets
of parameters were generated by uniformly sampling from b ∈ [−0.2, 0.5], Rmax ∈
[1.0, 2.2], c50 ∈ [0.1, 0.6] and δ ∈ [0.2, 0.6]. As the performance of ADO may depend
on not only the true parameters but also randomly generated responses, we use each
parameter set 10 times to include the variability in the data generation process.
Therefore, Simulation 2 consists of 300 experiments in total. Table 3.2 and Figure
3.5 present a complete list and graphical illustrations of the parameter sets used in
Simulation 2.
We changed the initial grid settings for the parameter space by adjusting the
minimum and maximum of grid points for each dimension: b ∈ [−2, 2], Rmax ∈ [0.5, 3],
c50 ∈ [0.05, 0.95], δ ∈ [0.001, 1.2]. The grid space for expected neural responses was
also updated by evenly dividing the interval [0, 2] into 10 grid points. All other
settings are the same with Simulation 1. Table 3.3 provides the summary of the
default settings in Simulation 2.
52
−1.0 −0.5 0.0
1.0
1.4
1.8
2.2
b
Rm
ax
−1.0 −0.5 0.0
0.2
0.4
0.6
0.8
b
c 50
−1.0 −0.5 0.0
0.2
0.4
0.6
0.8
1.0
b
δ
1.0 1.4 1.8 2.2
0.2
0.4
0.6
0.8
Rmax
c 50
1.0 1.4 1.8 2.2
0.2
0.4
0.6
0.8
1.0
Rmax
δ
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
1.0
c50
δ
Figure 3.5: The scatter plots of the parameter sets used in Simulation 2. Black dotsin each plot indicate the values of (b, Rmax) (upper left), (b, c50) (upper center), (b, δ)(upper right), (Rmax, c50) (lower left), (Rmax, δ) (lower center), and (c50, δ) (lowerright).
53
SetParameter values
SetParameter values
b Rmax c50 δ b Rmax c50 δ1 -0.654 1.551 0.423 0.487 16 -1.028 2.347 0.447 0.2552 -0.690 1.020 0.791 0.979 17 0.015 1.892 0.522 0.2253 -0.857 1.444 0.794 0.658 18 -1.072 1.781 0.178 0.8124 0.094 2.363 0.594 0.359 19 -1.068 1.719 0.644 0.3905 0.096 1.112 0.659 0.433 20 -0.152 1.284 0.471 0.9926 -0.469 1.167 0.509 0.327 21 -0.212 1.709 0.624 0.9017 -0.904 1.991 0.333 0.850 22 -0.164 1.240 0.727 0.9188 0.342 1.794 0.556 0.395 23 -1.015 1.086 0.620 0.8359 -1.025 1.310 0.168 0.606 24 -0.579 2.063 0.283 0.77010 -1.344 2.184 0.301 0.400 25 0.071 1.814 0.300 0.34811 -0.372 2.293 0.796 0.347 26 -1.284 1.420 0.869 0.30612 -1.179 1.711 0.207 0.868 27 -0.849 1.625 0.577 0.39013 0.009 2.352 0.713 0.610 28 -0.704 1.600 0.879 0.59414 -1.145 2.008 0.387 0.736 29 0.172 1.039 0.429 0.22315 -1.378 2.063 0.774 0.812 30 -0.353 1.321 0.432 0.651
Table 3.2: A list of 30 parameter sets used in Simulation 2. Parameter values arerounded up to three decimal places.
3.2.3 Simulation 3: One-trial-lag ADO
The third simulation aims to validate the practicability of one-trial-lag ADO. As
described in Section 2.7.3, the experiment starts with an ADO proposal derived by
the prior distribution. However, the second trial uses a randomly generated design
considering the time constraint for estimating the single-trial neural activation at the
first trial. At the end of trial t (t ≥ 2), an optimal design for the (t + 1)-th trial is
proposed by the single-trial beta estimates and behavioral responses obtained by the
(t−1)-th trial. Figure 2.5 offers the visualization of the one-trial-lag ADO procedures
used in Simulation 3.
54
Variable DetailsThe number of experiments 10 per each parameter set
The number of trials 20Stimulus
(Rounded to 3 decimal places)0.010, 0.017, 0.028, 0.046, 0.077,0.129, 0.215, 0.359, 0.599, 1.000
Prior
b (-3, 5)Rmax (-3, 5)c50 (0, 1)δ (0.0001, 5)
Initial grid settings
b -2, -1, 0, 1, 2Rmax 0.5, 1.125, 1.75, 2.375, 3c50 0.05, 0.275, 0.5, 0.725, 0.95δ 0.001, 0.30075, 0.6005, 0.90025, 1.2
Neural response0, 0.22, 0.44, 0.67, 0.89,1.11, 1.33, 0.56, 0.78, 2
Grid sizeDesign space 90 = 102 − 10
Parameter space 625 = 54
Response space 50 = 52 × 2
DE-MCMC
Chains 24Burn-in samples 200
Valid posterior samples 800Migration probability 0.1
DynamicGridding
Method Singular value decompositionSchedule After every trialPercentile (20%, 35%, 50%, 65%, 80%)
Table 3.3: Default settings in Simulation 2
The basic settings including the “true” parameter sets follow Simulation 2 except
for the schedule of dynamic gridding. In Simulation 3, we apply dynamic gridding only
after the 4th, 8th, 12th, and 16th trials because the limited amount of data in one-
trial lag optimization might misguide grid updating. In the one-trial-lag ADO setting,
the grid updating procedure does not exploit the available data because it relies on
the joint posterior distribution estimated from the data lacking the latest one trial.
Updating grid points every trial is risky as we cannot rule out the the possibility that
55
the “deficient” posterior distribution may not include the “true” parameters. Sparse
dynamic gridding schedule is a reasonable choice here because the schedule will force
the grid-updating algorithm to wait until an enough amount of data is collected.
3.2.4 Procedure
At each trial, two single-trial beta estimates and one behavioral choice response
corresponding to them were generated by the joint model discussed in 3.1.2 (Equation
3.6). The design is selected by the proposal of ADO in ADO-based experiments, and
uniform sampling without replacement from the design space (Figure 3.4).
Once the neural and behavioral responses were acquired, posterior distributions
were updated using the DE-MCMC sampler (B. M. Turner, Sederberg, et al., 2013).
The 24 chains were initialized by multinomial sampling from the parameter grid points
with probability of the posterior densities of the grid points. We ran the algorithm
for 1,000 iterations and discarded the first 200 steps as burn-in. The migration was
applied during the first 101 iterations with probability of 0.1.
Dynamic gridding adjusted the parameter grid space according to its predeter-
mined schedule. The estimated joint posterior distribution was rotated to a four-
dimensional orthogonal space by multiplying a rotation matrix obtained by singular
value decomposition. New grid points were defined at the 20th, 35th, 50th, 65th, and
80th percentiles of the marginal distribution of each dimension. Newly constructed
grid points in the orthogonal space were reversed to the original parameter space by
multiplying an inverse rotation matrix to them.
We defined measures of accuracy and precision of posterior estimates by root mean
square error (RMSE) and standard deviation (PSD) of the posterior distribution.
56
Specifically, parameter-wise performance measures (RMSEi,t and PSDi,t) and pooled
performance measures (RMSEpooled,t and PSDpooled,t) were calculated at each trial t
as follows:
RMSEi,t =
√∑1000k=201(xijkt − θi)2
800,
RMSEpooled,t =
√√√√1
4
4∑i=1
RMSE2i,t,
PSDi,t =
√∑1000k=201(xijkt − xi·)2
800,
PSDpooled,t =
√√√√1
4
4∑i=1
PSD2i,t (3.7)
where θ = (θ1, θ2, θ3, θ4) ≡ (b, Rmax, c50, δ) is a set of “true” parameters assumed in
each simulation, and xijkt is a value of the j-th chain of the DE-MCMC sampler for
the parameter θi at the k-th iteration.
3.3 Results
3.3.1 Simulation 1: Fixed Parameters
Figures 3.6 and 3.7 show the performance measures (i.e., RMSE, PSD) pooled
across parameters (Figure 3.6) and computed for each parameter (Figure 3.7), re-
spectively. In both plots, red lines refer to the performance measures from the ADO
experiments (solid lines) and their 95% credible interval (dotted lines), whereas black
lines refer to those from the randomized design experiments (solid lines) and their 95%
credible interval (dotted lines). Lower values are preferred in both RMSE and PSD
because smaller RMSE and PSD mean higher accuracy and precision in parameter
estimation.
57
log(
RM
SE
)
log(
Pos
terio
r S
D)
−2.
0−
1.0
0.0
1.0
log(
qual
.rep
.sum
[, 2:
3, 1
, 2])
ADORandomized
−2.
0−
1.0
0.0
0.5
log(
qual
.rep
.sum
[, 2:
3, 2
, 2])
1 5 10 15 20
Trial
Figure 3.6: Pooled root mean squared error (RMSEpooled; upper) and pooled posteriorstandard deviation (PSDpooled; lower) in Simulation 1. All the performance statistics(i.e., RMSEpooled, PSDpooled) are log-transformed. Red and black lines show theperformance statistics from ADO designs and randomized designs, respectively. Solidlines represent the mean of the performance measures changing across trials. Dottedlines represent 95% credible interval of the performance measures.
58
b
Rm
ax
c 50
δ−
3−
10
log(
qual
.arr
ay[,
p, ,
j, i])
−3
−1
0lo
g(qu
al.a
rray
[, p,
, j,
i]) ADORandomizedADORandomized
−2
01
log(
qual
.arr
ay[,
p, ,
j, i])
−2.
5−
1.0
0.5
log(
qual
.arr
ay[,
p, ,
j, i])
−3.
0−
2.0
−1.
0lo
g(qu
al.a
rray
[, p,
, j,
i])
−3.
0−
2.0
log(
qual
.arr
ay[,
p, ,
j, i])
−3
−1
log(
qual
.arr
ay[,
p, ,
j, i])
1 5 10 15 201 5 10 15 20
−4
−2
0lo
g(qu
al.a
rray
[, p,
, j,
i])
1 5 10 15 201 5 10 15 20Trial
log(RMSE) log(Posterior SD)
Figure 3.7: Root mean squared error (RMSEi; left) and posterior standard deviation(PSDi; right) for each parameter (b, Rmax, c50, δ from top to bottom) in Simulation1. All the performance statistics (i.e., RMSEi, PSDi) are log-transformed. Redand black lines show the performance statistics from ADO designs and randomizeddesigns, respectively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval of the performancemeasures.
59
Figure 3.6 shows that ADO performs better than randomized designs in both
accuracy (RMSE) and precision (PSD). Note that the upper bound of the 95% credible
interval of the performance of ADO is in overall close or even lower than the mean
performance of the randomized-design experiments. This result suggests that ADO
experiments mostly show better accuracy and precision compared to the mean-level
performance of the randomized designs.
Figure 3.7 illustrates how differently ADO handles model parameters by showing
parameter-wise performance measures. The performances of ADO and randomized
designs do not significantly distinguish in b and δ. Especially in b, ADO shows a
similar level of accuracy and precision with randomized design for the first 4-5 trials,
but is slightly overridden by randomized designs from the 6th trial. On the contrary,
ADO performs better than randomized designs in Rmax and c50.
The performance of ADO discussed above can be explained by the sequential
pattern of the design proposals. Figure 3.8 compares the designs proposed by ADO
(upper row) and randomized sampling (lower row). A 20-trial experiment is divided
into four subplots containing design information of five trials for each (Trials 1-5,
6-10, 11-15, and 16-20 from left to right). The shades colored in red (ADO) and gray
(randomized designs) under each design point (black dots) represents how frequently
a specific design was selected. A design point is colored with a darker shade if the
point is selected more frequently.
In the randomized-design experiments (lower row), selected designs are distributed
nearly uniformly over the design space without any regularity, and this pattern does
not change as the experiments proceeds. However, in the ADO experiments (upper
row), the distribution of selected designs is concentrated on a few points and this
60
AD
O
Ran
dom
ized
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(prop.array[(5 * j − 4):(5 * j), 1, , i])
log(
prop
.arr
ay[(
5 *
j − 4
):(5
* j)
, 2, ,
i])
log(Contrast 1)
log(Contrast 2)
1−5 6−10 11−15 16−20
Figure 3.8: A trace plot of experimental designs from ADO (upper row) and random-ized sampling (lower row) in Simulation 1. A sequence of 20 trials were segmentedinto four intervals (Trials 1-5, 6-10, 11-15, and 16-20 from left to right). The x-axisand y-axis of each subplot represent the contrast level of the first and second stimuli,respectively. Black dots represent individual design points. Shaded regions representactually selected designs; more frequently selected designs have darker shades. Thescale is intentionally omitted for simplicity. See Figure 3.4 for the detailed informationabout the scale.
61
pattern changes over trials. For example, during the first five trials (the first left
subplot on the upper row), ADO focuses on the lowest-contrast pairs (points in the
left lower side of the subplot) or the highest-contrast pairs (points in the right upper
side of the subplot). In fact, those designs are optimal selections for estimating b and
Rmax. However, ADO gradually moves its attention to the mid-range contrast values
to estimate c50. Note that the “mid-range” contrast values that ADO searches the
most (i.e., the most reddish shades in the first and second right subplots) include the
contrast levels of 0.215 or 0.359, as they are the contrast values that are closest to the
predefined c50 = 0.35. At the end of the experiment, ADO continues selecting at least
one mid-level contrast value to collect more information about c50 while frequently
including either the lowest or the highest contrast level to improve the estimation of
b and Rmax.
3.3.2 Simulation 2: Randomly Generated Parameters
As in 3.3.1, Figures 3.9 and 3.10 show the performance measures (i.e., RMSE,
PSD) pooled across parameters and those computed for each parameter, respectively.
As the performance statistics from 300 simulations with different “true” parameter
sets are aggregated, the performance of ADO and randomized designs are less differ-
entiated compared to the result of Simulation 1. However, ADO-based experiments
still tend to perform better than randomized-design experiments in both mean accu-
racy (RMSE) and precision (PSD). The parameter-wise performances follow a similar
pattern as that in Simulation 1: ADO overrides randomized designs when estimating
Rmax and c50, while showing a similar level of performance with randomized designs
for δ and allowing overtaking for b.
62
log(
RM
SE
)
log(
Pos
terio
r S
D)
−2.
0−
1.0
0.0
1.0
log(
qual
.rep
.sum
[, 2:
3, 1
, 2])
ADORandomized
−2.
5−
1.5
−0.
50.
5lo
g(qu
al.r
ep.s
um[,
2:3,
2, 2
])
1 5 10 15 20
Trial
Figure 3.9: Pooled root mean squared error (RMSEpooled; upper) and pooled posteriorstandard deviation (PSDpooled; lower) in Simulation 2. All the performance statistics(i.e., RMSEpooled, PSDpooled) are log-transformed. Red and black lines show theperformance statistics from ADO designs and randomized designs, respectively. Solidlines represent the mean of the performance measures changing across trials. Dottedlines represent 95% credible interval of the performance measures.
63
b
Rm
ax
c 50
δ−
3−
10
log(
qual
.arr
ay[,
p, ,
j, i])
−3
−1
0lo
g(qu
al.a
rray
[, p,
, j,
i]) ADORandomizedADORandomized
−2
01
log(
qual
.arr
ay[,
p, ,
j, i])
−3.
0−
1.5
0.0
log(
qual
.arr
ay[,
p, ,
j, i])
−4
−2
log(
qual
.arr
ay[,
p, ,
j, i])
−4
−2
log(
qual
.arr
ay[,
p, ,
j, i])
−3
−1
1lo
g(qu
al.a
rray
[, p,
, j,
i])
1 5 10 15 201 5 10 15 20
−4
−2
0lo
g(qu
al.a
rray
[, p,
, j,
i])
1 5 10 15 201 5 10 15 20Trial
log(RMSE) log(Posterior SD)
Figure 3.10: Root mean squared error (RMSEi; left) and posterior standard deviation(PSDi; right) for each parameter (b, Rmax, c50, δ from top to bottom) in Simulation2. All the performance statistics (i.e., RMSEi, PSDi) are log-transformed. Redand black lines show the performance statistics from ADO designs and randomizeddesigns, respectively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval of the performancemeasures.
64
−3 −2 −1 0 1
−3
−2
−1
01
Trial # 2Trial # 4Trial # 8Trial # 13Trial # 20
−3 −2 −1 0 1ADO
Ran
dom
ized
log(RMSE) log(Posterior SD)
Figure 3.11: The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 2. The x-axis and y-axis refer to the value of performancestatistics (i.e., RMSE, PSD) in the ADO experiments and randomized design exper-iments. Each trial is color-coded for visual clarity (Red: Trial 2, Orange: Trial 4,Green: Trial 8, Blue: Trial 13, Purple: Trial 20). Colored dots represent the perfor-mance statistics from individual simulations. Solid lines represent the 80% highestdensity regions.
65
Because Simulation 2 includes 30 different “true” parameter sets, it is difficult to
show the trace of design proposals for all simulations. Instead, Figure 3.11 illustrates
the same result from the 300 simulations separately as a scatter plot. For each
experiment, we compared log-transformed RMSE (left) and PSD (right) obtained
from the ADO experiment (x-axis) and from the randomized-design experiment (y-
axis). Each experiment is color-coded to differentiate the performance measures at
different trials: dots colored in red, orange, green, blue, and purple represent the
result after the 2nd, 4th, 8th, 13th, and 20th trials, respectively. Solid lines represent
80% credible regions obtained by two-dimensional kernel density estimation. Points
located at the shaded region are preferred because the performance measures obtained
from ADO are smaller than those from randomized designs.
Figure 3.11 shows that more points are located within the shaded area, especially
when the experiment is still in its earlier trials (i.e., the 4th and 8th trials). The
performance of the randomized-design experiments improves as the experiment pro-
ceeds, and finally becomes similar with that of ADO at the end of the experiment (i.e.,
purple points are located near the gray-dashed identity line). This result is expected
because accuracy and precision of parameter estimation will improve as long as we
keep data collection, regardless of how optimal the designs are. However, we should
consider it important that ADO drives the first few trials to guide data collection
procedures more efficiently.
Figure 3.12 depicts the proportion of the experiments that ADO performs better
randomized designs across trials (i.e., the proportion of the points located at the
shaded area in Figure 3.11 for each trial). The performance of ADO reaches at its
peak around the third or fourth trial, and starts to decrease gradually after that.
66
0.0
0.2
0.4
0.6
0.8
1.0
Trial
Pro
port
ion:
AD
O w
ins
1 5 10 15 20
RMSEPosterior SD
Figure 3.12: The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 2. Points located at the shaded areaare preferred. The accuracy and precision at each trial are represented as a red circleand a blue square, respectively.
67
However, more than half of the ADO experiments perform better the randomized-
design experiments even at the 20th trial.
3.3.3 Simulation 3: One-trial-lag ADO
Figures 3.13 and 3.14 show the performance measures (i.e., RMSE, PSD) pooled
across parameters and those computed for individual parameters, respectively. In
short, the performance of ADO shows similar patterns with Simulation 2 (Figures
3.9 and 3.10): ADO overrides randomized designs mainly because of its selective
optimization on Rmax and c50.
Figures 3.15 and 3.16 summarizes the result of 300 simulations with different
“true” parameters in Simulation 3. The result is similar with that from Simulation
2: the performance of ADO reaches at its peak around the third trial and more
than 70% of the ADO experiments supersede the randomized-design experiments
for next 4-5 trials. Although the proportion of the experiments that ADO shows
better performance gradually decreases as the number of trial increases, more than
half of the ADO experiments still show better performance the randomized-design
experiments at the 20th trial. To summarize, the result of Simulation 3 reveals
that the performance of one-trial-lag ADO is comparable to that of the ideal ADO
implementation without lagging.
3.4 Discussion
In this section, we first introduced the contrast discrimination task for a proof-
of-concept, and developed a directed joint model combining the Naka-Rushton equa-
tion and a Thurstonian decision model as its neural and behavioral submodels. We
then performed three simulation experiments to verify that ADO can generate design
68
log(
RM
SE
)
log(
Pos
terio
r S
D)
−2.
0−
1.0
0.0
1.0
log(
qual
.rep
.sum
[, 2:
3, 1
, 2])
ADORandomized
−2.
5−
1.5
−0.
50.
5lo
g(qu
al.r
ep.s
um[,
2:3,
2, 2
])
1 5 10 15 20
Trial
Figure 3.13: Pooled root mean squared error (RMSEpooled; upper) and pooled pos-terior standard deviation (PSDpooled; lower) in Simulation 3. All the performancestatistics (i.e., RMSEpooled, PSDpooled) are log-transformed. Red and black linesshow the performance statistics from ADO designs and randomized designs, respec-tively. Solid lines represent the mean of the performance measures changing acrosstrials. Dotted lines represent 95% credible interval of the performance measures.
69
b
Rm
ax
c 50
δ−
3−
10
log(
qual
.arr
ay[,
p, ,
j, i])
−3
−1
0lo
g(qu
al.a
rray
[, p,
, j,
i]) ADORandomizedADORandomized
−2
01
log(
qual
.arr
ay[,
p, ,
j, i])
−3.
0−
1.5
0.0
log(
qual
.arr
ay[,
p, ,
j, i])
−4
−2
log(
qual
.arr
ay[,
p, ,
j, i])
−4
−2
log(
qual
.arr
ay[,
p, ,
j, i])
−3
−1
1lo
g(qu
al.a
rray
[, p,
, j,
i])
1 5 10 15 201 5 10 15 20
−4
−2
0lo
g(qu
al.a
rray
[, p,
, j,
i])
1 5 10 15 201 5 10 15 20Trial
log(RMSE) log(Posterior SD)
Figure 3.14: Root mean squared error (RMSEi; left) and posterior standard deviation(PSDi; right) for each parameter (b, Rmax, c50, δ from top to bottom) in Simulation3. All the performance statistics (i.e., RMSEi, PSDi) are log-transformed. Redand black lines show the performance statistics from ADO designs and randomizeddesigns, respectively. Solid lines represent the mean of the performance measureschanging across trials. Dotted lines represent 95% credible interval of the performancemeasures.
70
−3 −2 −1 0 1
−3
−2
−1
01
Trial # 2Trial # 4Trial # 8Trial # 13Trial # 20
−3 −2 −1 0 1ADO
Ran
dom
ized
log(RMSE) log(Posterior SD)
Figure 3.15: The scatter plot of log-transformed RMSE (left) and log-transformedPSD(right) in Simulation 3. The x-axis and y-axis refer to the value of performancestatistics (i.e., RMSE, PSD) in the ADO experiments and randomized design exper-iments. Each trial is color-coded for visual clarity (Red: Trial 2, Orange: Trial 4,Green: Trial 8, Blue: Trial 13, Purple: Trial 20). Colored dots represent the perfor-mance statistics from individual simulations. Solid lines represent the 80% highestdensity regions.
71
0.0
0.2
0.4
0.6
0.8
1.0
Trial
Pro
port
ion:
AD
O w
ins
1 5 10 15 20
RMSEPosterior SD
Figure 3.16: The proportion of the experiments that the performance of ADO over-rides that of randomized designs in Simulation 3. Points located at the shaded areaare preferred. The accuracy and precision at each trial are represented as a red circleand a blue square, respectively.
72
proposals that maximizes information about the model parameters by incorporating
neural and behavioral data in real-time. For thorough verification, ADO was tested
by three levels of difficulty: (1) with a fixed “true” parameter set, (2) with randomly
generated “true” parameter sets, and (3) using one-trial-lag optimization.
Across all three simulations, ADO successfully proposed optimal designs and
showed better performance than randomized designs both in accuracy (i.e., RMSE)
and precision (i.e., PSD) because ADO trades off the importance of b and (Rmax, c50)
in its optimization procedures. In contrast to randomized-design experiments, ADO
decided that reducing uncertainty in Rmax and c50 is more “informative” than fo-
cusing on b. This strategy seems reasonable because Rmax and c50 are inherently
ill-constrained than b. By its definition, b as a baseline parameter is implicitly con-
ditioned to have a value near zero. However, estimation of Rmax is more difficult as
there is no theoretical assumption on the upper bound for single-trial beta estimates.
The difficulty in estimating Rmax is associated with c50 as well because c50 needs Rmax
(and b) to be constrained first for estimation. From this relationship among model
parameters, we can interpret that ADO prioritized efficient estimation of Rmax while
sacrificing accuracy and precision of b as this parameter is easier to estimate than
Rmax (and therefore c50). In short, ADO’s selective focus on experimental designs
allows more accurate and precise parameter estimation.
The trace plot of the design proposals from ADO from Simulation 1 (Figure 3.8)
shows how ADO handled the tradeoff among b, Rmax, and c50. When the experiment
begins, ADO tries to acquire information about b and Rmax first by sampling the
lowest-contrast or highest-contrast designs. However, it gradually explores mid-range
73
contrast levels to constrain c50 and successfully identifies candidate contrast levels
that are most likely to be c50.
The results from Simulation 2 and 3 suggest that the advantage of ADO-based
experiments can be generalized even with various “true” parameter sets, with a lim-
ited schedule of dynamic gridding, and with a practical constraint of one-trial-lag
optimization. These results offer practical implications when using fMRI-based ADO
in the real-world. For example, by ensuring a diverse range of parameter sets, our re-
sults provide assurance that a feasible level of performance even in real-world settings
that we don’t have any knowledge about the “true” parameter values. The similar-
ity of the ADO performances led by full and reduced dynamic gridding schedules
saves both time and computation resources because we don’t need to estimate a full
joint posterior distribution every trial. Finally, one-trial-lag optimization helps us set
up more reasonable interstimulus/intertrial intervals in ADO-based experiments. If
the performance of one-trial-lag ADO were not comparable to that of no-lag ADO, it
would be better to wait until the neural activation level from a stimulus/trial becomes
fully estimable, spending 20 seconds or more between stimuli and trials.
74
Chapter 4: Real-time fMRI Experiment
In this chapter, we test the performance of fMRI-based ADO within a real-time
fMRI experiment. First, we describe how a grating stimulus was defined and gener-
ated in the experiment, and then discuss the designs of the functional localizer and
contrast discrimination tasks. Second, we explain the real-time fMRI procedures for
ADO-based runs in detail. Third, we discuss the methods used for evaluating the
performance of ADO. Lastly, we discuss the result of the real-world experiment.
4.1 Methods
4.1.1 Participants
Four participants participated in the experiment. Each participant had three
two-hour sessions including 90-minute functional MR scanning. Two among four
participants were female, and the mean age of participants was 24.75. All participants
were recruited from The Ohio State University and provided informed consent. The
study was approved by the Institutional Review Board of The Ohio State University.
4.1.2 Stimuli
All stimuli and instructions were generated by SMILE (State Machine Inter-
face Library for Experiments; http://smile-docs.readthedocs.io/en/latest/),
75
Distance from the center of the screen (degree)
Stim
ulus
inte
nsity
7.26 4.34 2.94 0 1.74 4.34 7.26
0.00
0.50
1.00
Linear maskGrating intensity
Figure 4.1: An illustration of the linear mask applied to a grating pattern. The blackline shows the shape of the mask, while the red line describes the masked gratingpattern obtained when crossing the center of the screen horizontally.
a Python library for programming psychological experiments on a MacBook Pro
2016. Each participant laid on the scanner bed and viewed the stimuli presented
onto a rear-projection screen in the coil. Stimuli were presented at eye level at a
distance of 74cm.
Each grating stimulus was generated with spatial frequency of 3.06 cycles per
degree, and formed as an annulus not to expose the grating patterns at fovea. The
radii of the external and internal circles were 14.52 degree and 3.48 degree in visual
angle, respectively. In addition, a linear mask was applied to the annulus to allow
gradual changes in stimulus intensity, which is depicted in Figure 4.1.
The black line describes the shape of the mask: the stimulus intensity increases
from a distance of 1.74 degree reaches its maximum at a distance of 2.94 degree, and
fades gradually from a distance of 4.34 degree from the center of screen. The red
wavy line shows the actual grating pattern after the mask is applied.
76
Figure 4.2: Examples of the grating stimuli used in the experiment. The contrastlevels of the five stimuli are 0.01, 0.03, 0.1, 0.3, and 1 (from left to right).
Contrast levels are defined in the interval [0, 1]. When the contrast level is 0, the
stimulus is completely flattened and shown as a gray plane. When the contrast level
is 1, the stimulus shows a fluctuating black-white stripe pattern. Figure 4.2 shows
examples of the grating stimuli used in the experiment. The contrast levels of the
five stimuli are 0.01, 0.03, 0.1, 0.3, and 1 from left to right. The figure illustrates that
the higher contrast level allows for better discriminability between the high-intensity
and low-intensity regions.
4.1.3 Design
Main task: Contrast Discrimination
The design of the main task follows the description of 3.1.1 and Figure 3.1. A
participant was presented two consecutive grating stimuli with different contrast levels
and asked to keep fixation at a white “+” marker located at the center of a screen.
When the fixation marker changed to a response cue (a white “×” marker), the
participant was asked to answer whether the first or the second one was of higher
contrast. The participant was given two 2-button response pads to both hands, and
was instructed to use one button for each side to make a response. The response-
button association rule altered every session. For example, a participant was asked to
77
use the button in the left box to respond that the first stimulus had higher contrast
level in one session, and to use the button in the right box to make the same response
in the next session.
As in the simulation experiments, the contrast values are logarithmically spaced
with 10 levels (i.e., 0.010, 0.017, 0.028, 0.046, 0.077, 0.129, 0.215, 0.359, 0.599, 1.000).
We also restricted the design space such that no two stimuli had exactly the same
contrast (see Figure 3.4 for graphical illustration of the design space). Each run
consisted of 20 trials. The design was randomly selected in the run without ADO,
while ADO proposed the optimal design in the ADO-based run after the first three
trials. The order of the without-ADO run and ADO-based runs altered every session.
The difference between the run without ADO and with ADO is the length of
intertrial interval. ADO requires time to calculate an optimal design at the end
of every trial, and for adjusting parameter grids after the 4th, 8th, 12th, and 16th
trials. Specifically, fMRI-based ADO in this experiment requires 6-8 seconds for
proposing the optimal design and additional 4-5 seconds for full posterior estimation
and grid adjustment. Therefore, 6 seconds of the mean intertrial interval used in
the randomized-design experiment was not enough in the ADO-based run. While
the intertrial interval of the run without ADO was either 6, 8, or 10 seconds, that
of the ADO-based run was extended for 4 seconds (i.e., 10, 12, or 14 seconds). The
total length of the run without ADO was 624 seconds. The length of the ADO-
based run slightly varied every session due to the computation time required for
ADO and its subcomponents (i.e., full posterior sampling, adaptive gridding), but
took approximately 15 minutes.
78
Functional Localizer
Before running the main task, we ran a functional localizer task to detect the
voxels rigorously coactivating with the grating stimuli. The functional localizer task
was based on a continuous carry-over design (Aguirre, 2007) that controls the order
effect of the signal by considering all possible carry-over patterns from a stimulus pool.
As we can expect that the order of stimuli affect the neural activation pattern, the
continuous carry-over design can be used to detect voxels that share similar activation
patterns and the carry-over effect.
The experiment using the continuous carry-over design uses a fixed stimulus pre-
sentation order that realizes all possible configuration of carry-over patterns. Here,
we recommend making stimulus presentation settings as similar as possible to those
of the main task. For example, we set the stimulus duration (6 seconds) and the mean
interstimulus interval (8 seconds) as it was in the main task. However, generating all
possible carry-over patterns from ten contrast levels made the task length excessive
and therefore could have caused problematic issues such as participant fatigue and
scanner drift. Hence, we decided to use only five logarithmically spaced contrast lev-
els that could approximate contrast levels used in the main task (i.e., 0.01, 0.03, 0.1,
0.3, 1). The total length of the functional localizer task was 528 seconds.
In the task, the participant was instructed to press a button when the current
stimulus was of the same contrast with the previous one while maintaining fixation
at the center of the screen. However, the behavioral task served no function; it was
required only to help participants concentrate on the stimulus presentation.
79
4.1.4 Real-time fMRI Procedure
Preliminary Tasks
The participant went through a 30-minute briefing including informed consent,
safety screening, and a brief introduction about the experimental task. MRI scanning
was performed in the Center for Cognitive and Behavioral Brain Imaging at The Ohio
State University. A Siemens MAGNETOM Prisma 3T Magnetic Resonance Imaging
System was used with a 32-channel head coil.
First, the MPRAGE sequence was used for obtaining the anatomical structure of
the brain (1 × 1 × 1 mm3 resolution, inversion time = 950 msec, repetition time =
1900 msec, echo time = 4.44 msec, flip angle = 12 degree, matrix size = 256 × 224
mm, 176 sagittal slices per slab; scan time = 6.5 minutes). As we hoped to constrain
the ROI to the primary visual cortex (V1), the area to be scanned was then specified
by covering the Brodmann area 17 and most of the occipital lobe with a T2*-weighted
EPI sequence (repetition time = 2000 msec, echo time = 28 msec, flip angle = 72
degree, field of view = 200× 200 mm, in-plane resolution = 2× 2 mm, and 33 slices
with 2-mm thickness), which is referred to as the EPI space henceforth for simplicity.
All BOLD responses from the functional localizer task and the contrast discrimination
task were obtained using the EPI sequence with the same setting.
We should mention that further analyses (i.e., detecting voxels of interest, real-
time ADO computation, offline data analysis) used brain images without preprocess-
ing steps that are usually performed in offline analyses such as spatial and temporal
filtering due to its time consumption. The only exception is motion correction: the
80
MR scanner used in this experiment offers functionality for prospective motion cor-
rection – computational methods for reducing head motion artifacts during data ac-
quisition (for a recent review of prospective motion correction, see Maclaren, Herbst,
Speck, & Zaitsev, 2013).
Data preprocessing
We first carried out the functional localizer task to detect the voxels co-activating
with the presented grating stimuli. After the functional localizer task was complete,
the experimenter processed the anatomical data and the EPI localizer data for regis-
tration to the standard MNI space. However, the protocol encounters a compatibility
issue here because the MR scanner exports images as DICOM (Digital Imaging and
Communications in Medicine) files. Our further data preprocessing steps rely on FSL
(FMRIB software library; Smith et al., 2004), which requires images in a NIfTI-1 for-
mat. Therefore, we used a Python library dcmstack (http://dcmstack.readthedocs
.io/en/v0.6.1/) to transform DICOM files into the NIfTI-1 format.
Once the images were reformatted, we registered the anatomical images in the
subject space to the standard MNI brain template with nonlinear warping using
FLIRT (Jenkinson, Bannister, Brady, & Smith, 2002; Jenkinson & Smith, 2001) and
FNIRT (Andersson & Jenkinson, 2007) in FSL. Next, we aligned the EPI localizer
images to the anatomical images using FLIRT. By using the linear and nonlinear
warping obtained from the previous steps, we converted the mask for Brodmann area
17 provided by Julich histological atlas (Amunts, Malikovic, Mohlberg, Schormann,
& Zilles, 2000; Eickhoff et al., 2005) to the EPI space. As these procedures usually
take more than 7 minutes due to nonlinear registration, we asked the participant to
81
practice the contrast discrimination task for (approximately) 6 minutes to learn the
response-button mapping rule.
Determination of Voxels of Interest
The functional localizer task must detect voxels whose activation patterns are
strongly associated with stimulus presentation in the task. For selecting target voxels
in the main task, we performed a GLM analysis to all voxels in the EPI space using
the data from the functional localizer task. The GLM design matrix used only one re-
gressor representing the hemodynamic responses caused by all stimuli presented in the
functional localizer task. This GLM analysis did not consider any temporally auto-
correlated noise in the model structure because the analysis may be time-consuming.
Voxels in interest (VOIs) were determined by thresholding the t-statistic associ-
ated with the regression coefficient of the task-relevant regressor. The decision rule
is as follows: If the number of voxels with t ≥ 5 was equal to or greater than 200, we
used the threshold as t = 5. However, when this criterion was not met, we adjusted
the threshold to t ≥ 4. If 100 or more voxels passed the adjusted threshold, we ac-
cepted the threshold t = 4. If this criterion was not met again, we ran the functional
localizer task one more time and repeated the analysis. If the result did not allow
100 or more voxels even in the second attempt, we used the threshold allowing the
greatest number of voxels among four options (i.e., t ≥ 5 from the first run, t ≥ 4
from the first run, t ≥ 5 from the second run, and t ≥ 4 from the second run).
Finally, we derived the subject-specific, task-relevant mask specifying VOIs in V1
by taking conjunction of the subject-specific V1 mask and the extracted task-relevant
voxels. A Python library nilearn (Abraham et al., 2014) was used for formatting
the final mask.
82
Contrast Discrimination Task
The contrast discrimination task was carried out after the processing of the mask
was finished. The ADO-based run and randomized-design run was done once each
within a scanning session so that we could consider between-session variability of
the neural signal. The order of the randomized-design run and ADO-based run was
reversed every session.
The randomized-design run follows the description in Section 4.1.3. In the ADO-
based run, the first three trials are randomly proposed because of the hemodynamic
lag that prevents immediate estimation of stimulus-wise neural activation estimates
(Section 2.7.3). From the third trial, ADO computed the global utility of candidate
designs and proposed an optimal stimulus pair by the following procedure. First,
we extracted the BOLD time series from the VOIs and averaged them. Then we
estimated single-trial neural activation for each grating stimulus by fitting a GLM with
the first-order temporal autocorrelation model for noise (GLM-AR(1)) to the data
with a Python library statsmodel (Seabold & Perktold, 2010). Here, the AR(1) noise
model assumes that the measurement noise at time t is correlated with measurement
noise at time t−1. Once we obtained the stimulus-wise estimates of neural activation,
they were put into ADO together with behavioral responses for computing the optimal
design of the next trial. After the 4th, 8th, 12th, and 16th trials, we sampled the
joint posterior distribution using the DE-MCMC sampler (B. M. Turner, Sederberg,
et al., 2013) for 1,000 iterations, and used the last 800 samples for dynamic gridding.
The total length of both ADO-based and randomized-design experiments is 20
trials. In other words, ADO used a simple stopping rule based on a fixed number of
trials (20 trials), as we need to control the amount of data for parameter estimation.
83
4.2 Offline Analysis: Parameter Estimation
4.2.1 Posterior sampling
The performances of ADO and randomized designs were compared by offline pa-
rameter estimation with a complete data set. We first estimated stimulus-wise neural
activation levels of ADO-based and randomized-design experiments. After averag-
ing the extracted BOLD time-series from the VOIs, we fitted a GLM-AR(1) model
to estimate stimulus-wise neural activation parameters. Once the single-trial neural
estimates were acquired, the joint model parameters were finally estimated by the DE-
MCMC sampler with the stimulus-wise neural activation and behavioral responses as
the data.
In the parameter estimation step, we had to modify the DE-MCMC sampler
settings due to the quality of neural data associated with the mechanism of ADO. As
discussed with Figure 3.8, ADO tends to generate the same design repeatedly until
it gets enough information about the specific parameter, and then proposes distinct
patterns of the design to explore different model parameters. We found that the
unbalanced design of ADO adds significant amount of variability of stimulus-wise
neural activation estimates and may induce difficulties in getting well-constrained
posterior distributions.
Figure 4.3 shows an example of the variability in stimulus-wise neural activation
in ADO-based experiments. The neural activation estimates are more variable when
a specific contrast level is presented more frequently, which might hinder constrain-
ing model parameters when using a regularly used sampling method. Therefore, we
decided to use a “burn-in mode” of the DE-MCMC sampler that concentrates pos-
terior samples to the high-density regions compared to the regular “sampling mode”
84
−5 −4 −3 −2 −1 0
02
46
log(Contrast)
Stim
ulus
−w
ise
neur
al a
ctiv
atio
n
Figure 4.3: Variability of the stimulus-wise neural activaiton. The scatter plotplotshows the contrast levels and associated stimulus-wise neural activation obtained atthe ADO run of the third scanning session of Subject 1.
85
(B. M. Turner & Sederberg, 2012), in addition to high migration probability. Specif-
ically, the DE-MCMC sampler was run with the “burn-in mode” for 3,000 iterations
in total: the sampled used the first 2,000 iterations as a burn-in phase while applying
migration at every iteration, and generated the valid posterior samples for the last
1,000 iterations.
Note that brain images from the ADO-based and randomized-design runs shared
the same data preprocessing procedures to make the stimulus-wise activation esti-
mates from both experiments comparable. We used the motion-corrected images
exported directly from the MR scanner, and did not apply spatial and temporal fil-
tering. The neural signal was extracted from the same VOI mask defined in 4.1.4 for
ADO.
Also, as in the simulation experiments in Chapter 3, joint model parameters were
estimated incrementally to capture the changes of the estimates’ quality over trials.
In other words, we repeated the estimation process starting from the data for the first
trial and adding an one-trial amount of neural and behavioral data until we use up
all 20-trial data.
4.2.2 Benchmark
Unlike the simulation study, we don’t have a “true” parameter that serves as a
benchmark to compare the performances of ADO and randomized design. Therefore,
we decided to use the posterior estimate obtained by using all the data from both
ADO-based and randomized-design runs within a session as a benchmark. We can
justify this approach for two reasons: (1) the stimulus-wise neural activation estimates
from ADO-based and randomized-design runs capture the neural activity of the same
86
⋯ ⋯
ADO Randomized designTrial
Data
1 2 3 4 20⋯ 1 2 3 4 20⋯
⋱ ⋱
Parameter comparison
Benchmark
Figure 4.4: An illustration of the incremental parameter estimation. The gray shaderepresents the amount of data used for estimating parameters. When estimatingparameters for comparing the performance of ADO and randomized design, we in-crementally increase the amount of data so that we can compare how the parameterand corresponding posterior distribution change over trials. For evaluating the per-formance of each design, we set a benchmark estimate using all the data obtainedfrom the ADO-based and randomized-design runs within a scanning session.
visual system, and (2) the uncertainty of model parameters will be most reduced
by using all the available data. The variability of stimulus-wise neural activation
estimates (Figure 4.3) may raise questions about the first assumption because ADO
might cause adaptation to repeatedly presented stimuli compared to randomized de-
signs (Krekelberg, Boynton, & van Wezel, 2006). However, we suggest that using the
combined data is the most reasonable way to establish a standard for performance
evaluation given the constraints in our data analysis.
Figure 4.4 describes the parameter estimation strategy we used for performance
evaluation. The orange and blue squares represent the neural and behavioral data
for each trial in ADO-based and randomized-design runs. The gray shades represent
the amount of data used for parameter estimation. Given the neural and behavioral
87
data for 20 trials from ADO and randomized-design runs, the parameters from ADO
and randomized designs are estimated separately by increasing the amount of data
in a trial-by-trial manner. Meanwhile, the benchmark estimate is obtained by using
all the data that are obtained within a scanning session.
4.2.3 Determination of the Estimates
Once the posterior samples from the ADO, randomized designs, and benchmark
setting were obtained, we computed the estimates to be used for performance eval-
uation. We originally intended to calculate a four-dimensional MAP estimate using
multidimensional kernel density estimation. However, the currently available meth-
ods (Duong, 2007; O’Brien, Kashinath, Cavanaugh, Collins, & O’Brien, 2016) either
required substantial computation time or were very susceptible to slight differences
in posterior samples.
Figure 4.5 shows an example of the robustness issue in multivariate kernel den-
sity estimation. From the data of the ADO-based run of the third scanning session
of Subject 1, we estimated two distinct posterior distributions by running the DE-
MCMC sampler with the same sampling strategy and sampler settings two times.
Based on these posterior distributions, we computed four-dimensional MAP estimates
by the multivariate kernel density estimation method of Duong (2007), marginal
one-dimensional MAP estimates using an Epanechnikov kernel, and marginal one-
dimensional posterior mean values from each posterior distribution. As the two
posterior distributions come from the same data, the estimates from the posterior
distributions are supposed to be very similar to each other. However, Figure 4.5
88
suggests that distinct posterior distributions from the same data may allow differ-
ences according to the type of the posterior estimates that we use. The red, green,
and blue points indicate the difference of four-dimensional MAP estimates, marginal
one-dimensional MAP estimates, and marginal posterior mean values obtained from
the two posterior distributions from the same data. The four-dimensional joint MAP
estimates are less stable against the differences in posterior samples in that the dif-
ferences between the estimates are relatively larger, compared to the marginal MAP
estimates or posterior means.
Due to the susceptibility of four-dimensional MAP estimates, we decide to use the
marginal one-dimensional MAP estimate instead. Although we could still consider
using the posterior mean, MAP estimates seems to be more appropriate in this case
because the sampling method we used (i.e., the burn-in mode of the DE-MCMC
sampler) intentionally biases posterior samples toward high-density regions.
4.2.4 Definition of the Distance from the Benchmark Esti-mate
For comparing the performance, the measure of distance needs to be defined be-
tween the posterior estimate of ADO or randomized designs and the benchmark es-
timate. Similarly as in the simulation study, let us denote the MAP estimates from
the ADO data, randomized-design data, and benchmark data as θADO, θRD, and θB
where θ = (θ1, θ2, θ3, θ4) ≡ (b, Rmax, c50, δ).
89
0.0
0.2
0.4
0.6
b
4−dimensional MAP estimateMarginal MAP estimateMarginal posterior mean
0.0
0.5
1.0
1.5
2.0
Rmax
0.0
0.2
0.4
0.6
c50
0.0
0.1
0.2
0.3
0.4
0.5
Trial
δ
1 5 10 15 20
Trials
abs(
Diff
eren
ce)
Figure 4.5: Robustness of the estimates. The plot shows the differences in posteriorestimates obtained by two distinct posterior distributions from the same data (Subject1, Session 3, ADO run). The estimates are obtained incrementally: location at thex-axis represents the number of trials used for obtaining the corresponding posteriorestimates. The red, green, and blue points represent the four-dimensional MAPestimates (Duong, 2007), marginal one-dimensional MAP estimates, and marginalone-dimensional posterior mean values.
90
We use the Euclidian distance between the MAP estimate of the experiment data
and the benchmark estimate:
DADO =
√√√√ 4∑i=1
(θADO,i − θB,i)2,
DRD =
√√√√ 4∑i=1
(θRD,i − θB,i)2
where θADO,i, θRD,i, and θB,i mean the marginal MAP estimate of θi (i = 1, · · · , 4)
obtained by the ADO data, randomized-design data, and benchmark data, respec-
tively.
4.3 Results and Discussion
Figure 4.6 illustrates the distance between the MAP estimate of each experiment
(i.e., ADO versus randomized designs) and the benchmark estimates. The x-axis and
y-axis represent the log-transformed distance measures from ADO (i.e., DADO) and
randomized designs (i.e., DRD), respectively. Each scanning session is color-coded
differently for visual clarity. If the points are located within the shaded region, we
consider that the ADO estimates are closer to the benchmark estimate compared to
the randomized-design estimates. In other words, we can interpret that the ADO
estimates are “more accurate” than the randomized-design estimates.
The result shows that ADO tends to allow better accuracy than randomized de-
signs, especially in Subject 1 and 2. In Subject 3, the randomized-design estimates
was more accurate than the ADO estimates in one out of three scanning sessions.
In Subject 4, the randomized-design estimates was more accurate than the ADO
91
estimates in one out of three scanning sessions (i.e., red points), while ADO and ran-
domized designs converged to the comparable level of accuracy in another scanning
session (i.e., blue points).
Figure 4.7 illustrates the pooled standard deviation of the posterior distribution of
ADO (x-axis) and randomized designs (y-axis). The pooled standard deviation was
defined similarly in the simulation study (i.e., Equation 3.7). Each scanning session is
color-coded differently. If the points are located within the shaded region, we consider
that the ADO estimates are better in precision than the randomized-design estimates.
The range of both axes in the plots were adjusted to (−3, 0).
Firstly, the pooled standard deviation tends to be very small at the first few trials,
which were cut out of the plot for visual clarity. We interpret that this tendency is
not a computationally meaningful result, rather a statistical artifact generated by
the small amount of neural data. The large variability of the stimulus-wise neural
activation estimates tends to make estimation of the Naka-Rushton equation difficult
(i.e., allows less precision). However, when we have only the neural data for only
one trial, the data can limit the shape of the Naka-Rushton equation and allow high
precision of the joint model parameters in overall. This argument is justified because
there exists only one set of the Naka-Rushton parameters allowing the perfect fit that
connects two stimulus-wise neural activation estimates from two distinct contrast
levels. However, if we get the data for two or more trials, the variability of the
stimulus-wise neural activation levels will not allow the perfect fit and will reduce the
precision temporarily. Of course, precision will improve as we accumulate more data
throughout the experiment.
92
−1.0 −0.5 0.0 0.5 1.0 1.5
−1.
0−
0.5
0.0
0.5
1.0
1.5
Subject 1
log(DADO)
log(
DR
D)
−3 −2 −1 0 1
−3
−2
−1
01
Subject 2
log(DADO)lo
g(D
RD)
−1.5 −1.0 −0.5 0.0 0.5 1.0
−1.
5−
1.0
−0.
50.
00.
51.
0
Subject 3
log(DADO)
log(
DR
D)
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
1.0
1.5
Subject 4
log(DADO)
log(
DR
D)
seq(
20, 1
, −1)
Trial 1
Trial 10
Trial 20
Figure 4.6: The scatter plot of the log-transformed distance measure between theMAP estimates of ADO (x-axis) and randomized designs (y-axis) from the bench-mark MAP estimate. Each subplot represents the results from each subject. Eachscanning session is color-coded differently for visual clarity. The points located at theshaded region represent the trials that ADO allowed better accuracy than randomizeddesigns.
93
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
−3.
0−
2.5
−2.
0−
1.5
−1.
0−
0.5
0.0
Subject 1
log(PSDADO, pooled)
log(
PS
DR
D, p
oole
d)
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
−3.
0−
2.5
−2.
0−
1.5
−1.
0−
0.5
0.0
Subject 2
log(PSDADO, pooled)lo
g(P
SD
RD
, poo
led)
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
−3.
0−
2.5
−2.
0−
1.5
−1.
0−
0.5
0.0
Subject 3
log(PSDADO, pooled)
log(
PS
DR
D, p
oole
d)
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
−3.
0−
2.5
−2.
0−
1.5
−1.
0−
0.5
0.0
Subject 4
log(PSDADO, pooled)
log(
PS
DR
D, p
oole
d)
seq(
20, 1
, −1)
Trial 1
Trial 10
Trial 20
Figure 4.7: The scatter plot of the log-transformed, pooled standard deviation ofthe posterior distribution from ADO (x-axis) and randomized designs (y-axis). Eachsubplot represents the results from each subject. Each scanning session is color-codeddifferently. The points located at the shaded region represent the trials that ADOallowed better precision than randomized designs. The range of both axes in the plotswere adjusted to (−3, 0) for visual clarity.
94
In Figure 4.7, the result tends to support the randomized design in terms of the
precision of model parameters because many points are located out of the shaded
region. This result may be explained by that repeated proposal of the same or very
similar contrast levels causes larger variability in neural activation level, and therefore
fails to constrain the Naka-Rushton model parameters enough.
In summary, the real-world fMRI-based ADO experiments allow more accurate
estimates than randomized experimental designs, as we expected from the result of
the simulation study. However, the precision of the ADO estimates did not meet our
expectation compared to the randomized-design estimates. One possible explanation
about the precision of ADO-based estimates is that the variability of the neural data
induced by the unbalanced designs of ADO made the estimates less precise.
95
Chapter 5: Discussion
We proposed a method for fMRI-based ADO that incorporates both neural and
behavioral data within the optimization routine (Chapter 2). Next, we verified the
performance of fMRI-based ADO through both simulation (Chapter 3) and real-
world, real-time fMRI experiments (Chapter 4). The simulation study showed that
fMRI-based ADO estimates the joint model parameters more accurately and precisely
than conventional randomized designs. We replicated this result with various “true”
parameter sets (Simulation 2) and the lag of one trial in optimization (Simulation
3). In particular, Simulation 1 showed the selective focus of ADO in searching and
evaluating candidate designs. In the real-world fMRI experiment, ADO tends to es-
timate parameters more accurately than randomized designs. However, the precision
of the posterior distribution was better in the randomized-design experiments than
the ADO-based experiments.
The unbalanced designs of ADO might explain low precision of ADO estimates
in that repeated design proposals might have inflated the variability of single-trial
neural estimates, which was carried over to the joint model parameters. There are
many possible task-irrelevant factors such as neural adaptation and scanner drift that
might have inflated the variability of the estimated stimulus-wise neural activation.
96
However, we did not find clear evidence that suggested either adaptation or irreg-
ular drift patterns in both raw BOLD responses and single-trial neural estimates.
Therefore, we speculate that the inflated variability of single-trial neural estimates is
attributed to the selective experimental proposals of ADO.
5.1 Limitations
This study was the first implementation of Bayesian adaptive optimization meth-
ods of experimental design using both neural and behavioral data with a real-world
verification, and therefore leaves many methodological and technological questions to
be improved.
The first major issue is the interaction of the unbalanced design proposals of ADO
and the variability of stimulus-wise neural activation estimates. As discussed above,
the unbalanced designs of ADO can amplify the variability of the neural input of
ADO (i.e., single-trial neural estimates) because ADO constructs unbalanced designs.
The inflated variability of single-trial neural estimates can be carried over to the
model parameters, and therefore hinder precise parameter estimation in ADO. This
variability issue also forced us to use a different sampler setting (i.e., the burn-in
mode) compared to the simulation studies so that we can sample from the high-
density regions, which can harm our justification about the performance of ADO.
One way to handle the variability of single-trial neural estimates is to revise our
model assumptions. Specifically, we assumed that (1) the difference between stimulus-
wise neural estimates determines the response probability in the Thurstonian decision
model, and (2) the variability of stimulus-wise neural estimates is constant across
contrast levels. We made these assumptions to simplify the joint model, but they may
97
be prohibitively restrictive for capturing the dynamics of neural activity in our study.
For example, Boynton et al. (1999) pointed out that the assumption of identically
distributed noise may not be reasonable because the variability in the firing rate
tends to increase according to the mean firing rate. As BOLD responses are known
to be proportional to the mean firing rate (Heeger, Huk, Geisler, & Albrecht, 2000), we
can expect that the variability of the BOLD responses will increase according to the
amplitude of the BOLD responses. In this case, our assumption of constant variance
may not hold, and one potential solution is to develop a model of heteroscedasticity
in the stimulus-wise neural activations as a function of contrast levels.
There are several alternative strategies worth investigating in future work, such as
using different utility functions (Myung et al., 2013), planning the visitation sched-
ule (i.e., the order of parameters to be focused on in the optimization routine), and
focusing on the shape and uncertainty of the function to be estimated rather than
just parameters. However, applicability of these alternative optimization strategies
may depend on the assumptions and structure of the target model. Hence, simula-
tion experiments can play an important role to test how the different optimization
strategies cause changes in the patterns of the design proposals of ADO and therefore
the variability in the neural activation.
The second limitation of this study is our strategy for handling neural data. Al-
though we discussed that raw BOLD responses are not suitable in fMRI-based ADO
due to their dimensionality, we could consider alternative methods such as sequential
Monte Carlo methods (Cappe, Godsill, & Moulines, 2007). Also, within the GLM
framework, iterative trial-wise GLM with nuisance regressors (Mumford et al., 2014,
98
2012; B. O. Turner et al., 2012) could help control the variability of stimulus-wise
estimates.
The third issue is numerical precision of the current implementation of ADO. In
Equation 2.1, we can decompose p(θ|d1:t, y1:t) and p(θ|d1:t, y1:t, d, y) as
log p(θ|d1:t, y1:t) ∝ logp(θ)× p(y1:t|d1:t)
= log p(θ) + log p(y1:t|θ, d1:t), (5.1)
log p(θ|d1:t, y1:t, d, y∗) ∝ log
p(θ)× p(y1:t, y
∗|d1:t, d)
= log p(θ) + log p(y1:t|θ, d1:t) + log p(y∗|θ, d) (5.2)
where p(θ) is the prior density of θ, p(y1:t|θ, d1:t) is the likelihood of the current data
(i.e., y1:t), and p(y∗|θ, d) is the (anticipated) likelihood for the proposed design d
and expected response y∗. We can find that log p(θ) + log p(y1:t|θ, d1:t) is used in
Equation 2.1 multiple times: in addition to Equations 5.1 and 5.2, this term is equal
to log-transformed joint probability density of the parameter θ and response y1:t (i.e.,
p(y1:t|θ, d1:t)p(θ|d1:t) = p(y1:t, θ|d1:t)). Note that given the data set at trial t, there is
no need to compute log p(θ) + log p(y1:t|θ, d1:t) three times because it is a fixed value.
In the first version of fMRI-based ADO simulation, we evaluated log p(θ)+log p(y1:t|θ, d1:t)
repeatedly when exploring grid points in the parameter and response spaces, which
ended up with 40-60 seconds of ADO computation for each trial. However, to im-
prove the computation speed, we computed log p(θ) + log p(y1:t|θ, d1:t) only one time,
evaluated log p(y∗|θ, d) for each grid point in the parameter and response space, and
combined the two terms as in Equation 5.2. At the cost of speed, this computational
trick causes infinitesimal numerical errors compared to the original method, which
can be explained by floating point algebraic errors. This error may not matter when
99
the posterior distribution is not constrained enough; however, the size of the error
might increase proportionately to the constraints in the posterior.
5.2 Further Developments and Practical Applications
One natural extension of the fMRI-based ADO is model comparison, for which
ADO was originally proposed (Cavagnaro et al., 2013, 2011). This application seems
promising because model-based cognitive neuroscience studies have shown that neural
data can contribute to compare competitive cognitive models that can hardly be
discriminated by behavioral data alone (e.g., Mack et al., 2013).
We can also consider incorporating multiple ROIs in our optimization routine. In
this case, developing an appropriate joint model may be a critical factor because one
ROI might be associated with multiple parameters or one model parameter might be
correlated with multiple ROIs.
We may extend fMRI-based ADO for compatibility with distributed activation
patterns over multiple voxels. In the contrast discrimination task, we assumed that
voxels in V1 share a similar activation pattern modeled by the Naka-Rushton equa-
tion. However, many cognitive activities are represented in the brain with distributed
activation patterns because underlying neurons may have different tuning preferences.
For example, Cox and Savoy (2003) train a classifier model for object recognition us-
ing activation patterns across voxels in early visual cortex, rather than the averaged
signal. As multivariate pattern analysis has been widespread in fMRI research, com-
patibility with multi-voxel neural signals can add more generalizability to fMRI-based
ADO.
100
Practically, fMRI-based ADO has the potential for cognitive psychometric (van der
Maas, Molenaar, Maris, Kievit, & Borsboom, 2011) and computational psychiatric
(Wiecki, Poland, & Frank, 2015) settings by promoting efficient data collection. We
don’t have concrete examples of cognitive psychometric or computational psychiatric
studies using both neural and behavioral data for now. However, considering the cost
for collecting fMRI data (especially from a clinical population), fMRI-based ADO
may produce more efficient model-based neuroimaging studies.
5.3 Conclusions
By now, adaptive methods for design optimization in cognitive science have re-
lied on either neural or behavioral data only. Specifically in neuroimaging, the first
fully adaptive optimization methods (Lorenz et al., 2016) focused on task-to-region
mapping based on the localization paradigm. In this thesis, we proposed an ap-
plication of Adaptive Design Optimization (Cavagnaro et al., 2010) to model-based
fMRI experiments that aims to provide more systematic explanations between brain,
mind, and behavior. In addition to driving more accurate data collection, fMRI-based
ADO exploits both neural and behavioral data simultaneously with the joint model-
ing framework (B. M. Turner, Forstmann, et al., 2013). Future work could hopefully
control the variability of stimulus-wise neural estimates and improve the precision of
estimates.
101
References
Abdulrahman, H., & Henson, R. N. (2016). Effect of trial-to-trial variability onoptimal event-related fMRI design: Implications for Beta-series correlation andmulti-voxel pattern analysis. NeuroImage, 125 , 756–766.
Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kos-saifi, J., . . . Varoquaux, G. (2014). Machine learning for neuroimagingwith scikit-learn. Frontiers in Neuroinformatics , 8 , 14. Retrieved fromhttps://www.frontiersin.org/article/10.3389/fninf.2014.00014 doi:10.3389/fninf.2014.00014
Aguirre, G. K. (2007). Continuous carry-over designs for fMRI. NeuroImage, 35 (4),1480–1494.
Amunts, K., Malikovic, A., Mohlberg, H., Schormann, T., & Zilles, K. (2000). Brod-mann’s areas 17 and 18 brought into stereotaxic space?where and how variable?NeuroImage, 11 (1), 66–84.
Amzal, B., Bois, F. Y., Parent, E., & Robert, C. P. (2006). Bayesian-optimal designvia interacting particle systems. Journal of the American Statistical Associa-tion, 101 (474), 773–785.
Andersson, J. L. R., & Jenkinson, M. (2007). Non-linear registration aka Spa-tial normalisation (FMRIB Technical Report TR07JA2). Retrieved fromhttps://www.fmrib.ox.ac.uk/datasets/techrep/tr07ja2/tr07ja2.pdf
Bak, J. H., & Pillow, J. W. (2018). Adaptive stimulus selection for multi-alternativepsychometric functions with lapses. bioRxiv , 260976.
Bellman, R. (1957). Dynamic programming (1st ed.). Princeton, NJ, USA: PrincetonUniversity Press.
Boynton, G. M., Demb, J. B., Glover, G. H., & Heeger, D. J. (1999). Neuronal basisof contrast discrimination. Vision research, 39 (2), 257–269.
Bullmore, E., Brammer, M., Williams, S. C., Rabe-Hesketh, S., Janot, N., David,A., . . . Sham, P. (1996). Statistical methods of estimation and inference forfunctional MR image analysis. Magnetic Resonance in Medicine, 35 , 261–277.
Buracas, G. T., & Boynton, G. M. (2002). Efficient design of event-related fMRIexperiments using M-sequences. NeuroImage, 16 , 801–813.
Cappe, O., Godsill, S. J., & Moulines, E. (2007). An overview of existing methods andrecent advances in sequential Monte Carlo. In (Vol. 95, pp. 899–924). IEEE.
102
Cavagnaro, D. R., Aranovich, G. J., McClure, S. M., Pitt, M. A., & Myung, J. I.(2016). On the functional form of temporal discounting: An optimized adaptivetest. Journal of Risk and Uncertainty , 52 , 233–254.
Cavagnaro, D. R., Myung, J. I., Pitt, M. A., & Kujala, J. V. (2010). Adaptive designoptimization: A mutual information-based approach to model discrimination incognitive science. Neural Computation, 22 , 887–905.
Cavagnaro, D. R., Pitt, M. A., Gonzalez, R., & Myung, J. I. (2013). Discriminat-ing among probability weighting functions using adaptive design optimization.Journal of Risk and Uncertainty , 47 , 255–289.
Cavagnaro, D. R., Pitt, M. A., & Myung, J. I. (2011). Model discrimination throughadaptive experimentation. Psychonomic Bulletin and Review , 18 , 204–210.
Cooke, J. R. H., Selen, L. P. J., van Beers, R. J., & Medendorp, W. P. (2017).Bayesian adaptive stimulus selection for dissociating models of psychophysicaldata. bioRxiv , 220590.
Cox, D. D., & Savoy, R. L. (2003). Functional magnetic resonance imaging (fMRI)?brain reading?: detecting and classifying distributed patterns of fMRI activityin human visual cortex. NeuroImage, 19 (2), 261–270.
Dale, A. M. (1999). Optimal experimental design for event-related fMRI. HumanBrain Mapping , 8 , 109–114.
Daunizeau, J., Preuschoff, K., Friston, K., & Stephan, K. (2011).Optimizing experimental design for comparing models of brain func-tion. PLoS Computational Biology , 7 (11), e1002280. Retrieved fromhttps://doi.org/10.1371/journal.pcbi.1002280
DiMattina, C. (2016). Comparing models of contrast gain using psychophysicalexperiments. Journal of Vision, 16 , 1–18.
DiMattina, C., & Zhang, K. (2011). Active data collection for efficient estimationand comparison of nonlinear neural models. Neural Computation, 23 (9), 2242–2288.
DiMattina, C., & Zhang, K. (2013). Adaptive stimulus optimization for sensorysystems neuroscience. Frontiers in neural circuits , 7 , 101.
Dunne, S., & O’Doherty, J. P. (2013). Insights from the application of computationalneuroimaging to social neuroscience. Current opinion in neurobiology , 23 (3),387–392.
Duong, T. (2007). ks: Kernel density estimation and kernel discriminant analysis formultivariate data in R. Journal of Statistical Software, 21 (7), 1–16.
Eickhoff, S. B., Stephan, K. E., Mohlberg, H., Grefkes, C., Fink, G. R., Amunts, K.,& Zilles, K. (2005). A new SPM toolbox for combining probabilistic cytoarchi-tectonic maps and functional imaging data. Neuroimage, 25 , 1325–1335.
Forstmann, B. U., Brown, S., Dutilh, G., Neumann, J., & Wagenmakers, E.-J. (2010).The neural substrate of prior information in perceptual decision making: amodel-based analysis. Frontiers in Human Neuroscience, 4 , 40.
Forstmann, B. U., & Wagenmakers, E.-J. (2015). Model-based cognitive neuro-science: A conceptual introduction. In An introduction to model-based cognitive
103
neuroscience (pp. 139–156). Springer.Forstmann, B. U., Wagenmakers, E.-J., Eichele, T., Brown, S., & Serences, J. T.
(2011). Reciprocal relations between cognitive neuroscience and formal cognitivemodels: Opposites attract? Trends in Cognitive Sciences , 15 (6), 272–279.
Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. Neu-roImage, 19 (4), 1273–1302.
Friston, K. J., Holmes, A. P., Poline, J., Grasby, P., Williams, S., Frackowiak, R. S.,& Turner, R. (1995). Analysis of fMRI time-series revisited. NeuroImage, 2 (1),45–53.
Grabner, G., Janke, A. L., Budge, M. M., Smith, D., Pruessner, J., & Collins, D. L.(2006). Symmetric atlasing and model based segmentation: an application tothe hippocampus in older adults. In International Conference on Medical ImageComputing and Computer-Assisted Intervention (pp. 58–66).
Grabowski, T. J., Bauer, M. D., Foreman, D., Mehta, S., Eaton, B. L., Graves,W. W., . . . Bolinger, L. (2006). Adaptive pacing of visual stimulation for fMRIstudies involving overt speech. NeuroImage, 29 (3), 1023–1030. Retrieved fromhttps://doi.org/10.1016/j.neuroimage.2005.08.064
Greve, D. N., Brown, G. G., Mueller, B. A., Glover, G., Liu, T. T., et al. (2013). Asurvey of the sources of noise in fMRI. Psychometrika, 78 (3), 396–416.
Heeger, D. J., Huk, A. C., Geisler, W. S., & Albrecht, D. G. (2000). Spikes ver-sus BOLD: what does neuroimaging tell us about neuronal activity? NatureNeuroscience, 3 (7), 631.
Holling, H., Maus, B., & van Breukelen, G. J. P. (2013). Optimal design for functionalmagnetic resonance imaging experiments. Zeitschrift fur Psychologie, 221 , 174–189.
Hu, B., & Tsui, K.-W. (2005). Distributed evolutionary Monte Carlo with appli-cations to Bayesian analysis (Technical Report Number 1112). Retrieved fromhttp://www.stat.wisc.edu/techreports/tr1112.pdf
Jenkinson, M., Bannister, P., Brady, M., & Smith, S. (2002). Improved optimizationfor the robust and accurate linear registration and motion correction of brainimages. NeuroImage, 17 (2), 825–841.
Jenkinson, M., & Smith, S. (2001). A global optimisation method for robust affineregistration of brain images. Medical Image Analysis , 5 (2), 143–156.
Johnson, R. A., & Wichern, D. (2007). Applied Multivariate Statistical Analysis (6thed.). Upper Saddle River, New Jersey: Pearson Prentice Hall.
Kim, W., Pitt, M. A., Lu, Z.-L., & Myung, J. I. (2017). Planning beyond the nexttrial in adaptive experiments: A dynamic programming approach. CognitiveScience, 41 , 2234–2252.
Kim, W., Pitt, M. A., Lu, Z.-L., Steyvers, M., & Myung, J. I. (2014). A hierarchicaladaptive approach to optimal experimental design. Neural Computation, 26 ,2465–2492.
104
Koffarnus, M. N., Deshpande, H. U., Lisinski, J. M., Eklund, A., Bickel, W. K., & La-Conte, S. M. (2017). An adaptive, individualized fMRI delay discounting proce-dure to increase flexibility and optimize scanner time. NeuroImage, 161 , 56–66.Retrieved from https://doi.org/10.1016/j.neuroimage.2017.08.024
Kontsevich, L. L., & Tyler, C. W. (1999). Bayesian adaptive estimation of psycho-metric slope and threshold. Vision Research, 39 (16), 2729–2737.
Krekelberg, B., Boynton, G. M., & van Wezel, R. J. (2006). Adaptation: from singlecells to BOLD signals. Trends in Neurosciences , 29 (5), 250–256.
Kujala, J. V., & Lukka, T. J. (2006). Bayesian adaptive estimation: The nextdimension. Journal of Mathematical Psychology , 50 (4), 369–389.
Leek, M. R. (2001). Adaptive procedures in psychophysical research. Perception &Psychophysics , 63 (8), 1279–1292.
Lesmes, L. A., Jeon, S.-T., Lu, Z.-L., & Dosher, B. A. (2006). Bayesian adaptiveestimation of threshold versus contrast external noise functions: The quick TvCmethod. Vision Research, 46 (19), 3160–3176.
Lesmes, L. A., Lu, Z.-L., Baek, J., & Albright, T. D. (2010). Bayesian adaptiveestimation of the contrast sensitivity function: The quick CSF method. Journalof Vision, 10 (3), 17–17.
Lewi, J., Butera, R., & Paninski, L. (2009). Sequential optimal design of neurophys-iology experiments. Neural Computation, 21 (3), 619–687.
Li, X., Lu, Z.-L., Tjan, B. S., Dosher, B. A., & Chu, W. (2008). Bloodoxygenation level-dependent contrast response functions identify mechanismsof covert attention in early visual areas. Proceedings of the NationalAcademy of Sciences of the United States , 105 , 6202–6207. Retrieved fromhttps://doi.org/10.1073/pnas.0801390105
Lin, C. D., Anderson-Cook, C. M., Hamada, M. S., Moore, L. M., & Sitter, R. R.(2015). Using genetic algorithms to design experiments: A review. Quality andReliability Engineering International , 31 (2), 155–167.
Lindquist, M. A. (2008). The statistical analysis of fMRI data. Statistical Science,23 , 439–464.
Lorenz, R., Hampshire, A., & Leech, R. (2017). Neuroadaptive Bayesian optimizationand hypothesis testing. Trends in cognitive sciences , 21 (3), 155–167. Retrievedfrom https://doi.org/10.1016/j.tics.2017.01.006
Lorenz, R., Monti, R. P., Violante, I. R., Anagnostopoulos, C., Faisal,A. A., Montana, G., & Leech, R. (2016). The Automatic Neuro-scientist: A framework for optimizing experimental design with closed-loop real-time fMRI. NeuroImage, 129 , 320–334. Retrieved fromhttps://doi.org/10.1016/j.neuroimage.2016.01.032
Love, B. C. (2015). The algorithmic level is the bridge between computation andbrain. Topics in Cognitive Science, 7 (2), 230–242.
Mack, M. L., Preston, A. R., & Love, B. C. (2013). Decoding the brain?s algorithmfor categorization from its neural implementation. Current Biology , 23 (20),2023–2027.
105
Maclaren, J., Herbst, M., Speck, O., & Zaitsev, M. (2013). Prospective motioncorrection in brain imaging: A review. Magnetic Resonance in Medicine, 69 (3),621–636.
Marr, D. (1982). Vision: A Computational Investigation into the Human Represen-tation and Processing of Visual Information. New York: Freeman.
Mumford, J. A., Davis, T., & Poldrack, R. A. (2014). The impact of study design onpattern estimation for single-trial multivariate pattern analysis. NeuroImage,103 , 130–138.
Mumford, J. A., Turner, B. O., Ashby, F. G., & Poldrack, R. A. (2012). DeconvolvingBOLD activation in event-related designs for multivoxel pattern classificationanalyses. NeuroImage, 59 , 2636–2643.
Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on adaptive designoptimization. Journal of Mathematical Psychology , 57 , 53–67.
Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrim-ination. Psychological Review , 116 (3), 499–518.
O’Brien, T. A., Kashinath, K., Cavanaugh, N. R., Collins, W. D., & O’Brien, J. P.(2016). A fast and objective multidimensional kernel density estimation method:fastKDE. Computational Statistics & Data Analysis , 101 , 148–160.
O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and itsapplication to reward learning and decision making. Annals of the New YorkAcademy of sciences , 1104 (1), 35–53.
O’Reilly, J. X., & Mars, R. B. (2011). Computational neuroimaging: localising greekletters? Comment on forstmann et al. Trends in Cognitive Sciences , 15 (10),450.
Palestro, J. J., Bahg, G., Sederberg, P. B., Lu, Z.-L., Steyvers, M., & Turner, B. M.(2018). A tutorial on joint models of neural and behavioral measures of cogni-tion. Journal of Mathematical Psychology , 84 , 20–48.
Park, M., Weller, J. P., Horwitz, G. D., & Pillow, J. W. (2014). Bayesian activelearning of neural firing rate maps with transformed gaussian process priors.Neural computation, 26 (8), 1519–1541.
Poldrack, R. A., Mumford, J. A., & Nichols, T. E. (2011). Handbook of FunctionalMRI Data Analysis. New York: New York: Cambridge University Press.
Prins, N. (2013). The psi-marginal adaptive method: How to give nuisance parametersthe attention they deserve (no more, no less). Journal of Vision, 13 (7), 3–3.Retrieved from https://doi.org/10.1167/13.7.3
Rissman, J., Gazzaley, A., & D’Esposito, M. (2004). Measuring functional connec-tivity during distinct stages of a cognitive task. NeuroImage, 23 , 752–763.
Rodriguez, C. A., Turner, B. M., Van Zandt, T., & McClure, S. M. (2015). Theneural basis of value accumulation in intertemporal choice. European Journalof Neuroscience, 42 (5), 2179–2189.
Ryan, E. G., Drovandi, C. C., McGree, J. M., & Pettitt, A. N. (2016).A review of modern computational algorithms for Bayesian optimal de-sign. International Statistical Review , 84 , 128–154. Retrieved from
106
https://doi.org/10.1111/insr.12107
Sanchez, G., Daunizeau, J., Maby, E., Bertrand, O., Bompas, A., & Mattout, J.(2014). Toward a new application of real-time electrophysiology: online opti-mization of cognitive neurosciences hypothesis testing. Brain Sciences , 4 (1),49–72. Retrieved from https://doi.org/10.3390/brainsci4010049
Sanchez, G., Lecaignard, F., Otman, A., Maby, E., & Mattout, J.(2016). Active SAmpling Protocol (ASAP) to optimize individual neu-rocognitive hypothesis testing: A BCI-inspired dynamic experimental de-sign. Frontiers in Human Neuroscience, 10 , 347. Retrieved fromhttps://www.frontiersin.org/article/10.3389/fnhum.2016.00347 doi:10.3389/fnhum.2016.00347
Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modelingwith python. In Proceedings of the 9th python in science conference (Vol. 57,p. 61).
Serences, J. T., & Saproo, S. (2012). Computational advances towards link-ing BOLD and behavior. Neuropsychologia, 50 , 435–446. Retrieved fromhttps://doi.org/10.1016/j.neuropsychologia.2011.07.013
Smith, S. M., Jenkinson, M., Woolrich, M. W., Beckmann, C. F., Behrens, T. E.,Johansen-Berg, H., . . . Matthews, P. M. (2004). Advances in functional andstructural MR image analysis and implementation as FSL. NeuroImage, 23 ,S208–S219.
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimizationof machine learning algorithms. In Advances in Neural Information ProcessingSystems (pp. 2951–2959).
ter Braak, C. J. F. (2006). A Markov Chain Monte Carlo version of the geneticalgorithm Differential Evolution: easy Bayesian computing for real parameterspaces. Statistics and Computing , 16 , 239–249.
Thurstone, L. L. (1927). A law of comparative judgement. Psychological Review , 34 ,278–286. Retrieved from https://doi.org/10.1037/h0070288
Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L.(2017). Approaches to analysis in model-based cognitive neuroscience. Journalof Mathematical Psychology , 76 , 65-79.
Turner, B. M., Forstmann, B. U., Wagenmakers, E.-J., Brown, S. D., Sederberg, P. B.,& Steyvers, M. (2013). A Bayesian framework for simultaneously modelingneural and behavioral data. NeuroImage, 72 , 193–206.
Turner, B. M., Rodriguez, C. A., Norcia, T. M., McClure, S. M., & Steyvers, M.(2016). Why more is better: A method for simultaneously modeling EEG,fMRI, and Behavior. NeuroImage, 128 , 96–115.
Turner, B. M., & Sederberg, P. B. (2012). Approximate Bayesian computation withdifferential evolution. Journal of Mathematical Psychology , 56 (5), 375–385.
Turner, B. M., Sederberg, P. B., Brown, S. D., & Steyvers, M. (2013). A method forefficiently sampling from distributions with correlated dimensions. PsychologicalMethods , 18 , 368–384.
107
Turner, B. M., Van Maanen, L., & Forstmann, B. U. (2015). Combining CognitiveAbstractions with Neurophysiology: The Neural Drift Diffusion Model. Psy-chological Review , 122 , 312–336.
Turner, B. M., Wang, T., & Merkel, E. (2017). Factor analysis linking functions forsimultaneously modeling neural and behavioral data. NeuroImage, 153 , 28-48.
Turner, B. O., Mumford, J. A., Poldrack, R. A., & Ashby, F. G. (2012). Spatiotempo-ral activity estimation for multivoxel pattern analysis with rapid event-relateddesigns. NeuroImage, 62 , 1429–1438.
van der Maas, H. L., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011).Cognitive psychology meets psychometric theory: On the relation between pro-cess models for decision making and latent variable models for individual dif-ferences. Psychological Review , 118 (2), 339.
van Maanen, L., Brown, S. D., Eichele, T., Wagenmakers, E.-J., Ho, T., Serences, J.,& Forstmann, B. U. (2011). Neural correlates of trial-to-trial fluctuations inresponse caution. Journal of Neuroscience, 31 (48), 17488–17495.
van Ravenzwaaij, D., Provost, A., & Brown, S. D. (2017). A Confirmatory Approachfor Integrating Neural and Behavioral Data into a Single Model. Journal ofMathematical Psychology , 76 , 131–141.
Wager, T. D., & Nichols, T. E. (2003). Optimization of experimental design in fMRI:A general framework using a genetic algorithm. NeuroImage, 18 , 293–309.
Wandell, B. A. (1999). Computational neuroimaging of human visual cortex. AnnualReview of Neuroscience, 22 , 145–173.
Wandell, B. A., & Winawer, J. (2015). Computational neuroimaging and populationreceptive fields. Trends in Cognitive Sciences , 19 (6), 349–357.
Watson, A. B., & Pelli, D. G. (1983). QUEST: A Bayesian adaptive psychometricmethod. Perception & Psychophysics , 33 (2), 113–120.
Wiecki, T. V., Poland, J., & Frank, M. J. (2015). Model-based cognitive neuroscienceapproaches to computational psychiatry: clustering and classification. ClinicalPsychological Science, 3 (3), 378–399.
108