Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian...

13
Abstract. A Bayesian inference framework for estimat- ing the parameters of single-trial, multicomponent, event-related potentials is presented. Single-trial record- ings are modeled as the linear combination of ongoing activity and multicomponent waveforms that are rela- tively phase-locked to certain sensory or motor events. Each component is assumed to have a trial-invariant waveform with trial-dependent amplitude scaling factors and latency shifts. A Maximum a Posteriori solution of this model is implemented via an iterative algorithm from which the component’s waveform, single-trial amplitude scaling factors and latency shifts are esti- mated. Multiple components can be derived from a single-channel recording based on their differential variability, an aspect in contrast with other component analysis techniques (e.g., independent component anal- ysis) where the number of components estimated is equal to or smaller than the number of recording channels. Furthermore, we show that, by subtracting out the estimated single-trial components from each of the single-trial recordings, one can estimate the ongoing activity, thus providing additional information concern- ing task-related brain dynamics. We test this approach, which we name differentially variable component anal- ysis (dVCA), on simulated data and apply it to an experimental dataset consisting of intracortically re- corded local field potentials from monkeys performing a visuomotor pattern discrimination task. 1 Introduction Relations between brain and behavior are often studied by recording event-related potentials (ERPs) over repeated presentations of a sensory stimulus or task performance. The obtained data are commonly modeled as the linear combination of ongoing activity and event-related components that are relatively phase- locked to the onset of the event. The ongoing activity includes signals that are not related to the event and possibly also signals that are induced by the event but that are not phase-locked to its onset. Based on the classic signal-plus-noise model, the event-related signals are extracted by averaging across the ensemble of trials, resulting in the average event-related potential (AERP). The main assumption of this approach is that an event- related signal exists that is invariant across trials. However, evidence amassed thus far indicates that this assumption is no longer tenable. More realistic models have been advanced where trial-to-trial amplitude and latency variability are taken into account (Woody 1967; Coppola et al. 1978; Truccolo et al. 2002a). The existence of trial-to-trial variability in event-re- lated activity implies that the simple ensemble mean is not the optimal estimator for event-related components. Woody (1967) proposed a method for estimating the latency of single-trial event-related activity by use of a matched filter. Following Woody’s technique, averages can then be computed after adjusting for the estimated latency variability over trials. Subsequent work has ex- tended Woody’s idea by including amplitude variability and by placing the estimation of single-trial parameters in a maximum likelihood framework (Jaskowski and Verleger 1999; Lange et al. 1997; McGillem et al. 1985; Pham et al. 1987). Other techniques based on wavelet (Bartnik et al. 1992; Quiroga 2000; Quiroga and Garcia 2003) and clustering analysis (Haig et al. 1995) have also been introduced in the literature. The goal of the present paper is threefold. First, starting from a Bayesian perspective, we propose a comprehensive framework that can serve as a theoret- ical guideline for modeling and estimating parameters of single-trial multicomponent ERPs. The ERPs are modeled as the linear combination of multiple com- ponents whose waveforms, single-trial latency shifts, and amplitude scaling factors are to be estimated based on the components’ differential variability from trial to trial, a technique we entitle differentially variable component analysis (dVCA). In this regard, Woody’s Correspondence to: Mingzhou Ding (e-mail: [email protected]) Biol. Cybern. 89, 426–438 (2003) DOI 10.1007/s00422-003-0433-7 Ó Springer-Verlag 2003 Estimation of single-trial multicomponent ERPs: Differentially variable component analysis (dVCA) Wilson Truccolo 1 , Kevin H. Knuth 2 , Ankoor Shah 3,4 , Steven L. Bressler 5 , Charles E. Schroeder 3,4 , Mingzhou Ding 5 1 Department of Neuroscience, Brown University, 190 Thayer Street, Providence, RI 02912, USA 2 Computational Sciences Division, Code IC, NASA Ames Research Center, Moffett Field, CA 94035, USA 3 Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA 4 Cognitive Neuroscience and Schizophrenia Dept., Nathan S. Kline Institute, Orangeburg, NY 10962, USA 5 Center for Complex Systems and Brain Sciences, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA Received: 27 January 2003 / Accepted: 4 September 2003 / Published online: 4 December 2003

Transcript of Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian...

Page 1: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

Abstract. A Bayesian inference framework for estimat-ing the parameters of single-trial, multicomponent,event-related potentials is presented. Single-trial record-ings are modeled as the linear combination of ongoingactivity and multicomponent waveforms that are rela-tively phase-locked to certain sensory or motor events.Each component is assumed to have a trial-invariantwaveform with trial-dependent amplitude scaling factorsand latency shifts. A Maximum a Posteriori solution ofthis model is implemented via an iterative algorithmfrom which the component’s waveform, single-trialamplitude scaling factors and latency shifts are esti-mated. Multiple components can be derived from asingle-channel recording based on their differentialvariability, an aspect in contrast with other componentanalysis techniques (e.g., independent component anal-ysis) where the number of components estimated is equalto or smaller than the number of recording channels.Furthermore, we show that, by subtracting out theestimated single-trial components from each of thesingle-trial recordings, one can estimate the ongoingactivity, thus providing additional information concern-ing task-related brain dynamics. We test this approach,which we name differentially variable component anal-ysis (dVCA), on simulated data and apply it to anexperimental dataset consisting of intracortically re-corded local field potentials from monkeys performing avisuomotor pattern discrimination task.

1 Introduction

Relations between brain and behavior are often studiedby recording event-related potentials (ERPs) overrepeated presentations of a sensory stimulus or taskperformance. The obtained data are commonly modeledas the linear combination of ongoing activity and

event-related components that are relatively phase-locked to the onset of the event. The ongoing activityincludes signals that are not related to the event andpossibly also signals that are induced by the event butthat are not phase-locked to its onset. Based on theclassic signal-plus-noise model, the event-related signalsare extracted by averaging across the ensemble of trials,resulting in the average event-related potential (AERP).The main assumption of this approach is that an event-related signal exists that is invariant across trials.However, evidence amassed thus far indicates that thisassumption is no longer tenable. More realistic modelshave been advanced where trial-to-trial amplitude andlatency variability are taken into account (Woody 1967;Coppola et al. 1978; Truccolo et al. 2002a).

The existence of trial-to-trial variability in event-re-lated activity implies that the simple ensemble mean isnot the optimal estimator for event-related components.Woody (1967) proposed a method for estimating thelatency of single-trial event-related activity by use of amatched filter. Following Woody’s technique, averagescan then be computed after adjusting for the estimatedlatency variability over trials. Subsequent work has ex-tended Woody’s idea by including amplitude variabilityand by placing the estimation of single-trial parametersin a maximum likelihood framework (Jaskowski andVerleger 1999; Lange et al. 1997; McGillem et al. 1985;Pham et al. 1987). Other techniques based on wavelet(Bartnik et al. 1992; Quiroga 2000; Quiroga and Garcia2003) and clustering analysis (Haig et al. 1995) have alsobeen introduced in the literature.

The goal of the present paper is threefold. First,starting from a Bayesian perspective, we propose acomprehensive framework that can serve as a theoret-ical guideline for modeling and estimating parametersof single-trial multicomponent ERPs. The ERPs aremodeled as the linear combination of multiple com-ponents whose waveforms, single-trial latency shifts,and amplitude scaling factors are to be estimated basedon the components’ differential variability from trial totrial, a technique we entitle differentially variablecomponent analysis (dVCA). In this regard, Woody’sCorrespondence to: Mingzhou Ding (e-mail: [email protected])

Biol. Cybern. 89, 426–438 (2003)DOI 10.1007/s00422-003-0433-7� Springer-Verlag 2003

Estimation of single-trial multicomponent ERPs: Differentiallyvariable component analysis (dVCA)

Wilson Truccolo1, Kevin H. Knuth2, Ankoor Shah3,4, Steven L. Bressler5, Charles E. Schroeder3,4, Mingzhou Ding5

1 Department of Neuroscience, Brown University, 190 Thayer Street, Providence, RI 02912, USA2 Computational Sciences Division, Code IC, NASA Ames Research Center, Moffett Field, CA 94035, USA3 Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA4 Cognitive Neuroscience and Schizophrenia Dept., Nathan S. Kline Institute, Orangeburg, NY 10962, USA5 Center for Complex Systems and Brain Sciences, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA

Received: 27 January 2003 / Accepted: 4 September 2003 / Published online: 4 December 2003

Page 2: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

(Woody 1967) and related algorithms are seen as spe-cial cases implementing the maximum a posteriorisolution of the posterior probability. Second, weemphasize that single-trial parameters such as ampli-tude scaling factors and latency shifts are themselvesinteresting physiological variables, providing essentialinformation for understanding stages of informationprocessing in the cortex (Nowak and Bullier 1997;Schmolesky et al. 1998; Schroeder et al. 1998). Espe-cially interesting is the finding of differential variabilityfor estimated components in the same or differentchannels. Third, recent work shows that the ongoingactivity, which contains non-phase-locked signals, is themain source for the study of functional interdepen-dences between neuronal populations (Truccolo et al.2002a). Traditionally, ongoing activities have been ob-tained via subtraction of the AERP from single-trialrecording. We demonstrate that a more appropriateway to extract the ongoing process is to perform single-trial estimation of event-related potential componentsand then subtract them from the single-trial time series.

The entire dVCA approach is first tested on simulateddata and then applied to an experimental dataset con-sisting of intracortically recorded single-trial local fieldpotentials (LFPs) in monkeys performing a GO/NO-GOvisuomotor pattern discrimination task.

2 Theory

2.1 Model of single-trial LFPs

Our starting point is a generative model for the recordedsingle-trial LFPs. Such a model should capture at leastfour main properties of LFP recordings in the event-related paradigm: (a) the existence of signals that arerelatively phase-locked to a specific event onset, (b) thetrial-to-trial variability in amplitude and latency of theevent-related signals, (c) the possibility that the event-related response may be the superposition of multiplecomponents with differential variability in their single-trial amplitude scaling factors and latency shifts, and(d) the existence of signals that are not phased-locked tothe event, including activity that is unrelated to theexperiment (e.g., measurement noise) and activity that isinduced by the event. As a first approximation, thewaveforms of the event-related components are assumedto be invariant across trials. A model that incorporatesall these features can be written as

xrðtÞ ¼XN

n¼1anrsnðt � snrÞ þ grðtÞ ; ð2:1Þ

where xrðtÞ is the LFP recording from the r-th trial, snðtÞis the n-th event-related component waveform with atrial-to-trial variable amplitude scaling factor andlatency shift given by anr and snr, respectively, and N isthe total number of components. The process grðtÞ,henceforth referred to as the ongoing activity, includesall the non-phase-locked signals and is assumed to havea zero mean. From a physiological point of view we note

that each component could correspond either to theactivity of a distinct neural source or generator, or to theactivity of the same source but at a different neuralprocessing stage.

2.2 Bayesian estimation framework

The problem of estimating the single-trial parame-ters snðtÞ, anr, and snr is formulated from a Bayesianperspective. According to Bayes theorem, the posteriorprobability of model parameters M given data D andprior information I can be written as

pðM jD; IÞ ¼ pðDjM ; IÞpðM jIÞpðDjIÞ : ð2:2Þ

For the LFP model in Eq. 2.1, the posterior probabilitybecomes

pðfsnðtÞg; fanrg; fsnrg; hgðtÞjfxrðtÞg; IÞ

¼ pðfxrðtÞgjfsnðtÞg; fanrg; fsnrg; hgðtÞ; IÞpðfxrðtÞgjIÞ

� pðfsnðtÞg; fanrg; fsnrg; hgðtÞjIÞ ; ð2:3Þ

where f�g refers to the set of parameters for all thecomponents and the whole ensemble of trials, hgðtÞdenotes the parameters for the ongoing process, andpðfsnðtÞg; fanrg; fsnrg; hgðtÞjIÞ is the prior probability forthe model parameters. For this additive model, thelikelihood pðfxrðtÞgjfsnðtÞg; fanrg; fsnrg; hgðtÞ; IÞ turnsout to be simply given by the probability model of theongoing activity, i.e., pðgðtÞjIÞ. In the absence of preciseknowledge about the temporal structure of the ongoingactivity, we assign gðtÞ to be independent identicallydistributed with an (unknown) time-independent vari-ance r2

g and zero mean. In this way, Eq. 2.3 is rewrittenas

pðfsnðtÞg; fanrg; fsnrg; rgjfxrðtÞg; IÞ

¼ pðfgðtÞgjrg; IÞpðfsnðtÞg; fanrg; fsnrg; rgjIÞpðfxrðtÞgjIÞ

: ð2:4Þ

Under the constraint of a given mean and r2g and

following the principle of maximum entropy (Sivia1996), a Gaussian density is assigned to the likelihoodfunction. After dropping the normalization term1=pðfxrðtÞgjIÞ, the posterior can be rewritten as

pðfsnðtÞg; fanrg; fsnrg; rgjfxrðtÞg; IÞ

/ pðfsnðtÞg; fanrg; fsnrg; rgjIÞð2pr2gÞ�RT2

� exp�PR

r¼1

PT

t¼1xrðtÞ �

PN

n¼1anrsnðt � snrÞ

� �2

2r2g

2

6664

3

7775; ð2:5Þ

where R is the total number of trials and T is the totalnumber of sampled data points in a given trial. Fornotational simplicity we have assumed the sampling

427

Page 3: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

interval to be unity in the above expression. In practice,the real time can be recovered by multiplying the integertime index t by the sampling interval.

In the absence of detailed knowledge about theparameters anr, snr, and snðtÞ, we take their prior distribu-tions to be uniform, with appropriate cutoffs reflectingphysiologically reasonable ranges of values. That is,

pðsnðtÞjIÞ ¼ const:; 8n ; ð2:6Þ

pðanjIÞ ¼ const:; for 0 < an � amax; 8n ; ð2:7Þ

pðsnjIÞ ¼ const:; for smin < sn � smax; 8n : ð2:8Þ

Treating the variance of the ongoing activity as anuisance parameter and assigning the Jeffreys priorpðrgÞ ¼ r�1g (Sivia 1996), we marginalize the posteriorover rg:

pðfsnðtÞg; fanrg; fsnrgjfxrðtÞg; IÞ

/ pðfsnðtÞg; fanrg; fsnrgjIÞZ1

�1

ð2pr2gÞ�RT2 r�1g

� exp � 1

2r2g

XR

r¼1

XT

t¼1xrðtÞ �

XN

n¼1anrsnðt � snrÞ

" #20

@

1

Adrg

ð2:9Þ

and obtain

pðfsnðtÞg; fanrg; fsnrgjfxrðtÞg; IÞ

/ pðfsnðtÞg; fanrg; fsnrgjIÞð2pÞ�RT2 C�RT

2

�XR

r¼1

XT

t¼1xrðtÞ �

XN

n¼1anrsnðt � snrÞ

" #20@

1A

�RT2

; ð2:10Þ

where Cð�Þ is the gamma function.The evaluation of the posterior probability and

computation of its moments can be obtained via Mar-kov Chain Monte Carlo (MCMC) methods. Althoughhighly informative, these methods carry the disadvan-tage of being computationally intensive. Here, instead,we summarize the posterior density by seeking themaximum a posteriori (MAP) solution, i.e., a set ofparameters that maximize the posterior probability. Inthe context of Eq. 2.2, the MAP solution for the modelparameters M is

M ¼ argmaxM

½pðDjM ; IÞpðM jIÞ�

¼ argmaxM

½ln pðM jIÞ þ ln pðDjM ; IÞ� : ð2:11Þ

Because waveforms, amplitude scaling factors, andlatency shifts are being estimated simultaneously, themodel has degeneracy. We solve this problem byconstraining the ensemble mean of the amplitude scalingfactors and latency shifts of each component to equalone and zero, respectively.

Intuition about the characteristics of the MAP solu-tion can be gained by examining the partial derivativesof the logarithm of the posterior probability with respectto each of the model parameters. This leads to a prac-tical and simple estimation algorithm. In what follows,the time t assumes discrete values corresponding todigital sampling. Let

Q ¼XR

r¼1

XT

t¼1xrðtÞ �

XN

n¼1anrsnðt � snrÞ

" #2: ð2:12Þ

Let P represent the posterior probability in Eq. 2.10.Then its logarithm can be simply written as

ln P ¼ �RT2

lnQþ const : ð2:13Þ

For the partial derivatives we use j, p, and q to denotespecific index values for the generic running indices n, r,and t, respectively. The first partial derivative withrespect to sjðqÞ iso ln PosjðqÞ

¼ �RT2

Q�1oQ

osjðqÞ; ð2:14Þ

where

oQosjðqÞ

¼ �2XR

r¼1Wajr � ðajrÞ2sjðqÞh i

ð2:15Þ

and

W ¼ xrðqþ sjrÞ �XN

n¼1n 6¼j

anrsnðq� snr þ sjrÞ : ð2:16Þ

Setting oQosjðqÞ ¼ 0 gives

sjðqÞ ¼

PR

r¼1Wajr

PR

r¼1ðajrÞ2

; ð2:17Þ

with sjðqÞ denoting the estimated parameter. The aboveequation does not have a closed-form solution since theright-hand side depends on the other estimated param-eters. However, intuition about the type of solution canbe obtained by examining the term W . Basically, thisterm involves the following two elements: (a) the dataare shifted according to the latency shift of the estimatedcomponent, i.e., xrðqþ sjrÞ; and (b) the other scaled andtime-shifted components, i.e., anrsnðq� snr þ sjrÞ, forn 6¼ j are subtracted from the data. The properly scaledresiduals, where the scaling is given by the term ajr, arethen averaged across trials.

Similarly, we obtain the estimate for the amplitudescaling factors anr:

ajp ¼

PT

t¼1UV

PT

t¼1V 2

; ð2:18Þ

428

Page 4: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

where U ¼ xpðtÞ �PN

n¼1n 6¼janpsnðt � snpÞ and V ¼ sjðt�

sjpÞ. Notice that the formula derived for ajp is relatedto a matched filter solution. That is, ajp is given byprojecting U , which is the data after removing thecontribution from the other scaled and time-shiftedcomponents, onto V , which is the current componentunder estimation.

For the latency shift parameter, setting oQosjp¼ 0 leads

to the following equation:

2XT

t¼1xpðtÞ �

XN

n¼1n6¼j

anpsnðt � snpÞ

2664

3775ajps0jðt � sjpÞ

2664

� a2jps0jðt � sjpÞsjðt � sjpÞ

3775 ¼ 0 ; ð2:19Þ

where s0jðt � sjpÞ is the time derivative of sjðt � sjpÞ. Thesolution for sjp is more difficult since s appears in theargument of the waveform function. Again, intuitioncan be gained by directly examining the condition for themaximization of the logarithm of the posterior, which isequivalent to the minimization of the term Q in Eq. 2.12.Expansion of this term results in

XR

r¼1

XT

t¼1

"x2r ðtÞ þ

"XN

n¼1anrsnðt � snrÞ

#2

� 2xrðtÞXN

n¼1anrsnðt � snrÞ

#: ð2:20Þ

As sjp is varied, only the cross terms in xpðtÞ��PN

n¼1 anpsnðt � snpÞ for n ¼ j are relevant for theminimization of Eq. 2.12 (as long as the event-relatedcomponents snðtÞ can be considered zero outside sometime interval ðt0; tf ÞÞ. Thus the optimal parameter sjp isfound by maximizing

qðsÞ¼XT

t¼1ajpsjðt� sÞ½xpðtÞ�

XN

n¼1n6¼j

anpsnðt� sÞ�

2664

3775 ; ð2:21Þ

which, if properly normalized, is just the cross correla-tion between the estimated component and the dataafter the contributions from the other components havebeen subtracted out. Thus

sjp ¼ argmaxs

qðsÞ : ð2:22Þ

This result corresponds to Woody’s matched filteralgorithm for latency estimation (Woody 1967).

2.3 Algorithm implementation

Given that the partial derivatives and the Hessian matrixof the posterior probability are readily available,

techniques such as conjugate gradients and quasi-Newtontrust regionmethods can be employed in the search for theMAP solution. At each iteration step, the optimizationcan be carried out in stages, i.e., first with respect to thelatency shifts, then the waveforms, and finally theamplitude scaling factors. These techniques are much lesscomputationally intensive than MCMC but still requiremultiple searches starting at different initial conditions orstochastic annealing to avoid local maxima.

However, the analysis in the previous section suggestsa simpler heuristic algorithm. After an initial guess, ateach iteration step the parameters for all components areupdated in sequence as mentioned before: first thelatency shifts, then the waveforms, and finally theamplitude scaling factors. Specifically, let sm

j ðtÞ, amjr, sm

jrdenote the estimated values of the parameters in them-th iteration. Also, let the latency shifts assume onlydiscrete integer values, with unit corresponding to thesampling interval. To avoid degeneracy in the model, theaverages of the amplitude scaling factors and latencyshift values in each iteration are constrained to ham

jrir � 1and hsm

jrir � 0. In this way, if there is no trial-to-trialvariability both in amplitude and latency, the superpo-sition of the estimated component waveforms shouldequal the AERP. For a single channel dataset fxrðtÞg,the algorithm consists of the following steps:

(0) At m = 0, the initial guess for the amplitudescaling factors and latency shifts are set toa0

jr ¼ 1; s0jr ¼ 0; 8j; r. For simplicity, the decision on

Fig. 1. Synthetic data generated to evaluate the performance of thedVCA algorithm (see text for details)

429

Page 5: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

the number of components N is based on theinspection of the AERP (see Figs. 1 and 4). Simi-larly, N non overlapping segments of the AERP aretaken as the initial guesses for the N components’waveforms s0j ðtÞ. After this initialization, each iter-ation consists of four steps:

(1) For all trials, estimate the single-trial latency shiftsfor one component at a time, starting with the firstand proceeding up to the N -th component, accordingto smþ1

jr ¼ argmaxs

qmðsÞ. Reexpressed in time units,the estimated latency shift is simply sjrDt, where Dt isthe sampling interval. Given an approximateknowledge of where in time the component is ex-pected to happen, an interval for the search of theoptimal latency shift sjr can be stipulated. In thisway, the possibility that the component matches bychance the waveform of unrelated ongoing activity isdiminished. In the application to experimental datain Sect. 4, the following intervals for the search ofthe optimal latency shift were employed: [�30 ms, 30ms] for early components and [�60 ms, 60 ms] forlater components.

(2) Estimate the waveforms according to

smþ1j ðtÞ ¼

PR

r¼1Wam

jr

PR

r¼1ðam

jrÞ2

;

with

W ¼ xrðt þ smþ1jr Þ �

XN

n¼1n 6¼j

amnrs

mn ðt � smþ1

nr þ smþ1jr Þ :

(3) For all the trials and components, estimate theamplitude scaling factors according to

amþ1jr ¼

PT

t¼1UV

PT

t¼1V 2

;

with U ¼ xrðtÞ �PN

n¼1n 6¼jð Þ am

nrsmþ1n ðt � smþ1

nr Þ andV ¼ smþ1

j ðt � smþ1jr Þ.

(4) Increment the iteration index: m ¼ mþ 1; repeat (1)through (3) for M iterations.

We note that the differential variability of thecomponents on a trial-by-trial basis is the foundation ofthe estimation technique. We thus term our algorithmdifferentially variable component analysis (dVCA). Forthe experimental data employed here, where the overlapbetween the components is not large and the signal-to-noise ratio (SNR) is high, two iterations already givereasonable results. The above algorithm is especially wellsuited for single-trial analysis of LFPs, e.g., intracorticallyrecorded potentials, where the spatial overlap of differentand distant sources is not severe as in noninvasiverecordings like EEG and MEG. However, if we suspect

that the components might be highly overlapped or that amore automaticway to specify the number of componentsand initialize their waveforms is called for, the algorithmcan be improved with the following consideration.

The algorithm is initialized with a single componentwhose waveform can be taken to be a short segmentaround the largest fluctuation of the AERP. Single-trialamplitude scaling factors and latency shifts for thiscomponent are initialized as before, and the same se-quence of steps (1–4) is performed until the estimatedparameters converge (given a tolerance criterion) or amaximum number of iterations is reached. Then theestimated single-trial scaled and translated component issubtracted from the corresponding single-trial data,resulting in residual data. The ensemble average of theseresidual data is computed, leading to a new AERP. Anew component is introduced into the estimation if thelargest fluctuation of this new AERP happens to beoutside of a confidence interval (e.g., �2rðtÞ=

ffiffiffiRp

, whererðtÞ is the ensemble standard deviation of the residual).The initial value of this new component is taken to bethe segment around this largest fluctuation. The wholeprocedure is repeated until the ensemble average of theresidual data no longer shows any significant structure,i.e., fluctuations outside the confidence interval. Besidesthe use of a confidence interval as mentioned above, amodel selection approach using information criteria(e.g., Akaike’s information criterion) for the selection ofthe number of components could also be explored.

3 Application to simulated data

Simulations were used to evaluate the robustness of thedVCA in the presence of noise. The synthetic datamimicked local electric field potentials containing twoindependent, event-related components as shown inFig. 1 (top panel). The components have Gaussianwaveforms motivated by those found in the experimen-tal data presented in the next section. A single-trial timeseries was produced by a linear summation of compo-nent 1, component 2, and the ongoing noise process thatwas assumed to have a 1/f spectrum with standarddeviation denoted by rnoise. (For white noise ongoingprocesses we obtained similar results.) Trial-to-trialvariations in each component were introduced bymultiplying the component with an amplitude scalingfactor and by shifting it in time with a specific latencyshift. Single-trial amplitude scaling factors for compo-nent 1 were varied as a lognormal distribution withsample mean lamp ¼ 1:0 and sample standard deviationramp ¼ 1:0. For component 2, the amplitude scalingfactors had a sample mean lamp ¼ 1:0 and a samplestandard deviation of ramp ¼ 1:5. Single-trial latencyshifts for component 1 were varied as a normal distri-bution with sample mean llat ¼ 0:0 and sample standarddeviation rlat ¼ 10:0 ms. For component 2, the latencyshifts had a sample mean llat ¼ 0:0 and sample standarddeviation rlat ¼ 25:0 ms.

Twelve synthetic datasets were generated, eachconsisting of 222 simulated trials sampled at 200 Hz and

430

Page 6: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

corresponding to a given ongoing noise level rnoise.Specifically, the signal-to-noise ratios (SNR) of thedatasets were f–34.6574, –27.7259, –20.7944, –13.8629, –9.8083, –6.9315, –2.8768, 0, 6.9315, 13.8629, 20.7944,27.7259 dBg with respect to component 2, where theSNR was defined as:

SNRcomponent ¼ 20 log10rcomponent

rnoisedB : ð3:1Þ

Here rcomponent was calculated as the square root of thevariance of the component across time. It was critical toassign SNRs separately for each component in eachexperiment since the standard deviations of the twocomponents differ.

Examples of single-trial time series are shown in themiddle panel of Fig. 1 where the SNR with respect tocomponent 2 is approximately �7 dB. It is clear thatcomponent 2 is not readily visible in these single-trialtime series and that component 1 is not immediatelyobvious except in trial 222.

The dVCA algorithm was applied to each of thesynthetic datasets as explained above in Sect. 2.3. Theinitial estimates for the component waveforms were ta-ken as the AERP between nonoverlapping time periodsand as zero elsewhere. Inspection of the AERP in eachof the synthetic datasets revealed that the first peak,defined as component 1, occurred between 65 and 125ms while the second peak, defined as component 2, oc-curred between 140 and 235 ms (see Fig. 1 bottompanel). Initial estimates of the single-trial amplitudescaling factors were initialized to 1.0, and those for thesingle-trial latency shifts were set to 0 ms. After initial-ization, dVCA was applied to each synthetic datasetuntil the estimated component waveforms varied lessthan 1% between iterations or until 15 iterations of thealgorithm were performed. Performance of thealgorithm was evaluated by comparing the estimatedwaveforms, amplitude scaling factors, and latency shiftswith their original counterparts. The accuracies ofestimated waveforms were measured by calculating thefractional root mean square (RMS) error for the j-thcomponent as follows:

Ejwave ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPT

t¼1sjðtÞ � sjðtÞ� �2

s

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPT

t¼1sjðtÞ� �2

s ; ð3:2Þ

where T is the total number of time points, sjðtÞ is theoriginal component waveform, and sjðtÞ is the estimatedcomponent waveform. As shown in the top panel ofFig. 2, the error in the estimated waveforms decreases asthe SNR increases for both components. A waveformerror of less than 25% is achieved at around �10 dB andis below 10% at a SNR of 0 dB for each component.

The accuracy of estimated amplitude scaling factorsand latency shifts was evaluated by examining the dis-tribution of the differences between estimated and ori-ginal values. Figure 2 (middle and lower panels) showsthe results of comparing the estimated values and the

original values for the two components across variousSNR levels. The horizontal black lines denote the onestandard deviation of the original latency shift andamplitude scaling factor values with respect to theirmeans. The broken lines represent the one standarddeviation of the differences between the estimated valuesand the original values. We note that, when these brokenlines fall within the black horizontal lines, it means thatthe trial by trial estimated latency shifts and amplitudescaling factors are better representations of the truevalues than the ensemble or grand average. For ampli-tude estimation, trial-by-trial estimations producedbetter results than grand average for all the SNR levelsexamined (middle panels). For latency estimation, weachieved this result for SNR above �11 dB (lowerpanels). The dashed lines in the middle and lower panelsare two standard deviations of the differences betweenthe estimated values and the original values.

For two datasets with very different SNRs weexamined directly the correlation between the estimatedand the original values for waveform, amplitude scalingfactor, and latency shift. The results are shown in Fig. 3.When the SNR is approximately 14 dB (high SNR, leftcolumn) with respect to component 2, the estimatedwaveforms of components 1 and 2 are an almost perfectreplication of the original waveforms, as shown in thetop left panel of Fig. 3. Similarly, the estimated ampli-tude scaling factors are nearly perfectly correlated with

Fig. 2. Accuracy of the dVCA estimates of component waveforms,amplitude scaling factors, and latency shifts across various signal-to-noise ratios (see text for details)

431

Page 7: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

the original values (r2component1 ¼ 0:99; r2component2 ¼ 0:99)(middle left panel). Good results are also achieved forlatency shifts (r2component1 ¼ 0:52; r2component2 ¼ 0:70)(lower left panel). For the second dataset, the SNR ofthe data with respect to component 2 was approximately�21 dB (low SNR, right column). The estimated wave-forms capture the major features of the original com-ponents but also contain some error. The estimatedamplitude scaling factors are still well correlated withthe original values (r2component1 ¼ 0:77; r2component2 ¼ 0:85),but the latency shifts are no longer well correlated withthe original values (r2component1 ¼ 0:04; r2component2 ¼ 0:10).We note that the SNR in this case is extremely small. Itis rather remarkable that the waveform and amplitudescaling factor estimations are still able to produce rea-sonable results.

Application of the dVCA algorithm to synthetic datademonstrates that the technique is robust in estimatingthe parameters of the two components from a singlechannel of data. Our results suggest that the methodworks effectively for all the component parameters forSNRs down to approximately �11 dB. For waveformsof the components and their amplitude scaling factors,the method works effectively for SNRs down to �21 dB.As the SNR decreases, the dVCA estimates of the la-tency shifts are most affected. It is worth noting that thesimulation here used two components that have a tem-poral overlap of about 35 ms. This is rather substantialgiven that each component has a length of about 60 to

70 ms. For situations with less component overlap, thedVCA technique is expected to work even more effec-tively. In summary, the results illustrated here demon-strate that the dVCA can be a valuable and accurate toolin identifying multiple components from a single chan-nel of data and that its application to real data shownbelow, where the components do not appear to overlaptemporally, should yield reasonable results.

4 Application to experimental LFP data

The experimental dataset was collected by Dr. RichardNakamura in the Laboratory of Neuropsychology at theU.S. National Institute of Mental Health. Visual evokedresponses in two macaque monkeys were sampled at200 Hz from chronically implanted surface-to-depthbipolar electrodes at 15 and 14 cortical sites in thehemisphere contralateral to the hand executing theresponse. Prior to estimation of the single-trial param-eters, the time series were resampled (linear interpola-tion) to 1 kHz. The monkeys performed a GO/NO-GOvisual pattern discrimination task. The prestimulus stagebegan when the monkey, while viewing a computerscreen, depressed a hand lever with the preferred hand.Following a random interval from 0.5 to 1.2 s, a visualstimulus appeared for 100 ms on the screen. Two typesof visual patterns were presented: four dots arranged asa DIAMOND or as a LINE. The monkeys wererewarded for lifting the hand in response (GO) to onepattern type or for maintaining pressure (NO-GO) tothe other pattern type. The contingency between stim-ulus pattern and response type was reversed on succes-sive sessions. For the applications in this paper, twoensemble datasets were employed: one with 222 GOresponse trials to the DIAMOND pattern and anotherconsisting of the 100 GO response trials to the LINEpattern that also had the fastest response times. Adetailed description of the experiment and data prepro-cessing have been presented elsewhere (Bressler et al.1993; Ding et al. 2000).

Examples of the estimation framework covering twomain applications to the analysis of LFP data are pre-sented. First, the algorithm is applied to LFPs fromthree diverse cortical sites: striate, parietal, andsomatosensory (arm area), illustrating the capability andversatility of the dVCA algorithm to recover single-trialparameters. Second, by subtracting out the event-relatedphase-locked potentials from the single-trial time serieswe estimate the ongoing activities. From the ongoingactivity we study the event-related modulation of powerspectra as a function of time.

4.1 Differential variability of event-related phase-lockedcomponents

Figure 4 shows the AERPs from three very diversecortical sites: striate, parietal, and somatosensory (armarea). Inspection of the AERPs suggests the presence oftwo main components. The time intervals for the

Fig. 3. Cortical sites (from top to bottom: striate, parietal, andsomatosensory)

432

Page 8: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

components are delineated by dashed vertical lines.Figures 5, 6, and 7 show the results of applying thedVCA estimation algorithm to the two componentsfrom each of the three cortical areas. The left (right)columns of the three figures are the reconstructedwaveforms (top), histograms of the estimated amplitudescaling factors (middle), and histograms of the estimatedlatency shifts (bottom) of the first (second) component.We sometimes also refer to the first and secondcomponents as the early and later components. It isinteresting to note that the latency shift histograms andamplitude scaling factor histograms (bottom plots) allshow a single peaked distribution, suggesting thatthe estimation is capturing the trial-to-trial variationsof the underlying events that are relatively phase-lockedto the stimulus onset.

Two observations are in order. First, for all the cor-tical sites, the variability in both amplitude scaling factorand latency shift is larger for the later component (rightcolumn in Figs. 5, 6, and 7) in comparison with theearlier one (left column in Figs. 5, 6, and 7), as indicatedby the broader latency shift distributions. This by itselfargues for the existence of differential variability of theevent-related phase-locked components. Although theAERP may suggest a monolithic structure, the differ-ential variability entails that the event-related compo-nents should be treated individually. Second, thevariability both in amplitude scaling factor and latencyshift increases as the cortical site becomes more distant

from the primary visual cortex, suggesting that otherintervening processes contribute to the increased vari-ability.

Given the trial-to-trial variability of latency shiftsand amplitude scaling factors, one is naturally led toinvestigate their relations to the variability of behavioralvariables (e.g., relationships between response time andlatency shift or amplitude scaling) and also to examinethe relations between the components’ parameters (i.e.,between amplitude scaling factors and latency shifts).Here we focus on the relationships between the single-trial response time (RT) and component latency shifts.Scatter plots in Fig. 8 show that significant correlationappears for both components at the parietal site (middleplots), although it is much higher for the second one,indicating that this component is more related to themotor phase of the task than the sensory phase [r2 valuesof 0.09 and 0.25 for the first and second components(p < 0:001); the two coefficients were significantly dif-ferent (p < 0:02)]. Indeed, the mean response time isaround 260 ms after stimulus onset, which is about thesame time as the peak of the second parietal component.A much stronger correlation with RT is observed for thecomponents in the somatosensory site (top plots) (r2

values of 0.60 and 0.62 (p < 0:001) for the first andsecond component, respectively). The first componentactually precedes the single-trial RT, indicating that thiscomponent arises possibly as an early activation fromfrontal cortex (motor or premotor). No significant

Fig. 4. Event-related phase-locked components and AERPs (see textfor details)

Fig. 5. Estimated components and estimated parameter distributionsfor the striate channel. Ensemble size: 222 trials. Two iterations of thealgorithm were performed

433

Page 9: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

correlations are observed for the striate site (p < 0:05)(bottom plots).

To provide supporting evidence for the above corre-lation results, we sorted the trials according to the levelof RT and plotted the single-trial time series as a gray-scale-coded ‘‘raster plot’’ in Fig. 9. The RT is superim-posed on the raster plots as the black curve. It is quiteapparent that for the striate site (top plot) little relationexists between RT and the component latency. For thesomatosensory site (bottom plot), the latency for bothcomponents clearly tracks the RT curve.

Another useful application of single-trial parameterestimation is in the investigation of cortical processes atdifferent cortical sites that appear to be related byexamination of their AERPs. The AERPs alone, how-ever, cannot reveal whether transient phase-locked pro-cesses are related. An illustrative case is presented for thestriate and the somatosensory channel. As seen in Fig. 4,the second component at the striate site peaks at aboutthe same time as the first component at the somatosen-sory site, just before the mean response time (260 ms).However, the single-trial latency shifts for the somato-sensory component are highly correlated with RT, whilethere is no significant correlation for the striate site(Fig. 8). Further corroboration of this result is obtainedby visual inspection of the single-trial waveforms sortedaccording to the respective single-trial RTs (Fig. 9). It isclear that, although the striate and somatosensoryAERPs exhibit components at similar times, indicated by

the temporal coincidence of peaks or large fluctuations,the activation in the somatosensory cortex systematicallytracks the RT, whereas the striate activation seems tobehave independently. This indicates that the activationsmeasured by these potentials are related to different andpossibly largely independent processes.

4.2 Separation of ongoing activity and event-relatedphase-locked components

Single-trial LFP time series can stem from many distinctcortical processes. In early sensory cortices, for example,phase-locked transient signals are thought to arisemainly from the response to thalamic inputs, whilemodulations observed in ongoing activity signals arethought to represent changes in effective connectivity orother intrinsic parameters of the cortical networks(Pfurtscheller and Lopes da Silva 1999). An importantstep in the analysis of LFPs recorded in the event-relatedparadigm is then to decompose the time series into thesetwo main signals. Once the decomposition is achieved,the analysis proceeds by the investigation of thetemporal dynamics in each of the decomposed signalsseparately.

The model of event-related recordings presented inSect. 2 implies that simple subtraction of the AERPfrom each single-trial time series does not result in thedesired decomposition due to the trial-to-trial variability

Fig. 6. Estimated components and estimated parameter distributionsfor the parietal site. The same conventions as in Fig. 5 are employed.Ensemble size: 222 trials (see text for details)

Fig. 7. Estimated components and estimated parameter distributionsfor the somatosensory site. The same conventions as in Fig. 5 areemployed. Ensemble size: 222 trials (see text for details)

434

Page 10: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

of event-related activity. The problems resulting fromthis simple-minded decomposition have been analyzed inTruccolo et al. (2002a). We reproduce some basicarguments here for completeness. For simplicity, con-sider the case of a single component with trial-to-trialvariability in amplitude scaling factors only:

xrðtÞ ¼ arsðtÞ þ grðtÞ : ð4:1Þ

The ensemble average (AERP) based on this modelbecomes hxrðtÞir ¼ harisðtÞ. After the AERP is sub-tracted from each single trial, the residual time series isgiven by

nrðtÞ ¼ arsðtÞ � hxrðtÞir þ grðtÞ

¼ ðar � hariÞsðtÞ þ grðtÞ

¼ SrðtÞ þ grðtÞ ; ð4:2Þ

which contains two components – the ongoing activitygrðtÞ and a component related to the AERP waveform

SrðtÞ ¼ ðar � hariÞsðtÞ : ð4:3Þ

Thus, a phase-locked component still remains in theresidual LFP time series even after the AERP issubtracted from each trial. Consequently, the varianceof xrðtÞ over the ensemble of trials at a given time tbecomes

r2xðtÞ ¼ h½nrðtÞ�2ir¼ h½SrðtÞ�2ir þ h½grðtÞ�2ir¼ r2

SðtÞ þ r2gðtÞ ; ð4:4Þ

which is also clearly modulated by the time dependenceof r2

SðtÞ. Similarly, the power spectral density timefunction of the residual, hjnrðf ; tÞj2i, computed in asliding time window centered at time t will also exhibittemporal modulations at frequencies characteristic ofthe AERP waveform. Specifically, since sðtÞ is oftenoscillatory with a very distinct main frequency, thequantity hjnrðf ; tÞj2i at this frequency will be modulatedand should also exhibit a significant increase during theevent-related response time period. In other words, evenif the ongoing activity is not changing significantly intime, a significant transient change in the ensemblevariance and power time functions will be observed, withthe time course of this transient being related to sðtÞ andto the shape of the AERP. Another important possibilityconsidered below appears in the case where largetransients resulting from the remnant phase-lockedcomponent can mask important changes in the ongoingactivity.

Therefore, rather than simply subtracting the AERP,a better solution for the decomposition is obtained whenthe single-trial phase-locked components are estimatedand subtracted from each single-trial time series. Theongoing activity, grðtÞ, is then obtained as

Fig. 8. Correlation between response time (RT) and single-triallatency shift for each estimated component in three cortical sites(see text for details)

Fig. 9. Single-trial latencies and RT. The amplitudes are denoted bythe grayscale and have been normalized for illustration purposes

435

Page 11: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

grðtÞ ¼ xrðtÞ �XN

n¼1anrsnðt � snrÞ : ð4:5Þ

We now illustrate this approach by looking at the timecourse of neural oscillations observed from a frontal site(superior principle) for one of the monkeys. The 100trials corresponding to the fastest response times to theGO response (stimulus LINE) were employed. Thischoice was done primarily to highlight the modulationof ongoing activity between the pre- and postsensoryactivation periods.

Figure 10 (top plot) shows the AERP’s waveformwith a characteristic frequency around 13 Hz and a mainpeak at � 130 ms following stimulus onset. To contrastthe performance of the ongoing process estimation inEq. 4.5 with that of the simple AERP subtractionmethod, we first compute the power spectrum timefunction on the residual time series that is obtained bysubtracting out the AERP from each single-trial timeseries. The spectrum is derived by fitting a fifth-orderautoregressive model to the residual time series using a50-ms-long analysis window that is shifted one data

point at a time (Ding et al. 2000) through the trial. Asshown by the thin solid line in the bottom plot ofFig. 10, there is a clear peak in the spectrum at 19 Hz forthe time window centered at 50 ms. This oscillatoryphenomenon is also characteristic of the whole presti-mulus period. Analysis done in Liang et al. (2002)implicated the 19-Hz oscillation corresponding to thispeak as possibly related to anticipatory attention. Torule out the possibility that this oscillation was linked toreward anticipation or motor preparation, it was nec-essary to study the time course of the oscillation. Thesolid line in the middle plot of Fig. 10 is the power at19 Hz as a function of time. It shows no decline as thetrial progresses. But the small peak seen around 130 mspoststimulus (coinciding with the main peak of theAERP) and the time course of the power at 13 Hz(dashed line in the middle plot of Fig. 10), peaking alsoat 130 ms, suggest that the persistence of the power at19 Hz is an artifact due to the interference from theevoked activity that is not cleanly removed by the simplesubtraction of the AERP.

To address this possibility, we estimate the evokedactivity on a single-trial basis. A single component ismodeled with the initial guess for its waveform obtainedfrom the AERP around 130 ms. The final estimatedcomponent’s waveform preserves the main shape of theAERP, including its characteristic oscillation around13 Hz. The power spectrum performed on the ongoingactivity obtained from (3.5) is then computed. A singletime snapshot for the power computed from a windowat 160 ms is shown in Fig. 10 (bottom plot). Interest-ingly, the power peak at 13 Hz is no longer present, andthe 19 Hz ongoing activity is reduced to one third of thepresensory activation level. Based on these findings it isconcluded that the prestimulus oscillation in the betafrequency range is not related to motor preparation orreward but to anticipatory attention.

5 Discussion

The main contribution of this paper is to present, from aBayesian perspective, a unified framework for theestimation of single-trial ERP parameters where all theassumptions in the inference process are made explicit.The resulting algorithm is referred to as differentiallyvariable component analysis (dVCA). The frameworkmakes it easy to analyze and relate different algorithms.For example, Woody’s matched filter method is seen as away of implementing the MAP solution for the posteriorprobability under specific assumptions. The dVCAapproach is thoroughly tested on simulated data andthen applied to the analysis of LFP data from behavingmonkeys, where we demonstrate the benefits of single-trial parameters and the proper extraction of theongoing process in delineating the physiological mean-ing of the various aspects of LFP recordings. Ourexperience shows that the dVCA implemented hereworks well for the estimation of relatively low-frequencyand large-amplitude event-related components. High-frequency components (e.g., evoked gamma bursts) may

Fig. 10. Discriminating event-related transients in ongoing activityfrom transients in phase-locked components. The middle plot showsthe power at 13 and 19 Hz as a function of time computed from theAERP-subtracted residual time series. Bottom plot: power spectracomputed from a short window centered at 50 and 160 ms (dashedcurve: spectrum from AERP subtracted time series; thick curve:spectrum from the time series after the single-trial phase-lockedcomponent had been subtracted

436

Page 12: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

represent a challenge to this heuristic algorithm becauseof their low SNR and the larger impact of trial-to-triallatency variability when the recordings are obtainedfrom broad field potentials.

The framework deals mainly with the analysis ofrecordings from a single channel. This channel-by-channel strategy is reasonable for multichannelrecordings where the separation between electrodes issufficiently large that different channels are not mea-suring the same generators. However, for recordingsmade with densely packed sensors such as in high-den-sity scalp EEG recordings, the incorporation of multiplechannels into the component estimation process be-comes important. A multichannel implementationwithin this same framework has been briefly describedby Truccolo et al. (2002b) and Knuth et al. (2001) andused to perform a detailed investigation of intralaminarcortical interactions (Shah et al. 2001). It is worth notingthat many currently used component separation tech-niques, including cluster analysis, factor analysis, PCA,and ICA (Bell and Sejnowski 1995; Jung et al. 1999), canbe seen as special cases of the multichannel multicom-ponent Bayesian approach depending on variations ofthe underlying model, the choice of the prior proba-bilities, and the parameters to be estimated (Knuth 1999;Hinton and Ghahramani 1997; Knuth et al. 2002; Knuthet al., manusript in preparation).

Improvements in the proposed framework are ex-pected to come especially in three main aspects. First,the current LFP model assumes the linear superposi-tion of event-related phase-locked components andongoing activities. Future experimental evidence mayresult in better models, allowing for nonlinear effects aswell as for more informative priors for the estimatedparameters. Second, the current approach models theongoing activity as white noise. This is clearly anoversimplification. There are some important consid-erations regarding improvements in this respect,though. To begin with, we note that the inclusion oftemporal correlations of ongoing activity does notchange the location of the modes of the posteriorprobability. (This partly explains why in Sect. 3 we areable to get good estimation results even though theongoing process is modeled as 1/f noise.) It only affectstheir width, i.e., the spread of the probability, and byconsequence the variance of the estimators. In this waythe MAP solution remains the same regardlessof whether the ongoing activity is assumed to be whiteor temporally correlated. Furthermore, the inclusion ofthe autocorrelation function or any other higher-orderstatistics in the time domain is very cumbersome. Abetter solution would be to rewrite the generativemodel in the spectral domain (Brillinger 1975) and usethe power spectrum of the preevent LFP as an estimatefor the ongoing activity. However, based on ourexperience and the considerations of Sect. 4.2, theongoing activity may be highly nonstationary duringthe transition from pre- to postevent segments. Futureresearch could consider the alternative of including thepower spectrum of the ongoing activity as a parameterto be simultaneously estimated. That would also reduce

the risk of overfitting the data, in other words spurious‘‘locking’’ of the estimated components’ waveforms tothe ongoing activity.

Finally, a third aspect for improvement refers to thealgorithmic implementation itself. The choice of thenumber of event-related components and how to modeleach component is a key aspect of the algorithm. Thesolution implemented in this work depends largely oninspection of the AERP waveform and is somewhat adhoc. An improvement in this aspect may be obtained bytaking explicit advantage of differential variability foramplitude scaling factors and latency shifts betweendifferent components (Sect. 4.1). The alternative for anautomatic and more principled way of selecting thenumber of components suggested in Sect. 2.3 makes useof such differential variability as a source of separabilityfor the component analysis. This approach is currentlyunder investigation in the context of a multichannelmulticomponent framework (Knuth et al., manusript inpreparation).

Acknowledgements. This work was supported byNIMH (MH64204and MH42900), NSF (IBN0090717), ONR (N000149910062),T32M07288, NARSADYoung Investigator Award (KHK), NASAIDU/IS/CICT Program, and the NASA Aerospace TechnologyEnterprise.

References

Bartnik EA, Blinowska KJ, Durka PJ (1992) Single evoked-po-tential reconstruction by means of wavelet transform. BiolCybern 67: 175–181

Bell AJ, Sejnowski TJ (1995) An information maximizationapproach to blind separation and blind deconvolution. NeuralComput 7(6): 1129–1159

Bressler SL, Coppola R, Nakamura R (1993) Episodic multire-gional cortical coherence at multiple frequencies during visualtask performance. Nature 366: 153–156

Brillinger DR (1975) Time series analysis: data analysis and theory.Holt, New York

Coppola R, Tabor R, Buchsbaum MS (1978) Signal to noise ratioand response variability measurements in single trial evokedpotentials. Electroencephalogr Clin Neurophysiol 44: 214–222

Ding M, Bressler SL, Yang W, Liang H (2000) Short-windowspectral analysis of cortical event-related potentials by adaptivemultivariate autoregressive modeling: data preprocessing,model validation, and variability assessment. Biol Cybern 83:35–45

Haig AR, Gordon E, Rogers G, Anderson J (1995) Classificationof single-trial ERP sub-types: application of globally optimalvector quantization using simulated annealing. Clin Neuro-physiol 94: 288–297

Hinton GE, Ghahramani Z (1997) Generative models for discov-ering sparse distributed representations. Philosoph Trans R SocLond Ser B Biol Sci 352(1358): 1177–1190

Jaskowski P, Verleger R (1999) Amplitude and latencies of single-trial ERP’s estimated by a maximum-likelihood method. IEEETrans Biomed Eng 46(8): 987–993

Jung T-P, Makeig S, Westerfield M, Townsend J, Courchesne E,Sejnowski TJ (1999) Analyzing and visualizing single-trialevent-related potentials. In: Proceedings of the conference onadvances in neural information processing systems (NIPS1999),Denver, 30 November–2 December 1999, pp 118–124

Knuth KH (1999) A Bayesian approach to source separation. In:Cardoso JF, Jutten C, Loubaton P (eds) Proceedings of the 1st

437

Page 13: Estimation of single-trial multicomponent ERPs: …bressler/pdf/BiolCyber03.pdf2.2 Bayesian estimation framework The problem of estimating the single-trial parame-ters s nðtÞ, a

international workshop on independent component analysisand signal separation (ICA’99), Aussios, France, 11–15 Janu-ary 1999, pp 283–288

Knuth KH, Truccolo WA, Bressler SL, Ding M (2001) Separa-tion of multiple evoked components using differential ampli-tude and latency variability. In: Proceedings of the 3rdinternational conference on independent component analysisand blind signal separation, San Diego, 9–12 December 2001,pp 463–468

Knuth KH, Clanton ST, Shah AS, Truccolo WA, Ding M, BresslerSL, Trejo LJ, Schroeder CE (2002) Multiple component event-related potential (mcERP) estimation. In: Abstracts of theSociety for Neuroscience annual meeting, San Diego, 2–7November 2002, vol 28, Prog no 506.4.

Liang H, Bressler SL, Ding M, Truccolo WA, Nakamura R(2002) Synchronized activity in prefrontal cortex duringanticipation of visuomotor processing. Neuroreport 13(6):2011–2015

Lange DH, Pratt H, Inbar GF (1997) Modeling and estimation ofsingle evoked brain potential components. IEEE Trans BiomedEng 44(9): 791–799

McGillem CD, Aunon JI, Pomalaza CA (1985) Improved wave-form estimation procedures for event-related potentials. IEEETrans Biomed Eng 32(6): 371–379

Nowak LG, Bullier J (1997) The timing of information transfer inthe visual system. In: Rockland KS (ed) Cerebral cortex,vol 12. Plenum Press, New York, pp 205–241

Pfurtscheller G, Lopes da Silva FH (1999) Event-related EEG/MEG synchronization and desynchronization: basic principles.Clin Neurophysiol 110: 1842–1857

Pham DT, Mocks J, Kohler W, Gasser T (1987) Variable latenciesof noisy signals: estimation and testing in brain potential data.Biometrika 74(3): 525–533

Quiroga RQ (2000) Obtaining single stimulus evoked potentialswith wavelet denoising. Physics D 145(3–4): 278–292

Quiroga RQ, Garcia H (2003) Single-trial event-related potentialswith wavelet denoising. Clin Neurophysiol 114(2): 376–390

Schmolesky MT, Wang Y, Hanes DP, Thompson KG, Leutgeb S,Schall JD, Leventhal AG (1998) Signal timing across themacaque visual system. J Neurophysiol 79: 3272–3278

Schroeder CE, Mehta AD, Givre SJ (1998) A spatiotemporalprofile of visual system activation revealed by current sourcedensity analysis in the awake macaque. Cerebral Cortex 8:575–592

Shah AS, Knuth KH, Mehta AD, Fu KG, Johnston TA, Dias EC,Truccolo WA, Ding M, Bressler SL, Schroeder CE (2001)Functional connectivity between visual structures in behavingmonkeys. In: Abstracts of the Society for Neuroscience annualmeeting, San Diego, 10–15 November 2001, vol 27, Progno 721.3

Sivia DS (1996) Data analysis: a Bayesian tutorial. Oxford Uni-versity Press, New York

Truccolo WA, Ding M, Knuth KH, Nakamura R, Bressler SL(2002a) Trial-to-trial variability of cortical evoked responses:implications for the analysis of functional connectivity. ClinNeurophysiol 113: 206–226

Truccolo WA, Knuth KH, Bressler SL, Ding M (2002b) Bayesiananalysis of single trial cortical event-related components. In:Fry RL (ed) Bayesian inference and Maximum entropy andBayesian methods in science and engineering. In: Proceedingsof AIP conference, Baltimore, MD, 4–9 August 2001, 617: 64–73. American Institute of Physics, Melville, NY

Woody CD (1967) Characterization of an adaptive filter for theanalysis of variable latency neuroelectric signals. Med Biol Eng5: 539–553

438