Analyzing Brain Signals by Combinatorial Optimization

Upload
dieterreese 
Category
Documents

view
29 
download
1
description
Transcript of Analyzing Brain Signals by Combinatorial Optimization
Analyzing Brain Signals
by Combinatorial Optimization
Justin DauwelsLIDS, MIT
Amari Research Unit, Brain Science Institute, RIKEN
December 1, 2008
Quantifying statistical interdependence of point processes
Application to spike data and EEG
Topics• Mathematical problem Similarity of Multiple Point Processes
• Motivation/Application Early diagnosis of Alzheimer’s disease from EEG signals
• Along the way… Spike synchrony
CollaboratorsFrançois Vialatte*, Theophane Weber+, and Andrzej Cichocki* (*RIKEN, +MIT)
Financial Support
Alzheimer's disease
• Mild (early stage) becomes less energetic or spontaneous noticeable cognitive deficits still independent (able to compensate)
• Moderate (middle stage) Mental abilities decline personality changes become dependent on caregivers
• Severe (late stage) complete deterioration of the personality loss of control over bodily functions total dependence on caregivers
Apathy
Memory(forgettingrelatives)
Evolution of the disease (stages)One disease, many symptoms
Video sources: Alzheimer society
• 2 to 5 years before mild cognitive impairment (MCI) 6 to 25 % progress to Alzheimer‘s
memory, language, executive functions, apraxia, apathy, agnosia, etc…
• 2% to 5% of people over 65 years old• up to 20% of people over 80 Jeong 2004 (Nature)
EEG data
GOAL: Diagnosis of MCI based on EEG
• EEG is relatively simple and inexpensive technology• Early diagnosis: medication more effective, more time to prepare future care of patient, etc.
Overview
Alzheimer’s Disease (AD)
decrease in EEG synchrony Similarity of Point Processes
Two 1D point processesTwo multiD point processesMultiple multiD point processes
Numerical Results Conclusion
Alzheimer's diseaseInside glimpse: abnormal EEG
• AD vs. MCI (Hogan et al. 203; Jiang et al., 2005)• AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006)• MCI vs. mildAD (Babiloni et al., 2006).
Decrease of synchrony
Brain “slowdown”
slow rhythms (0.58 Hz) fast rhythms (830 Hz)
(Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993).
Images: www.cerebromente.org.br
EEG system: inexpensive, mobile, useful for screening
focus of this project
Spontaneous (scalp) EEG
Fourier power
f (Hz)
t (sec)
ampl
itude
Fourier X(f)2
EEG x(t)
Timefrequency X(t,f)2
(wavelet transform)
Timefrequency patterns(“bumps”)
Sparse representation: bump model
Bumps
Sparse representation
F. Vialatte et al. “A machine learning approach to the analysis of timefrequency maps and its application to neural dynamics”, Neural Networks (2007).
104 105 coefficients
about 102 parameters
t (sec)
f(Hz)
f(Hz)
t (sec)
f(Hz)
t (sec)
Assumptions:
1. timefrequency map is suitable representation
2. oscillatory bursts (“bumps”) convey key information
Similarity of bump models
How “similar” are n ≥ 2 bump models?
Similarity of multiple multidimensional point processes
with and
“point” / ”event”
Overview
Alzheimer’s Disease (AD)
decrease in EEG synchrony Similarity of Point Processes
Two 1dim point processesTwo multidim point processesMultiple multidim point processes
Numerical Results Conclusion
Two onedimensional point processes
tx
x’
0
0 t
How synchronous/similar?
Classical methods for continuous time series faile.g., crosscorrelation
Two aspects of synchrony
Analogy: waiting for a train
• Train may not arrive (e.g., mechanical problem) = Event reliability
• Train may or may not be on time = Timing precision
Two 1dim point processes
Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from MorrisLecar Neuron Conclusion
Spike Synchrony Measures
Von Rossum distance (mixed) Schreiber et al similarity measure (mixed) HunterMilton similarity measure (mixed) VictorPurpura distance metric (event reliability) Event synchronization (mixed) Stochastic event synchrony
(timing precision and event reliability)
Van Rossum distance measure• Spikes convolved with exponential or Gaussian
function
→ spike trains converted into time series s(t) and s’(t)
• Squared distance between s(t) and s’(t)
• If x = x’, we have DR = 0
• Time constant τR
x
x’
0
0 τR
van Rossum M.C.W., 2001. A novel spike distance. Neural Computation 13, 751–63.
Schreiber et al. similarity measure
• Spikes convolved with exponential or Gaussian function
→ spike trains converted into time series s(t) and s’(t)
• Correlation between s(t) and s’(t)
• If x = x’, we have SS = 1
• Time constant τS
Schreiber S., Fellous J.M., Whitmer J.H., Tiesinga P.H.E., and Sejnowski T.J., 2003. A new correlationbased measure of spike timing reliability. Neurocomputing 52, 925–931.
x
x’
0
0
VictorPurpura distance measure• Minimal cost DV of transforming x into x'• Basic operations
• event insertion/deletion: cost = 1• event movement: cost proportional to distance (constant CV)
• If x = x’, we have DV = 0
• Time constant τV = 1/CV
DELETION
INSERTION
Victor J. D. and Purpura K. P., 1997. Metricspace analysis of spike trains: theory, algorithms, and application. Network: Comput. Neural Systems 8(17), 127–164.
Stochastic Event Synchrony
• x and x’ synchronous if identical apart from• delay• little timing jitter• few deletions/insertions
• based on generative statistical model
x
x’
0
0
v0
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Stochastic Event Synchrony
v0 T0
T0
T0
0
0 T0
x
x’
0
0 T0
δt /2
δt /2
noncoincident
noncoincident
x
x
Stochastic event synchrony (SES): delay δt , jitter st , noncoincidence ρDauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Marginalizing over v:
v0 T0
T0
T0
0
0 T0
x
x’
0
0 T0
δt /2
δt /2
geometric prior for lenght
events i.u.d. in [0,T0]
Gaussian offsets withmean δt /2 and variance st /2
Gaussian offsets withmean δt /2 and variance st /2
i.i.d. deletions with prob pd
i.i.d. deletions with prob pd
noncoincident
noncoincident
x
xStochastic Event Synchrony
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Probabilistic inference
DYNAMIC PROGRAMMINGPARAMETER ESTIMATION
PROBLEM: Given 2 point processes x and x’, compute ρ and θ = δt , st
APPROACH: (j*, j’*,θ*) = argmaxj,j’,θ log p(x, x’, j, j’,θ)
SOLUTION: Coordinate descent
(j(i+1) , j’(i+1) ) = argmaxj,j’ log p(x, x’, j , j’ , θ(i)) θ(i+1) = argmaxx log p(x, x’, j(i+1) , j’(i+1) , θ)
0 x1 x2 x3 x4 x5 x6
0
x’1
x’2
x’3
x’4
x’5
x’6
xk noncoincident x’k’ noncoincident (xk x’k’ ) coincident pair
Spike Synchrony Measures
Von Rossum distance (mixed) Schreiber et al similarity measure (mixed) HunterMilton similarity measure (mixed) VictorPurpura distance metric (event reliability) Event synchronization (mixed) Stochastic event synchrony
(timing precision and event reliability)
Two 1dim point processes
Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from MorrisLecar Neuron Conclusion
Surrogate Data
• pd = 0, 0.1, …, 0.4 (deletion probability)• δt = 0, 25, and 50 ms (delay)• σt = 10, 30, and 50 ms (timing jitter)• length of hidden sequence = 40/(1pd)• expected length of x and x’ = 40
E{S} computed over 10’000 pairs
Surrogate Data: Results
• E{DR} increases with pd and σt
→ DR cannot distinguish timing dispersion from event reliability(likewise all measures except SES and DV)
• E{DV} increases with pd, practically independent of σt
→ DV measure for event reliability
• ONLY curves for δt = 0ms, measures strongly depend on lag
Victor Purpura measure DVVan Rossum measure DRsimilar for SS ,SH ,SQ
δt =0
Surrogate Data: Results for SES
• E{σt} increases with σt, practically independent of pd
→ σt measure for timing dispersion
• E{ρ} increases with pd, practically independent of σt
→ ρ measure for event reliability
• Curves for δt = 0, 25, and 50 ms practically coincident
Two 1dim point processes
Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from MorrisLecar Neuron Conclusion
MorrisLecar Neurons• Simple neuron model• Exhibits behavior of Type I & II neurons (saddlenode/Hopf bifurc.)• Input current: baseline + sinusoid + Gaussian noise
• Membrane potentialType I Type II
5 trials
Spiking threshold
High reliabilityLarge timing dispersion
Low reliabilitySmall timing dispersion
jitter st = (15ms)2, noncoincidence ρ = 3% jitter st = (3ms)2, noncoincidence ρ = 27%
Type I Type II
MorrisLecar Neurons (2)
50 trials
MorrisLecar Neurons: Results
• Small τ: Type II has larger similarity than type I (dispersion in Type I)• Large τ: Type I has larger similarity than type II (dropouts in Type II)
• Observation:Similarity depends on time constant τ → similarity FUNCTION S(τ)SES AUTOMATICALLY selects st
Two 1dim point processes
Review of Spike Synchrony Measures Surrogate Spike Data Spike Trains from MorrisLecar Neuron Conclusion
Conclusion
• Similarity of pairs of spike trains: timing precision and reliability
• Comparison of various spike synchrony measures
• Most measures not able to separate the two aspect of synchrony• Exception: VictorPurpura and Stochastic Event Synchrony
• VictorPurpura: event reliability• SES: both timing precision and event reliability
• Most measures depend on time constant, to be chosen by user• Exception: Event Synchronization and SES
• Most measures sensitive to lags between the two spike trains• Exception: SES
• Future work: application to neurophysiological recordings
Overview
Alzheimer’s Disease (AD)
decrease in EEG synchrony Similarity of Point Processes
Two 1dim point processesTwo multidim point processesMultiple multidim point processes
Numerical Results Conclusion
Similarity of two bump models...
... by matching bumps
• Bumps in one model, but NOT in other → fraction of “noncoincident” bumps ρ
• Bumps in both models, but with offset → Average time offset δt (delay) → Timing jitter with variance st
→ Average frequency offset δf → Frequency jitter with variance sf
PROBLEM: Given two bump models, compute (ρ, δt, st, δf, sf )
Stochastic Event Synchrony (SES) = (ρ, δt, st, δf, sf )
Generative model
Generate bump model (hidden)
• geometric prior for number of bumps
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate two “noisy” observations
• offset between hidden and observed bump = Gaussian random vector with mean ( ±δt /2, ±δf /2) covariance diag(st/2, sf /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
yhidden
y y’
( δt /2, δf /2)
( δt /2, δf /2)
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Summary
MATCHING → maxproductESTIMATION → closedform
PROBLEM: Given two bump models, compute (ρ, δt, st, δf, sf )
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
θ
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Average synchrony
3. SES for each pair of models4. Average the SES parameters
1. Group electrodes in regions2. Bump model for each region
Overview
Alzheimer’s Disease (AD)
decrease in EEG synchrony Similarity of Point Processes
Two 1dim point processesTwo multidim point processesMultiple multidim point processes
Numerical Results Conclusion
Beyond pairwise interactions
Pairwise similarity Multivariate similarity
Similarity of multiple bump modelsy1 y2 y3 y4 y5
y1 y2 y3 y4 y5
Constraint: in each cluster at most one bump from each signal
Models similar if• few deletions/large clusters• little jitter
Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008.
Generative model
Generate bump model (hidden)
• geometric prior for number n of bumps
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate M “noisy” observations
• offset between hidden and observed bump = Gaussian random vector with mean ( δt,m /2, δf,m /2) covariance diag(st,m/2, sf,m /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
yhidden
y1 y2 y3 y4 y5
Parameters: θ = δt,m , δf,m , st,m , sf,m, pc
pc (i) = p(cluster size = i y) (i = 1,2,…,M)
Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008.
Probabilistic inference
CLUSTERING (Integer Program)ESTIMATION OF PARAMETERS
PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m, pc
APPROACH: (b*,θ*) = argmaxb,θ log p(y, y’, b, θ)
SOLUTION: Coordinate descent
b(i+1) = argmaxc log p(y, y’, b, θ(i) ) θ(i+1) = argmaxx log p(y, y’, b(i+1) ,θ )
Integer programming methods (e.g., LP relaxation)• IP with 10.000 variables solved in about 1s• CPLEX: commercial toolbox for solving IPs (combines several algorithms)
Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008.
Overview
Alzheimer’s Disease (AD)
decrease in EEG synchrony Similarity of Point Processes
Two 1dim point processesTwo multidim point processesMultiple multidim point processes
Numerical Results Conclusion
EEG Data
EEG data provided by Prof. T. Musha
• EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 agematched control subjects (CTR) recorded while in rest with closed eyes → spontaneous EEG
• All 22 MCI patients suffered from Alzheimer’s disease (AD) later on
• Electrodes located on 21 sites according to 1020 international system
• Electrodes grouped into 5 zones (reduces number of pairs) 1 bump model per zone
• Band pass filtered between 4 and 30 Hz
Similarity measures• Correlation and coherence• Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC, ...
• Phase Synchrony: compare instantaneous phases (wavelet/Hilbert transform)
• State space based measures sync likelihood, Sestimator, SHNindices, ...
• Informationtheoretic measures KL divergence, JensenShannon divergence, ...
No Phase Locking Phase Locking
TIME FREQUENCY
Sensitivity (average synchrony)
Granger
Info. Theor.
State Space
Phase
SES
Corr/Coh
MannWhitney test: small p value suggests large difference in statistics of both groups
Significant differences for ffDTF and SES (more unmatched bumps, but same amount of jitter)
Classification (biSES)
• Clear separation, but not yet useful as diagnostic tool• Additional indicators needed (fMRI, MEG, DTI, ...)• Can be used for screening population (inexpensive, simple, fast)
ffDTF
± 85% correctly classified
Strong (anti) correlations „families“ of sync measures
Correlations
Overview
Alzheimer’s Disease (AD)
decrease in EEG synchrony Similarity of Point Processes
Two 1dim point processesTwo multidim point processesMultiple multidim point processes
Numerical Results Conclusion
Conclusions Measure for similarity of point processes
Key idea: matching of events
Applications Spiking synchrony (surrogate data/Morris Lecar neuron) EEG synchrony of MCI patients
SES allows to distinguish event reliability from timing precision
About 8590% correctly classified MCI vs. healthy subjects perhaps useful for screening a large population
Future work: Combination with other modalities (MEG, fMRI, ...) Integration of biophysical models Alternative inference techniques (variations on maxproduct, MonteCarlo)
Analyzing Brain Signals
by Combinatorial Optimization
Justin DauwelsLIDS, MIT
Amari Research Unit, Brain Science Institute, RIKEN
December 1, 2008
Quantifying statistical interdependence of point processes
Application to spike data and EEG
References + softwareReferences
Quantifying Statistical Interdependence by Message Passing on GraphsPART I: OneDimensional Point Processes, Neural Computation (under revision)
Quantifying Statistical Interdependence by Message Passing on GraphsPART II: MultiDimensional Point Processes, Neural Computation (under revision)
Quantifying Statistical Interdependence by Message Passing on GraphsPART III: Multivariate Approach, Neural Computation (in preparation)
A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based on EEG, NeuroImage (under revision)
On the Early Diagnosis of Alzheimer's Disease Based on EEG, Current Alzheimer’s Research (in preparation, invited review)
Measuring Neural Synchrony by Message Passing, NIPS 2007
Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008
SoftwareMATLAB implementation of the synchrony measuresMATLAB Toolbox for bump modelling
.
SummarySimilarity of multiple multidimensional point processes
Step 1: TWO ONEdimensional point processes
Step 2: TWO MULTIdimensional point processes
Step 3: MULTIPLE MULTIdimensional point processes
Dynamic programming
Maxproduct/LP relaxation/EdmundKarp
Integer Programming
Estimation
Deltas: average offset Sigmas: var of offset
...where
Simple closed form expressions
artificial observations (conjugate prior)
Largescale synchrony
Apparently, all brain regions affected...
Alzheimer's disease
1980 1990 2000 2010 2020 2030 2040 20500
2
4
6
8
10
12
14
Outside glimpse: the future (prevalence)
USA (Hebert et al. 2003)
2000 2030 20500
20
40
60
80
100
120
Developped countries
Developping countries
World (Wimo et al. 2003)
Mil
lio
n o
f su
ffer
ers
Mil
lio
n o
f su
ffer
ers
• 2% to 5% of people over 65 years old
• Up to 20% of people over 80
Jeong 2004 (Nature)
Ongoing and future work
Applications
alternative inference techniques (e.g., MCMC, linear programming) time dependent (Gaussian processes) multivariate (T.Weber)
Fluctuations of EEG synchrony Caused by auditory stimuli and music (T. Rutkowski) Caused by visual stimuli (F. Vialatte) Yoga professionals (F. Vialatte) Professional shogi players (RIKEN & Fujitsu) BrainComputer Interfaces (T. Rutkowski)
Spike data from interacting monkeys (N. Fujii) Calcium propagation in gliacells (N. Nakata) Neural growth (Y. Tsukada & Y. Sakumura) ...
Algorithms
Fitting bump models
Signal
Bump
Initialisation After adaptationAdaptation
gradient method
F. Vialatte et al. “A machine learning approach to the analysis of timefrequency maps and its application to neural dynamics”, Neural Networks (2007).
Boxplots
SURPRISE!No increase in jitter, but significantly less matched activity!
Physiological interpretation• neural assemblies more localized?• harder to establish largescale synchrony?
Generative modelGenerate bump model (hidden)
• geometric prior for number n of bumps p(n) = (1 λ S) (λ S)n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate two “noisy” observations
• offset between hidden and observed bump = Gaussian random vector with mean ( ±δt /2, ±δf /2) covariance diag(st/2, sf /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
yhidden
y y’
Easily extendable to more than 2 observations…
( δt /2, δf /2)
( δt /2, δf /2)
Probabilistic inference
MATCHINGPOINT ESTIMATION
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
θ
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
Alzheimer's diseaseInside glimpse: abnormal EEG
• AD vs. MCI (Hogan et al. 203; Jiang et al., 2005)• AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006)• MCI vs. mildAD (Babiloni et al., 2006).
Decrease of synchrony
Brain “slowdown”
slow rhythms (0.58 Hz) fast rhythms (830 Hz)
(Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993).
Images: www.cerebromente.org.br
EEG system: inexpensive, mobile, useful for screening
focus of this project
Comparing EEG signal rhythms ?
PROBLEM I:
Signals of 3 seconds sampled at 100 Hz ( 300 samples)Timefrequency representation of one signal = about 25 000 coefficients
2 signals
Numerous neighboring pixels
Comparing EEG signal rhythms ?(2)
One pixel
PROBLEM II:
Shifts in timefrequency!
Generative model
Generate bump model (hidden)
• geometric prior for number n of bumps p(n) = (1 λ S) (λ S)n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate M “noisy” observations
• offset between hidden and observed bump = Gaussian random vector with mean ( δt,m /2, δf,m /2) covariance diag(st,m/2, sf,m /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
yhidden
y1 y2 y3 y4 y5
Parameters: θ = δt,m , δf,m , st,m , sf,m, pc
pc (i) = p(cluster size = i y) (i = 1,2,…,M)
± 90% correctly classified
± 85% correctly classified
Average cluster size
Classification (multiSES)
Average cluster size
Average bump freq
Average bump width
ffDTF
Similarity of bump models...
How “similar” or “synchronous” are two bump models?
Signatures of local synchronyf (Hz)
t (sec)
Timefrequency patterns(“bumps”)
EEG stems from thousands of neuronsbump if neurons are phaselocked= local synchrony
Alzheimer's diseaseInside glimpse: brain atrophy
Video source: P. Thompson, J.Neuroscience, 2003
Images: Jannis Productions.(R. Fredenburg; S. Jannis)
amyloid plaques andneurofibrillary tangles
Video source: Alzheimer society
POINT ESTIMATION: θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
Uniform prior p(θ): δt, δf = average offset, st, sf = variance of offset Conjugate prior p(θ): still closedform expressionOther kind of prior p(θ): numerical optimization (gradient method)
Probabilistic inference
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
ALGORITHMS
• Polynomialtime algorithms gives optimal solution(s) (EdmondKarp and Auction algorithm)• Linear programming relaxation: extreme points of LP polytope are integral• Maxproduct algorithm gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]
EQUIVALENT to (imperfect) bipartite maxweight matching problem
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) = argmaxc Σkk’ wkk’(i) ckk’
s.t. Σk’ ckk’ ≤ 1 and Σk ckk’ ≤ 1 and ckk’ 2 {0,1}
Probabilistic inference
not necessarily perfectfind heaviest set of disjoint edges
p(y, y’, c, θ) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ; δt ,st,kk’) N(f k’ – fk ; δf ,sf, kk’) β2)ckk’
Maxproduct algorithmMATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
Generative model
Maxproduct algorithmMATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
μ↑μ↑
μ↓ μ↓
Conditioning on θ
Maxproduct algorithm (2)• Iteratively compute messages
• At convergence, compute marginals p(ckk’) = μ↓(ckk’) μ↓(ckk’) μ↑(ckk’)
• Decisions: c*kk’ = argmaxckk’ p(ckk’)
Algorithm
MATCHING → maxproductESTIMATION → closedform
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
θ
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
Generative modelGenerate bump model (hidden)
• geometric prior for number n of bumps p(n) = (1 λ S) (λ S)n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate two “noisy” observations
• offset between hidden and observed bump = Gaussian random vector with mean ( ±δt /2, ±δf /2) covariance diag(st/2, sf /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
yhidden
y y’
Easily extendable to more than 2 observations…
( δt /2, δf /2)
( δt /2, δf /2)
Generative model (2)
• Binary variables ckk’
ckk’ = 1 if k and k’ are observations of same hidden bump, else ckk’ = 0 (e.g., cii’ = 1 cij’ = 0)
• Constraints: bk = Σk’ ckk’ and bk’ = Σk ckk’ are binary (“matching constraints”)
• Generative Model p(y, y’, yhidden , c, δt , δf , st , sf ) (symmetric in y and y’)
• Eliminate yhidden → offset is Gaussian RV with mean = ( δt , δf ) and covariance diag (st , sf)
• Probabilistic Inference:(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
y y’
( δt /2, δf /2)
( δt /2, δf /2)
i
i’ j’
p(y, y’, c, θ) = ∫ p(y, y’, yhidden , c, θ) dyhidden
θ
• Bumps in one model, but NOT in other → fraction of “spurious” bumps ρspur
• Bumps in both models, but with offset → Average time offset δt (delay) → Timing jitter with variance st
→ Average frequency offset δf → Frequency jitter with variance sf
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)θ
Summary
Objective function
• Logarithm of model: log p(y, y’, c, θ) = Σkk’ wkk’ ckk’ + log I(c) + log pθ(θ) + γ
wkk’ = (1/st (t k’ – tk – δt)2 + 1/sf (f k’ – fk– δf)2 )  2 log β
β = pd (λ/V)1/2
Euclidean distance between bump centers
• Large wkk’ if : a) bumps are close b) small pd c) few bumps per volume element
• No need to specify pd , λ, and V, they only appear through β = knob to control # matches
y y’
( δt /2, δf /2)
( δt /2, δf /2)
i
i’ j’
Distance measures
wkk’ = 1/st,kk’ (t k’ – tk – δt)2 + 1/sf,kk’ (f k’ – fk– δf)2 + 2 log β
st,kk’ = (Δtk + Δt’k) st sf,kk’ = (Δfk + Δf’k) sf
Scaling
NonEuclidean
p(y, y’, c, θ) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ; δt ,st,kk’) N(f k’ – fk ; δf ,sf, kk’) β2)ckk’
Generative model
Expect bumps to appear at about same frequency, but delayed
Frequency shift requires nonlinear transformation, less likely than delay
Conjugate priors for st and sf (scaled inverse chisquared):
Improper prior for δt and δt : p(δt) = 1 = p(δf)
Prior for parameters
CTR
MCI
Preliminary results for multivariate modellinear comb of pc
Probabilistic inference
MATCHINGPOINT ESTIMATION
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
θ
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
X
Y
Minx2 X, y2Y d(x,y)
Generative model
Generate bump model (hidden)
• geometric prior for number n of bumps p(n) = (1 λ S) (λ S)n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate M “noisy” observations
• offset between hidden and observed bump = Gaussian random vector with mean ( δt,m /2, δf,m /2) covariance diag(st,m/2, sf,m /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
(other prior pc0 for cluster size)
yhidden
y1 y2 y3 y4 y5
Parameters: θ = δt,m , δf,m , st,m , sf,m, pc
pc (i) = p(cluster size = i y) (i = 1,2,…,M)
(Hebb 1949, Fuster 1997)
Stimuli Consolidation Stimulus
Voice Face Voice
Role of local synchrony
Assembly activation Hebbian consolidationAssembly recall
Probabilistic inference
CLUSTERING (IP or MP)POINT ESTIMATION
PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m, pc
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
Integer program• Maxproduct algorithm (MP) on sparse graph• Integer programming methods (e.g., LP relaxation)
Fourier transform
High frequency
Low frequency
Frequency
1 23
2
1
3
Windowed Fourier transform
* =Fourier basis functions Window
function windowed basis functions
WindowedFourierTransform
t
f
Overview
Alzheimer’s Disease (AD):
decrease in EEG synchrony Synchrony measure in timefrequency domain
Pairs of EEG signalsCollections of EEG signals
Numerical Results Conclusion
Average synchrony
3. SES for each pair of models4. Average the SES parameters
1. Group electrodes in regions2. Bump model for each region
Beyond pairwise interactions...
Pairwise similarity Multivariate similarity
Similarity measures• Correlation and coherence• Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC, ...
• Phase Synchrony: compare instantaneous phases (wavelet/Hilbert transform)
• State space based measures sync likelihood, Sestimator, SHNindices, ...
• Informationtheoretic measures KL divergence, JensenShannon divergence, ...
No Phase Locking Phase Locking
TIME FREQUENCY
Generative model (2)
Cost function
unit cost of noncoincident event
unit cost of coincident pair
Model
Surrogate Data: Results (2)
• SS depends on δt
• likewise other S except SES
Probabilistic inference
MATCHINGPOINT ESTIMATION
PROBLEM: Given two bump models, compute (ρ, δt, st, δf, sf )
APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
θ
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
ALGORITHMS
• Polynomialtime algorithms gives optimal solution(s) (EdmondKarp and Auction algorithm)• Linear programming relaxation: gives optimal solution if unique [Sanghavi (2007)]• Maxproduct algorithm gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]
EQUIVALENT to (imperfect) bipartite maxweight matching problem
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) = argmaxc Σkk’ wkk’(i) ckk’
s.t. Σk’ ckk’ ≤ 1 and Σk ckk’ ≤ 1 and ckk’ 2 {0,1}
Probabilistic inference (2)
not necessarily perfectfind heaviest set of disjoint edges
Maxproduct algorithmMATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
μ↑μ↑
μ↓ μ↓
• At convergence, compute marginals p(ckk’) = μ↓(ckk’) μ↓(ckk’) μ↑(ckk’)
• Decisions: c*kk’ = argmaxckk’ p(ckk’) (optimal if solution unique)
Exemplarbased formulationyhidden
y1 y2 y3 y4 y5
• Exemplars = identical copies of hidden bumps = cluster “center”• Other bumps in cluster = nonidentical copies of exemplars
• Is event an exemplar?• If not, which exemplar is it associated with?• Several constraints
Integer program
Exemplarbased formulation: IPBinary Variables
Integer Program: LINEAR objective function/constraints
Equivalent to kdim matching: for k = 2: in P but for k > 2: NPhard!