Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf ·...
Transcript of Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf ·...
Markovian (and conceivably causal)representations of stochastic processes
Cosma Shalizi
Statistics Dept., Carnegie Mellon University & Santa Fe Institute
10 July 2010UAI
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
“State”
In classical physics or dynamical systems, “state” is the presentvariable which fixes all future observablesBack away from determinism: state determines distribution offuture observablesWould like a small stateWould like the state to be well-behaved, e.g. homogeneousMarkovTry construct states by constructing predictions
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Notation etc.
Upper-case letters are random variables, lower-case theirrealizationsStochastic process . . . ,X−1,X0,X1,X2, . . .X t
s = (Xs,Xs+1, . . .Xt−1,Xt )Past up to and including t is X t
−∞, future is X∞t+1Discrete time is not required but cuts down on measure theory
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Making a Prediction
Look at X t−∞, make a guess about X∞t+1
Most general guess is a probability distributionOnly ever attend to selected aspects of X t
−∞mean, variance, phase of 1st three Fourier modes, . . .
∴ guess is a function or statistic of X t−∞
What’s a good statistic to use?
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Predictive Sufficiency
For any statistic σ,
I[X∞t+1; X t−∞] ≥ I[X∞t+1;σ(X t
−∞)]
σ is predictively sufficient iff
I[X∞t+1; X t−∞] = I[X∞t+1;σ(X t
−∞)]
Sufficient statistics retain all predictive information in the data
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Why Care About Sufficiency?
Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)Excuse for not worrying about particular loss functions
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Why Care About Sufficiency?
Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)
Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)Excuse for not worrying about particular loss functions
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Why Care About Sufficiency?
Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)
Excuse for not worrying about particular loss functions
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Why Care About Sufficiency?
Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)Excuse for not worrying about particular loss functions
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
“Causal” States
(Crutchfield and Young, 1989)
Histories a and b are equivalent iff
Pr(X∞t+1|X t
−∞ = a)
= Pr(X∞t+1|X t
−∞ = b)
[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is
ε(x t−∞) = [x t
−∞]
Set st = ε(x t−1−∞)
A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
“Causal” States
(Crutchfield and Young, 1989)
Histories a and b are equivalent iff
Pr(X∞t+1|X t
−∞ = a)
= Pr(X∞t+1|X t
−∞ = b)
[a] ≡ all histories equivalent to a
The statistic of interest, the causal state, is
ε(x t−∞) = [x t
−∞]
Set st = ε(x t−1−∞)
A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
“Causal” States
(Crutchfield and Young, 1989)
Histories a and b are equivalent iff
Pr(X∞t+1|X t
−∞ = a)
= Pr(X∞t+1|X t
−∞ = b)
[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is
ε(x t−∞) = [x t
−∞]
Set st = ε(x t−1−∞)
A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
“Causal” States
(Crutchfield and Young, 1989)
Histories a and b are equivalent iff
Pr(X∞t+1|X t
−∞ = a)
= Pr(X∞t+1|X t
−∞ = b)
[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is
ε(x t−∞) = [x t
−∞]
Set st = ε(x t−1−∞)
A state is an equivalence class of histories and a distributionover future events
IID = 1 state, periodic = p states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
“Causal” States
(Crutchfield and Young, 1989)
Histories a and b are equivalent iff
Pr(X∞t+1|X t
−∞ = a)
= Pr(X∞t+1|X t
−∞ = b)
[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is
ε(x t−∞) = [x t
−∞]
Set st = ε(x t−1−∞)
A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
set of histories, color-coded by conditional distribution of futures
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Notation and setting
Partitioning histories into causal states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Sufficiency
(Shalizi and Crutchfield, 2001)
I[X∞t+1; X t−∞] = I[X∞t+1; ε(X t
−∞)]
because
Pr(X∞t+1|St = ε(x t
−∞))
=
∫y∈[x t
−∞]Pr(X∞t+1|X t
−∞ = y)
Pr(X t−∞ = y |St = ε(x t
−∞))
dy
= Pr(X∞t+1|X t
−∞ = x t−∞)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Sufficiency
(Shalizi and Crutchfield, 2001)
I[X∞t+1; X t−∞] = I[X∞t+1; ε(X t
−∞)]
because
Pr(X∞t+1|St = ε(x t
−∞))
=
∫y∈[x t
−∞]Pr(X∞t+1|X t
−∞ = y)
Pr(X t−∞ = y |St = ε(x t
−∞))
dy
= Pr(X∞t+1|X t
−∞ = x t−∞)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
A non-sufficient partition of histories
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Effect of insufficiency on predictive distributions
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Markov Properties
Future observations are independent of the past given thecausal state:
X∞t+1 |= X t−∞|St+1
by sufficiency:
Pr(X∞t+1|X t
−∞ = x t−∞,St+1 = ε(x t
−∞))
= Pr(X∞t+1|X t
−∞ = x t−∞)
= Pr(X∞t+1|St+1 = ε(x t
−∞))
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Markov Properties
Future observations are independent of the past given thecausal state:
X∞t+1 |= X t−∞|St+1
by sufficiency:
Pr(X∞t+1|X t
−∞ = x t−∞,St+1 = ε(x t
−∞))
= Pr(X∞t+1|X t
−∞ = x t−∞)
= Pr(X∞t+1|St+1 = ε(x t
−∞))
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Recursive Updating/Deterministic Transitions
Recursive transitions for states:
ε(x t+1−∞) = T (ε(x t
−∞), xt+1)
Automata theory: “deterministic transitions” (even though thereare probabilities)In continuous time:
ε(x t+h−∞) = T (ε(x t
−∞), x t+ht )
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
If u ∼ v , any future event F , and single observation a
Pr(X∞t+1 ∈ aF |X t
−∞ = u)
= Pr(X∞t+1 ∈ aF |X t
−∞ = v)
Pr(Xt+1 = a,X∞t+2 ∈ F |X t
−∞ = u)
= Pr(Xt+1 = a,X∞t+2 ∈ F |X t
−∞ = v)
Pr(
X∞t+2 ∈ F |X t+1−∞ = ua
)Pr(Xt+1 = a|X t
−∞ = u)
= Pr(
X∞t+2 ∈ F |X t+1−∞ = va
)Pr(Xt+1 = a|X t
−∞ = v)
Pr(
X∞t+2 ∈ F |X t+1−∞ = ua
)= Pr
(X∞t+2 ∈ F |X t+1
−∞ = va)
ua ∼ va
(same for continuous values or time but need more measure theory)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
If u ∼ v , any future event F , and single observation a
Pr(X∞t+1 ∈ aF |X t
−∞ = u)
= Pr(X∞t+1 ∈ aF |X t
−∞ = v)
Pr(Xt+1 = a,X∞t+2 ∈ F |X t
−∞ = u)
= Pr(Xt+1 = a,X∞t+2 ∈ F |X t
−∞ = v)
Pr(
X∞t+2 ∈ F |X t+1−∞ = ua
)Pr(Xt+1 = a|X t
−∞ = u)
= Pr(
X∞t+2 ∈ F |X t+1−∞ = va
)Pr(Xt+1 = a|X t
−∞ = v)
Pr(
X∞t+2 ∈ F |X t+1−∞ = ua
)= Pr
(X∞t+2 ∈ F |X t+1
−∞ = va)
ua ∼ va
(same for continuous values or time but need more measure theory)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Causal States are Markovian
S∞t+1 |= St−1−∞|St
because
S∞t+1 = T (St ,X∞t )
andX∞t |=
{X t−1−∞,S
t−1−∞
}|St
Also, the transitions are homogeneous
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Causal States are Markovian
S∞t+1 |= St−1−∞|St
because
S∞t+1 = T (St ,X∞t )
andX∞t |=
{X t−1−∞,S
t−1−∞
}|St
Also, the transitions are homogeneous
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Causal States are Markovian
S∞t+1 |= St−1−∞|St
because
S∞t+1 = T (St ,X∞t )
andX∞t |=
{X t−1−∞,S
t−1−∞
}|St
Also, the transitions are homogeneous
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Causal States are Markovian
S∞t+1 |= St−1−∞|St
because
S∞t+1 = T (St ,X∞t )
andX∞t |=
{X t−1−∞,S
t−1−∞
}|St
Also, the transitions are homogeneous
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimality
ε is minimal sufficient= can be computed from any other sufficient statistic
= for any sufficient η, exists a function g such that
ε(X t−∞) = g(η(X t
−∞))
Therefore, if η is sufficient
I[ε(X t−∞); X t
−∞] ≤ I[η(X t−∞); X t
−∞]
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimality
ε is minimal sufficient= can be computed from any other sufficient statistic= for any sufficient η, exists a function g such that
ε(X t−∞) = g(η(X t
−∞))
Therefore, if η is sufficient
I[ε(X t−∞); X t
−∞] ≤ I[η(X t−∞); X t
−∞]
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimality
ε is minimal sufficient= can be computed from any other sufficient statistic= for any sufficient η, exists a function g such that
ε(X t−∞) = g(η(X t
−∞))
Therefore, if η is sufficient
I[ε(X t−∞); X t
−∞] ≤ I[η(X t−∞); X t
−∞]
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Sufficient, but not minimal, partition of histories
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Coarser than the causal states, but not sufficient
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Uniqueness
There is really no other minimal sufficient statistic
If η is minimal, there is an h such that
η = h(ε) a.s.
but ε = g(η) (a.s.) so
g(h(ε)) = ε
h(g(η)) = η
ε and η partition histories in the same way (a.s.)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Uniqueness
There is really no other minimal sufficient statisticIf η is minimal, there is an h such that
η = h(ε) a.s.
but ε = g(η) (a.s.) so
g(h(ε)) = ε
h(g(η)) = η
ε and η partition histories in the same way (a.s.)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Uniqueness
There is really no other minimal sufficient statisticIf η is minimal, there is an h such that
η = h(ε) a.s.
but ε = g(η) (a.s.)
so
g(h(ε)) = ε
h(g(η)) = η
ε and η partition histories in the same way (a.s.)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Uniqueness
There is really no other minimal sufficient statisticIf η is minimal, there is an h such that
η = h(ε) a.s.
but ε = g(η) (a.s.) so
g(h(ε)) = ε
h(g(η)) = η
ε and η partition histories in the same way (a.s.)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimal stochasticity
If Rt = η(X t−1−∞) is also sufficient, then
H[Rt+1|Rt ] ≥ H[St+1|St ]
∴ the predictive states are the closest we get to a deterministicmodel, without losing power
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimal stochasticity
If Rt = η(X t−1−∞) is also sufficient, then
H[Rt+1|Rt ] ≥ H[St+1|St ]
∴ the predictive states are the closest we get to a deterministicmodel, without losing power
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Entropy Rate
h1 ≡ limn→∞
H[Xn|X n−11 ] = lim
n→∞H[Xn|Sn]
= H[X1|S1]
so the predictive states lets us calculate the entropy rateand do source coding
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Entropy Rate
h1 ≡ limn→∞
H[Xn|X n−11 ] = lim
n→∞H[Xn|Sn]
= H[X1|S1]
so the predictive states lets us calculate the entropy rate
and do source coding
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Entropy Rate
h1 ≡ limn→∞
H[Xn|X n−11 ] = lim
n→∞H[Xn|Sn]
= H[X1|S1]
so the predictive states lets us calculate the entropy rateand do source coding
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimal Markovian Representation
The observed process (Xt ) is non-Markovian and ugly
But it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimal Markovian Representation
The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )
After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimal Markovian Representation
The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) unique
Can exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimal Markovian Representation
The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .
. . . and those distributions correspond to predictive states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Minimal Markovian Representation
The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
What Sort of Markov Model?
Common-or-garden HMM:
St+1 |= Xt |St
But hereSt+1 = T (St ,Xt )
This is a chain with complete connections (Onicescu andMihoc, 1935; Iosifescu and Grigorescu, 1990)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
What Sort of Markov Model?
Common-or-garden HMM:
St+1 |= Xt |St
But hereSt+1 = T (St ,Xt )
This is a chain with complete connections (Onicescu andMihoc, 1935; Iosifescu and Grigorescu, 1990)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
What Sort of Markov Model?
Common-or-garden HMM:
St+1 |= Xt |St
But hereSt+1 = T (St ,Xt )
This is a chain with complete connections (Onicescu andMihoc, 1935; Iosifescu and Grigorescu, 1990)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
HMM
S1
X1
S2
X2
S3
X3
CCC
S1
X1
S2
X2
S3
X3
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
HMM
S1
X1
S2
X2
S3
X3
CCC
S1
X1
S2
X2
S3
X3
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Example of a CCC: Even Process
2 1B | 1.0B | 0.5
A | 0.5
Blocks of As of any length, separated by even-length blocks ofBs
Not Markov at any order
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Example of a CCC: Even Process
2 1B | 1.0B | 0.5
A | 0.5
Blocks of As of any length, separated by even-length blocks ofBs
Not Markov at any order
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Inventions
Statistical relevance basis (Salmon, 1971, 1984)
Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Inventions
Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)
Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Inventions
Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)
Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Inventions
Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)
Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Inventions
Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)
Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Inventions
Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)
Sufficient posterior representation (Langford et al., 2009)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Inventions
Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
How Broad Are These Results?
Knight (1975, 1992) gave most general constructionsNon-stationary Xt continuous (but discrete works as special case)Xt with values in a Lusin space
(= image of a complete separable
metrizable space under a measurable bijection)
St is a homogeneous strong Markov process withdeterministic updatingSt has cadlag sample paths (in appropriate topologies oninfinite-dimensional distributions)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
How Broad Are These Results?
Knight (1975, 1992) gave most general constructionsNon-stationary Xt continuous (but discrete works as special case)Xt with values in a Lusin space (= image of a complete separable
metrizable space under a measurable bijection)
St is a homogeneous strong Markov process withdeterministic updatingSt has cadlag sample paths (in appropriate topologies oninfinite-dimensional distributions)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
How Broad Are These Results?
Knight (1975, 1992) gave most general constructionsNon-stationary Xt continuous (but discrete works as special case)Xt with values in a Lusin space (= image of a complete separable
metrizable space under a measurable bijection)
St is a homogeneous strong Markov process withdeterministic updatingSt has cadlag sample paths (in appropriate topologies oninfinite-dimensional distributions)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
A Cousin: The Information Bottleneck
(Tishby et al., 1999)
For inputs X and outputs Y , fix β > 0, find η(X ), the bottleneckvariable, maximizing
I[η(X ); Y ]− βI[η(X ); X ]
give up 1 bit of predictive information for β bits of memoryPredictive sufficiency comes as β →∞, unwilling to lose anypredictive power
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
A Cousin: The Information Bottleneck
(Tishby et al., 1999)
For inputs X and outputs Y , fix β > 0, find η(X ), the bottleneckvariable, maximizing
I[η(X ); Y ]− βI[η(X ); X ]
give up 1 bit of predictive information for β bits of memory
Predictive sufficiency comes as β →∞, unwilling to lose anypredictive power
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
A Cousin: The Information Bottleneck
(Tishby et al., 1999)
For inputs X and outputs Y , fix β > 0, find η(X ), the bottleneckvariable, maximizing
I[η(X ); Y ]− βI[η(X ); X ]
give up 1 bit of predictive information for β bits of memoryPredictive sufficiency comes as β →∞, unwilling to lose anypredictive power
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Extension 1: Input-Output
(Littman et al., 2002; Shalizi, 2001, ch. 7)
System output (Xt ), input (Yt )Histories x t
−∞, y t−∞ have distributions of output xt+1 for each
further input yt+1Equivalence class these distributions and enforce recursiveupdatingInternal states of the system, not trying to predict future inputs
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Extension 2: Space and Time
(Shalizi, 2003; Shalizi et al., 2004, 2006; Jänicke et al., 2007)
Dynamic random field X (~r , t)Past cone: points in space-time which could matter to X (~r , t)Future cone: points in space-time for which X (~r , t) could matter
past
future
Equivalence-class past cone configurations byconditional distributions over future conesS(~r , t) is a Markov fieldMinimal sufficiency, recursive updating, etc., allgo through
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Extension 2: Space and Time
(Shalizi, 2003; Shalizi et al., 2004, 2006; Jänicke et al., 2007)
Dynamic random field X (~r , t)Past cone: points in space-time which could matter to X (~r , t)Future cone: points in space-time for which X (~r , t) could matter
past
future
Equivalence-class past cone configurations byconditional distributions over future conesS(~r , t) is a Markov fieldMinimal sufficiency, recursive updating, etc., allgo through
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Extension 2: Space and Time
(Shalizi, 2003; Shalizi et al., 2004, 2006; Jänicke et al., 2007)
Dynamic random field X (~r , t)Past cone: points in space-time which could matter to X (~r , t)Future cone: points in space-time for which X (~r , t) could matter
past
future
Equivalence-class past cone configurations byconditional distributions over future conesS(~r , t) is a Markov fieldMinimal sufficiency, recursive updating, etc., allgo through
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Statistical Complexity
Definition (Grassberger, 1986; Crutchfield and Young, 1989)
C ≡ I[ε(X t−∞); X t
−∞] is the statistical forecasting complexityof the process
= amount of information about the past needed for optimalprediction= H[ε(X t
−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Statistical Complexity
Definition (Grassberger, 1986; Crutchfield and Young, 1989)
C ≡ I[ε(X t−∞); X t
−∞] is the statistical forecasting complexityof the process
= amount of information about the past needed for optimalprediction
= H[ε(X t−∞)] for discrete causal states
= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Statistical Complexity
Definition (Grassberger, 1986; Crutchfield and Young, 1989)
C ≡ I[ε(X t−∞); X t
−∞] is the statistical forecasting complexityof the process
= amount of information about the past needed for optimalprediction= H[ε(X t
−∞)] for discrete causal states
= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Statistical Complexity
Definition (Grassberger, 1986; Crutchfield and Young, 1989)
C ≡ I[ε(X t−∞); X t
−∞] is the statistical forecasting complexityof the process
= amount of information about the past needed for optimalprediction= H[ε(X t
−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)
= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Statistical Complexity
Definition (Grassberger, 1986; Crutchfield and Young, 1989)
C ≡ I[ε(X t−∞); X t
−∞] is the statistical forecasting complexityof the process
= amount of information about the past needed for optimalprediction= H[ε(X t
−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes
= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Statistical Complexity
Definition (Grassberger, 1986; Crutchfield and Young, 1989)
C ≡ I[ε(X t−∞); X t
−∞] is the statistical forecasting complexityof the process
= amount of information about the past needed for optimalprediction= H[ε(X t
−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocesses
Property of the process, not learning problem
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
SufficiencyMarkov PropertiesMinimalities
Statistical Complexity
Definition (Grassberger, 1986; Crutchfield and Young, 1989)
C ≡ I[ε(X t−∞); X t
−∞] is the statistical forecasting complexityof the process
= amount of information about the past needed for optimalprediction= H[ε(X t
−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Connecting to Data
Everything so far has been math/probability
(The Oracle tells us the infinite-dimensional distribution of X )
Can we do some statistics and find the states?Two senses of “find”: learn in a fixed model vs. discover theright model
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Connecting to Data
Everything so far has been math/probability(The Oracle tells us the infinite-dimensional distribution of X )
Can we do some statistics and find the states?Two senses of “find”: learn in a fixed model vs. discover theright model
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Connecting to Data
Everything so far has been math/probability(The Oracle tells us the infinite-dimensional distribution of X )
Can we do some statistics and find the states?Two senses of “find”: learn in a fixed model vs. discover theright model
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Learning
Given states and transitions (ε,T ), realization xn1
Estimate Pr (Xt+1 = x |St = s)
Just estimation for stochastic processesEasier than ordinary HMMs because St is a function oftrajectoryExponential families in the all-discrete case, very tractable
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Learning
Given states and transitions (ε,T ), realization xn1
Estimate Pr (Xt+1 = x |St = s)
Just estimation for stochastic processesEasier than ordinary HMMs because St is a function oftrajectoryExponential families in the all-discrete case, very tractable
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Discovery
Given xn1
Estimate ε,T ,Pr (Xt+1 = x |St = s)
Inspiration: “geometry from a time series” in nonlineardynamicsConditional independence testing reminiscent of PCalgorithmFunction learning approach (Langford et al., 2009)Nobody seems to have tried non-parametric Bayes
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Discovery
Given xn1
Estimate ε,T ,Pr (Xt+1 = x |St = s)
Inspiration: “geometry from a time series” in nonlineardynamicsConditional independence testing reminiscent of PCalgorithmFunction learning approach (Langford et al., 2009)Nobody seems to have tried non-parametric Bayes
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
“Geometry from a Time Series”
Deterministic dynamical system with state zt on a smoothmanifold of dimension m, zt+1 = f (zt )Only identified up to a smooth, invertible change of coordinates(diffeomorphism)Observe a time series of a single smooth, instantaneousfunction of state xt = g(zt )Set st = (xt , xt−1, . . . xt−k+1)
Generically, if k ≥ 2m + 1, then zt = φ(st )φ is smooth and invertibleφ commutes with time evolution, φ(st+1) = f (φ(st ))Regressing st+1 on st gives φ−1 ◦ fIdea due to Packard et al. (1980); Takens (1981), modern review in Kantz and
Schreiber (2004)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
“Geometry from a Time Series”
Deterministic dynamical system with state zt on a smoothmanifold of dimension m, zt+1 = f (zt )Only identified up to a smooth, invertible change of coordinates(diffeomorphism)Observe a time series of a single smooth, instantaneousfunction of state xt = g(zt )Set st = (xt , xt−1, . . . xt−k+1)Generically, if k ≥ 2m + 1, then zt = φ(st )φ is smooth and invertibleφ commutes with time evolution, φ(st+1) = f (φ(st ))Regressing st+1 on st gives φ−1 ◦ f
Idea due to Packard et al. (1980); Takens (1981), modern review in Kantz and
Schreiber (2004)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
“Geometry from a Time Series”
Deterministic dynamical system with state zt on a smoothmanifold of dimension m, zt+1 = f (zt )Only identified up to a smooth, invertible change of coordinates(diffeomorphism)Observe a time series of a single smooth, instantaneousfunction of state xt = g(zt )Set st = (xt , xt−1, . . . xt−k+1)Generically, if k ≥ 2m + 1, then zt = φ(st )φ is smooth and invertibleφ commutes with time evolution, φ(st+1) = f (φ(st ))Regressing st+1 on st gives φ−1 ◦ fIdea due to Packard et al. (1980); Takens (1981), modern review in Kantz and
Schreiber (2004)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
CSSR: Causal State Splitting Reconstruction
Key observation: Recursion + one-step-ahead predictivesufficiency⇒ general predictive sufficiency
Get next-step distribution right by independence testingThen make states recursive
Assumes discrete observations, discrete time, finite causalstatesPaper: Shalizi and Klinkner (2004); C++ code,http://bactra.org/CSSR/
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
One-Step Ahead Prediction
Start with all histories in the same state
Given current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distributionUse a hypothesis test to control false positive rate
If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality
If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n
h1+ιfor some ι
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
One-Step Ahead Prediction
Start with all histories in the same stateGiven current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distribution
Use a hypothesis test to control false positive rate
If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality
If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n
h1+ιfor some ι
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
One-Step Ahead Prediction
Start with all histories in the same stateGiven current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distributionUse a hypothesis test to control false positive rate
If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality
If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n
h1+ιfor some ι
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
One-Step Ahead Prediction
Start with all histories in the same stateGiven current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distributionUse a hypothesis test to control false positive rate
If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality
If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n
h1+ιfor some ι
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Ensuring Recursive Transitions
Need to determinize a probabilistic automatonSeveral ways of doing this; technical and not worth going intohereTrickiest part of the algorithm and can influence thefinite-sample behavior
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Convergence
S = true causal state structureSn = structure reconstructed from n data pointsAssume: finite # of states, every state has a finite history, usinglong enough histories, technicalities:
Pr(Sn 6= S
)→ 0
D = true distribution, Dn = inferredError scales like independent samples
E[‖Dn −D‖TV
]= O(n−1/2)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Convergence
S = true causal state structureSn = structure reconstructed from n data pointsAssume: finite # of states, every state has a finite history, usinglong enough histories, technicalities:
Pr(Sn 6= S
)→ 0
D = true distribution, Dn = inferredError scales like independent samples
E[‖Dn −D‖TV
]= O(n−1/2)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Handwaving
Empirical conditional distributions for histories converge(large deviations principle for Markov chains)
Histories in the same state become harder to accidentallyseparateHistories in different states become harder to confuseEach state’s predictive distribution converges O(n−1/2)(from LDP again, take mixture)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Example: The Even Process
2 1B | 1.0B | 0.5
A | 0.5
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
2 1B | 1.000B | 0.503
A | 0.497
reconstruction with Λ = 3, n = 1000, α = 0.005
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Some Uses
Geomagnetic fluctuations (Clarke et al., 2003)Natural language processing (Padró and Padró, 2005a,c,b,2007a,b)Anomaly detection (Friedlander et al., 2003a,b; Ray, 2004)Information sharing in networks (Klinkner et al., 2006; Shaliziet al., 2007)Social media propagation (Cointet et al., 2007)Neural spike train analysis (Haslinger et al., 2010)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
About “Causal”
Term “causal states” introduced by Crutchfield and Young(1989)
without too much precisionAll about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain
trajectories)
Still, those screening-off properties are really suggestive
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
About “Causal”
Term “causal states” introduced by Crutchfield and Young(1989) without too much precision
All about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain
trajectories)
Still, those screening-off properties are really suggestive
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
About “Causal”
Term “causal states” introduced by Crutchfield and Young(1989) without too much precisionAll about probabilistic prediction, not counterfactuals
(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain
trajectories)
Still, those screening-off properties are really suggestive
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
About “Causal”
Term “causal states” introduced by Crutchfield and Young(1989) without too much precisionAll about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain
trajectories)
Still, those screening-off properties are really suggestive
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
About “Causal”
Term “causal states” introduced by Crutchfield and Young(1989) without too much precisionAll about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain
trajectories)
Still, those screening-off properties are really suggestive
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Back to Physics
(Shalizi and Moore, 2003)
Assume: Microscopic state Zt ∈ Z, with an evolution operator f
Assume: Micro-states support counterfactualsAssume: Never get to see Zt , instead deal with Xt = γ(Zt )Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Back to Physics
(Shalizi and Moore, 2003)
Assume: Microscopic state Zt ∈ Z, with an evolution operator fAssume: Micro-states support counterfactuals
Assume: Never get to see Zt , instead deal with Xt = γ(Zt )Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Back to Physics
(Shalizi and Moore, 2003)
Assume: Microscopic state Zt ∈ Z, with an evolution operator fAssume: Micro-states support counterfactualsAssume: Never get to see Zt , instead deal with Xt = γ(Zt )
Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Back to Physics
(Shalizi and Moore, 2003)
Assume: Microscopic state Zt ∈ Z, with an evolution operator fAssume: Micro-states support counterfactualsAssume: Never get to see Zt , instead deal with Xt = γ(Zt )Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Sequences of Xt values refine Γ
Γ(T ) =T∧
t=1
f−t Γ
ε partitions histories of X∴ ε joins cells of Γ(∞)
∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Sequences of Xt values refine Γ
Γ(T ) =T∧
t=1
f−t Γ
ε partitions histories of X
∴ ε joins cells of Γ(∞)
∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Sequences of Xt values refine Γ
Γ(T ) =T∧
t=1
f−t Γ
ε partitions histories of X∴ ε joins cells of Γ(∞)
∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Sequences of Xt values refine Γ
Γ(T ) =T∧
t=1
f−t Γ
ε partitions histories of X∴ ε joins cells of Γ(∞)
∴ ε induces a partition ∆ of Z
This is a new, Markovian coarse-grained variable
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Sequences of Xt values refine Γ
Γ(T ) =T∧
t=1
f−t Γ
ε partitions histories of X∴ ε joins cells of Γ(∞)
∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Connecting to Causality
Interventions moving z from one cell of ∆ to another changesthe distribution of X∞t+1
Changing z inside a cell of ∆ might still make a difference“There must be at least this much structure”
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Connecting to Causality
Interventions moving z from one cell of ∆ to another changesthe distribution of X∞t+1Changing z inside a cell of ∆ might still make a difference
“There must be at least this much structure”
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Connecting to Causality
Interventions moving z from one cell of ∆ to another changesthe distribution of X∞t+1Changing z inside a cell of ∆ might still make a difference“There must be at least this much structure”
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Summary
Your stochastic process has a unique, minimal Markovianrepresentation
This representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Summary
Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive properties
Can reconstruct from sample data in some cases. . .and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Summary
Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .
and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Summary
Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .and a lot more could be done in this line
The Markov states have the right screening-off propertiesfor causal models
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Summary
Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Clarke, Richard W., Mervyn P. Freeman and Nicholas W.Watkins (2003). “Application of Computational Mechanics tothe Analysis of Natural Data: An Example in Geomagnetism.”Physical Review E , 67: 0126203. URLhttp://arxiv.org/abs/cond-mat/0110228.
Cointet, Jean-Philippe, Emmanuel Faure and Camille Roth(2007). “Intertemporal topic correlations in online media.” InProceedings of the International Conference on Weblogs andSocial Media [ICWSM] . Boulder, CO, USA. URLhttp://camille.roth.free.fr/travaux/cointetfaureroth-icwsm-cr4p.pdf.
Crutchfield, James P. and Karl Young (1989). “InferringStatistical Complexity.” Physical Review Letters, 63:105–108. URL http://www.santafe.edu/~cmg/compmech/pubs/ISCTitlePage.htm.
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Friedlander, David S., Shashi Phoha and Richard Brooks(2003a). “Determination of Vehicle Behavior based onDistributed Sensor Network Data.” In Advanced SignalProcessing Algorithms, Architectures, and ImplementationsXIII (Franklin T. Luk, ed.), vol. 5205 of Proceedings of theSPIE . Bellingham, WA: SPIE. Presented at SPIE’s 48thAnnual Meeting, 3–8 August 2003, San Diego, CA.
Friedlander, Davis S., Isanu Chattopadhayay, Asok Ray, ShashiPhoha and Noah Jacobson (2003b). “Anomaly Prediction inMechanical System Using Symbolic Dynamics.” InProceedings of the 2003 American Control Conference,Denver, CO, 4–6 June 2003.
Gács, Péter, John T. Tromp and Paul M. B. Vitanyi (2001).“Algorithmic Statistics.” IEEE Transactions on InformationTheory , 47: 2443–2463. URL
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
http://arxiv.org/abs/math.PR/0006233.Grassberger, Peter (1986). “Toward a Quantitative Theory of
Self-Generated Complexity.” International Journal ofTheoretical Physics, 25: 907–938.
Haslinger, Robert, Kristina Lisa Klinkner and Cosma RohillaShalizi (2010). “The Computational Structure of SpikeTrains.” Neural Computation, 22: 121–157. URLhttp://arxiv.org/abs/1001.0036.doi:10.1162/neco.2009.12-07-678.
Iosifescu, Marius and Serban Grigorescu (1990). Dependencewith Complete Connections and Its Applications. Cambridge,England: Cambridge University Press. Revised paperbackprinting, 2009.
Jaeger, Herbert (2000). “Observable Operator Models forDiscrete Stochastic Time Series.” Neural Computation, 12:
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
1371–1398. URL http://www.faculty.iu-bremen.de/hjaeger/pubs/oom_neco00.pdf.
Jänicke, Heike, Alexander Wiebel, Gerik Scheuermann andWolfgang Kollmann (2007). “Multifield Visualization UsingLocal Statistical Complexity.” IEEE Transactions onVisualization and Computer Graphics, 13: 1384–1391. URLhttp://www.informatik.uni-leipzig.de/bsv/Jaenicke/Papers/vis07.pdf.doi:10.1109/TVCG.2007.70615.
Kantz, Holger and Thomas Schreiber (2004). Nonlinear TimeSeries Analysis. Cambridge, England: Cambridge UniversityPress, 2nd edn.
Klinkner, Kristina Lisa, Cosma Rohilla Shalizi and Marcelo F.Camperi (2006). “Measuring Shared Information andCoordinated Activity in Neuronal Networks.” In Advances in
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Neural Information Processing Systems 18 (NIPS 2005) (YairWeiss and Bernhard Schölkopf and John C. Platt, eds.), pp.667–674. Cambridge, Massachusetts: MIT Press. URLhttp://arxiv.org/abs/q-bio.NC/0506009.
Knight, Frank B. (1975). “A Predictive View of Continuous TimeProcesses.” Annals of Probability , 3: 573–596. URL http://projecteuclid.org/euclid.aop/1176996302.
— (1992). Foundations of the Prediction Process. Oxford:Clarendon Press.
Langford, John, Ruslan Salakhutdinov and Tong Zhang (2009).“Learning Nonlinear Dynamic Models.” Electronic preprint.URL http://arxiv.org/abs/0905.3369.
Littman, Michael L., Richard S. Sutton and Satinder Singh(2002). “Predictive Representations of State.” In Advances inNeural Information Processing Systems 14 (NIPS 2001)
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
(Thomas G. Dietterich and Suzanna Becker and ZoubinGhahramani, eds.), pp. 1555–1561. Cambridge,Massachusetts: MIT Press. URL http://www.eecs.umich.edu/~baveja/Papers/psr.pdf.
Onicescu, Octav and Gheorghe Mihoc (1935). “Sur les chaînesde variables statistiques.” Comptes Rendus de l’Académiedes Sciences de Paris, 200: 511–512.
Packard, Norman H., James P. Crutchfield, J. Doyne Farmerand Robert S. Shaw (1980). “Geometry from a Time Series.”Physical Review Letters, 45: 712–716.
Padró, Muntsa and Lluís Padró (2005a). “Applying a FiniteAutomata Acquisition Algorithm to Named EntityRecognition.” In Proceedings of 5th International Workshopon Finite-State Methods and Natural Language Processing
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
(FSMNLP’05). URL http://www.lsi.upc.edu/~nlp/papers/2005/fsmnlp05-pp.pdf.
— (2005b). “Approaching Sequential NLP Tasks with anAutomata Acquisition Algorithm.” In Proceedings ofInternational Conference on Recent Advances in NLP(RANLP’05). URL http://www.lsi.upc.edu/~nlp/papers/2005/ranlp05-pp.pdf.
— (2005c). “A Named Entity Recognition System Based on aFinite Automata Acquisition Algorithm.” Procesamiento delLenguaje Natural , 35: 319–326. URL http://www.lsi.upc.edu/~nlp/papers/2005/sepln05-pp.pdf.
— (2007a). “ME-CSSR: an Extension of CSSR using MaximumEntropy Models.” In Proceedings of Finite State Methods forNatural Language Processing (FSMNLP) 2007 . URL
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
http://www.lsi.upc.edu/%7Enlp/papers/2007/fsmnlp07-pp.pdf.
— (2007b). “Studying CSSR Algorithm Applicability on NLPTasks.” Procesamiento del Lenguaje Natural , 39: 89–96.URL http://www.lsi.upc.edu/%7Enlp/papers/2007/sepln07-pp.pdf.
Ray, Asok (2004). “Symbolic dynamic analysis of complexsystems for anomaly detection.” Signal Processing, 84:1115–1130.
Salmon, Wesley C. (1971). Statistical Explanation andStatistical Relevance. Pittsburgh: University of PittsburghPress. With contributions by Richard C. Jeffrey and James G.Greeno.
— (1984). Scientific Explanation and the Causal Structure ofthe World . Princeton: Princeton University Press.
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Shalizi, Cosma Rohilla (2001). Causal Architecture, Complexityand Self-Organization in Time Series and Cellular Automata.Ph.D. thesis, University of Wisconsin-Madison. URLhttp://bactra.org/thesis/.
— (2003). “Optimal Nonlinear Prediction of Random Fields onNetworks.” Discrete Mathematics and Theoretical ComputerScience, AB(DMCS): 11–30. URLhttp://arxiv.org/abs/math.PR/0305160.
Shalizi, Cosma Rohilla, Marcelo F. Camperi and Kristina LisaKlinkner (2007). “Discovering Functional Communities inDynamical Networks.” In Statistical Network Analysis:Models, Issues, and New Directions (Edo Airoldi andDavid M. Blei and Stephen E. Fienberg and AnnaGoldenberg and Eric P. Xing and Alice X. Zheng, eds.), vol.4503 of Lecture Notes in Computer Science, pp. 140–157.
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
New York: Springer-Verlag. URLhttp://arxiv.org/abs/q-bio.NC/0609008.
Shalizi, Cosma Rohilla and James P. Crutchfield (2001).“Computational Mechanics: Pattern and Prediction, Structureand Simplicity.” Journal of Statistical Physics, 104: 817–879.URL http://arxiv.org/abs/cond-mat/9907176.
Shalizi, Cosma Rohilla, Robert Haslinger, Jean-BaptisteRouquier, Kristina Lisa Klinkner and Cristopher Moore(2006). “Automatic Filters for the Detection of CoherentStructure in Spatiotemporal Systems.” Physical Review E ,73: 036104. URLhttp://arxiv.org/abs/nlin.CG/0508001.
Shalizi, Cosma Rohilla and Kristina Lisa Klinkner (2004). “BlindConstruction of Optimal Nonlinear Recursive Predictors forDiscrete Sequences.” In Uncertainty in Artificial Intelligence:
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Proceedings of the Twentieth Conference (UAI 2004) (MaxChickering and Joseph Y. Halpern, eds.), pp. 504–511.Arlington, Virginia: AUAI Press. URLhttp://arxiv.org/abs/cs.LG/0406011.
Shalizi, Cosma Rohilla, Kristina Lisa Klinkner and RobertHaslinger (2004). “Quantifying Self-Organization withOptimal Predictors.” Physical Review Letters, 93: 118701.URL http://arxiv.org/abs/nlin.AO/0409024.
Shalizi, Cosma Rohilla and Cristopher Moore (2003). “What Isa Macrostate? From Subjective Measurements to ObjectiveDynamics.” Electronic pre-print. URLhttp://arxiv.org/abs/cond-mat/0303625.
Takens, Floris (1981). “Detecting Strange Attractors in FluidTurbulence.” In Symposium on Dynamical Systems and
Cosma Shalizi Markovian Representations
Set-UpOptimality Properties
ReconstructionCausality
References
Turbulence (D. A. Rand and L. S. Young, eds.), pp. 366–381.Berlin: Springer-Verlag.
Tishby, Naftali, Fernando C. Pereira and William Bialek (1999).“The Information Bottleneck Method.” In Proceedings of the37th Annual Allerton Conference on Communication, Controland Computing (B. Hajek and R. S. Sreenivas, eds.), pp.368–377. Urbana, Illinois: University of Illinois Press. URLhttp://arxiv.org/abs/physics/0004057.
Cosma Shalizi Markovian Representations