AlexanderMarxJillesVreeken
TellingCausefromEffectusingMDL-basedLocalandGlobalRegression
November20,2017
QuestionoftheDay
2
How can we infer the causal directionbetween two univariate numeric
random variables 𝑿and 𝒀and give a reliable confidence measure?
Howthisiscurrentlydone
3
0 1
0
1
X
Y
Additive Noise Models (ANMs)
If 𝑋 → 𝑌, there exists a function 𝑓s.t.
𝑌 = 𝑓 𝑋 + 𝑁, with 𝑁∥𝑋,
but not for the inverse direction
Problemsg need for powerful independence testg often independence holds
(to some extend) in both directionsg 𝑝-values are sensitive to sample size
Howwedoit
We build on the Algorithmic Markov Condition
Formally, if 𝑋 → 𝑌, then
𝐾 𝑃(𝑋) + 𝐾 𝑃(𝑌|𝑋) ≤ 𝐾 𝑃 𝑌 + 𝐾 𝑃(𝑋 𝑌))
where 𝐾(𝑥) denotes the Kolmogorov complexity of 𝑥, which is the length of the shortest program that outputs 𝑥 and halts.
(Janzing and Schölkopf 2010)4
“Simpler to describe cause, and then effect given cause,than if we do so vice versa.”
Informally:
TheMinimumDescriptionLengthPrinciple
Approximate 𝑲with 𝑳, using two-part MDL
Given model class ℳ, the best model 𝑀 ∈ ℳ for data 𝐷, is that 𝑀 that minimizes the total encoded cost.
5
Costs to encode 𝐷and 𝑀
Complexity of 𝑀
Complexity of 𝐷given 𝑀+=
𝐿 𝐷 = min>∈ℳ
𝐿 𝑀 + 𝐿 𝐷 𝑀)
MDL avoids overfitting by considering both the fit of the model, as well as the model complexity
MDLasaPracticalInstantiation
If 𝑋 → 𝑌, we have that
𝐿 𝑋 + 𝐿 𝑌 𝑋)𝐿 𝑋 + 𝐿(𝑌)
<𝐿 𝑌 + 𝐿 𝑋 𝑌)𝐿 𝑋 + 𝐿(𝑌)
Encode using two-part MDLg restrict model class to regression functionsg that is, we need to find that function that minimizes
the complexity of model and the data given the model
6
Howtoencodeafunction
Encode 𝐿 𝑌 𝑋) = 𝐿 𝑓 + 𝐿 𝑌 𝑓, 𝑋)g 𝑌 = 𝑓 𝑋 + 𝑁g 𝑓is a regression functiong 𝑓minimizes two-part cost
Data given model 𝑳 𝒀 𝒇, 𝑿) = 𝑳 𝑵g encode noise assuming a normal distribution
Model 𝑳(𝒇)g encode type (linear, quadratic, exponential, …)g encode parameters 𝛼, 𝛽, 𝛾, …
7
0 1
0
1
X
Y
Non-determinism
8
0 1
0
1
X
Y�1 1
0
0.7
x̃
yi
asso
ciat
edto
x
Non-determinism
Setupg global trend functiong local functions restricted to the same type (we
assume noise follows same distribution)
Computationg for 𝑥 ∈ 𝑋with 𝑐𝑜𝑢𝑛𝑡 𝑥 = 𝑐
g create 𝑥L = 𝑠𝑒𝑞(− QR, … , Q
R)of length 𝑐
g 𝐹𝑖𝑡(𝑠𝑜𝑟𝑡 𝑦W𝑚𝑎𝑝𝑝𝑒𝑑𝑡𝑜𝑥 , 𝑥L)
Encodingg encode type and parameters of the functionsg encode mapping of local functions to 𝑥 ∈ 𝑋
9
0 1
0
1
X
Y�1 1
0
0.7
x̃
yi
asso
ciat
edto
x
Slope– compute𝐿 𝑌 𝑋)
10
1 F = ;;2 fg fit global function and add fg to F ;3 for each function type t do4 Ft F ;5 for x 2 X, count(x) > � do6 fl fit local function on x̃ of x;7 if adding fl to Ft reduces overall costs then8 Ft = Ft [ fl;9 end
10 end11 F min(F, Ft);12 end13 return costs of Y given F and X;
ConfidenceandSignificance
How certain are we?
But, is a given inference significant?g we can use the no-hypercompression inequality to test significance
P 𝐿\ 𝑋 − 𝐿 𝑋 ≥ 𝑘 ≤ 2`a
g our null hypothesis 𝐿\ is that 𝑿 and 𝒀 are only correlated
(for details on no-hypercompression inequality see Grünwald, 2007)11
ℂ =𝐿 𝑋 + 𝐿 𝑌 𝑋)𝐿 𝑋 + 𝐿 𝑌
−𝐿 𝑌 + 𝐿 𝑋 𝑌)𝐿 𝑋 + 𝐿 𝑌
g the higher the more certaing robust w.r.t. sample size
𝐿(𝑋 → 𝑌) 𝐿(𝑌 → 𝑋)
ConfidenceRobustness
100
250
500
1000
0
200
400
600
800
# data points
confi
denc
e
12
100
250
500
1000
2
3
4
# data points
confi
denc
e
100
250
500
1000
0.1
0.2
0.3
0.4
0.5
# data points
confi
denc
e
RESIT(HSIC idep.)
IGCI(Entropy)
SLOPE(Compression)
SyntheticdatageneratedwithANMs
uu
ug
un
gu
gg
gn
bu
bg
bn pu
pg
pn
0
20
40
60
80
100
generating model
accura
cy
in[%
]
RESIT
IGCI
SLOPE
13
Uniform
𝑌 = 𝑓 𝑋 + 𝑁
Gaussian Binomial Poisson
PerformanceonBenchmarkData(Tübingen 97 univariate numeric cause-effect pairs)
14
Inferences of state of the art algorithms ordered by confidence values.
SLOPE is 85% accurate with 𝛼 = 0.001
Conclusions
We considered causal inference on univariate numeric data
g we propose an MDL score for local and global regression
g reliable confidence score, indep. of sample size
g significance test based on no-hypercompression
g very good performance on synthetic data,outclassing the state of the art on benchmark data
Future: Consider multivariate, mixed-type and causal graphs
15
Thank you!
16
We considered causal inference on univariate numeric data
g we propose an MDL score for local and global regression
g reliable confidence score, indep. of sample size
g significance test based on no-hypercompression
g very good performance on synthetic data,outclassing the state of the art on benchmark data
Future: Consider multivariate, mixed-type and causal graphs
Runtimes
IGC
IS
LO
PE
RE
SIT
AN
M
CU
RE
101102103104105106
tim
e(s
)
17
SlopeonlyDeterministic
18
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
decision rate
%of
corr
ect
decis
ions
SLOPE SLOPED
CURE RESIT
IGCI ANM
FormulasEncoding a function
𝐿 𝑓 = f 𝐿ℕ 𝑠 + 𝐿ℕ(𝜙 ⋅ 10j)�
l∈m
Encoding the model
𝐿 𝐹 = 𝐿ℕ 𝐹 + log𝑋 − 1𝐹q − 1
+ 2 log(ℱ)
+𝐿 𝑓s + f 𝐿(𝑓q)�
tu∈vu
Encoding the data given the model
𝐿 𝑌 𝐹, 𝑋) = f𝑛t2
1ln 2
+ log 2𝜋𝜎R − 𝑛t log 𝜏�
t∈v
19
Top Related