Post on 22-Feb-2022
Attribution definition
[1] Sundararajan et al. "Axiomatic attribution for deep networks." ICML, 2017. 2
Attribution explanation
β’ A branch of semantic explanations
β’ Inferring contribution score of each individual feature
Label: cat
Input image
Predict Interpret
Attribution Heatmap
Definition 1. For a pre-trained model π, an attribution of prediction at
input π = [π₯1, . . . , π₯π] is a vector π = [π1, . . . , ππ], where ππ is the
contribution of π₯π to the prediction π(π).
3
Existing attribution methods
Many attribution methods are proposed recently.
Input sample
Gradient
*Input
Occlusion
DeepLIFTπ-LRP
Integrated
Grads
Expected
Gradients......
β’ Sensitivity
β’ Perturbation
β’ Layerwise decomposition
β’ Averaging gradients
Different formulations
Various heuristics
4
Problems of attribution explanation
The attrbution problem is not well-defined
β’ The definition is uninformative for how to assign the contribution
Many attribution methods are based on different heurisitcs
β’ Few theoretical foundations
β’ No mutuality among existing methods
β’ Difficult to compare theoretically
5
Contributions of this paper
β’ We propose a Taylor attribution framework, which offers a theoretical
formulation to the attribution problem.
β’ Fourteen mainstream attribution methods with different formulations are
unified into the proposed framework by theoretical reformulations.
β’ We propose principles for a reasonable attribution, and assess the fairness
of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
6
Contributions of this paper
β’ We propose a Taylor attribution framework, which offers a theoretical
formulation for how to assign contribution.
β’ Fourteen mainstream attribution methods are unified into the proposed
Taylor framework by theoretical reformulations.
β’ We propose principles for a reasonable attribution, and assess the
fairness of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
7
Attribution problem statement
Input: pre-trained model π, input sample π, and baseline π (no signal state)
Output: attribution vector π
Corresponds to π£(π) β π£(β )
However, there are infinite possible cases for such decomposition.
Which decomposition is reasonable?
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
π(π) β π(π) = π1 +. . . +ππ
Many attribution methods aim to distribute the outcome of π (w.r.t the
baseline π) to each feature,
8
Taylor attribution framework
Challengesβ’ DNN Model π is too complex to analyze
Basic ideaβ’ Taylor Theroem: If π(π) is infinitely differentiable, then π(π) β π(π)
can be approximated by a Taylor expansion function
β’ The Taylor expansion function can be explictly divided into independent
and interactive parts
β’ Then the attribution can be expressed as a function of Taylor independent
and interaction terms
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Second-order Taylor attribution
Second-order Taylor expansion
9
π(π) β π(π) =
π
ππ₯π βπ +1
2
π
π
ππ₯ππ₯π βπβπ + πΊ (π)
π(π) β π(π) =
π
ππ₯π βπ +1
2
π
ππ₯π2 βπ2 +
1
2
πβ π
ππ₯ππ₯π βπβπ + πΊ (π)
All high-order
independent terms, π»ππΈ
All high-order
interaction terms π°(πΊ)
All first-order
terms, π»ππΆ
Divide the expansion into first-order, high-order independent and interaction terms
Attribution vector can be expressed as a function of the three type terms
ππ = π(πππΌ , ππ
πΎ,
πΌ(π))ππ = ππππππππ π( π(π) β π(π))
Connections with related work
in Game theory
10
β’ Connection to Shapley Taylor interaction index [1]
Shapely Taylor interaction index π±π(πΊ) measures Taylor interactions of subsets
with at most π players.
When π = π, i.e., consider interactions of all subsets,
[1 ] Sundararajan, et al. The shapley taylor interaction index. ICML, 2020.
π±π(πΊ) = π°(πΊ), βπΊ
β’ πΌ(π) is a special case of Shapley Taylor interaction index.
11
Contributions of this paper
β’ We propose a Taylor attribution framework, which offers a theoretical
formulation for the attribution problem.
β’ We prove that, Fourteen attribution methods with different formulas can
be unified into the proposed Taylor attribution framework.
β’ We propose principles for a reasonable attribution, and assess the
fairness of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Unifying attribution maps of fourteen
methods by interactions
12
Attribution maps of Fourteen methods are unified into the Taylor attribution framework.
Specifically, they can expressed as a weighted sum of the three type terms.
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ) πΆπ, πΈπ, ππ
πΊ are the coefficients
Unifying attribution maps of fourteen
methods by interactions: GradientΓInput
13Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Intuition. Gradient*Input produces attribution maps with improved sharpnessοΌby multipling the gradients with the input.
Unification. GradientΓInput can be unified into
Taylor attribution framework.
Reformulation. In GradientΓInput, the corresponding coefficients are,
πΌπ = 1,
πΎπ = 0,
πππ = 0, β π
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
β’ Only assigns the first-order terms
Unifying attribution maps of fourteen
methods by interactions: Ξ΅-LRP
14
Intuition. It produces attribution maps by distributing the output in proportion
according to the input. It conducts in a layer-wise manner.
Unification. Ξ΅-LRP can be unified into the
Taylor attribution framework.
Reformulation. In Ξ΅-LRP, if relu is used as activation function, the corresponding
coefficients are [1],
πΌπ = 1,
πΎπ = 0,
πππ = 0, β π
[1] Ancona, Marco, et al. Towards better understanding of gradient-based attribution methods for deep neural networks. ICLR, 2018.
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
β’ Only assigns the first-order terms when relu is applied.
Unifying attribution maps of fourteen
methods by interactions: GradCAM
15
Intuition. GradCAM conducts global average pooling to the gradients, then
perform a linear combination
Unification. GradCAM can be unified into
Taylor attribution framework.
Reformulation. Define the global average pooled features as πΉ. Consider π(π) =β(πΉ). Then in GradCAM, the corresponding coefficients of function π are,
πΌπ = 1,
πΎπ = 0,
πππ = 0, β π
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
β’ Assigns the first-order terms of function β.
Unifying attribution maps of fourteen
methods by interactions: Occlusion-1&patch
16
Intuition. Occlude one pixel/patch, and observe how the prediction changes.
Unification. Occlusion-1 & Occlusion-patch can
be unified into Taylor framework.
Reformulation. In Occlusion-1, the corresponding coefficients are,
πΌπ = 1,
πΎπ = 1,
πππ = 1, ππ π β π
πππ = 0, ππ π β π
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
β’ assigns first-order, high-order independent terms of π₯π, and all interactions involving π₯π.
Unifying attribution maps of fourteen
methods by interactions: Shapley value
17
Intuition. Shapley value obtains the attribution map by averaging the marginal
contribution of π₯π to coalition π over all possible coalitions involving π₯π.
Unification. Shapley value can be unified into
Taylor attribution framework.
Reformulation. In Shapley value, the corresponding coefficients are,
πΌπ = 1,
πΎπ = 1,
πππ = 1/|π|, ππ π β π
πππ = 0, ππ π β π
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
β’ assigns first-order, independent terms of π₯π, and 1/|S| proportion of interactions involving π₯π.
Unifying attribution maps of fourteen
methods by interactions: Integrated Grads
18
Intuition. It produces the attribution map by integrating the gradients along a
straight line from baseline π to input π.
Unification. Integrated Gradients can be unified
into Taylor attribution framework.
Reformulation. In Integrated Gradients, the corresponding coefficients are,
πΌπ = 1,
πΎπ = 1,
πππ(π) = ππ/πΎ, ππ π β π, π = [π1, . . . , ππ],
πππ = 0, ππ π β π
β’ assigns first-order,independent terms of π₯π, and ππ/π² proportion of interaction terms ππππππ
ππ . . . ππππ to π₯π.
β’ For example, π(π) = π₯1π₯23 + π₯1
2π₯2π₯32 , then π2 =
3
4π₯1π₯2
3 +1
5π₯12π₯2π₯3
2 .
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
πΎ = π1+. . . +ππ
Unifying attribution maps of fourteen
methods by interactions: DeepLIFT Rescale
19
Intuition. DeepLIFT propogates the output difference in proportion according to
the input difference. Such propogation proceeds in a layer-wise manner.
Unification. DeepLIFT Rescale can be unified
into Taylor attribution framework.
Reformulation. Consider the attribution at π layer. If ππ(π) = π(ππ»π + π), then
in DeepLIFT Rescale, the corresponding coefficients are,
πΌπ = 1,
πΎπ = 1,
πππ(π) = ππ/πΎ, ππ π β π, π = [π1, . . . , ππ]
πππ = 0, ππ π β π
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
πΎ = π1+. . . +ππ
β’ Shares the same coefficients as Integrated gradients at each layer.
Unifying attribution maps of fourteen
methods by interactions: Deep Taylor
20
Intuition. proceeds in a layer-wise manner. It propgates all relevances to the
features with positive weight.
Unification. Deep Taylor can be unified into the framework.
Reformulation. Define π+ = {π|π€ππ β₯ 0} and πβ = {π|π€ππ < 0}, where π€ππ is the
parameters at π layer. In Deep Taylor, for features in π+, the coefs are,
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+ππ
πΊπ°(πΊ)
πΎ = π1+. . . +ππ
β’ Noted that the interactions among features in πβ are assigned to features in π+.
πΌπ = 1,
πΎπ = 1,
πππ(π) = ππ/πΎ, ππ π β π, π = [π1, . . . , ππ]
πππ = 0, ππ π β π, π β πβ
πππ = π§ππ
+/π§π , ππ π β π, π β πβ
Unifying attribution maps of fourteen
methods by interactions: LRP-πΆπ·
21
Intuition. propgates πΆ times relevances to the features with positive weight, and
π· times to the features with negative weight.
Unification. LRP-πΌπ½ can be unified into the
Taylor attribution framework.
Reformulation. Define π+ = {π|π€ππ β₯ 0} and πβ = {π|π€ππ < 0}, where π€ππ is the
parameters at π layer. In LRP-πΌπ½, for features in π+, the coefficients are,
First-order terms, π»ππΆ
high-order independent terms, π»ππΈ
high-order interaction terms, π°(πΊ)
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
β’ The coefficients are Ξ± times the coefficients in Deep Taylor attribution.
πΌπ = πΌ,
πΎπ = πΌ,
πππ(π) = πΌ ππ/πΎ ,
ππ π β π, π = [π1, . . . , ππ]
πππ = 0, ππ π β π, π β πβ
πππ = πΌ π§ππ
+/π§π , ππ π β π, π β πβ
πΎ = π1+. . . +ππ
Unifying attribution maps of fourteen
methods by interactions: Expected Attribution
22
Intuition. proposed to reduce the probability that attribution is dominated by a
specific baseline, which averages the attributions over multiple baselines.
Unification. Combining Eq.(1) with the previous reformulations, Expected
Attributions can be unified into the Taylor attribution framework.
β’ For example, Expected Gradients, Expected DeepLIFT, and Deep Shapley.
ππππ₯π
= ΰΆ±π(π)πππππ ππππ (1)
where πππππ ππ is the attribution obtained by basic methods.
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊπ°(πΊ)
23
Contributions of this paper
β’ We propose a Taylor attribution framework, which offers a theoretical
formulation for the attribution problem.
β’ We prove that, Fourteen attribution methods with different formula can be
unified into the proposed Taylor attribution framework.
β’ We propose principles for a reasonable attribution, and assess the
fairness of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Principles for a reasonable attribution
24Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
β’ We proved that, attribution maps of fourteen methods can be unified as the
following form:
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊ π°(πΊ)
How to define a reasonable attribution map?
Principles for a reasonable attribution
25
β’ The first-order terms of π₯π should be all assigned to π₯π. β’ The high-order independent terms of π₯π should be all assigned to π₯π. β’ Only Interactions of π involving π₯π , should be assigned to π₯π.
πΆπ = π, πΈπ = π,
πππΊ > π, ππ π β πΊ
πππΊ = π, ππ π β πΊ
Principle 1:
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊ π°(πΊ)
ππ
ππππ +ππ = π
Principles for a reasonable attribution
26
β’ Interactions of any coalition π should be all distributed to the players in π.
π βπΊ
πππΊ = π, βπΊ
Principle 2:
ππ = πΆπ π»ππΆ + πΈπ π»π
πΈ+
π
πππΊ π°(πΊ)
ππ
ππππ +ππ = π
Assessing the fairness of existing
attribution methods
27
For example, Shapley value well satisfies the two principles.
πΌπ = 1, πΎπ = 1,
πππ = 1/|π|, ππ π β π
πππ = 0, ππ π β π
β’ Interactions of π are evenly assigned to the players in π.
β’ In this sense, Shapley value is a fair attribution.
For example, Occlusion-1 satisfies principle 1, doesnβt satisfy principle 2.
πΌπ = 1, πΎπ = 1,
πππ = 1, ππ π β π
πππ = 0, ππ π β π
π βπΊ
πππΊ = |πΊ|, βπΊ
β’ Interactions of π are repeatedly assigned to each player.
These principles can be applied to assess the fairness of exsiting methods.
+1
2 +1
3=