Giuseppe Biondi-Zoccai [email protected] Division of Cardiology, University of Turin, Turin, Italy.
Information Theory in Medical Imaging · •Founded information theory with one landmark paper in...
Transcript of Information Theory in Medical Imaging · •Founded information theory with one landmark paper in...
Information Theory in Medical
Imaging
Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases
Objectives
• Learn how to quantify information
• Understand the use of information theory
in imaging
Topics
I. Measures of information
II. Applications to Imaging
I. Co-registration
II. Feature selection
III. Model-free analysis
III. Time Series Analysis (e.g. BOLD, EEG)
I. Functional connectivity
II. Signal complexity and unpredictability
Information and Uncertainty
Calculations are based on the assumption that symbols have equal chance of appearance
log(1/3) log(3)
log(1/ 2) log(2)
log(3) log(2) log(6) Both combined:
Random Generator Which char comes
next?
Change in
Uncertainty
log(1) 0
More On Uncertainty Three symbols, but one appears infrequent
Which char
comes next?
M symbols = 3
Code length = 9
Odds:
p(A) = 3/9
p(B) = 4/9
p(@) = 2/9
Weighted log-odds,
e.g. of A: 3/8*log (3/8)
2
2
1
1
log
: log
1
Shannon
i i
M
i i
i
M
i
i
p p
Many H p p
p
Entropy
Example: Entropy Of mRNA
…ACGTAACACCGCACCTG
2
1 1 1 1; ; ;
2 4 8 8
.. log : 1; 2; 3; 3
1 1 1 1( ) 1 2 3 3 1.75
2 4 8 8
A C G T
i A C G T
p p p p
Logodds p p p p p
Entropy bits H
Assume “A” occurs 50% of all instances, “C” = 25%, “G” and “T”, each = 12.5%
Concept Of Entropy
•Shannon Entropy formulated by Claude Shannon
•American mathematician
•Founded information theory with one landmark paper in 1948
•Worked with Alan Turin on cryptography during WWII
•History of Entropy
•1854 – Rudolf Clausius, German physicist, developed theory of heat
•1876 – Williard Gibbs, American physicist, used it for theory of energy
•1877 – Ludwig Boltzmann, Austrian physicist, formulated theory of thermodynamics
•1879 – Gibbs re-formulated entropy in terms of statistical mechanics
•1948 - Shannon
Claude Shannon
1916-2001
Three Interpretations of Entropy
• The uncertainty in the outcome of an event
– Systems with frequent event have lower entropy than
systems with infrequent events.
• The amount of surprise (information)
– Observing an infrequent event is a greater surprise, i.e.
carries more information and thus has higher entropy, than
a frequent event.
• The dispersion in the probability distribution
– An uniform image has a less disperse histogram and thus
lower entropy than a heterogeneous image.
Various Formulations Of Entropy
Names
Shannon
Renyi
Aczel
Aczel
Aczel
Varma
Varma
Kapur
Hadra
Arimoto
Sharma
Sharma
Various Formulations Of Entropy II
Names
Taneja
Sharma
Sharma
Ferreri
Sant'Anna
Sant'Anna
Picard
Picard
Picard
Picard
Picard
Entropy In Medical Imaging
H= 6.0 bits
Entropy Map Histogram
N=1
Averaging
H= 4.0 N=40
Histograms
image brain only
Co-variance map
H= 3.9
H= 10.1
Effect Of Noise On Entropy
Entropy
Increasing Noise
Entr
opy
Map
SNR=20
Signal-to-Noise
Level
SNR=2
SNR=0.2
Uses Of Information Theory In
Imaging
• Entropy measures of mixed distributions
• Applications:
– image registration
– Quantification of image quality
– Model-free analysis
Image Registration
• Define a transform T that maps one image onto
another image such that some measure of
overlap is maximized .
– Use entropy as a means for determining max. overlap
CT MRI CT
MR
I
Entropy In Image Registration • Define joint probability distribution of
images as metric for the degree of image registration:
• Computation: – Use a 2-D histogram with each axis
designing the frequency of intensity values in each image
– Build the 2D histogram by incrementing each cell each time there is a intensity “co-occurrence” between the two images
– For registered images, we expect the 2D histogram is highly focused. Mis-registration will lead to greater dispersion of the 2D histogram
– Since entropy measures the degree of histogram dispersion, we can use entropy for quantifying image registration
– Formally, we minimize the joint entropy between images A and B
Joint 2D
histogram
A B
A
B
Joint Entropy
Joint entropy (entropy of mixed probability distributions):
Definition:
• H(X,Y) = H(X) + H(X) – MI(X:Y)
Properties:
• H(X,Y)=H(X)+H(Y), i.e. MI(X;Y) = 0, only if X and Y are completely independent, .
• Image registration can be guided by minimizing H(X,Y)
),(log),(),( iii
Xx Yy
i yxpyxpYXHi i
Venn Diagram
M
Example Joint Entropy of 2-D Histogram for rotation of image with respect
to itself of 0, 2, 5, and 10 degrees
Entire Rotation Series
Rotation (degrees) 0 2 5 10
H(X,Y): 5.5 7.1 9.6 14.1
Mutual Information
Definition:
MI(X,Y) = H(Y) - H(Y|X) = H(X) - H(X|Y)
This is the amount by which the uncertainty of X is reduced if Y is known or vice versa.
MI(X,Y) = H(X) + H(Y) - H(X,Y)
maximizing MI(X,Y) is equivalent to minimizing joint entropy H(X,Y)
Properties:
• MI = 0 indicates independence
• Image registration can be guided by maximizing MI
Venn Diagram
M
,
( , )( , ) ( , ) log
( ) ( )i i
i ii i
x X y Y i i
p x yMI X Y p x y
p x p y
Other Properties of Mutual
Information • MI is symmetric: MI (X,Y) = MI(Y,X)
• MI(X,X) = H(X)
• MI(X,Y) <= H(X) or H(Y)
– Information that each image contains about the other image
cannot be greater than the total information in each image.
• MI(X,Y) >= 0
– Cannot increase uncertainty in X by knowing Y
• MI(X,Y) = 0 only if X and Y are independent
M
Mutual Information vs. Joint Entropy
• Mutual information has the advantage over joint entropy that entropy of each
distribution is included separately.
Venn Diagram
M
Conditional Entropy
Let X be given. What can be
gained from measuring Y?
Yy
iii
Xx
i
ii
xypypxpXYH |log)|(
Yy ii
iii
ixyp
xpyxpXYH
,
)(log,)|(
Definition:
• H(Y|X) = H(X,Y) – H(X)
• H(X|Y) = H(X,Y) – H(Y)
Properties Of Conditional Entropy
• Quantifies the information gain from Y when X is given
• Note: H(X|Y) is not equal H(Y|X)
• H(X|Y) = 0 only if X and Y are identical distributions; otherwise
H(X|Y) and H(Y|X) > 0
Relative Entropy
A non-symmetric measure of the difference between distributions p(X) and
q(X).
• D(P||Q} is also known as Kullback-Leibler divergence Dkl
• Since D(P||Q| is non-symmetric it is not a true metric
• But a symmetric definition exists:
• sKL = ½*[ D(P||Q) + D(Q||P)]
i i
ii
yq
xpxpQPD log)(||
Uses Of Relative Entropy
How much noisier is image A compared to image B?
Map Histograms Divergence of B from A
Increasing noise level
sK
L
A
B
*sKL stands for symmetrical relative entropy, aka Kullback-Leibler
divergence
Uses Of Relative Entropy
Detect movements & image corruptions
Series Number
sK
L
#1 #2 ……… #5 #6 ………… #13
Model-Free Analysis With Relative
Entropy
A
B
C
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
0 20 40 60 80 100 120 140
0.20.4
0.60.8
Index
x
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
time
a t
b t
c t
Example: connectivity patterns in resting-state fMRI
fMRI Analysis Of Functional
Connectivity
1( , )
1
M
i i
i
a a b b
corr A BM std a std b
A B
C
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
0 20 40 60 80 100 120 140
0.20.4
0.60.8
Index
x
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
time
a t
b t
c t
Individual correlation matrices
PD patient Control
Brain regions
Bra
in r
egio
ns
Bra
in r
egio
ns
Brain regions
A Model-Free Analysis
Conventional (model-based): compare correlation matrices between
patients and controls region by region, e.g. using t-tests
Alternative (model-free): compare overall structure of correlation matrices between
patients and controls using relative entropy
PD patient Control
Brain regions
Bra
in r
egio
ns
Bra
in r
egio
ns
Brain regions
Transformation Into Eigen-Space
PD patient Control
regions
re
gio
ns
regions
re
gio
ns
Eig
enve
cto
rs
Eigenvectors
Eig
enve
cto
rs
Eigenvectors
Eigenspace representation
e.g. principle components
analysis
Eigenvalue Distribution
Each trajectory
represents a subject.
Red = Parkinson’s;
blue = control
Ranking
Magnitude O
f E
igenvalu
e
Relative Entropy Of Eigenvalue
Distributions
Each point represents
the relative entropy of
a subject’s
eigenvalues relative to
the eigenvalues of
uniform connectivity.
Red = Parkinson’s;
blue = control
Image Feature Selection
Using Information Theory
Objective
Learn how to use information theory for
determining imaging features
Mutual Information based Feature
Selection Method
• We test the ability to separate two classes
(features).
• Let X be the feature vector, e.g. co-
occurances of intensities
• Y is the classification, e.g. particular
intensities
• How to maximize the separability between
the classification?
Mutual Information based Feature
Selection • MI tests a feature’s ability to separate two classes.
– Based on definition 3) for mutual information
– Here X is the feature vector and Y is the classification
• Note that X is continuous while Y is discrete
– By minimizing MI(X,Y), we maximize the separability of the
feature
• Note this method only tests each feature individually
( , )( , ) ( , ) log
( ) ( )i i
i ii i
x X y Y i i
p x yMI X Y p x y
p x p y
Joint Mutual Information based
Feature Selection Method
• Joint MI tests a feature’s independence from all other
features:
• Two implementations:
– 1) Compute all individual MI.s and sort from high to low
– 2) Test the joint MI of current feature while keeping all others
• Keep the features with the lowest JMI (implies
independence)
• Implement by selecting features that maximize:
Information Theory Of Time
Series Signals
Objective
Learn how to use information theory for
quantifying functional brain signals
How much information carry FMRI fluctuations?
A
B
C
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
0 20 40 60 80 100 120 140
0.20.4
0.60.8
Index
x
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
time
a t
b t
c t
Time-series = a sequence of time ordered observations of some system
Limits of conventional correlation analysis:
1. based on deterministic system
2. does not quantify information, e.g. surprise
Mutual Information
• Suppose one system stochastically generates events:
• Probability of events:
– Observed p(xn) , e.g. 01101111
– New p(xn+1) , e.g. 011011110
– MI is not effective at predicting future events from current data,
because it is symmetric, M(X,Y)=M(Y,X)
– MI does not indicate in which direction information flows
n nn
nnnn
xpxp
xxpxxpMI
1
11
,log),(
Conditional Entropy • Suppose one systems stochastically generate events:
• Probability of events:
– Observed p(xn) , e.g. 01101111
– New p(xn+1) , e.g. 011011110
Define entropy rate for one of the systems
– This provides an information rate
– But still works for one system only
– This works, however only for one system
– Relative Entropy cannot be used to measure information flow
between two systems, e.g. two brain regions
n
nnnnnn xxxxpxxH |log),(| 1111
Transfer Entropy • Suppose two connected systems stochastically generate
events:
• Probability of events:
– Observed p(xn) , e.g. 01101111
– Observed p(yn) , e.g. 10010000
– New p(xn+1) , e.g. 011011110
– New p(yn+1) , e.g. 100100000
Define entropy rate for one of the systems
Entropy rate between the systems, e.g. xy
– This works, however only for one system
– Relative Entropy cannot be used to measure information flow
between two systems, e.g. two brain regions
n
nnnnnnnn yxxyxxpxxH ,|log),,(| 1111
n nn
nnnnnnnnyx
xxp
yxxpyxxpxxH
|
,|log),,(|
1
111
Application EEG
J Comput Neurosci. 2011 February; 30(1): 45–67.
Published online 2010 August 13. doi: 10.1007/s10827-010-0262-3
Application EEG
J Comput Neurosci. 2011 February; 30(1): 45–67.
Published online 2010 August 13. doi: 10.1007/s10827-010-0262-3
Finger lifting
2 4 6 8 10
23
45
67
Index
H13
[1:1
0]
0 L
Block Entropy H(L):
Uncertainty in identifying
fluctuation patterns.
Quantification of Uncertainty in fMRI using Entropy
A
B
C
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
0 20 40 60 80 100 120 1400.2
0.40.6
0.8Index
x
0 20 40 60 80 100 120 140
0.00.2
0.40.6
0.81.0
Index
x
Pr(
sL )
sliding block (L)
Probability distribution over block length L
H(L)
Bits of information
Summary Metric Description
Entropy Uncertainty associated with
observations
Mutual Information Mutual dependence of random
observations, i.e. the joined entropy
Conditional entropy Uncertainty associated with multiple
sets of observations, when one of them
is known
Relative entropy
aka Kullback-Leibler divergence
Difference between two distributions
Transfer Entropy Rate of information flow from one time
series to another
Block Entropy Rate of information flow across sets of
symbols