Information Theory in Medical Imaging · •Founded information theory with one landmark paper in...

Post on 02-Aug-2020

0 views 0 download

Transcript of Information Theory in Medical Imaging · •Founded information theory with one landmark paper in...

Information Theory in Medical

Imaging

Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases

Norbert.schuff@ucsf.edu

Objectives

• Learn how to quantify information

• Understand the use of information theory

in imaging

Topics

I. Measures of information

II. Applications to Imaging

I. Co-registration

II. Feature selection

III. Model-free analysis

III. Time Series Analysis (e.g. BOLD, EEG)

I. Functional connectivity

II. Signal complexity and unpredictability

Information and Uncertainty

Calculations are based on the assumption that symbols have equal chance of appearance

log(1/3) log(3)

log(1/ 2) log(2)

log(3) log(2) log(6) Both combined:

Random Generator Which char comes

next?

Change in

Uncertainty

log(1) 0

More On Uncertainty Three symbols, but one appears infrequent

Which char

comes next?

M symbols = 3

Code length = 9

Odds:

p(A) = 3/9

p(B) = 4/9

p(@) = 2/9

Weighted log-odds,

e.g. of A: 3/8*log (3/8)

2

2

1

1

log

: log

1

Shannon

i i

M

i i

i

M

i

i

p p

Many H p p

p

Entropy

Example: Entropy Of mRNA

…ACGTAACACCGCACCTG

2

1 1 1 1; ; ;

2 4 8 8

.. log : 1; 2; 3; 3

1 1 1 1( ) 1 2 3 3 1.75

2 4 8 8

A C G T

i A C G T

p p p p

Logodds p p p p p

Entropy bits H

Assume “A” occurs 50% of all instances, “C” = 25%, “G” and “T”, each = 12.5%

Concept Of Entropy

•Shannon Entropy formulated by Claude Shannon

•American mathematician

•Founded information theory with one landmark paper in 1948

•Worked with Alan Turin on cryptography during WWII

•History of Entropy

•1854 – Rudolf Clausius, German physicist, developed theory of heat

•1876 – Williard Gibbs, American physicist, used it for theory of energy

•1877 – Ludwig Boltzmann, Austrian physicist, formulated theory of thermodynamics

•1879 – Gibbs re-formulated entropy in terms of statistical mechanics

•1948 - Shannon

Claude Shannon

1916-2001

Three Interpretations of Entropy

• The uncertainty in the outcome of an event

– Systems with frequent event have lower entropy than

systems with infrequent events.

• The amount of surprise (information)

– Observing an infrequent event is a greater surprise, i.e.

carries more information and thus has higher entropy, than

a frequent event.

• The dispersion in the probability distribution

– An uniform image has a less disperse histogram and thus

lower entropy than a heterogeneous image.

Various Formulations Of Entropy

Names

Shannon

Renyi

Aczel

Aczel

Aczel

Varma

Varma

Kapur

Hadra

Arimoto

Sharma

Sharma

Various Formulations Of Entropy II

Names

Taneja

Sharma

Sharma

Ferreri

Sant'Anna

Sant'Anna

Picard

Picard

Picard

Picard

Picard

Entropy In Medical Imaging

H= 6.0 bits

Entropy Map Histogram

N=1

Averaging

H= 4.0 N=40

Histograms

image brain only

Co-variance map

H= 3.9

H= 10.1

Effect Of Noise On Entropy

Entropy

Increasing Noise

Entr

opy

Map

SNR=20

Signal-to-Noise

Level

SNR=2

SNR=0.2

Uses Of Information Theory In

Imaging

• Entropy measures of mixed distributions

• Applications:

– image registration

– Quantification of image quality

– Model-free analysis

Image Registration

• Define a transform T that maps one image onto

another image such that some measure of

overlap is maximized .

– Use entropy as a means for determining max. overlap

CT MRI CT

MR

I

Entropy In Image Registration • Define joint probability distribution of

images as metric for the degree of image registration:

• Computation: – Use a 2-D histogram with each axis

designing the frequency of intensity values in each image

– Build the 2D histogram by incrementing each cell each time there is a intensity “co-occurrence” between the two images

– For registered images, we expect the 2D histogram is highly focused. Mis-registration will lead to greater dispersion of the 2D histogram

– Since entropy measures the degree of histogram dispersion, we can use entropy for quantifying image registration

– Formally, we minimize the joint entropy between images A and B

Joint 2D

histogram

A B

A

B

Joint Entropy

Joint entropy (entropy of mixed probability distributions):

Definition:

• H(X,Y) = H(X) + H(X) – MI(X:Y)

Properties:

• H(X,Y)=H(X)+H(Y), i.e. MI(X;Y) = 0, only if X and Y are completely independent, .

• Image registration can be guided by minimizing H(X,Y)

),(log),(),( iii

Xx Yy

i yxpyxpYXHi i

Venn Diagram

M

Example Joint Entropy of 2-D Histogram for rotation of image with respect

to itself of 0, 2, 5, and 10 degrees

Entire Rotation Series

Rotation (degrees) 0 2 5 10

H(X,Y): 5.5 7.1 9.6 14.1

Mutual Information

Definition:

MI(X,Y) = H(Y) - H(Y|X) = H(X) - H(X|Y)

This is the amount by which the uncertainty of X is reduced if Y is known or vice versa.

MI(X,Y) = H(X) + H(Y) - H(X,Y)

maximizing MI(X,Y) is equivalent to minimizing joint entropy H(X,Y)

Properties:

• MI = 0 indicates independence

• Image registration can be guided by maximizing MI

Venn Diagram

M

,

( , )( , ) ( , ) log

( ) ( )i i

i ii i

x X y Y i i

p x yMI X Y p x y

p x p y

Other Properties of Mutual

Information • MI is symmetric: MI (X,Y) = MI(Y,X)

• MI(X,X) = H(X)

• MI(X,Y) <= H(X) or H(Y)

– Information that each image contains about the other image

cannot be greater than the total information in each image.

• MI(X,Y) >= 0

– Cannot increase uncertainty in X by knowing Y

• MI(X,Y) = 0 only if X and Y are independent

M

Mutual Information vs. Joint Entropy

• Mutual information has the advantage over joint entropy that entropy of each

distribution is included separately.

Venn Diagram

M

Conditional Entropy

Let X be given. What can be

gained from measuring Y?

Yy

iii

Xx

i

ii

xypypxpXYH |log)|(

Yy ii

iii

ixyp

xpyxpXYH

,

)(log,)|(

Definition:

• H(Y|X) = H(X,Y) – H(X)

• H(X|Y) = H(X,Y) – H(Y)

Properties Of Conditional Entropy

• Quantifies the information gain from Y when X is given

• Note: H(X|Y) is not equal H(Y|X)

• H(X|Y) = 0 only if X and Y are identical distributions; otherwise

H(X|Y) and H(Y|X) > 0

Relative Entropy

A non-symmetric measure of the difference between distributions p(X) and

q(X).

• D(P||Q} is also known as Kullback-Leibler divergence Dkl

• Since D(P||Q| is non-symmetric it is not a true metric

• But a symmetric definition exists:

• sKL = ½*[ D(P||Q) + D(Q||P)]

i i

ii

yq

xpxpQPD log)(||

Uses Of Relative Entropy

How much noisier is image A compared to image B?

Map Histograms Divergence of B from A

Increasing noise level

sK

L

A

B

*sKL stands for symmetrical relative entropy, aka Kullback-Leibler

divergence

Uses Of Relative Entropy

Detect movements & image corruptions

Series Number

sK

L

#1 #2 ……… #5 #6 ………… #13

Model-Free Analysis With Relative

Entropy

A

B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 140

0.20.4

0.60.8

Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

time

a t

b t

c t

Example: connectivity patterns in resting-state fMRI

fMRI Analysis Of Functional

Connectivity

1( , )

1

M

i i

i

a a b b

corr A BM std a std b

A B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 140

0.20.4

0.60.8

Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

time

a t

b t

c t

Individual correlation matrices

PD patient Control

Brain regions

Bra

in r

egio

ns

Bra

in r

egio

ns

Brain regions

A Model-Free Analysis

Conventional (model-based): compare correlation matrices between

patients and controls region by region, e.g. using t-tests

Alternative (model-free): compare overall structure of correlation matrices between

patients and controls using relative entropy

PD patient Control

Brain regions

Bra

in r

egio

ns

Bra

in r

egio

ns

Brain regions

Transformation Into Eigen-Space

PD patient Control

regions

re

gio

ns

regions

re

gio

ns

Eig

enve

cto

rs

Eigenvectors

Eig

enve

cto

rs

Eigenvectors

Eigenspace representation

e.g. principle components

analysis

Eigenvalue Distribution

Each trajectory

represents a subject.

Red = Parkinson’s;

blue = control

Ranking

Magnitude O

f E

igenvalu

e

Relative Entropy Of Eigenvalue

Distributions

Each point represents

the relative entropy of

a subject’s

eigenvalues relative to

the eigenvalues of

uniform connectivity.

Red = Parkinson’s;

blue = control

Image Feature Selection

Using Information Theory

Objective

Learn how to use information theory for

determining imaging features

Mutual Information based Feature

Selection Method

• We test the ability to separate two classes

(features).

• Let X be the feature vector, e.g. co-

occurances of intensities

• Y is the classification, e.g. particular

intensities

• How to maximize the separability between

the classification?

Mutual Information based Feature

Selection • MI tests a feature’s ability to separate two classes.

– Based on definition 3) for mutual information

– Here X is the feature vector and Y is the classification

• Note that X is continuous while Y is discrete

– By minimizing MI(X,Y), we maximize the separability of the

feature

• Note this method only tests each feature individually

( , )( , ) ( , ) log

( ) ( )i i

i ii i

x X y Y i i

p x yMI X Y p x y

p x p y

Joint Mutual Information based

Feature Selection Method

• Joint MI tests a feature’s independence from all other

features:

• Two implementations:

– 1) Compute all individual MI.s and sort from high to low

– 2) Test the joint MI of current feature while keeping all others

• Keep the features with the lowest JMI (implies

independence)

• Implement by selecting features that maximize:

Information Theory Of Time

Series Signals

Objective

Learn how to use information theory for

quantifying functional brain signals

How much information carry FMRI fluctuations?

A

B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 140

0.20.4

0.60.8

Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

time

a t

b t

c t

Time-series = a sequence of time ordered observations of some system

Limits of conventional correlation analysis:

1. based on deterministic system

2. does not quantify information, e.g. surprise

Mutual Information

• Suppose one system stochastically generates events:

• Probability of events:

– Observed p(xn) , e.g. 01101111

– New p(xn+1) , e.g. 011011110

– MI is not effective at predicting future events from current data,

because it is symmetric, M(X,Y)=M(Y,X)

– MI does not indicate in which direction information flows

n nn

nnnn

xpxp

xxpxxpMI

1

11

,log),(

Conditional Entropy • Suppose one systems stochastically generate events:

• Probability of events:

– Observed p(xn) , e.g. 01101111

– New p(xn+1) , e.g. 011011110

Define entropy rate for one of the systems

– This provides an information rate

– But still works for one system only

– This works, however only for one system

– Relative Entropy cannot be used to measure information flow

between two systems, e.g. two brain regions

n

nnnnnn xxxxpxxH |log),(| 1111

Transfer Entropy • Suppose two connected systems stochastically generate

events:

• Probability of events:

– Observed p(xn) , e.g. 01101111

– Observed p(yn) , e.g. 10010000

– New p(xn+1) , e.g. 011011110

– New p(yn+1) , e.g. 100100000

Define entropy rate for one of the systems

Entropy rate between the systems, e.g. xy

– This works, however only for one system

– Relative Entropy cannot be used to measure information flow

between two systems, e.g. two brain regions

n

nnnnnnnn yxxyxxpxxH ,|log),,(| 1111

n nn

nnnnnnnnyx

xxp

yxxpyxxpxxH

|

,|log),,(|

1

111

2 4 6 8 10

23

45

67

Index

H13

[1:1

0]

0 L

Block Entropy H(L):

Uncertainty in identifying

fluctuation patterns.

Quantification of Uncertainty in fMRI using Entropy

A

B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 1400.2

0.40.6

0.8Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

Pr(

sL )

sliding block (L)

Probability distribution over block length L

H(L)

Bits of information

Summary Metric Description

Entropy Uncertainty associated with

observations

Mutual Information Mutual dependence of random

observations, i.e. the joined entropy

Conditional entropy Uncertainty associated with multiple

sets of observations, when one of them

is known

Relative entropy

aka Kullback-Leibler divergence

Difference between two distributions

Transfer Entropy Rate of information flow from one time

series to another

Block Entropy Rate of information flow across sets of

symbols