Information Theory in Medical Imaging · •Founded information theory with one landmark paper in...

47
Information Theory in Medical Imaging Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases [email protected]

Transcript of Information Theory in Medical Imaging · •Founded information theory with one landmark paper in...

Page 1: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Information Theory in Medical

Imaging

Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases

[email protected]

Page 2: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Objectives

• Learn how to quantify information

• Understand the use of information theory

in imaging

Page 3: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Topics

I. Measures of information

II. Applications to Imaging

I. Co-registration

II. Feature selection

III. Model-free analysis

III. Time Series Analysis (e.g. BOLD, EEG)

I. Functional connectivity

II. Signal complexity and unpredictability

Page 4: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Information and Uncertainty

Calculations are based on the assumption that symbols have equal chance of appearance

log(1/3) log(3)

log(1/ 2) log(2)

log(3) log(2) log(6) Both combined:

Random Generator Which char comes

next?

Change in

Uncertainty

log(1) 0

Page 5: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

More On Uncertainty Three symbols, but one appears infrequent

Which char

comes next?

M symbols = 3

Code length = 9

Odds:

p(A) = 3/9

p(B) = 4/9

p(@) = 2/9

Weighted log-odds,

e.g. of A: 3/8*log (3/8)

2

2

1

1

log

: log

1

Shannon

i i

M

i i

i

M

i

i

p p

Many H p p

p

Entropy

Page 6: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Example: Entropy Of mRNA

…ACGTAACACCGCACCTG

2

1 1 1 1; ; ;

2 4 8 8

.. log : 1; 2; 3; 3

1 1 1 1( ) 1 2 3 3 1.75

2 4 8 8

A C G T

i A C G T

p p p p

Logodds p p p p p

Entropy bits H

Assume “A” occurs 50% of all instances, “C” = 25%, “G” and “T”, each = 12.5%

Page 7: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Concept Of Entropy

•Shannon Entropy formulated by Claude Shannon

•American mathematician

•Founded information theory with one landmark paper in 1948

•Worked with Alan Turin on cryptography during WWII

•History of Entropy

•1854 – Rudolf Clausius, German physicist, developed theory of heat

•1876 – Williard Gibbs, American physicist, used it for theory of energy

•1877 – Ludwig Boltzmann, Austrian physicist, formulated theory of thermodynamics

•1879 – Gibbs re-formulated entropy in terms of statistical mechanics

•1948 - Shannon

Claude Shannon

1916-2001

Page 8: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Three Interpretations of Entropy

• The uncertainty in the outcome of an event

– Systems with frequent event have lower entropy than

systems with infrequent events.

• The amount of surprise (information)

– Observing an infrequent event is a greater surprise, i.e.

carries more information and thus has higher entropy, than

a frequent event.

• The dispersion in the probability distribution

– An uniform image has a less disperse histogram and thus

lower entropy than a heterogeneous image.

Page 9: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Various Formulations Of Entropy

Names

Shannon

Renyi

Aczel

Aczel

Aczel

Varma

Varma

Kapur

Hadra

Arimoto

Sharma

Sharma

Page 10: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Various Formulations Of Entropy II

Names

Taneja

Sharma

Sharma

Ferreri

Sant'Anna

Sant'Anna

Picard

Picard

Picard

Picard

Picard

Page 11: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Entropy In Medical Imaging

H= 6.0 bits

Entropy Map Histogram

N=1

Averaging

H= 4.0 N=40

Histograms

image brain only

Co-variance map

H= 3.9

H= 10.1

Page 12: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Effect Of Noise On Entropy

Entropy

Increasing Noise

Entr

opy

Map

SNR=20

Signal-to-Noise

Level

SNR=2

SNR=0.2

Page 13: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Uses Of Information Theory In

Imaging

• Entropy measures of mixed distributions

• Applications:

– image registration

– Quantification of image quality

– Model-free analysis

Page 14: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Image Registration

• Define a transform T that maps one image onto

another image such that some measure of

overlap is maximized .

– Use entropy as a means for determining max. overlap

CT MRI CT

MR

I

Page 15: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Entropy In Image Registration • Define joint probability distribution of

images as metric for the degree of image registration:

• Computation: – Use a 2-D histogram with each axis

designing the frequency of intensity values in each image

– Build the 2D histogram by incrementing each cell each time there is a intensity “co-occurrence” between the two images

– For registered images, we expect the 2D histogram is highly focused. Mis-registration will lead to greater dispersion of the 2D histogram

– Since entropy measures the degree of histogram dispersion, we can use entropy for quantifying image registration

– Formally, we minimize the joint entropy between images A and B

Joint 2D

histogram

A B

A

B

Page 16: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Joint Entropy

Joint entropy (entropy of mixed probability distributions):

Definition:

• H(X,Y) = H(X) + H(X) – MI(X:Y)

Properties:

• H(X,Y)=H(X)+H(Y), i.e. MI(X;Y) = 0, only if X and Y are completely independent, .

• Image registration can be guided by minimizing H(X,Y)

),(log),(),( iii

Xx Yy

i yxpyxpYXHi i

Venn Diagram

M

Page 17: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Example Joint Entropy of 2-D Histogram for rotation of image with respect

to itself of 0, 2, 5, and 10 degrees

Page 18: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy
Page 19: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy
Page 20: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Entire Rotation Series

Rotation (degrees) 0 2 5 10

H(X,Y): 5.5 7.1 9.6 14.1

Page 21: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Mutual Information

Definition:

MI(X,Y) = H(Y) - H(Y|X) = H(X) - H(X|Y)

This is the amount by which the uncertainty of X is reduced if Y is known or vice versa.

MI(X,Y) = H(X) + H(Y) - H(X,Y)

maximizing MI(X,Y) is equivalent to minimizing joint entropy H(X,Y)

Properties:

• MI = 0 indicates independence

• Image registration can be guided by maximizing MI

Venn Diagram

M

,

( , )( , ) ( , ) log

( ) ( )i i

i ii i

x X y Y i i

p x yMI X Y p x y

p x p y

Page 22: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Other Properties of Mutual

Information • MI is symmetric: MI (X,Y) = MI(Y,X)

• MI(X,X) = H(X)

• MI(X,Y) <= H(X) or H(Y)

– Information that each image contains about the other image

cannot be greater than the total information in each image.

• MI(X,Y) >= 0

– Cannot increase uncertainty in X by knowing Y

• MI(X,Y) = 0 only if X and Y are independent

M

Page 23: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Mutual Information vs. Joint Entropy

• Mutual information has the advantage over joint entropy that entropy of each

distribution is included separately.

Venn Diagram

M

Page 24: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Conditional Entropy

Let X be given. What can be

gained from measuring Y?

Yy

iii

Xx

i

ii

xypypxpXYH |log)|(

Yy ii

iii

ixyp

xpyxpXYH

,

)(log,)|(

Definition:

• H(Y|X) = H(X,Y) – H(X)

• H(X|Y) = H(X,Y) – H(Y)

Page 25: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Properties Of Conditional Entropy

• Quantifies the information gain from Y when X is given

• Note: H(X|Y) is not equal H(Y|X)

• H(X|Y) = 0 only if X and Y are identical distributions; otherwise

H(X|Y) and H(Y|X) > 0

Page 26: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Relative Entropy

A non-symmetric measure of the difference between distributions p(X) and

q(X).

• D(P||Q} is also known as Kullback-Leibler divergence Dkl

• Since D(P||Q| is non-symmetric it is not a true metric

• But a symmetric definition exists:

• sKL = ½*[ D(P||Q) + D(Q||P)]

i i

ii

yq

xpxpQPD log)(||

Page 27: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Uses Of Relative Entropy

How much noisier is image A compared to image B?

Map Histograms Divergence of B from A

Increasing noise level

sK

L

A

B

*sKL stands for symmetrical relative entropy, aka Kullback-Leibler

divergence

Page 28: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Uses Of Relative Entropy

Detect movements & image corruptions

Series Number

sK

L

#1 #2 ……… #5 #6 ………… #13

Page 29: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Model-Free Analysis With Relative

Entropy

A

B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 140

0.20.4

0.60.8

Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

time

a t

b t

c t

Example: connectivity patterns in resting-state fMRI

Page 30: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

fMRI Analysis Of Functional

Connectivity

1( , )

1

M

i i

i

a a b b

corr A BM std a std b

A B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 140

0.20.4

0.60.8

Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

time

a t

b t

c t

Individual correlation matrices

PD patient Control

Brain regions

Bra

in r

egio

ns

Bra

in r

egio

ns

Brain regions

Page 31: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

A Model-Free Analysis

Conventional (model-based): compare correlation matrices between

patients and controls region by region, e.g. using t-tests

Alternative (model-free): compare overall structure of correlation matrices between

patients and controls using relative entropy

PD patient Control

Brain regions

Bra

in r

egio

ns

Bra

in r

egio

ns

Brain regions

Page 32: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Transformation Into Eigen-Space

PD patient Control

regions

re

gio

ns

regions

re

gio

ns

Eig

enve

cto

rs

Eigenvectors

Eig

enve

cto

rs

Eigenvectors

Eigenspace representation

e.g. principle components

analysis

Page 33: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Eigenvalue Distribution

Each trajectory

represents a subject.

Red = Parkinson’s;

blue = control

Ranking

Magnitude O

f E

igenvalu

e

Page 34: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Relative Entropy Of Eigenvalue

Distributions

Each point represents

the relative entropy of

a subject’s

eigenvalues relative to

the eigenvalues of

uniform connectivity.

Red = Parkinson’s;

blue = control

Page 35: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Image Feature Selection

Using Information Theory

Objective

Learn how to use information theory for

determining imaging features

Page 36: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Mutual Information based Feature

Selection Method

• We test the ability to separate two classes

(features).

• Let X be the feature vector, e.g. co-

occurances of intensities

• Y is the classification, e.g. particular

intensities

• How to maximize the separability between

the classification?

Page 37: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Mutual Information based Feature

Selection • MI tests a feature’s ability to separate two classes.

– Based on definition 3) for mutual information

– Here X is the feature vector and Y is the classification

• Note that X is continuous while Y is discrete

– By minimizing MI(X,Y), we maximize the separability of the

feature

• Note this method only tests each feature individually

( , )( , ) ( , ) log

( ) ( )i i

i ii i

x X y Y i i

p x yMI X Y p x y

p x p y

Page 38: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Joint Mutual Information based

Feature Selection Method

• Joint MI tests a feature’s independence from all other

features:

• Two implementations:

– 1) Compute all individual MI.s and sort from high to low

– 2) Test the joint MI of current feature while keeping all others

• Keep the features with the lowest JMI (implies

independence)

• Implement by selecting features that maximize:

Page 39: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Information Theory Of Time

Series Signals

Objective

Learn how to use information theory for

quantifying functional brain signals

Page 40: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

How much information carry FMRI fluctuations?

A

B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 140

0.20.4

0.60.8

Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

time

a t

b t

c t

Time-series = a sequence of time ordered observations of some system

Limits of conventional correlation analysis:

1. based on deterministic system

2. does not quantify information, e.g. surprise

Page 41: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Mutual Information

• Suppose one system stochastically generates events:

• Probability of events:

– Observed p(xn) , e.g. 01101111

– New p(xn+1) , e.g. 011011110

– MI is not effective at predicting future events from current data,

because it is symmetric, M(X,Y)=M(Y,X)

– MI does not indicate in which direction information flows

n nn

nnnn

xpxp

xxpxxpMI

1

11

,log),(

Page 42: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Conditional Entropy • Suppose one systems stochastically generate events:

• Probability of events:

– Observed p(xn) , e.g. 01101111

– New p(xn+1) , e.g. 011011110

Define entropy rate for one of the systems

– This provides an information rate

– But still works for one system only

– This works, however only for one system

– Relative Entropy cannot be used to measure information flow

between two systems, e.g. two brain regions

n

nnnnnn xxxxpxxH |log),(| 1111

Page 43: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Transfer Entropy • Suppose two connected systems stochastically generate

events:

• Probability of events:

– Observed p(xn) , e.g. 01101111

– Observed p(yn) , e.g. 10010000

– New p(xn+1) , e.g. 011011110

– New p(yn+1) , e.g. 100100000

Define entropy rate for one of the systems

Entropy rate between the systems, e.g. xy

– This works, however only for one system

– Relative Entropy cannot be used to measure information flow

between two systems, e.g. two brain regions

n

nnnnnnnn yxxyxxpxxH ,|log),,(| 1111

n nn

nnnnnnnnyx

xxp

yxxpyxxpxxH

|

,|log),,(|

1

111

Page 46: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

2 4 6 8 10

23

45

67

Index

H13

[1:1

0]

0 L

Block Entropy H(L):

Uncertainty in identifying

fluctuation patterns.

Quantification of Uncertainty in fMRI using Entropy

A

B

C

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

0 20 40 60 80 100 120 1400.2

0.40.6

0.8Index

x

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Index

x

Pr(

sL )

sliding block (L)

Probability distribution over block length L

H(L)

Bits of information

Page 47: Information Theory in Medical Imaging · •Founded information theory with one landmark paper in 1948 •Worked with Alan Turin on cryptography during WWII •History of Entropy

Summary Metric Description

Entropy Uncertainty associated with

observations

Mutual Information Mutual dependence of random

observations, i.e. the joined entropy

Conditional entropy Uncertainty associated with multiple

sets of observations, when one of them

is known

Relative entropy

aka Kullback-Leibler divergence

Difference between two distributions

Transfer Entropy Rate of information flow from one time

series to another

Block Entropy Rate of information flow across sets of

symbols