Statistics of natural images

1

Statistics of natural images

May 30, 2010Ofer BartalAlon Faktor

2

Outline

• Motivation• Classical statistical models• New MRF model approach• Learning the models• Applications and results

3

Motivation

• Big variance in appearance • Can we even dream of modeling this?

4

Motivation

• Main questions:– Do all natural images obey some common

“rules”?– How can one find these “rules”?– How to use “rules” for computer vision

tasks?

5

Motivation

• Why bother to model at all?

• “Noise”, uncertainty

• Model helps choose the “best” possible answer

• Lets see some examples

Natural image model

6

Noise-blur removal

• Consider the classical De-convolution problem

• Can be formulated as linear set of equations:

?Y h X N X

cs cs csy Hx n

H Y+X

N

7

X

=

YNh

X̂

?

Noise-blur removal

8

Inpainting

Y AX n

Y

?X

1 0 0 ... 00 1 0 0 ... 00 0 0 0 0 1 0 0 ... 00 0 0 0 0 0 1 0 ... 00 0 .... 0 1

A

Missing lines of identity matrix = missing pixels (under-determined system)

9

Motivation

• Problems: – Unknown noise– H may be singular (Deconvolution)– H may be under-determined (Inpainting)

• So there can be many solutions. • How can we find the “right” one?

10

Motivation

• Goal: Estimate x– Assume:

• Prior model of natural image:• Prior model of noise:

– Use MAP estimator to find x:

* arg max ( | ) arg max ( | ) ( )x x

x P x y P y x P x

( )xP x

* arg max ( ) ( )n xx

x P y Hx P x

( )nP n

11

Energy Minimization problem

• The MAP problem can be reformulated as:

data term( | )+prior term( ) ˆ arg min

x

E y x xx E

E

x

13

Classical models

• Smoothness prior (model of image gradients) – Gaussian prior (LS problem)– L1 Prior and sparse prior (IRLS problem)

Image gradient

14

Gaussian Priors

• Assume:

– Gaussian priors on gradients of x:

– Gaussian noise:• Using this assumption:

2

221( )2

x

p x e

* arg min 2T T

xx x Tx x b

* arg max ( ) ( )n xx

x P y Hx P x

2~ (0, )n N

15

Non-Gaussian Priors

• Empirical results: image gradients have a Non-Gaussian heavy tailed distribution

• We assume L1 or sparse prior• We solve it by IRLS –iterative re-weighted LS

16

De-convolution Results

Gaussian prior Sparse priorBlurred image

Good results on simple images

17

De-noising Results

De-noising resultNoisy image

Poor results on real natural images

18

Classical models – Pro’s and Con’s

• Advantages:– Simple and easy to implement

• Disadvantages:– Too Heuristic– Only one property - Smoothness– Bias towards totally smooth images:

P P

19

Going Beyond Classical Models

0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

number of similar patches (in log10 scale)

prob

abilt

y

0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


prob

abilt

y

0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09


prob

abilt

y

20

Modern Approach

• Model is based on image properties• Choose properties using image dataset

• Questions:1. What types of properties?

Responses to linear filters.2. How to find good properties?

Either pre-determined bank or learn from data.3. How should combine properties to one distribution?

We will see how.

21

Mathematical framework

• Want: A model p(I) of real distribution f(I).• Computationally hard:

– A 100x100 pixel image has 10,000 variables• Can explicitly model only a few dimensions at a time

Arrow = viewpoint of few dimensions

22


• A viewpoint is a response to a linear filter• A distribution over these responses is a

marginal of real distribution f(I)• (Marginal = Distribution over a subset of variables)

Arrow = marginal of f(I)

23


• If p(I) and f(I) have the same marginal distributions of linear filters then p(I)=f(I) (proposition by Zhu and Mumford)

• “Hope”: If we will choose K “good” filters then p(I) and f(I) will be “close”.

How do we measure “close?”

24

Distance between distributions

• Kullback-Leibler divergence:

• Problem - f(I) unknown• Proposition - use instead:

• Measures fit of model to observations

( ), ( ; , ) log ( ) log ( ; , )f fKL f I p I S E f I E p I S

~

( ), ( ; , ) log ( ; , ) log ( ; , )P X

KL f I p I S p I S p I S

( ; , )p I S X

25

Illustration

log ( ; , )p I S

log ( ; , )p I S

~KL

~

( ), ( ; , ) log ( ; , ) log ( ; , )P X

KL f I p I S p I S p I S

26

Getting synthesized images

• Get synthesized images by sampling the learned model

• Sample using Markov Chain Monte Carlo (MCMC).

• Drawback: Learning process is slow

xp

27

Our model P(I) – A MRF

• MRF = Markov Random Field• A MRF is based on a graph G=(V,E).

V – pixels E – between pixels that affect each other

• Our distribution is the MRF:

( )1( ) exp ( )c c

c Cliques

p I U IZ

28

Simple grid MRF

• Here, cliques are edges• Every pixel belongs to 4 cliques

29

MRF

• We limit ourselves to:

– Cliques of fixed size (over-lapping patches)

– Same for all cliques

• We get:

( ) ( )( ) ( )

1

( ) ( )K

Tc cU I F I

( ) ( )( )

1 1

1( ) exp ( )C K

Tc

c

p I F IZ

U

30

MRF simulation

( ) ( )( )

1 1

1( ) exp ( )C K

Tc

c

p I F IZ

31

Histogram simulation

( ) ( )obsnH I

Histogram of a marginal

32

MRF

• In terms of convolutions:

• Denote: Set of potential functions:

• Denote: Set of filters:

( )

1...K

( )

1...KS F

( ) ( )

1 ( , )

,1( ; , )

K

x y

F I x y

p I S eZ

33

MRF - A simple example

• Cliques of size 1• Pixels are i.i.d and distributed by grayscale histogram

grayscale histogram

Drawback: cliques are too small

34

MRF - Another simple example

• Clique = whole image• Result: Uniform distribution on images in dataset

Px

Drawback: cliques are too big

37

Revisiting classical models

• Actually, the classical model is a pairwise MRF:

• Has cliques of size 2:

• Has only 2 linear filters => 2 marginals

• No guarantee that p(I) will be close to f(I)

( , )( ( , )) ( ( , ))1( ) x yx y

I x y I x yp I e

Z

39

Zhu and Mumford’s approach (1997)

• We want to find K “good” filters• Strategy:

– Start off with a bank B of possible filters– Choose subset that minimizes the

distance between p(I) and f(I)– For computational reasons, choose filters one by

one using a greedy method

, | |S B S K

41

Choosing the next filter

• AIG = the difference between the model p(I) and the data from the viewpoint of marginal

• AIF = the difference in between different images in dataset from the viewpoint of marginal

( ) ( ) ( )IC AIG AIF

( ) ( )( ; , )

1

( ) ( )

1

1( ) ( ) ( )21( ) ( )

2obs

Mobsn P I S

n

Mobsn

n

AIG H I E H IM

AIF H IM

42

Algorithm – Filter selection

Bank of filters

IC

IC

IC arg max

max

Model ( )learn

44

Learning the potentials

( ) Model

( )IC

Calculate update

Init

(Using maximum entropy on P)

45

The bank of filters

• Filter types: – Intensity filter (1X1)– Isotropic filters - Laplacian of Gaussian (LG, )– Directional filters - Gabor (Gcos, Gsin)

• Computation in different scales - image pyramid

Laplacian of Gaussian Gabor

46

Running example of algorithmExperiment I

Use only small filters

47

Results

All learned potentials have a diffusive nature

( ) ( )( )

1 1

1( ) exp ( )C K

Tc

c

p I F IZ

48

Running example of algorithmExperiment II

• Only gradient filters, in different scales• Small filters -> diffusive potential (as expected)• Surprisingly: Large filters -> reactive potentials

Diffusive Reactive

50

Examples of the synthesized images

Experiment I Experiment II

This image is more “natural” because it has some regions with sharp boundaries

51

Outline

• We have seen:– MRF models – Selection of filters from a bank – Learning potentials

• Now:– Data-driven filters – Analytic results for simple potentials– Making sense in results– Applications

52

Roth and Black’s approach

filters potentials

Chosen from bank Learn a-parametricallyX XLearn from data Learn parametrically

Learn together

53

Motivation – model of natural patches

• Why learn filters from data?• Inspiration from models of natural patches:

– Sparse coding– Component analysis– Product of experts

54

Motivation – Sparse Coding of patches

• Goal: find a set s.t.

•

• Learn from database of natural patches

• Only few filters should fire on a given patch

1

, are sparseN

i i ii

patch a F a

iF

,i ia patch F

iF

1 2 3 4 5

55

Motivation – Component analysis

• Learn by component analysis:– PCA– ICA

• Results in “filters like” components– PCA – first components look like contrast filters– ICA - components look like Gabor filters

iF

56

PCA results

high

low

57

ICA results

• Independent filters • Can derive model for patches:

1

( ) ( )n

Ti i

i

P x p F x

TiF x

ip

58

Motivation – Product of experts

• More sophisticated model for natural patches:

• Training of MLE => “intuitive” filters:

2

1

1( ; ) ( ; ), , ( ) 1( 2, )

i

i i i

KT

POE i sti ii

F F zzZ F

p X X

texturecontrast

59

• extension of POE to FOE:

Field of experts (FOE)

( )1 1

1( ; ) exp ( ; ) ,,( , )

C KT

FOE i iii

i ii

cc

F Fp I IFZ

log( )st

( )1 1

( ; )iC K

TFOE i c

c i

E F I

Roth S., Black M. J., Fields of experts IJCV, 2009

60

The experts

• Student-t experts2

( ) 12

i

izz

( )st z

61

Meaning of

• Higher means:– Punishes high responses more severely – A filter with higher weight

( )st z

1

2

( )

1

1( ; , ) exp ( , )

g log 12

K

FOE i i i iii i

TCi c

ic

p I F gZ F

F I

Learning the model

log ( ; , )p I S

log ( ; , )p I S

~KL

Model

1

2K

MCMCinit

random

65

Results of learning FOE

Filters aren’t “intuitive”

F

67

So far…

filters potentials

Chosen from bank Learned a-parametrically

diffusive reactive

Small filters Large filters

non-intuitive?

68

So far…

filters potentials

Learned from database Learned parametrically

non-intuitive?

69

What now?• Revisiting POE and FOE with Gaussian

potentials• Relation to non-Gaussian potentials• Making sense of previous results

Weiss Y., Freeman W. T. What makes a good model of natural images?. CVPR, 2007

70

Gaussian POE

2

2

1

2

1

* 2

1

1; exp ( )( )

ln ; ( ) ln ( )

arg min ( ) ln ( )i

z

KT

GPOE i iii

KT

GPOE i i ii

KT

i i iML F i

e

p x F F xZ F

p x F F x Z F

F F x Z F

71

• Claim: Z is constant for any set of K orthonormal vectors

•

• This has an analytic solution – the K minor components of the data

Gaussian POE

* 2

1

arg min ( )i

KT

i iF orthonormal i

F F x

72

• Non-intuitive high-frequency filters• Reminder - PCA

ResultsExample of learned filters

high

low

73

Gaussian FOE

2

2( )

1 1

2 2( )

1 1 1 ,

2 2 2

1

2

1

1( ;{ }) exp ( )({ })

( ) ( * ) ( , )

{ }( ) { }( ) ( ) { }( )

ln ({ }) ln { }( ) ln ( )

K CT

GFOE i i ci ci

K C KTi c i

i c i x y

K

ii

K

i ii

z

p I F F IZ F

F I F I x y

F I G I

Z F F G

74

Gaussian FOE

* 2

( )1 1

2*

( )

*2

arg min ( ) ln ( )

( ) arg min ( ) { }( ) ln ( )

1( ){ }( )

i

K CT

i i c iML F i c

MLG

ML

F F I Z F

G G I G

GI

75

Gaussian FOE

• satisfies:

=> Optimal filters have high frequencies

2*2

1

1{ }( ){ }( )

K

ii

FI

2*

1

{ }( )K

ii

F

2{ }( )I

*iF

76

• Non-Gaussian potentials -> modeled by GSM

• Properties of GFOE hold for GSM

Gaussian Scale Mixture (GSM)

77

Revisiting FOE

• Student t expert – fit GSM• Filters have the property of

Natural image Roth and Black filters

22

1

1{ }( ){ }( )

K

ii

FI

high-frequency filters

78

Learning FOE with fixed filters

Algorithm prefers high-frequency filters

79

Conclusion

• For Gaussian potentials and GSM’s:learning => High frequency filters

• Experimental evidence to this phenomena • Maybe there is a “logic” behind this non-intuitive

result?

80

Making Sense of results

• Criterion for “good” filters for patches – Rarely fire on natural images and fire frequently on all other images

Patches from Natural images

Histogram of filter responses

White noise

81

Making Sense of results

• An image was modeled by what you don’t expect to find in it

• This is satisfied by the classical prior of smooth gradients

• But why limit ourselves to intuitive filters?• Maybe non-intuitive filters can do better…

82

reactivediffusive

White noise


Revisiting diffusive and reactive potentials

White noise


83

Inference

• We learned a model• We can use it for inference problems

– Corrupted information– Missing information

• Exact inference – Loopy BP • Approximate inference - gradient based

optimization

84

Belief Propagation

• Observed data is incorporated to model byiy i

ix

iy

85

Belief Propagation

Message passing Algorithm

• Exact only on tree MRFs • Efficient only on pairwise MRFs

86

Alternative by Roth and Black

• Reminder:

• Approximate inference by gradient-based optimization :

• Advantage: Low computational cost• Drawback: only local minimum if not convex

= argmin ( ( | )) ( ( ))MAPI

I Log P I I Log P I

Uncertainty \Noise model Learned model

( 1) ( ) , ( , )t tII I I I E I I

87

Partition function

=> No need to estimate partition function

• We get:

( , ) 1

arg min ( ( | )) * ( , );n

i iI x y i

Log P I I F I x y

( , ) 1

argmin ( ( | )) log( ( , )) * ( , );n

i i i iI x y i

Log P I I Z F F I x y

X

(Doesn’t depend on )

I

88

The gradient step

( ) ( )

( , ) 1 1

* ( , ); * '( * ; )n N

i iI i i i

x y i i

F I x y F F I

( )iF( )iF

• How to derivate the second term?• By a mathematical “trick” we get:

89

• Assume Gaussian noise

• So the Gradient step is:

De-noising

2

2

( | ) ( )1( ( | )) ( )

2

nP I I P I I

Log P I I I I

( ) ( )2

1

1 ( ) * '( * ; )N

i ii

i

I I I F F I

90

Results

91

Results

92

Results

Original Noisy(20.29dB)

FOE(28.72dB)

Poritilla (Wavelets)(28.9dB)

Non-local means

(28.21dB)

StandardNon-Linear diffusion (27.18dB)

State of the art

Generalprior

93

Results on Berkeley databaseWiener filter

Non-Linear diffusion FOE

Poritilla1Poritilla2

Out

put P

SNR

Low noise

High noise

Input PSNRLow noise

High noise

Input PSNR

94

How many 3x3 filters to take?

Number of filters

Size of filter – 3X3Performance start saturating when we reach 8 filters

95

Dependence on size and shape of clique

What is the best filter?

97

Inpainting - Reminder

Y AX n

Y X

Problem: pixels outside mask can change

Solution: constraint them

Inpainting

• Assume pixels outside mask M don’t change

• So the gradient step is: ( ) ( )

1

* '( * ; )N

i ii

i

I M F F I

Advanced Topics In Computer

Vision CourseSpring 2010

Advanced Topics In Computer

Vision CourseSpring 2010

0-1 Mask Image we want to inpaint98

99

Results

100

Results

101

ResultsFOE Bertalmio

FOE Bertalmio

PSNR 29.06dB 27.56dB

SSIM 0.9371 0.9167

102

Pro’s and Con’s

• Perform well on narrow straws or small holes (even if they cover most of the image)

• Isn’t able to fill large holes• Isn’t designed to handle textures

103

Thank you for Listening…

Statistics of natural images

Documents

Transcript of Statistics of natural images