CS448f: Image Processing For Photography and Vision
description
Transcript of CS448f: Image Processing For Photography and Vision
![Page 1: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/1.jpg)
CS448f: Image Processing For Photography and Vision
Lecture 2
![Page 2: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/2.jpg)
Today:
• More about ImageStack• Sampling and Reconstruction• Assignment 1
![Page 3: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/3.jpg)
ImageStack
• A collection of image processing routines
• Each routine bundled into an Operation class– void help()– void parse(vector<string> args)– Image apply(Window im, … some parameters …)
![Page 4: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/4.jpg)
ImageStack Types
• Window:– A 4D volume of floats, with each scanline contiguous in
memory.
class Window {int frames, width, height, channels;float *operator()(int t, int x, int y);void sample(float t, float x, float y, float *result)
};
![Page 5: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/5.jpg)
ImageStack Types
• Image:– A subclass of Window that is completely contiguous
in memory– Manages its own memory via reference counting (so
you can make cheap copies)
class Image : public Window {Image copy();
};
![Page 6: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/6.jpg)
Image and Windows
Image
Window Regular Upcasting
Window(Window) new reference to the same dataWindow(Window, int, int ...) Selecting a subregion
Image(Image) new reference to same dataImage.copy() copies the data
Image(Window) copies data into new Image
![Page 7: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/7.jpg)
4 Way to Use ImageStack
• Command line• As a library• By extending it• By modifying it
![Page 8: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/8.jpg)
Fun things you can do with ImageStack
• ImageStack –help• ImageStack –load input.jpg –save output.png• ImageStack –load input.jpg –display• ImageStack –load a.jpg –load b.jpg –add –save c.jpg• ImageStack –load a.jpg –loop 10 – –scale 1.5 –display• ImageStack –load a.jpg –eval “(val > 0.5) ? val : 0”• ImageStack –load a.jpg –resample width/2 height/2• ... all sorts of other stuff
![Page 9: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/9.jpg)
Where to get it:
• The course website• http://cs448f.stanford.edu/imagestack.html
![Page 10: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/10.jpg)
float *operator()(int t, int x, int y)
void sample(float t, float x, float y, float *result);
Sampling and Reconstruction
![Page 11: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/11.jpg)
Why resample?
• Making an image larger:
![Page 12: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/12.jpg)
Why resample?
• Making an image smaller:
![Page 13: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/13.jpg)
Why resample?
• Rotating an image:
![Page 14: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/14.jpg)
Why resample?
• Warping an image (useful for 3D graphics):
![Page 15: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/15.jpg)
Enlarging images
• We need an interpolation scheme to make up the new pixel values
• (applet)• Interpolation = Convolution
![Page 16: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/16.jpg)
![Page 17: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/17.jpg)
![Page 18: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/18.jpg)
![Page 19: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/19.jpg)
![Page 20: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/20.jpg)
What makes an interpolation good?
• Well… let’s look at the difference between the one that looks nice and the one that looks bad…
![Page 21: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/21.jpg)
![Page 22: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/22.jpg)
Fourier Space
• An image is a vector• The Fourier transform is a change of basis– i.e. an orthogonal transform
• Each Fourier basis vector is something like this:
![Page 23: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/23.jpg)
Fourier Space
• The Fourier transform expresses an image as a sum of waves of different frequencies
• This is useful, because our artifacts are confined to high frequencies
• In fact, we probably don’t want ANY frequencies that high in our output – isn’t that what it means to be smooth?
![Page 24: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/24.jpg)
Deconstructing Sampling
• We get our output by making a grid of spikes that take on the input values s:
S[0]S[1] S[2] S[3] S[4]
S[5]
S[6]
S[7] S[8]S[9]
![Page 25: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/25.jpg)
Deconstructing Sampling
• Then evaluating some filter f at each output location x:
x
S[0]S[1] S[2] S[3] S[4]
S[5]
S[6]
S[7] S[8]S[9]
for (i = 1; i < 7; i++) output[x] += f(x-i)*s[i];
![Page 26: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/26.jpg)
Alternatively
• Start with the spikes
S[0]S[1] S[2] S[3] S[4]
S[5]
S[6]
S[7] S[8]S[9]
![Page 27: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/27.jpg)
Alternatively
• Convolve with the filter f
S[0]S[1] S[2] S[3] S[4]
S[5]
S[6]
S[7] S[8]S[9]
![Page 28: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/28.jpg)
Alternatively
• And evaluate the result at x
S[0]S[1] S[2] S[3] S[4]
S[5]
S[6]
S[7] S[8]S[9]
xfor (i = 1; i < 7; i++) output[x] += s[i]*f(i-x);
![Page 29: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/29.jpg)
They’re the same
• Method 1:
• Method 2:
• f is symmetric, so f(x-i) = f(i-x)for (i = 1; i < 7; i++) output[x] += f(x-i)*s[i];
for (i = 1; i < 7; i++) output[x] += s[i]*f(i-x);
![Page 30: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/30.jpg)
Start with the (unknown) nice smooth desired result R
![Page 31: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/31.jpg)
Multiply by an impulse train T
![Page 32: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/32.jpg)
Now you have the known sampled signal R.T
![Page 33: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/33.jpg)
Convolve with your filter fNow you have (R.T)*f
![Page 34: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/34.jpg)
And get your desired result RR = (R.T)*f
![Page 35: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/35.jpg)
Therefore
• Let’s pick f to make (R.T)*f = R• In other words, convolution by f should undo
multiplication by T• Also, we know R is smooth– has no high frequencies
![Page 36: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/36.jpg)
Meanwhile, in Fourier space…
• Let’s pick f’ to make (R’*T’).f = R’• In other words, multiplication by f’ should
undo convolution by T’• Also, we know R’ is zero above some point– has no high frequencies
![Page 37: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/37.jpg)
T vs T’
• Turns out, the Fourier transform of an impulse train is another impulse train (with the inverse spacing)
• R’*T’:
![Page 38: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/38.jpg)
T vs T’
• All we need to do is pick an f’ that gets rid of the extra copies:
• (R’*T’).f’:
![Page 39: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/39.jpg)
A good f’
• Preserves all the frequencies we care about• Discards the rest• Allows us to resample as many times as we
like without losing information• (((((R’*T’).f’)*T’.f’)*T’.f’)*T’.f’) = R’
![Page 40: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/40.jpg)
How do our contenders match up?
Linear
![Page 41: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/41.jpg)
How do our contenders match up?
Cubic
![Page 42: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/42.jpg)
How do our contenders match up?
Lanczos 3 = sinc(x)*sinc(x/3)
![Page 43: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/43.jpg)
How do our contenders match up?
Linear
![Page 44: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/44.jpg)
How do our contenders match up?
Cubic
![Page 45: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/45.jpg)
How do our contenders match up?
Lanczos 3
![Page 46: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/46.jpg)
Lanczos 3
![Page 47: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/47.jpg)
Sinc - The perfect result?
![Page 48: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/48.jpg)
A good f’
• Should throw out high-frequency junk• Should maintain the low frequencies• Should not introduce ringing• Should be fast to evaluate• Lanczos is a pretty good compromise• Window::sample(...);• Window::sampleLinear(...);
![Page 49: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/49.jpg)
Inverse Warping
• If I want to transform an image by some rotation R, then at each output pixel x, place a filter at R-1(x) over the input.
• In general warping is done by– Computing the inverse of the desired warp– For every pixel in the output• Sample the input at the inverse warped location
![Page 50: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/50.jpg)
Forward Warping (splatting)
• Some warps are hard to invert, so...• Add an extra weight channel to the output• For every pixel x in the input– Compute the location y in the output– For each pixel under the footprint of the filter• Compute the filter value w• Add (w.r w.g w.b w) to the value stored at y
• Do a pass through the output, dividing the first n channels by the last channel
![Page 51: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/51.jpg)
Be careful sizing the filter
• If you want to enlarge an image, the filter should be sized according to the input grid
• If you want to shrink an image, the filter should be sized according to the output grid of pixels– Think of it as enlarging an image in reverse– You don’t want to keep ALL the frequencies when
shrinking an image, in fact, you’re trying to throw most of them out
![Page 52: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/52.jpg)
![Page 53: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/53.jpg)
Rotation
• Ok, let’s use the lanczos filter I love so much to rotate an image:
![Page 54: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/54.jpg)
Original
![Page 55: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/55.jpg)
Rotated by 30 degrees 12 times
![Page 56: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/56.jpg)
Rotated by 10 degrees 36 times
![Page 57: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/57.jpg)
Rotated by 5 degrees 72 times
![Page 58: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/58.jpg)
Rotated by 1 degree 360 times
![Page 59: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/59.jpg)
What went wrong?
![Page 60: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/60.jpg)
Your mission: Make rotate better
• Make it accurate and fast• First we’ll check it’s plausible:ImageStack -load a.jpg -rotate <something> -display
• Then we’ll time it:ImageStack -load a.jpg -time --loop 360 ---rotate 1
• Then we’ll see how accurate it is:for ((i=0;i<360;i++)); do
ImageStack -load im.png -rotate 1 -save im.pngdoneImageStack -load orig.png -crop width/4 height/4 width/2
height/2 -load im.png -crop width/4 height/4 width/2 height/2 -subtract -rms
![Page 61: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/61.jpg)
Targets:
• RMS must be < 0.07• Speed must be at least as fast as -rotate
• My solution has RMS ~ 0.05• Speed ~ 50% faster than -rotate (No SSE)
• Prizes for the fastest algorithm that meets the RMS requirement, most accurate algorithm that meets the speed requirement
![Page 62: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/62.jpg)
Grade:
• 20% for having a clean readable algorithm• 20% for correctness• 20% for being faster than -rotate• 40% for being more accurate than -rotate
![Page 63: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/63.jpg)
Due:
• Email your modified Geometry.cpp (and whatever other files you modified) in a zip file to us by midnight on Thu Oct 1– [email protected]
![Page 64: CS448f: Image Processing For Photography and Vision](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816954550346895de0feb5/html5/thumbnails/64.jpg)
Finally, Check out this paper:
• Image Upsampling via Imposed Edge Statistics• http://www.cs.huji.ac.il/~raananf/projects/upsampling/upsampling.html