Recent progress in computational photography using deep ...
Transcript of Recent progress in computational photography using deep ...
Fin
d m
ore
Pow
erP
oin
t te
mpla
tes
on p
rezentr
.com
!
Recent progress in computational photography using deep learning
Greg Slabaugh
1 Oct 2020
Note: This presentation contains a slide with flashing imagery.
AI, AI, and more AI
3
What is AI?
Merriam-Webster
“A branch of computer science dealing with the simulation of intelligent behavior in computers.”
English Oxford Living Dictionary
“The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”
AI… coming to (or already in) a device near you
My journey in AI
Siemens Corporate Research
Medicsight
City, University of London
Huawei
Strong vs weak AI
Strong AI
• Consciousness
• Ability to make judgements, plan, communicate,
self-awareness
• Also known as Artificial General Intelligence
(AGI)
Weak AI
• Focuses on a specific task
• No self-awareness
The AI taxonomy (according to Greg)
AI
Weak
Strong
Machine Learning
Other
Deep
Traditional
CNN
Other (DBN)Supervised
Unsupervised
Reinforcement
The AI taxonomy (according to Greg)
AI
Weak
Strong
Machine Learning
Other
Deep
Traditional
CNN
Other (DBN)Supervised
Unsupervised
Reinforcement
What is machine learning?
• Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn
without being explicitly programmed.
Models that are learned from data
10
• Labelled data• Learn a mapping between inputs and outputs • Example: face detection
• Dynamic environment• Computer gets feedback and learns to
“win”• Example: ML playing Atari 2600 games
• No labels• Computer groups similar data to discover hidden patterns• Example: “People who bought X also bought Y”
Supervised
Unsupervised Reinforcement
Learning
Neural networks
Learning
Going deep
AlexNet (2012)
• AlexNet, a type of Convolutional Neural Network (CNN) won the ImageNet challenge by a large margin
(15.4% error, compared to 26.2%). This precipitated a swell of interest in Deep Learning
techniques. AlexNet learns how to represent images using abstracted features extracted from
learned filters.
Deep learning is a class of machine learning algorithms that use multiple layers of nonlinear
processing for feature extraction and transformation. Each successive layer uses the output from
the previous layer as input.
In deep learning, features are learned, rather than engineered. This is also known as representation
learning, as the network learns representations of the data customised to the task.
Traditional machine learning
Deep learning
Representation learning
Key components
1. Convolution. This filters an image. The
weights for the filter are learned.
2. ReLU. This applies a non-linear transformation
to the data. This way, the CNN and find a non-
linear mapping between the inputs and outputs.
3. Pooling. This combines adjacent pixels in a
filtered output. This results in abstraction. The
CNN learns more “high level” features (e.g.
face, instead of edges).
Common operations
4. Dense (fully connected) layers. These layers connect
all inputs to all outputs through weights. In doing so, they
lose spatial information but look at the data holistically.
5. Skip connections. Using a skip connection, data (feature
maps) are passed over parts of a network. This helps in
back-propagating gradients.
7. Down/Upsampling. This increases the resolution of a
feature map or image.
6. Batch normalisation. Batch normalisation applies
normalisation at hidden layers. It takes the output of the
previous layer and subtracts the batch mean and divides by
the standard deviation. Denormalisation is applied using
learned weights.
In computer vision, one typically sees convolutional neural networks (CNNs) applied to images. Convolution
is well suited to take advantage of spatially correlated data common to images. One may see recurrent
architectures for temporal data (e.g. videos).
A deep network can be characterised by:
• The architecture, which describes the layers of processing that transform inputs to outputs. A CNN that
outputs a label is a classifier, and one that outputs a continuous variable is a regressor.
• The loss, which is a mathematical representation of the error produced by the network. During training,
weights are adjusted by back-propagating gradients through the network to minimise the loss.
• The training, including the optimisation strategy and data used.
UNet
Characterising a CNN
Deep learning frameworks
Making it easy…
# Import libraries and modulesimport numpy as npnp.random.seed(123) # for reproducibility
from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activation, Flattenfrom keras.layers import Convolution2D, MaxPooling2Dfrom keras.utils import np_utilsfrom keras.datasets import mnist
# Load pre-shuffled MNIST data into train and test sets(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Preprocess input dataX_train = X_train.reshape(X_train.shape[0], 1, 28, 28)X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)X_train = X_train.astype('float32')X_test = X_test.astype('float32')X_train /= 255X_test /= 255
# Preprocess class labelsY_train = np_utils.to_categorical(y_train, 10)Y_test = np_utils.to_categorical(y_test, 10)
# Define model architecturemodel = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,28,28)))model.add(Convolution2D(32, 3, 3, activation='relu'))model.add(MaxPooling2D(pool_size=(2,2)))model.add(Dropout(0.25))
model.add(Flatten())model.add(Dense(128, activation='relu'))model.add(Dropout(0.5))model.add(Dense(10, activation='softmax'))
# Compile modelmodel.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit model on training datamodel.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)
# Evaluate model on test datascore = model.evaluate(X_test, Y_test, verbose=0)
Why is deep learning so… trendy?
Recently there has been a surge of (research, commercial) interest in Deep Learning
1. Large datasets (e.g. ImageNet)
2. New algorithms, toolkits (e.g. TensorFlow, PyTorch) and available code (GitHub)
3. Graphics Processing Units (GPUs)
NVidia GeForce RTX 2080Ti with 4352 cores
CNNs in computer vision
Semantic segmentation
Object detection
Image classification
Super-resolutionPose estimation
Image restoration
Computational photography
24
• Computational photography uses digital computation, rather than optical processes, in the capture
and processing of images.
• This can be done to
◦ Improve image quality
◦ Reduce cost
◦ Reduce size of camera elements
• More broadly, one can also consider image processing effects
Why is this interesting?
25https://www.dxomark.com/smartphones-vs-cameras-closing-the-gap-on-image-quality/
Google Pixel 3Sony a7R III
We’re taking a lot of photos…
https://focus.mylio.com/tech-today/how-many-photos-will-be-taken-in-2020
Huawei P40Pro+, features
28
29
A traditional ISP contains a large number of stages of image processing algorithms to transform the raw data acquired by
the image sensor into a high quality JPG image. An example simplified traditional ISP is shown below. An ISP is normally
implemented in a specialized ASIC.
Can one use Deep Learning in the ISP?
RAW
Hardware
Optics / Sensors
Bla
ck le
vel c
orr
ect
ion
Raw
no
ise
re
du
ctio
n
Au
to w
hit
e b
alan
ce
De
mo
saic
ing
Cam
era
co
lor
mat
rix
Dyn
amic
ran
ge
com
pre
ssio
n
Gam
ma
corr
ect
ion
Ton
e m
app
ing
RG
B d
en
ois
ing
Shar
pe
nin
g
De
vign
ett
e
JPG
Traditional ISP pipeline
Automatic White Balance (AWB)
30
• AWB, or colour constancy, applies a colour correction to an image, to make the image appear as
if it were taken under an achromatic light source.
• This is achieved by estimating the illumination in the scene, and then compensating for it.
Deep learning for AWB
31
• Regression problem: given an uncorrected image,
estimate (and apply) the colour correction
• Bianco et al., “Color Constancy Using CNNs,” CVPRW 2015
Ill-posed problem
32
• The problem is ill-posed. Given a single image where the scene and the illumination are unknown,
multiple solutions a possible.
• Who remembers “The dress” from 2015?
Multi-hypothesis approach
33
1. Create a set of N candidate illuminants
2. Correct the image with each candidate, forming N hypothesized corrected images
3. Classify each corrected image on how well it is white balanced – producing a weight for each image
4. Produce a weighted average solution
5. Apply correction
Multi-hypothesis approach
34
1. Create a set of N candidate illuminants
2. Correct the image with each candidate, forming N hypothesized corrected images
3. Classify each corrected image on how well it is white balanced – producing a weight for each image
4. Produce a weighted average solution
5. Apply correction
Multi-hypothesis approach
35
1. Create a set of N candidate illuminants
2. Correct the image with each candidate, forming N hypothesized corrected images
3. Classify each corrected image on how well it is white balanced – producing a weight for each image
4. Produce a weighted average solution
5. Apply correction
Multi-hypothesis approach
36
1. Create a set of N candidate illuminants
2. Correct the image with each candidate, forming N hypothesized corrected images
3. Classify each corrected image on how well it is white balanced – producing a weight for each image
4. Produce a weighted average solution
5. Apply correction
Multi-hypothesis approach
37
1. Create a set of N candidate illuminants
2. Correct the image with each candidate, forming N hypothesized corrected images
3. Classify each corrected image on how well it is white balanced – producing a weight for each image
4. Produce a weighted average solution
5. Apply correction
Multi-hypothesis approach
38
1. Create a set of N candidate illuminants
2. Correct the image with each candidate, forming N hypothesized corrected images
3. Classify each corrected image on how well it is white balanced – producing a weight for each image
4. Produce a weighted average solution
5. Apply correction
Results
39
Cube dataset: 1707 images
captured with Canon 550D camera
Advantages / disadvantages of this approach
40
Advantages
• Classifier solves a camera-agnostic question (how well white balanced is the image)?
• Scene illuminants can be combined across cameras
• Can apply the method in a training-free way to new cameras
• State-of-the-art performance
Disadvantages
• Requires inference N times. However, the images can be very small.
• Assumes a single illuminant. Future work: handle multi-illuminant case.
Moire patterns
41
• Moire patterns occur when two patterns interfere with each other
• Aliasing results from high frequencies masquerading as low frequencies
• Moire patterns are sensitive to movement!
https://en.wikipedia.org/wiki/File:Moir%C3%A9.gif
https://steemit.com/art/@ztwin/moire-gifs-
Moire in digital photography
42
• In digital photography, Moire patterns degrade image quality.
• Why does this happen? A camera sensor samples incoming light on a set
of pixels. Frequencies above the Nyquist limit cannot be captured properly
by the sensor, resulting in aliasing.
◦ In scenario 1, the subpixel layout of the LCD elements produces
uncapturable frequencies
◦ In scenario 2, the scene itself contains very high frequencies
Scenario 1: Photography of digital screensScenario 2: Photography of high
frequency patterns
Demoire
43
• The demoire problem seeks to remove the Moire corruption.
• This is challenging as Moire patterns have a widely varying appearance including different frequency
components. Wavelet decomposition: differences
WDNet
44
• Wavelet DemoireNet (WDNet) is a CNN that transforms an image to the wavelet domain where it is
processed using two branches:
◦ Dense branch is based on DenseNet and models fine details
◦ Dilation branch uses dilated convolutions to look at the data more coarsely
DenseNet? Dilated convolution?
45
• DenseNet is composed of denseblocks.
• Layers are densely connected through residuals.
• Each layer receives in input all previous outputs.
• Dilated convolution skips points by some rate.
• This increases receptive field
• The output looks at the data more globally.
Results
46
Results
47
Ablation study: Importance of wavelet processing
48
Image enhancement using curve layers
50
• Photoshop / Lightroom allows users to adjust global image properties through the use of curves
Can we build a neural network do this automatically?
Example: adjusting brightness
CURL
51
• We recently introduced neural CURve Layers (CURL) which learns and applies curve adjustments to
an image. CURL has the following features:
◦ Curves are piecewise linear
◦ Curves can flexibly map different image attributes (brightness, saturation, colour)
◦ Different colour spaces (RGB, HSV, LAB) supported
◦ Fully differentiable and trained end-to-end
◦ Predicted curves are intuitive and can be user adjusted
◦ State-of-the-art performance
CURL methodology
52
• Architecture
• Loss
Results
53
Deep learning limitations
• Typically requires large datasets
• Methods described in this talk also require labelled data
• Algorithms are complex
• Slow to train (but fast at test time)
• Difficult to interpret results (Explainable AI)
• Black-box
• Biologically inspired, but don’t capture the biological mechanisms of the brain
• Limited theoretical understanding
• Hyper-parameters
Deep learning and AI
Hype, or hope?
References
56
1. U-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf Ronneberger, Philipp Fischer, Thomas Brox, MICCAI 2015
2. Densely Connected Convolutional Networks, Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger, CVPR 2017
3. Multi-Scale Context Aggregation by Dilated Convolutions, Fisher Yu, Vladlen Koltun ICLR 2016.
4. A Multi-Hypothesis Approach to Color Constancy, Daniel Hernandez-Juarez, Sarah Parisot, Benjamin Busam, Ales Leonardis, Gregory Slabaugh, Steven McDonagh, CVPR 2020.
5. Wavelet-Based Dual-Branch Neural Network for Image Demoireing, Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, Qi Tian, ECCV 2020.
6. CURL: Neural Curve Layers for Global Image Enhancement, Sean Moran, Steven McDonagh, Greg Slabaugh, Submitted to ICPR 2020
Contact: [email protected]