Recent progress in computational photography using deep ...

Fin

d m

ore

Pow

erP

oin

t te

mpla

tes

on p

rezentr

.com

!

Recent progress in computational photography using deep learning

Greg Slabaugh

1 Oct 2020

Note: This presentation contains a slide with flashing imagery.

AI, AI, and more AI

What is AI?

Merriam-Webster

“A branch of computer science dealing with the simulation of intelligent behavior in computers.”

English Oxford Living Dictionary

“The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”

AI… coming to (or already in) a device near you

My journey in AI

Siemens Corporate Research

Medicsight

City, University of London

Huawei

Strong vs weak AI

Strong AI

• Consciousness

• Ability to make judgements, plan, communicate,

self-awareness

• Also known as Artificial General Intelligence

(AGI)

Weak AI

• Focuses on a specific task

• No self-awareness

The AI taxonomy (according to Greg)

AI

Weak

Strong

Machine Learning

Other

Deep

Traditional

CNN

Other (DBN)Supervised

Unsupervised

Reinforcement

What is machine learning?

• Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn

without being explicitly programmed.

Models that are learned from data

10

• Labelled data• Learn a mapping between inputs and outputs • Example: face detection

• Dynamic environment• Computer gets feedback and learns to

“win”• Example: ML playing Atari 2600 games

• No labels• Computer groups similar data to discover hidden patterns• Example: “People who bought X also bought Y”

Supervised

Unsupervised Reinforcement

Learning

Neural networks

Learning

Going deep

AlexNet (2012)

• AlexNet, a type of Convolutional Neural Network (CNN) won the ImageNet challenge by a large margin

(15.4% error, compared to 26.2%). This precipitated a swell of interest in Deep Learning

techniques. AlexNet learns how to represent images using abstracted features extracted from

learned filters.

Deep learning is a class of machine learning algorithms that use multiple layers of nonlinear

processing for feature extraction and transformation. Each successive layer uses the output from

the previous layer as input.

In deep learning, features are learned, rather than engineered. This is also known as representation

learning, as the network learns representations of the data customised to the task.

Traditional machine learning

Deep learning

Representation learning

Key components

1. Convolution. This filters an image. The

weights for the filter are learned.

2. ReLU. This applies a non-linear transformation

to the data. This way, the CNN and find a non-

linear mapping between the inputs and outputs.

3. Pooling. This combines adjacent pixels in a

filtered output. This results in abstraction. The

CNN learns more “high level” features (e.g.

face, instead of edges).

Common operations

4. Dense (fully connected) layers. These layers connect

all inputs to all outputs through weights. In doing so, they

lose spatial information but look at the data holistically.

5. Skip connections. Using a skip connection, data (feature

maps) are passed over parts of a network. This helps in

back-propagating gradients.

7. Down/Upsampling. This increases the resolution of a

feature map or image.

6. Batch normalisation. Batch normalisation applies

normalisation at hidden layers. It takes the output of the

previous layer and subtracts the batch mean and divides by

the standard deviation. Denormalisation is applied using

learned weights.

In computer vision, one typically sees convolutional neural networks (CNNs) applied to images. Convolution

is well suited to take advantage of spatially correlated data common to images. One may see recurrent

architectures for temporal data (e.g. videos).

A deep network can be characterised by:

• The architecture, which describes the layers of processing that transform inputs to outputs. A CNN that

outputs a label is a classifier, and one that outputs a continuous variable is a regressor.

• The loss, which is a mathematical representation of the error produced by the network. During training,

weights are adjusted by back-propagating gradients through the network to minimise the loss.

• The training, including the optimisation strategy and data used.

UNet

Characterising a CNN

Deep learning frameworks

Making it easy…

# Import libraries and modulesimport numpy as npnp.random.seed(123) # for reproducibility

from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activation, Flattenfrom keras.layers import Convolution2D, MaxPooling2Dfrom keras.utils import np_utilsfrom keras.datasets import mnist

# Load pre-shuffled MNIST data into train and test sets(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess input dataX_train = X_train.reshape(X_train.shape[0], 1, 28, 28)X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)X_train = X_train.astype('float32')X_test = X_test.astype('float32')X_train /= 255X_test /= 255

# Preprocess class labelsY_train = np_utils.to_categorical(y_train, 10)Y_test = np_utils.to_categorical(y_test, 10)

# Define model architecturemodel = Sequential()

model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,28,28)))model.add(Convolution2D(32, 3, 3, activation='relu'))model.add(MaxPooling2D(pool_size=(2,2)))model.add(Dropout(0.25))

model.add(Flatten())model.add(Dense(128, activation='relu'))model.add(Dropout(0.5))model.add(Dense(10, activation='softmax'))

# Compile modelmodel.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit model on training datamodel.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)

# Evaluate model on test datascore = model.evaluate(X_test, Y_test, verbose=0)

Why is deep learning so… trendy?

Recently there has been a surge of (research, commercial) interest in Deep Learning

1. Large datasets (e.g. ImageNet)

2. New algorithms, toolkits (e.g. TensorFlow, PyTorch) and available code (GitHub)

3. Graphics Processing Units (GPUs)

NVidia GeForce RTX 2080Ti with 4352 cores

CNNs in computer vision

Semantic segmentation

Object detection

Image classification

Super-resolutionPose estimation

Image restoration

Computational photography

24

• Computational photography uses digital computation, rather than optical processes, in the capture

and processing of images.

• This can be done to

◦ Improve image quality

◦ Reduce cost

◦ Reduce size of camera elements

• More broadly, one can also consider image processing effects

Why is this interesting?

25https://www.dxomark.com/smartphones-vs-cameras-closing-the-gap-on-image-quality/

Google Pixel 3Sony a7R III

https://www.dxomark.com/smartphones-vs-cameras-closing-the-gap-on-image-quality/

We’re taking a lot of photos…

https://focus.mylio.com/tech-today/how-many-photos-will-be-taken-in-2020

https://focus.mylio.com/tech-today/how-many-photos-will-be-taken-in-2020

Huawei P40Pro+, features

28

29

A traditional ISP contains a large number of stages of image processing algorithms to transform the raw data acquired by

the image sensor into a high quality JPG image. An example simplified traditional ISP is shown below. An ISP is normally

implemented in a specialized ASIC.

Can one use Deep Learning in the ISP?

RAW

Hardware

Optics / Sensors

Bla

ck le

vel c

orr

ect

ion

Raw

no

ise

re

du

ctio

n

Au

to w

hit

e b

alan

ce

De

mo

saic

ing

Cam

era

co

lor

mat

rix

Dyn

amic

ran

ge

com

pre

ssio

n

Gam

ma

corr

ect

ion

Ton

e m

app

ing

RG

B d

en

ois

ing

Shar

pe

nin

g

De

vign

ett

e

JPG

Traditional ISP pipeline

Automatic White Balance (AWB)

30

• AWB, or colour constancy, applies a colour correction to an image, to make the image appear as

if it were taken under an achromatic light source.

• This is achieved by estimating the illumination in the scene, and then compensating for it.

Deep learning for AWB

31

• Regression problem: given an uncorrected image,

estimate (and apply) the colour correction

• Bianco et al., “Color Constancy Using CNNs,” CVPRW 2015

Ill-posed problem

32

• The problem is ill-posed. Given a single image where the scene and the illumination are unknown,

multiple solutions a possible.

• Who remembers “The dress” from 2015?

Multi-hypothesis approach

33

1. Create a set of N candidate illuminants

2. Correct the image with each candidate, forming N hypothesized corrected images

3. Classify each corrected image on how well it is white balanced – producing a weight for each image

4. Produce a weighted average solution

5. Apply correction


34





5. Apply correction


35





5. Apply correction


36





5. Apply correction


37





5. Apply correction


38





5. Apply correction

Results

39

Cube dataset: 1707 images

captured with Canon 550D camera

Advantages / disadvantages of this approach

40

Advantages

• Classifier solves a camera-agnostic question (how well white balanced is the image)?

• Scene illuminants can be combined across cameras

• Can apply the method in a training-free way to new cameras

• State-of-the-art performance

Disadvantages

• Requires inference N times. However, the images can be very small.

• Assumes a single illuminant. Future work: handle multi-illuminant case.

Moire patterns

41

• Moire patterns occur when two patterns interfere with each other

• Aliasing results from high frequencies masquerading as low frequencies

• Moire patterns are sensitive to movement!

https://en.wikipedia.org/wiki/File:Moir%C3%A9.gif

https://steemit.com/art/@ztwin/moire-gifs-

https://en.wikipedia.org/wiki/File:Moir%C3%A9.gif

https://steemit.com/art/@ztwin/moire-gifs-

Moire in digital photography

42

• In digital photography, Moire patterns degrade image quality.

• Why does this happen? A camera sensor samples incoming light on a set

of pixels. Frequencies above the Nyquist limit cannot be captured properly

by the sensor, resulting in aliasing.

◦ In scenario 1, the subpixel layout of the LCD elements produces

uncapturable frequencies

◦ In scenario 2, the scene itself contains very high frequencies

Scenario 1: Photography of digital screensScenario 2: Photography of high

frequency patterns

Demoire

43

• The demoire problem seeks to remove the Moire corruption.

• This is challenging as Moire patterns have a widely varying appearance including different frequency

components. Wavelet decomposition: differences

WDNet

44

• Wavelet DemoireNet (WDNet) is a CNN that transforms an image to the wavelet domain where it is

processed using two branches:

◦ Dense branch is based on DenseNet and models fine details

◦ Dilation branch uses dilated convolutions to look at the data more coarsely

DenseNet? Dilated convolution?

45

• DenseNet is composed of denseblocks.

• Layers are densely connected through residuals.

• Each layer receives in input all previous outputs.

• Dilated convolution skips points by some rate.

• This increases receptive field

• The output looks at the data more globally.

Results

46

Results

47

Ablation study: Importance of wavelet processing

48

Image enhancement using curve layers

50

• Photoshop / Lightroom allows users to adjust global image properties through the use of curves

Can we build a neural network do this automatically?

Example: adjusting brightness

CURL

51

• We recently introduced neural CURve Layers (CURL) which learns and applies curve adjustments to

an image. CURL has the following features:

◦ Curves are piecewise linear

◦ Curves can flexibly map different image attributes (brightness, saturation, colour)

◦ Different colour spaces (RGB, HSV, LAB) supported

◦ Fully differentiable and trained end-to-end

◦ Predicted curves are intuitive and can be user adjusted

◦ State-of-the-art performance

CURL methodology

52

• Architecture

• Loss

Results

53

Deep learning limitations

• Typically requires large datasets

• Methods described in this talk also require labelled data

• Algorithms are complex

• Slow to train (but fast at test time)

• Difficult to interpret results (Explainable AI)

• Black-box

• Biologically inspired, but don’t capture the biological mechanisms of the brain

• Limited theoretical understanding

• Hyper-parameters

Deep learning and AI

Hype, or hope?

References

56

1. U-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf Ronneberger, Philipp Fischer, Thomas Brox, MICCAI 2015

2. Densely Connected Convolutional Networks, Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger, CVPR 2017

3. Multi-Scale Context Aggregation by Dilated Convolutions, Fisher Yu, Vladlen Koltun ICLR 2016.

4. A Multi-Hypothesis Approach to Color Constancy, Daniel Hernandez-Juarez, Sarah Parisot, Benjamin Busam, Ales Leonardis, Gregory Slabaugh, Steven McDonagh, CVPR 2020.

5. Wavelet-Based Dual-Branch Neural Network for Image Demoireing, Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, Qi Tian, ECCV 2020.

6. CURL: Neural Curve Layers for Global Image Enhancement, Sean Moran, Steven McDonagh, Greg Slabaugh, Submitted to ICPR 2020

Contact: [email protected]

https://arxiv.org/pdf/1505.04597.pdf

https://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf

https://arxiv.org/abs/1511.07122

http://openaccess.thecvf.com/content_CVPR_2020/papers/Hernandez-Juarez_A_Multi-Hypothesis_Approach_to_Color_Constancy_CVPR_2020_paper.pdf

https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123580086.pdf

https://arxiv.org/pdf/1911.13175.pdf

mailto:[email protected]

Recent progress in computational photography using deep ...

Documents

Transcript of Recent progress in computational photography using deep ...