CNN FOR LICENSE PLATE MOTION DEBLURRINGihradis/CNN-Deblur/ICIP_poster.pdf · Faculty of Information...

1
CNN FOR LICENSE PLATE MOTION DEBLURRING Pavel Svoboda, Michal Hradiš, Lukáš Maršík, Pavel Zemčík Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic This research is supported by the ARTEMIS joint undertaking under grant agreement no 621439 (ALMARVI). This work was supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project ”IT4Innovations National Supercomputing Center – LM- 2015070”. Enhancement CNN c=3 c=128 Conv.+ReLU 19 x 19 3 x 3 x 128 Image 3 x 3 x 128 c=128 Conv.+ReLU 3 x 3 x 128 c=128 Conv.+ReLU 3 x 3 x 128 c=128 Conv.+ReLU Image 1. Image restoration It is relatively easy to generate articially degraded images. 2. Training data 3. Architecture 4. Training The same as for text deblurring (Hradiš et al.; 2015) • 15 convolution+ReLU layers (last is linear) • Inputs and outputs are RGB images • Outputs MAP estimates of the original images • No padding, crops 25px borders • 2M weights (9 MB), 2Mops per pixel • Processes 1Mpx image in ~4s on GTX 780 • One LP in ~0.2s trained on arti cial data for speci c viewpoint and image content Degradation - linear blur and noise Inverse problem - MAP estimate? Not convex, too simple image priors. How to handle saturation, non-uniform blur, compression, non-linearities, non-gaussian noise? Our approach (Who needs k?) • Stochastic gradient descent with momentum • L2 loss as objective function • Initialization with uniform distribution • 400K iterations • Minibatch 54 crops 66x66 (output 16x16) • 3 days on GTX 980 We generate data for a speci c camera - blur lengths and orietations, license plate orientations and scales. It is easy to include complex degradations including non-linear sharpenning lters and compression. Original Blind L0 CNN-L15 1 1 x 1 1 x 1 1 x 1 1 x 1 1 x 1 3 x 3 3 x 3 5 x 5 5 x 5 7 x 7 7 x 7 5 x 5 5 x 5 1 x 1 19 x 19 128 320 320 320 128 128 512 256 64 128 128 128 128 128 1 3 5 7 9 11 13 15 17 19 12 17 22 27 32 5 9 13 17 Test data length [pixel] PSNR [dB] Blur lenght Two static cameras Blur directions 37°-57° and 59°-79° Results Motion blur range (lenght-direction) • 721 images, longer exposure (6-12ms) • Manually annotated LP characters • Evaluating OCR accuracy • Existing LP-speci c OCR engine • Baseline blind L0-regularized deconvolution (Pan et al.; 2014) • Blur orientation range 50° Evaluation on real images LP crops 264x128px 140K sharp training/validation images 0 10 20 30 40 50 60 70 80 90 16 18 20 22 24 26 28 30 10 20 40 60 90 130 180 Test data direction [degree] PSNR [dB] Blur direction CNN Input CNN Input CNN Input CNN Input 9 11 13 15 17 19 21 23 0.6 0.7 0.8 0.9 1 CNN-L15 Blind L0 Original Blind L0 per license plate Trained models range [pixel] Accuracy Real images OCR Error improved from 37% to 9% • L0 deconvolution only imporoved character error down to 23% 5 15 25 35 45 55 65 75 85 95 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 JPEG compression OCR accuracy CNN Input CNN Input CNN Input Q 5 Q 25 Q 95 Networks were trained and tested for di erent blur lengths Networks were trained and tested for di erent blur orientation ranges We tested a network trained for motion blur on images with additional JPEG compression to asses robustness to additional degradations. The network was able to maintain OCR accuracy down to quality 25. Download from www.t.vutbr.cz/~ihradis/CNN-Deblur Robustness to JPEG compression References • Michal Hradiš, Jan Kotera, Pavel Zemčík, and Filip Šroubek, “Convolutional neural networks for direct text deblurring,” in BMVC, 2015. • Jinshan Pan, Zhe Hu, Zhixun Su, and Ming-Hsuan Yang, “Deblurring Text Images via L0-Regularized Intensity and Gradient Prior,” in CVPR 2014. We explore direct blind deconvolution with convolutional neural networks on images from a real-life trac surveillance system, where the blur kernels are partially constrained. The neural networks trained on articial generated data provide superior reconstruction quality compared to traditional blind deconvolution methods. Custom CNNs can be easily trained for other speci c applications or camera congurations (image content and blur sizes and types). Train c=3 c=128 Conv.+ReLU 19 x 19 3 x 3 x 128 Image 3 x 3 x 128 c=128 Conv.+ReLU 3 x 3 x 128 c=128 Conv.+ReLU 3 x 3 x 128 c=128 Conv.+ReLU Image Sharp Blurred Blur+noise

Transcript of CNN FOR LICENSE PLATE MOTION DEBLURRINGihradis/CNN-Deblur/ICIP_poster.pdf · Faculty of Information...

Page 1: CNN FOR LICENSE PLATE MOTION DEBLURRINGihradis/CNN-Deblur/ICIP_poster.pdf · Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic This research is

CNN FOR LICENSE PLATE MOTION DEBLURRINGPavel Svoboda, Michal Hradiš, Lukáš Maršík, Pavel ZemčíkFaculty of Information Technology, Brno University of Technology, Brno, Czech Republic

This research is supported by the ARTEMIS joint undertaking under grant agreement no 621439 (ALMARVI). This work was supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project ”IT4Innovations National Supercomputing Center – LM- 2015070”.

Enhancement CNN

c=3

c=128

Conv.+ReLU

19 x 193 x 3 x 128

Image

3 x 3 x 128

c=128

Conv.+ReLU

3 x 3 x 128

c=128

Conv.+ReLU

3 x 3 x 128

c=128

Conv.+ReLU Image

1. Image restoration

It is relatively easy to generate artificially degraded images.

2. Training data 3. Architecture 4. Training

The same as for text deblurring (Hradiš et al.; 2015) • 15 convolution+ReLU layers (last is linear)• Inputs and outputs are RGB images• Outputs MAP estimates of the original images• No padding, crops 25px borders • 2M weights (9 MB), 2Mflops per pixel• Processes 1Mpx image in ~4s on GTX 780 • One LP in ~0.2s

trained on artificial data for specific viewpoint and image content

Degradation - linear blur and noise

Inverse problem - MAP estimate?

Not convex, too simple image priors.

How to handle saturation, non-uniform blur, compression, non-linearities, non-gaussian noise?

Our approach (Who needs k?)

• Stochastic gradient descent with momentum• L2 loss as objective function• Initialization with uniform distribution• 400K iterations• Minibatch 54 crops 66x66 (output 16x16)• 3 days on GTX 980

We generate data for a specific camera - blur lengths and orietations, license plate orientations and scales.

It is easy to include complex degradations including non-linear sharpenning filters and compression.

Original

Blind L0

CNN-L15

1

1 x 1 1 x 1 1 x 1 1 x 1 1 x 13 x 3 3 x 3

5 x 5 5 x 5 7 x 7 7 x 75 x 5 5 x 5

1 x 1

19 x 19

128 320 320 320 128 128 512 256 64128 128 128 128 128 1

3 5 7 9 11 13 15 17 19

12

17

22

27

32

591317

Test data length [pixel]

PSN

R [

dB

]

Blur

leng

ht

Two static cameras Blur directions 37°-57° and 59°-79°

ResultsMotion blur range (lenght-direction)

• 721 images, longer exposure (6-12ms)• Manually annotated LP characters• Evaluating OCR accuracy • Existing LP-specific OCR engine• Baseline blind L0-regularized deconvolution (Pan et al.; 2014)• Blur orientation range 50°

Evaluation on real images

LP crops 264x128px140K sharp training/validation images

0 10 20 30 40 50 60 70 80 90

16

18

20

22

24

26

28

30

1020406090130180

Test data direction [degree]

PSN

R [

dB

]

Blur

dire

ctio

n

CNNInput CNNInput CNNInput CNNInput

9 11 13 15 17 19 21 23

0.6

0.7

0.8

0.9

1

CNN-L15Blind L0OriginalBlind L0 per license plate

Trained models range [pixel]

Acc

ura

cy

Real

imag

es O

CR

• Error improved from 37% to 9%• L0 deconvolution only imporoved character error down to 23%

5 15 25 35 45 55 65 75 85 950.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

JPEG compression

OC

R a

ccur

acy

CNNInput CNNInput CNNInput

Q 5

Q 25

Q 95

Networks were trained and tested for different blur lengths

Networks were trained and tested for different blur orientation ranges

We tested a network trained for motion blur on images with additional JPEG compression to asses robustness to additional degradations. The network was able to maintain OCR accuracy down to quality 25.

Download fromwww.fit.vutbr.cz/~ihradis/CNN-Deblur

Robustness to JPEG compression References

• Michal Hradiš, Jan Kotera, Pavel Zemčík, and Filip Šroubek, “Convolutional neural networks for direct text deblurring,” in BMVC, 2015.• Jinshan Pan, Zhe Hu, Zhixun Su, and Ming-Hsuan Yang, “Deblurring Text Images via L0-Regularized Intensity and Gradient Prior,” in CVPR 2014.

We explore direct blind deconvolution with convolutional neural networks on images from a real-life traffic surveillance system, where the blur kernels are partially constrained. The neural networks trained on artificial generated data provide superior reconstruction quality compared to traditional blind deconvolution methods. Custom CNNs can be easily trained for other specific applications or camera configurations (image content and blur sizes and types).

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

Train

c=3

c=128

Conv.+ReLU

19 x 193 x 3 x 128

Image

3 x 3 x 128

c=128

Conv.+ReLU

3 x 3 x 128

c=128

Conv.+ReLU

3 x 3 x 128

c=128

Conv.+ReLU Image

SharpBlurred

Blur+noise