A Compressive Sensing Approach To Object-based Surveillance Video Coding

A Compressive Sensing Approach To Object-based Surveillance Video Coding

Divya Venkatraman and Anamitra MakurICASSP 2009

2

Introduction Analysis of Residual Object Error Proposed CS Based Video Coding

◦ Direct CS◦ DCT based CS◦ Wavelet based CS

Comparitive Study Conclusions

Outline

3

Compressive Sensing aims to measure sparse and compressible signals at close to their intrinsic information rate than Nyquist rate

CS for indoor surveillance video coding◦ To efficiently compress the long hours of video

Introduction

4

Compressive Sensing◦ is original signal◦ is K-sparse under ◦ ,

measurement matrix ◦ Reconstruct signal by linear program

IntroductionNx R

, x S S N N

M Ny x S My R

1ˆ arg min s.t.

NS RS S y S

5

Identification of objects in motion through segmentation is essential for surveillance and in such a video codec◦ An arbitrary shaped moving object is described

using two features: texture and shape

The aim of this work is to explore CS on motion-compensated object error ◦ Represent the texture information for arbitrary-

shaped objects

Introduction

6

Shadow-less object segmentation is achieved using frame ratio pixels, edge maps and morphological operation based correction [7]

Analysis of Residual Object Error

[7] D. Venkatraman and A. Makur, “Shadow-less Segmentation of Moving Humans from Surveillance video”, 10th Intl Conf Control Automtn. Robotics Vision (ICARCV), pp. 1317-1322, Dec 2008.

7

Flow chart [7]


8

Object-based motion estimation using the sum of squared differences (SSD)

◦ : motion vector of j th object◦ W : search window◦ Mj : binary object mask


2

1arg minSSDj j n j n j

u W j

u M f u x f x

SSDju

9

◦ : current reconstructed frame◦ : background image

The motion compensation object error for each object j in nth frame


1ˆ SSDn bg j j j n j j

j

f I M M f u x

n̂fbgI

ˆj j n nM f f

10

Two distinct characteristics of the object error are inherent sparsity and arbitrary shape

The object error has sparse representation◦ Object error with significant values only along the

boundary of the object


11

The few variations of the proposed CS based video coding framework

Proposed CS Based Video Coding

12

The number of measurementsis a sharp threshold for successful reconstruction

The sparsity ratio of object error array varies across video frames in a sequence and also between objects in a single frame

The number of measurements is calculated as

◦ where is termed as measurement index


~ 2 log /M K N K

/N K

max , log /M rK rK N Kr

13

The random measurement vector has Gaussian density with the distribution peaking at mean zero ◦ because of the sparse nature of the object error

signal In a fixed camera indoor gray-level

surveillance video the background frame is constant except for occasional illumination changes

CAVIER [12] averaged over 100 frames and with multiple objects in a frame


y

[12] “Caviar: Context aware vision using image-based active recognition,” http://homepages.inf.ed.ac.uk/rbf/CAVIAR/.

14

For robust scenario, the background may be statistically modeled and shared between the encoder and decoder periodically

The dimension of each object is approximately 2% of the total frame area

Object-PSNR of the segmented mask instead of the frame PSNR◦ PSNR from the background area is not relevant in

surveillance video compression performance


15

The CS measurements may be quantized using uniform or non-uniform quantizer

Table 1 compares the bitrate-vs-object-PSNR for uniform quantizer (Δ=8) and optimum Lloyd-Max non-uniform Gaussian quantizer [10] for quantization level L=8


[10] S.P. Lloyd, "Least Squares Quantization in PCM," IEEE Trans. on Information Theory, pp. 129–137, Vol. IT-28, 1982.

16

Arithmetic coding performed after uniform quantization exploits the probability distribution and hence gives better coding performance

But given the non-stationarity of surveillance video, a non-uniform Gaussian adaptive quantizer, adapting to the variance of the distribution, is likely to be more robust


17

The following are the transmitted parameters for each video frame in a CS framework1. Motion vectors for different objects in the video

frame2. Location (top left coordinate and size of the

bounding box) and shape (chain coded boundary through entropy coding) of the objects

3. Dimensions of the measurement matrix for each object

4. CS coefficients (and transform coefficients)5. Variance of the CS measurements for adaptive

quantizer


18

Direct CS coding of object error DCT based CS coding of object error Wavelet based CS coding of object error


19

To the object error, a lower threshold is applied in order to construct a sparse matrix by eliminating insignificant values and converting them to zero

The R-D curve for a fixed L and increasing r give a better object-PSNR for a corresponding bitrate◦ r around 2.5 gives optimal performance

Direct CS coding of object error

20

Direct CS coding of object error

21

DCT is applied on the object error and the coefficients are thresholded to construct a sparse matrix◦ The threshold Th used is a percentage of the absolute

maximum value of the transform coefficients Whole-object DCT CS

◦ Since the object size in the frame is small DCT of the whole error block is taken

Block-wise DCT CS◦ The object error is divided into 8x8 sub blocks, and

smaller sub blocks at the boundary if required and separate DCT is applied for each block

DCT based CS coding of object error

22

DCT based CS coding of object error DCT CS coders peform worse than the direct CS The block-wise DCT CS is better than the whole-

object DCT CS

23

Investigate a few variations of wavelet based CS coding scheme

1. Regular wavelet CS◦ Daubechies 9/7 biorthogonal wavelet◦ Two levels of wavelet decomposition◦ A threshold is then applied to create a single

sparse array by merging all bands and CS is applied

Wavelet based CS coding of object error

24

2. Multiscale wavelet CS◦ For better representation of the wavelet coefficients,

wavelet bands at different levels are no longer merged but treated separately

◦ The scaling coefficients are uniformly quantized using a step size Δ (and entropy coded) and transmitted separately

◦ The number of CS measurements is determined as a ratio α of the size of the object error M ＝ α N

◦ Two levels are used, 1/4th and 3/4th of the measurements are used for the second level and first level wavelet coefficients

◦ A threshold is used before CS measurements


25

3. Hybrid wavelet CS◦ The scaling coefficients are separately quantized

and transmitted as in the multiscale wavelet CS◦ Independent assignment of CS measurements at

different levels of wavelet decomposition is avoided

◦ It is better to compress a sparse signal as a whole rather than in parts

◦ All wavelet coefficients (except the scaling coefficients) are merged together before thresholding and CS measurements


26

Performance of wavelet schemes Regular wavelet CS

27

Performance of wavelet schemes Multiscale wavelet CS

28

The hybrid wavelet CS coder performs better than the multiscale wavelet CS at all bitrates and the regular wavelet CS at higher bitrates

For the hybrid wavelet CS, experiments are performed by varying the quantization step size Δ for the scaling coefficients

Performance of wavelet schemes

29

Δ =8 is a good choice◦ object error is high pass and hence the scaling

coefficients have less energy and a large Δ may be used


30

Performance of wavelet schemes Hybrid wavelet CS

31

Hybrid wavelet CS◦ Smaller r (=2,2.5), smaller L (=4) and larger Th

(=10%) are better at low bitrates (≦8000 bits per frame)

◦ Larger r (=3), larger L (=16,32) and smaller Th (=5%) is preferable at higher bitrates (≧10000 bits per frame)


32

The optimal operation curves for the different techniques adapting parameter values for r, L, Δ and Th

Comparitive Study

33

CS works better with DWT◦ wavelet transform gives better energy

compaction than DCT

At low bitrates, the wavelet CS works better than direct CS◦ wavelet transform creates a sparser array for CS

than the original object error

Comparitive Study

34

Regular wavelet CS performs well initially but becomes worse than the hybrid scheme after 35 dB PSNR

At low bitrate, scaling coefficients are also sparse and hence well presented by regular wavelet CS

Comparitive Study

35

Hybrid scheme performs better that multiscale wavelet as the CS measurements are applied to all the wavelet bands merged together unlike the multiscale CS

Beyond 10k bits/frame direct CS perfoms better than other schemes

In conclusion, wavelet CS is best at low rates, while direct CS is best at higher rates

Comparitive Study

36

The bitrate variation is moderate for both the schemes except after frame 90◦ when the two objects of the video merge into one

object The typical PSNR variation

◦ direct CS : 29.87dB to 40.74dB◦ hybrid wavelet CS : 27.4 dB to 39.5dB

Comparitive Study

37

Comparitive Study

38

CS for surveillance video coding is discussed using different techniques and compared

The hybrid wavelet CS is found to work better at lower bitrates and direct CS for higher bitrates

In this work we target coding only the texture of the object error with shape explicitly coded using chain code

In future, we aim to revive the object shape using implicit shape coding at the decoder

Conclusions

A Compressive Sensing Approach To Object-based Surveillance Video Coding

Documents

Transcript of A Compressive Sensing Approach To Object-based Surveillance Video Coding