arXiv:1302.3446v2 [stat.AP] 15 Feb 2013 coded capture One adaptive integration period 123 4.. 123 ....

ADAPTIVE TEMPORAL COMPRESSIVE SENSING FOR VIDEO

Xin Yuan, Jianbo Yang, Patrick Llull, Xuejun Liao, Guillermo Sapiro, David J. Brady and Lawrence Carin

Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA

ABSTRACTThis paper introduces the concept of adaptive temporal com-pressive sensing (CS) for video. We propose a CS algorithmto adapt the compression ratio based on the scene’s tempo-ral complexity, computed from the compressed data, withoutcompromising the quality of the reconstructed video. Thetemporal adaptivity is manifested by manipulating the inte-gration time of the camera, opening the possibility to real-time implementation. The proposed algorithm is a general-ized temporal CS approach that can be incorporated with adiverse set of existing hardware systems.

Index Terms— Video compressive sensing, temporalcompressive sensing ratio design, temporal superresolution,adaptive temporal compressive sensing, real-time implemen-tation.

1. INTRODUCTIONVideo compressive sensing (CS), a new application of CS, hasrecently been investigated to capture high-speed videos at lowframe rate by means of temporal compression [1, 2, 3]1. Acommonality of these video CS systems is the use of per-pixelmodulation during one integration time-period, to overcomethe spatio-temporal resolution trade-off in video capture. Asa consequence of active [1, 2] and passive pixel-level codingstrategies [3] (see Fig. 1), it is possible to uniquely modulateseveral temporal frames of a continuous video stream withinthe timescale of a single integration period of the video cam-era (using a conventional camera). This permits these novelimaging architectures to maintain high resolution in both thespatial and the temporal domains. Each captured frame of thecamera is a coded temporal linear combination of the under-lying high-speed video frames. After acquisition, high-speedvideos are reconstructed by various CS inversion algorithms[9, 10, 11].

These hardware systems were originally designed forfixed temporal compression ratios. The correlation in timebetween video frames can vary, depending on the detailedtime dependence of the scene being imaged. For example, ascene monitored by a surveillance camera may have signif-icant temporal variability during the day, but at night there

1Significant work in spatial compression has been demonstrated with asingle-pixel camera [4, 5, 6, 7]. Unfortunately, this hardware cannot decreasethe sampling frame rate, and therefore has not been applied in temporal CS.[8] achieved compressive temporal superresolution for time-varying periodicscenes by exploiting their Fourier sparsity.

t

Modulated image

Original video

Hadamard product

Shiftedmask

= = = =

Frame 1 Frame 2 Frame 3 Frame NF

+ + +

+ + +

=

=

Conventional capture

CACTI coded capture

One adaptive integration period

1 2 3 4... 1 2 3 4... 1 2 3 4...

Fig. 1. Illustration of the coding mechanisms within the Coded ApertureCompressive Temporal Imaging (CACTI) system [3]. The first row showsNF high-speed temporal frames of the source datacube video; the secondrow depicts the mask with which each frame is multiplied (black is zero,white is one). In CACTI, the same code (mask) is shifted (from left to right)to constitute a series of frame-dependent codes. Finally, the CACTI mea-surement of these NF frames is the sum of the coded frames, as shown atright-bottom.

may be extended time windows with no or limited changes.Therefore, adapting the temporal compression ratio based onthe captured scene is important, not only to maintain a highquality reconstruction, but also to save power, memory, andrelated resources.

We introduce the concept of adaptive temporal compres-sive sensing to manifest a CS video system that adapts to thecomplexity of the scene under test. Since each of the afore-mentioned cameras involves similar integration over a timewindow, in which NF high-speed video frames are modu-lated/coded, we propose to adapt this time window (the inte-gration time NF ), to change the temporal compression ratioas a function of the complexity of the data. Specifically, weadaptively determine the number of frames NF collapsed toone measurement, using motion estimation in the compresseddomain.2

The algorithm for adaptive temporal CS can be incorpo-

2Studies have shown that improved performance could be achieved whenprojection matrices are designed to adapt to the underlying signal of interest[12, 13, 14, 15, 16]. However, none of these methods was developed for videotemporal CS. The adaptive CS ratio for video has been investigated in [17, 18,19, 20, 21]. Each frame in the video to be captured is partitioned into severalblocks based on the estimated motion, and each block is set with a differentCS ratio. Though a novel idea, it is difficult to employ it in real cameras sinceit is hard to sample at different framerates for different regions (blocks) of thescene with an off-the-shelf camera. In contrast, the method presented in thispaper can be readily incorporated with various existing hardware systems.

arX

iv:1

302.

3446

v2 [

stat

.AP]

15

Feb

2013

rated with a diverse range of existing video CS systems (notonly the imaging architectures in [1, 2, 3] but also flutter shut-ter [22, 23] cameras), to implement real-time temporal adap-tation. Furthermore, thanks to the availability of hardware forsimple motion estimation [24], the proposed algorithm can bereadily implemented in these cameras.

2. PROPOSED METHODThe underlying principle of the proposed method is to deter-mine the temporal compression ratio NF based on the motionof the scene being sensed. In the following, we propose toestimate the motion of the objects within the scene, to adaptthe compression ratio for effective video capture.

Search in this windowPxP block

Fig. 2. Basic principle of block-matching. Search all the P × P blocks inthe window of frame B to find the one best matched with the block in frameA, and use this to compute the block motion.

2.1. Block-Matching Motion Estimation

The block-matching method considered here has been em-ployed in a variety of video codes ranging from MPEG1 /H.261 to MPEG4 / H.263 [24, 25, 26]. Diverse algorithms[25] have investigated the block-matching concept shown inFig. 2. The key steps of the block-matching method are re-viewed as follows: i) partition frame A (e.g., previous frame)into P ×P (pixels) blocks; ii) pre-define a window size M ×M (pixels); iii) search all the P × P blocks in the M × Mwindows in frame B (e.g., current frame) around the selectedblock in frame A; iv) and find the best matching block in thewindow according to some metric such as mean squared error,and use this to compute the block motion. We demonstrateadaptive compression ratios based on this estimated motionfrom reconstructed video frames in Section 3.

Estimating motion in high-speed dynamic scenes via theblock-matching method in the reconstructed video (after sig-nal recovery) is computationally infeasible given current re-construction times at even modest compression ratios. Hence,we aim to compute the adaption of NF based directly onthe raw (compressed) measurements without the intermedi-ate step of reconstruction. The following section proposesa method to estimate motion solely on low-framerate, codedmeasurements from the camera.

2.2. Real-Time Block-Matching Motion Estimation

Estimating motion from the camera captured data requiresthe motion to be observable without reconstructing the videoframes from the measurement. Fig. 3 presents the underlyingprinciple of the real-time block-matching motion estimation

Frame A Frame B

Matched

Window

Fig. 3. Real-time motion estimation by block-matching.

approach. From this figure, it is apparent that the scene’s mo-tion is observable within the time-integrated coding structure.This property lets us employ the block-matching method di-rectly on raw measurements (frames A and B in Fig. 3) toestimate the scene’s motion. Adapting the compression ratioNF online is feasible due to the computational simplicity ofthis method.

2

Fig. 4. Segmentation of foreground and background by motion estimationfrom compressed measurements. Left is the original measurement; middle isthe background blocks with foreground blocks shown in black and the rightpart presents the foreground blocks with background blocks shown in black.Note that the aim of this work is to estimate the motion, not segmentation.This primary segmentation helps us to localize the moving parts of the scene.16×16 (P = 16) blocksize is used and the window size is defined as 40×40(M = 40). Cross-diamond search algorithm [27] has been used to generatethis figure and the subsequent results in Section 3.

By thresholding the measurement from the moving pix-els estimated for each block, we can also roughly segmentthe scene into the foreground and the background (Fig. 4).Notably, we adapt NF solely based on the estimated motionvelocity V (pixels/frame) for the fastest-moving blocks of thescene.

Intuitively, the compression ratio required to faithfully re-construct the scene’s motion is inversely proportional to thedetected velocity V , e.g., NF = C

V , where C is a constantthat depends on the scene. In practice, we simply apply alook-up table to (discretely) appropriately adapt NF with fewcomputations. See Table 1 for an example. Since good hard-ware exists for motion estimation [24] , the proposed methodcan be implemented in real time.

It is worth noting that the estimated motion, and then theselected NF based on the present measurement, is used in theupcoming frames. We assume the consistent motion of theadjacent frames in the video. Sudden changes of the motionwill result in one integration time delay of the NF adaption.Simulation results in Fig. 5 verify this point. We can of courseput an upper bound in NF .

(a) PSNR, adaptive NF and estimated velocity vs. high-speed frames

(b) Measurement

frame 99~106

(c) Measurement

frame 187~202

(d) Measurement

frame 295~302

0 50 100 150 200 250 300 3500

5

10

15

20

25

Frame

Velocity (pixels) Original Video

Velocity (pixels) Measurements

NF (frames)

PSNR (dB)

Fig. 5. (a) Reconstruction PSNR (dB), adaptive NF (frames), and veloc-ities (pixels/frame) estimated from the original video and measurements, allare plotted against frame number. (b-d) Measurements with vehicles at dif-ferent velocities.

(a) Ground truth, frames 1~4 shown as examples

(b) Reconstructed video frames, 16 selected frames shown as examples

#1 #2 #3 #4

#1 #2 #3 #4

#121 #122 #123 #124

#201 #202 #203 #204

#301 #302 #303 #304

Fig. 6. Selected reconstructed frames (b) based on the adaptive NF pre-sented in Fig. 5. Frames 1 to 4 in (a) are shown as examples of ground truth.

V [0, 0.5) [0.5, 1) [1, 2) [2, 3) [3, 7) ≥ 7NF 16 12 8 6 4 2

Table 1. Relationships between the velocity V (pixels/frame) of the fore-ground and the compression ratio NF (frames).

0 50 100 150 200 250 300 3500

5

10

15

20

25

Frame

Velocity (pixels) Original Video

Velocity (pixels) Reconstructed Video

NF (frames)

PSNR (dB)

Fig. 7. Reconstruction PSNR (dB), adaptive NF (frames), and velocities(pixels/frame) estimated from the original and reconstructed video frames,all are plotted against frame number.

3. EXPERIMENTAL RESULTS

From [3], we have found that (based on extensive simulations)shifting a fixed mask is as good as using the more sophisti-cated time evolving codes used in [1, 2]. For convenience(but not necessity), the subsequent results will use a shiftedmask to modulate the high-speed video frames.

3.1. Example 1: Synthetic Traffic VideoWe illustrate the adaptive compression ratio framework on atraffic video [28] that has 360 frames. We artificially varythe foreground velocity for this video to evaluate the pro-posed method’s performance for motion estimation and NF

adaption. Frames 1-120 (Fig. 5(b)) and 241-336 (Fig. 5(d))run at the originally-captured framerate; we freeze the scenebetween frames 121-240 (Fig. 5(c)). Generalized alternatingprojection (GAP) algorithm [11] is used for the reconstruc-tions.

Table 1 provides the compression ratio NF correspond-ing to several scene velocities V . This look-up table (learnedbased on training data3) seeks to maintain a constant recon-struction peak signal-to-noise ratio (PSNR) of 22dB. Fig. 5presents the real-time motion estimation results using simu-lated low-framerate coded exposures of the traffic video withan initial compression ratio NF = 6. After a short fluctu-ation, the estimated velocity of the scene becomes constant;NF accordingly stabilizes at 8. When the vehicles freeze,the block-matching algorithm senses zero change in the pixelposition and updates NF to 16. NF returns to 8 upon con-tinuing video playback at normal speed. We can also observethe consistence of velocities estimated from the original videoand from the compressed measurements in Fig. 5(a). Suddenchanges in the video’s framerate (and hence the motion ve-locity V ) are reflected in short fluctuations of the PSNR (for

3We use other traffic videos playing at different velocities (different fram-erates) to learn this table. The main steps are presented as follows: i) gen-erate videos with different motion velocities by changing the framerate; ii)estimate the motion velocities V of the generated videos; iii) modulate thegenerated videos with shifting masks and constitute measurements with di-verse NF ; iv) reconstruct the videos with GAP [11] from these compressedmeasurements and calculate the PSNR of the reconstructed video; v) andbuild the relations between estimated velocities V and NF maintaining aconstant PSNR (around 22dB).

0 100 200 300 400 500 6000

5

10

15

20

25

30

35

Frame

Velocity (pixels) from MeasurementsNF (frames)PSNR (dB)

(d) Measurementframe 539~544

(a) PSNR, adaptive NF and estimated velocity vs. high-speed frames

(c) Measurementframe 237~244

(b) Measurementframe 97~112

(e) Reconstructed frames 539~544 with adaptive NF, average PSNR = 30.6466dB

#539 PSNR: 29.8204dB #540 PSNR: 31.1062dB #541 PSNR: 31.1244dB


NF =16 NF =8 NF =6



(f) Reconstructed frames 539~544 with nonadaptive (constant) NF=10, average PSNR = 27.5445dB

Fig. 8. Motion estimation and adaptive NF from the measurements.(a) Reconstruction PSNR (dB), adaptive NF (frames) (average adaptiveNF =10.12), and velocities (pixels/frame) estimated from the measurements,all are plotted against frame number. (b-d) Measurements when there isnothing, one person, and a couple moving inside the scene, adapted NF =16, 8, 6, respectively. (e) Reconstructed frames 539-544 from the measure-ment in (d) with adaptive NF . (f) Reconstructed frames 539-544 with non-adpative (constant) NF = 10.

one time-integration period) in Fig. 5(a). The average PSNRof the reconstructed frames in Fig. 5 is 21.8dB, very close toour expectation (22dB). Fig. 6 presents several reconstructedframes based on the adaptive NF in Fig. 5.

We additionally evaluate the block-matching algorithm’sperformance by deploying it to reconstructed frames. Fig. 7demonstrates that its performance is similar to the phenom-ena shown in Fig. 5. This justifies that it is unnecessary toreconstruct each measurement prior to updating NF .

3.2. Example 2: Realistic Surveillance VideoFig. 8 implements adaptive NF on video data captured infront of a shop [29]. Table 1 is again useful for this exam-ple.

The first 189 frames of this video (Fig. 8(b)) are station-ary; nothing is moving within the scene. As seen before, sinceV = 0, the compression ratio remains at NF =16. After the189th frame, different people begin to walk in and out of thevideo area (Fig. 8(c-d)). The compression ratio NF is adaptedbetween 6 and 16 according to the estimated velocity. Whenone person walks into the shop (Fig. 8(c)), the compressionratio drops (NF =8). This results in a better-posed reconstruc-tion of the underlying video frames. When a couple walks infront of the shop (Fig. 8(d)), NF drops further to 6. The cor-responding measurement and reconstructed frames are shownin Fig. 8(d,e).

This video takes a total of 67 adaptive measurements tocapture and reconstruct 678 high-speed video frames, achiev-ing a mean compression ratio NF ≈10.12. To demonstratethe utility of adapting NF based on the sensed data, wecompare adaptive reconstructions to those obtained whenNF is fixed at or near its expected value. Fig. 8(f) showsreconstructed frames 539-544 when fixing NF = 10. Com-paring part (e) with part (f), we notice that adapting NF

provides a (3dB) higher reconstruction quality (averagePSNR=30.65dB) than fixing NF near its expected value(average PSNR=27.54dB). These improvements are mostnoticeable whenever there is motion within the scene anddemonstrate the potency of temporal compression ratio adap-tation in realistic applications.

4. CONCLUSIONWe have introduced the concept of adaptive temporal com-pressive sensing for video and demonstrated a real-timemethod to adapt the temporal compression ratio for videocompressive sensing. By estimating the motion of objectswithin the scene, we determine how many measurements arenecessary to ensure a reasonably well-conditioned estimationof high-speed motion from lower-framerate measurements.

A block-matching algorithm estimates the scene’s motiondirectly from the compressed measurements to obviate real-time reconstruction, thereby significantly reducing the real-time computational resources. Simulation results have veri-fied the efficacy of the proposed adaption algorithm. Future

work will seek to embed this real-time framework into thehardware prototype.

5. REFERENCES

[1] Y. Hitomi, J. Gu, M. Gupta, T. Mitsunaga, and S. K. Na-yar, “Video from a single coded exposure photograph usinga learned over-complete dictionary,” IEEE International Con-ference on Computer Vision (ICCV), pp. 287–294, November2011.

[2] D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: Pro-grammable pixel compressive camera for high speed imaging,”IEEE International Conference on Computer Vision and Pat-tern Recognition (CVPR), pp. 329–336, June 2011.

[3] P. Llull, X. Liao, X. Yuan, J. Yang, D. Kittle, L. Carin,G. Sapiro, and D. J. Brady, “Coded aperture compressive tem-poral imaging,” submitted to Optics Express, available on line:arXiv:647139.

[4] A. C. Sankaranarayanan, P. K. Turaga, R. G. Baraniuk, andR. Chellappa, “Compressive acquisition of dynamic scenes,”11th European Conference on Computer Vision, Part I, pp.129–142, September 2010.

[5] A. C. Sankaranarayanan, C. Studer, and R. G. Baraniuk, “CS-MUVI: Video compressive sensing for spatial-multiplexingcameras,” IEEE International Conference on ComputationalPhotography, pp. 1–10, April 2012.

[6] M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sar-votham, D. Takhar, K. F. Kelly, and R. G. Baraniuk, “Compres-sive imaging for video representation and coding,” Proceed-ings of the Picture Coding Symposium, pp. 1–6, April 2006.

[7] M. F. Duarte, M. A.Davenport, D. Takhar, and S. TingJ. N. Laska, K. F. Kelly, and R. G. Baraniuk, “Single-pixelimaging via compressive sampling,” IEEE Signal ProcessingMagazine, vol. 25, no. 2, pp. 83–91, March 2008.

[8] A. Veeraraghavan, D. Reddy, and R. Raskar, “Coded strob-ing photography: Compressive sensing of high speed periodicvideos,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 33, no. 4, pp. 671–686, April 2011.

[9] J. A. Tropp and A. C. Gilbert, “Signal recovery from randommeasurements via orthogonal matching pursuit,” IEEE Trans-actions on Information Theory, vol. 53, no. 12, pp. 4655–4666,December 2007.

[10] J.M. Bioucas-Dias and M.A.T. Figueiredo, “A new TwIST:Two-step iterative shrinkage/thresholding algorithms for imagerestoration,” IEEE Transactions on Image Processing, vol. 16,no. 12, pp. 2992–3004, December 2007.

[11] X. Liao, H. Li, and L. Carin, “Generalized alternating pro-jection for weighted-`2,1 minimization with applications tomodel-based compressive sensing,” Preprint submitted toSIAM Journal on Imaging Sciences, 2012.

[12] M. Elad, “Optimized projections for compressed sensing,”IEEE Transactions on Signal Processing, vol. 55, no. 12, pp.5695–5702, December 2007.

[13] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,”IEEE Transactions on Signal Processing, vol. 56, no. 6, pp.2346–2356, June 2008.

[14] W. Carson, M. Chen, M. Rodrigues, R. Calderbank, andL. Carin, “Communications inspired projection design withapplication to compressive sensing,” to appear in SIAM Jour-nal on Imaging Sciences, accepted 2012.

[15] J. M. Duarte-Carvajalino, G. Yu, L. Carin, and G. Sapiro,“Task-driven adaptive statistical compressive sensing of gaus-sian mixture models,” IEEE Transactions on Signal Process-ing, vol. 61, no. 3, pp. 585–600, February 2013.

[16] L. Zelnik-Manor, K. Rosenblum, and Y. C. Eldar, “Sensingmatrix optimization for block-sparse decoding,” IEEE Trans-actions on Signal Processing, vol. 59, no. 9, pp. 4300–4312,September 2011.

[17] Z. Liu, H. V. Zhao, and A. Y. Elezzabi, “Block-based adaptivecompressed sensing for video,” IEEE International Conferenceon Image Processing, pp. 1649–1652, 2010.

[18] Z. Liu, A. Y. Elezzabi, and H. V. Zhao, “Maximum frame ratevideo acquisition using adaptive compressed sensing,” IEEETransactions on Circuits and Systems for Video Technology,vol. 21, no. 11, pp. 1704–1718, November 2011.

[19] J. Y. Park and M. B. Wakin, “A multiscale framework for com-pressive sensing of video,” Proceedings of the Picture CodingSymposium, pp. 1–4, May 2009.

[20] M. Azghani, A. Aghagolzadeh, and M. Aghagolzadeh, “Com-pressed video sensing using adaptive sampling rate,” Interna-tional Symposium on Telecommunications, pp. 710–714, 2010.

[21] J. E. Fowler, S. Mun, and E. W. Tramel, “Block-based com-pressed sensing of images and video,” Foundations and Trendsin Signal Processing, vol. 4, no. 4, pp. 297–416, 2012.

[22] R. Raskar, A. Agrawal, and J. Tumblin, “Coded exposure pho-tography: motion deblurring using fluttered shutter,” ACMTransactions on Graphics, vol. 25, no. 3, pp. 795, 2006.

[23] Y. Tendero, J-M Morel, and B. Rouge, “The flutter shutterparadox,” to appear in SIAM Journal on Imaging Sciences, pp.1–33, 2013.

[24] C.-H. Hsieh and T.-P. Lin, “VLSI architecture for block-matching motion estimation algorithm,” IEEE Transactionson Circuits and Systems for Video Technology, vol. 2, no. 2,pp. 169–175, June 1992.

[25] M. Ezhilarasan and P. Thambidurai, “Simplified block match-ing algorithm for fast motion estimation in video compres-sion,” Journal of Computer Science, vol. 4, no. 4, pp. 282–289,2008.

[26] D. J. Le Gall, “The MPEG video compression algorithm,” Sig-nal Processing: Image Communication, vol. 4, no. 2, pp. 129–140, April 1992.

[27] C.-H. Cheung and L.-M. Po, “A novel cross-diamond searchalgorithm for fast block motion estimation,” IEEE Transac-tions on Circuits and Systems for Video Technology, vol. 12,no. 12, pp. 1168–1177, December 2002.

[28] “http://projects.cwi.nl/dyntex/database pro.html,” .

[29] “http://i21www.ira.uka.de/image sequences/,” .

http://projects.cwi.nl/dyntex/database_pro.html

http://i21www.ira.uka.de/image_sequences/

arXiv:1302.3446v2 [stat.AP] 15 Feb 2013 coded capture One adaptive integration period 123 4.. 123 ....

Documents

Transcript of arXiv:1302.3446v2 [stat.AP] 15 Feb 2013 coded capture One adaptive integration period 123 4.. 123 ....