Robust high-deﬁnition video watermarking based on self ...hklee.kaist.ac.kr/publications/OE(with...

Roc

TMKHK

DDE

HKSD

1

Titaarpcmmwcatterw

tftctvrst

0

Optical Engineering 49�9�, 097006 �September 2010�

O

obust high-definition video watermarking basedn self-synchronizing signals againstomposite distortions

ae-Woo Ohin-Jeong Leeyung-Su Kimeung-Kyu Leeorea Advanced Institute of Science and

Technologyepartment of Computer Scienceaejeon, 305-701, Republic of Korea-mail: [email protected]

ae-Yeoun Leeumoh National Institute of Technologychool of Computer and Software Engineeringaejeon, 305-701, Republic of Korea

Abstract. This paper proposes a high-definition video watermarkingmethod robust against compound geometrical attacks by camcordercapture as well as common video processing such as frame-rate conver-sion and transcoding. Unlike traditional watermarking systems using theoriginal reference pattern, the proposed method exploits the referencepattern estimated from the watermarked video. Since the reference pat-tern and watermark pattern are embedded in the same spatial position ofeach frame, the two patterns estimated from the embedded video arealways in spatial sync. Thus, the watermark information can be simplydetected without an additional synchronizing step, even if geometricaldistortions happen to the marked video. Also, the problem of misestima-tion of the reference pattern caused by temporal video clipping is solvedusing the proposed two-pass detector. Extensive experiments prove thatthe proposed method is robust against various temporal and spatialdistortions. © 2010 Society of Photo-Optical Instrumentation Engineers.�DOI: 10.1117/1.3488053�

Subject terms: digital rights management; video watermarking; geometricalattack; camcorder capture; self-synchronizing signal.

Paper 100195RR received Mar. 11, 2010; revised manuscript received Jul. 25,2010; accepted for publication Jul. 28, 2010; published online Sep. 17, 2010.

Introduction

he qualitative and quantitative growth of the digital videondustry is largely based on the development of networkechnology and the spread of various display devices suchs high-resolution LCD TV, portable multimedia players,nd high-performance cell phones. At the same time, pi-ates copy and distribute the digital video contents moreerfectly. Since the illegal acts cause great financial loss toontent providers and the market, the role of digital rightsanagement �DRM� system is important. Video water-arking as a DRM system comes into the spotlight as aay to prevent illegal copying and distribution of digital

ontents. The principle of video watermarking is to embedsecret signal into original video content. The inserted wa-

ermark signal represents copyright information, whichypically indicates the provider or owner. The watermarkmbedded in individual copies of the video allows illegallyeproduced copies to be traced back to the receiver fromhich they originated.A watermarked video undergoes various manipulations

o be suited for many multimedia applications or devices:rame-rate conversion, bit-rate change, scaling, transcodingo various formats, and cropping. The inserted watermarkan be removed or destroyed by these manipulations, sohat those are treated as attacks against the watermarkedideo. In particular, since geometrical distortions such asotation, scaling, translation, and projection destroy theynchronization needed for watermark detection, they arehe most difficult attacks to handle in a blind watermarking

091-3286/2010/$25.00 © 2010 SPIE

ptical Engineering 097006-

Downloaded from SPIE Digital Library on 28 Oct 2010 to 14

system. In recent times, the spread of high-performancecamcorders has made illegal video capture easy. When avideo is captured by a camcorder, some mixed geometricaldistortions are applied to the captured video in combinationwith digital to analog �DA� and analog to digital �AD� con-version. This makes the spatial synchronization problemmore difficult.

For still images, there have been various watermarkingproposals for robustness against geometrical attacks. Theseschemes are classified into four categories: embedding in ageometrically invariant domain,1,2 inserting synchroniza-tion marks such as templates,3,4 exploiting periodic water-mark patterns,5–7 and using image features such as corners,edges, texture, and the segmented object.8–11 However,these schemes are not suited for high-definition �HD� videocontent, because of their high computational complexityand limited robustness against compound geometrical andtemporal attacks.

There are also some video watermarking systems thatsolve the spatial synchronization problem under geometri-cal attacks. Leest et al.12 proposed a video watermarkingscheme using the mean luminance value of each frame onthe temporal axis as a domain insensitive to spatial desyn-chronization. The watermark is embedded by increasing ordecreasing the mean luminance and detected by estimating�after temporally lowpass filtering� the mean luminance ofthe watermarked video frames. Although this approach isinvariant under geometrical attacks, it is susceptible to tem-poral distortions such as frame dropping and frame-ratechange. Lee et al.13 proposed a method based on the localautocorrelation function �LACF� to handle a perspective

September 2010/Vol. 49�9�1

3.248.135.186. Terms of Use: http://spiedl.org/terms

deLmdnsCdwdcChtcr

smsvpemb

Oh et al.: Robust high-definition video watermarking based on self-synchronizing signals…

O

istortion as well as an affine transformation. The methodstimates geometrical distortion parameters by using theACF. The watermark is detected after restoring the water-ark pattern by exploiting the parameters. However, it is

ifficult to use peaks of the LACF, because the peaks areot clear after several signal-processing attacks. Also, thischeme is still not robust to mixed geometrical distortions.oria et al.14 embedded the watermark pattern into theual-tree complex wavelet transform �DT CWT� domain,hich has perfect reconstruction, shift invariance, and goodirectional selectivity. In this scheme, the DT CWT coeffi-ients of the watermark pattern are embedded into the DTWT coefficients of level 3 and level 4 subbands of theost content. Although the embedded watermark is robusto some mild geometrical attacks such as weak scaling,ropping, and rotation, the scheme still has a problem withelatively severe or compound geometrical attacks.

In this paper, we propose a practical video watermarkingcheme robust to mixed geometrical attacks as well as com-on video-processing attacks such as frame-rate conver-

ion and transcoding to various formats. Unlike traditionalideo watermarking systems using the original referenceattern, the proposed method exploits the estimated refer-nce pattern for detection based on correlation with water-ark patterns. The reference pattern is consecutively em-

edded by the spread-spectrum method15,16 in the preset

Fig. 1 Traditional watermarking system versumethod of the traditional watermarking system, �a geometrically attacked frame, and �c� detectmetrically attacked frame.



frames of a video and estimated directly from the water-marked video. Since the estimated reference pattern is al-ways self-synchronized with watermark patterns, the detec-tor simply identifies the watermark elements without anadditional spatial synchronization step. In the proposedmethod, the problem of misestimation of the reference pat-tern caused by temporal desynchronization is solved by atwo-pass detector.

The paper is organized as follows. In Sec. 2, we describethe problems of traditional methods and introduce the basicidea of the proposed method. In Sec. 3, we explain theembedding procedure and the detection procedure, includ-ing the one-pass detector for temporally synchronized vid-eos and the two-pass detector for temporally desynchro-nized videos. Section 4 analyzes the error probability of ourmethod. Experimental results are presented in Sec. 5, andSec. 6 concludes.

2 Problem Statement and Basic IdeaThe correlation-based traditional watermarkingsystems13,17,18 verify the watermark information throughthe correlation between the estimated watermark patternand the original reference pattern as shown in Fig. 1�a�. Ifthe watermarked frame is geometrically distorted, however,the estimated watermark pattern becomes spatially asyn-

osed watermarking system: �a� the detectionction by the traditional watermarking system forthe proposed watermarking system for a geo-

s propb� deteion by



csF

sturewpvootwctgst

Fp


O

hronous with the reference pattern. This makes it impos-ible to get the correct correlation results, as described inig. 1�b�.

We propose a new concept for watermarking system toolve the problem caused by geometrical attacks. Unlikehe traditional watermarking systems, the proposed methodses the estimated reference pattern instead of the originaleference pattern. The proposed method estimates the ref-rence pattern as well as the watermark pattern from theatermarked video. In embedding process, the referenceattern is additively embedded in the spatial domain ofideo frames. Also, the watermark pattern is embedded inther frames in the same way. Since the spatial coordinatesf both patterns in the different frames are the same, thewo patterns estimated from the watermarked video are al-ays in spatial sync. If the watermarked video is geometri-

ally attacked, the reference pattern and the watermark pat-ern are equally distorted, because all frames of theeometrically attacked video generally go through the samepatial distortion. Thus, the proposed method can calculatehe correlation between the two patterns without an addi-

ig. 2 The result of normalized correlation between the embeddedattern and the cumulatively estimated pattern.

Fig. 3 The embedding outline of the referencegray square: embedded pattern�.



tional synchronization step, as shown in Fig. 1�c�. We callthe pair of patterns a self-synchronizing signal.

The performance of the proposed method is dependenton the accuracy of the estimated reference pattern. In gen-eral, the reference pattern estimated from one frame is notsimilar to the original reference pattern. This imprecise ref-erence pattern leads to decrease of the performance of theproposed method. To solve this problem, we introduce re-dundant embedding and cumulative detecting system.17,18

This system repetitively embeds the pattern in consecutiveframes and estimates it cumulatively. The longer the accu-mulation interval of the estimated pattern is, the more pre-cisely the embedded pattern is picked up. Figure 2 showsan experimental result confirming that expectation. Thegraph represents the results of the normalized correlationbetween the original embedded pattern and the pattern es-timated cumulatively from an embedded video using anadaptive Wiener filter.19 As shown in Fig. 2, redundantlyaccumulating the estimated patterns makes it possible toobtain a more exact reference pattern.

3 Watermarking System

3.1 The Basic Description of the ProposedWatermarking Scheme

3.1.1 The design of the watermarkIn our method, a watermark sequence signifies copyrightinformation and consists of 1 and −1, which are randomlygenerated by a secret key and follow a standard normaldistribution. Each 1 or −1 is called a watermark elementand represented by a pattern having positive or negativecorrelation with the reference pattern. We denote the refer-ence pattern by n; it is a 2-D pattern that follows a standardnormal distribution. If the watermark element to be insertedis 1, the watermark 2-D pattern is embedded as n, to havethe positive relation with the reference pattern. If −1 isinserted, the watermark pattern is −n, to have the negativerelation. Figure 3 shows temporal order in which the framesthat the reference pattern and the watermark pattern areembedded. Since our scheme is based on repetitive embed-ding and cumulative estimating as mentioned in Sec. 2, thereference pattern and watermark patterns are repetitivelyembedded in the succeeding frames during the predefinedintervals. The insertion intervals �in seconds� of the refer-ence pattern and watermark pattern are denoted by u and v,

n and watermark pattern �white square: frame;
patter


rsudwtspwo3

3

Ffc4imbftpi−wiR

pdsa


O

espectively. The reference pattern n is embedded in thepatial domain of temporally consecutive frames during the-second interval. Then, a watermark pattern is embeddeduring the other predefined time interval, v seconds. Theatermark pattern is renewed every v seconds according to

he watermark elements. One watermarking section con-ists of the reference pattern interval and entire watermarkattern intervals. In a video having a long running time, theatermarking section is periodically repeated. The detailsf the watermarking process are described in Secs. 3.2 and.3.

.1.2 The payload and region of interest ofwatermarking

igure 3 shows the case including one pattern for eachrame. The payload for watermark elements can be in-reased by inserting several patterns for each frame. Figurerepresents the region of interest �ROI� for the cases hav-

ng four and sixteen patterns per frame. A white squareeans the luminance frame of a video, and the gray and

lack squares are the embedding and detecting ROIs of therame, respectively. In the case of four patterns per frame,he pattern n is preferentially embedded as the referenceattern in the four ROIs of each frame for u seconds. Then,f the watermark elements to be embedded are 1, −1, −1, 1,1, 1, 1, 1, each frame for the first v seconds includesatermark patterns n, −n, −n, n, and for the next v seconds

ncludes watermark patterns −n, n, n, n in the embeddingOIs. The order of the ROIs is the raster-scan order.

In the proposed method, only a part of the embeddedattern is exploited for the detection. In other words, theetecting ROI is smaller than the embedding ROI, ashown in Fig. 4. In each ROI, since the reference patternnd watermark patterns are either positively or negatively

Fig. 4 Embedding ROIs �gray squares� and dpatterns per frame, �b� mixed geometrical distoand �d� mixed geometrical distortion of �c�.



correlated and are embedded at the same spatial coordinatesin different frames, we can correctly obtain the correlationresult in spite of using only part of the pattern. This struc-ture of the ROIs makes it possible to get stable results evenif geometrical distortions occur and to reduce the detectionprocessing time. Figure 4�b� and 4�d� show that the part ofthe embedded pattern is stably preserved in the detectionROI even if the watermarked frame is geometrically dis-torted.

3.2 Watermark EmbeddingFigure 5 illustrates the proposed embedding procedure. Theinput of the process is the luminance channel of the videoframes. Before embedding, each interval, u and v, of thereference pattern and watermark pattern should be deter-mined. If the intervals are increased, the cumulatively esti-mated patterns become more exact, but the payload lengthof the watermark sequence may be decreased. The detail ofembedding procedure is as follows.

Step 1: the generation of the watermark sequence. Asecret key-dependent watermark sequence W= �w�t� �w�t�� −1,1� , t=0, . . . , l−1�, where l is thenumber of watermark elements, is produced as water-mark information to be embedded in a video. The binarywatermark sequence W is randomly generated by a pseu-dorandom number generator following a standard nor-mal distribution N�0,1�. Thus, the proportions of the 1and −1 in the generated sequence W are similar.

Step 2: the generation of the basic pattern. A basic pat-tern n to be used as reference and watermark patterns isbased on a 2-D rectangular signal, which is of sizeM �M and follows a standard normal distribution.

g ROIs �black squares�: �a� the case of four�a�, �c� the case of sixteen patterns per frame,

etectinrtion of

1 2




O

The height and the width of the 2-D signal are extendedby � times, and consequently the basic pattern n of sizeM1��M2� is generated. The extension of the 2-D pat-tern enhances the power of low frequencies in the signal,so that it improves the robustness against DA and ADconversion distortion as well as various signal-processing attacks.

Step 3: the insertion of the reference patterns after per-ceptual modeling. In this step, the reference patterns areembedded into ROIs of the frame for u seconds, thepredefined interval. If the number of patterns per frameis p, then p reference patterns are respectively embeddedinto each ROI. In Fig. 5, p is 4. Since the insertion of thepatterns should not lower the perceptual quality, a per-ceptual modeling process is performed. The perceptualconstraint has a bearing on the embedding strength be-cause of the trade-off between the robustness and theimperceptibility of the pattern. We calculate a localweighting factor �, which controls the insertion strength,by using a perceptual masking model based on the hu-man visual system �HVS�. The proposed method uses alocal weighting factor based on the noise visibility func-tion �NVF�.20 The local weighting factor is given by

�k�i, j,m� = �1 − nvfk�i, j,m�� · S0 + nvfk�i, j,m� · S1, �1�

where S0 is the upper bound of visibility in edged andtextured regions, and S1 is the lower bound in flat andsmooth regions. The variables �i , j� and k are the spatialcoordinates within each embedding ROI and the indexof the frame, respectively. The index of each ROI inraster-scan order is denoted by m. When the number ofpatterns per frame is p, we have 0�m� p−1. The NVFis given by

Fig. 5 Watermark embedding procedu



nvfk�i, j,m� =1

1 + �D/�k max2 � · �k

2�i, j,m�, �2�

where �k2� · � is the local variance and is calculated from

pixel values obtained by sliding a 3�3 unit; �k max2 is

the maximum of the local variances; and D is a scalingconstant empirically set to 150. The p reference patternsare embedded in an additive way into the frame lumi-nance ROIs, after modulating the basic pattern n by theNVF-based local weighting factor. The embedding pro-cess of the each reference pattern is represented by

Ik��i, j,m� = Ik�i, j,m� + �k�i, j,m� · n�i, j� , �3�

where Ik� · , · ,m� is the m’th ROI in the k’th frame, andIk�� · , · ,m� is the inserted ROI.

Step 4: the insertion of the watermark patterns afterperceptual modeling. After embedding the reference pat-terns, the watermark patterns are embedded according tothe element value of the watermark sequence W. If thewatermark element to be embedded is 1, the basic pat-tern n is exploited; otherwise, −n is exploited. The ROIsof watermark patterns are equal to those of referencepatterns. Like the embedding of the reference pattern,the perceptual modeling process is first performed oneach ROI to get the local weight factor. Then, the patternmodulated by the local weighting factor is added to eachROI of the frame as follows:

Ik��i, j,m� = Ik�i, j,m� + �k�i, j,m� · n�i, j� · w�t� , �4�

where t is the index of the watermark element andt= ��k mod�ux+ lvx�−ux� / �vx�� p+m, where x is theframe rate �frames per second� of the video to be water-marked. This process is performed for v seconds, andthen the process regarding the next p watermark ele-

e number of patterns per frame is 4.�
re. �Th


drwdfi

3WtTttnqld

3Fiwmiff


O

ments is repeated every v seconds until the end of thewatermark sequence.

After the insertion of all the watermark elements isone, step 3 and step 4 are performed again to embed theeference patterns and the watermark patterns in the nextatermarking section. The u, v, and p values are sent to aetector as side information after the embedding process isnished.

.3 Watermark Detectione propose two watermark detectors. First, a one-pass de-

ector is used in the case of a temporally unimpaired video.he one-pass detection is performed in real time. If a wa-

ermarked video is temporally clipped by a video editingool or camcorder capture, the exact reference pattern mayot be obtained by the one-pass algorithm, and conse-uently the watermark detection may fail because of theoss of the temporal sync. To solve this problem, a two-passetector is proposed.

.3.1 One-pass detectorigure 6 depicts the process of the one-pass detector. The

nput ROIs of this procedure are the partial areas of theatermarked ROIs of the frame luminance channel, asentioned in Sec. 3.1.2. The detector knows the pattern

ntervals �u and v� and the number of patterns per frame �p�rom the side information. The detection procedure is asollows.

Step 1: the estimation of the reference patterns. The ref-erence patterns are estimated during the reference-pattern interval, u seconds. If the number of patterns perframe is p, then p reference patterns are respectivelyestimated from each ROI. In Fig. 6, p is 4. First, thedenoising process is performed on ROIs in every frame.Since the reference patterns are embedded like invisiblenoise by the additive method, we apply a denoising filter

Fig. 6 One-pass watermark detection pro



to estimate the part of the embedded reference pattern bycalculating the difference between the detection ROI andits denoising-filtered result. In this paper, an adaptiveWiener filter19 is used as the denoising filter. The esti-mated pattern S� is calculated by

Sk��i�, j�,m� =s2

�k2�i�, j�,m� + s2 �Ik�i�, j�,m� − �k�i�, j�,m�� ,

�5�

where Ik� · , · ,m� is the m’th detection ROI in the k’thframe of the watermarked video, which goes through acorrupted channel, and 0�m� p−1. The functions�k� · � and �k

2� · � are the local mean and the local vari-ance in a 3�3 sliding window, respectively; s2 is thenoise variance, but is replaced with the mean value of�k

2� · � because the actual value is not available in thedetection.21 The pattern Sk�� · , · ,m� estimated from eachROI of the frame is accumulated during the referenceinterval �u seconds� to increase the accuracy of the esti-mated patterns, as mentioned in Sec. 2. This accumula-tion is performed in each ROI and produces the p refer-ence patterns, Er� · , · ,0� , . . . ,Er� · , · , p−1�.

Step 2: the estimation of the watermark patterns. Afterobtaining the reference patterns, the process for the wa-termark patterns is started. Like the estimates of the ref-erence patterns, the watermark patterns Ew� · , · ,m�,where 0�m� p−1, are obtained by accumulating theestimated patterns for v seconds. The estimation is per-formed every frame by the adaptive Wiener filter, too.The estimating ROIs of the watermark patterns are thesame as those of the reference pattern.

Step 3: the determination of the watermark elements. Inorder to determine whether the estimated watermark pat-tern represents 1 or −1, the normalized correlation be-

. �The number of patterns per frame is 4.�
cedure


A

Twi


O

tween Er� · , · ,m� and Ew� · , · ,m� is calculated as fol-lows:

NCt=

�i�=0

H−1

�j�=0

L−1

Er�i�, j�,m� · Ew�i�, j�,m�

�i�=0

H−1

�j�=0

L−1

�Er�i�, j�,m��21/2

�i�=0

H−1

�j�=0

L−1

�Ew�i�, j�,m��21/2

, �6�

where L and H denote the width and height of each ROIsquare, respectively. Also, t is the index of the water-mark element, and t= ��k mod�ux�+ lvx��−ux�� / �vx�� p+m, where l and x� are the number of inserted wa-termark elements and the frame rate of the video to bedetected, respectively. If the normalized correlationvalue is larger or smaller than a preset threshold, thewatermark element is determined as follows:

w��t� = − 1 if NCt � − �1,

1 if NCt �1,

0 otherwise, �7�

where w�� · � is the determined watermark element and �1

is the preset threshold depending on the false-positiveerror rate. This step outputs p watermark elements.

Step 4: Enhancing the accuracy of the reference pattern.According to the determined watermark elements w�� · �,the accuracy of the reference patterns to be correlatedwith the watermark patterns of the next watermark pat-tern interval can be improved by accumulating, aftercontrolling the sign of the Ew� · , · ,m�, by followingequation:

Enextr �i�, j�,m�

= Er�i�, j�,m� + Ew�i�, j�,m� if w��t� = 1,

Er�i�, j�,m� + �− 1� · Ew�i�, j�,m� if w��t� = − 1,

Er�i�, j�,m� if w��t� = 0,

�8�where 0�m� p−1. Since the watermark patterns aredesigned having positive or negative correlation with thereference patterns, this step works efficiently.

fter this step, steps 2 to 4 are performed again for the nextp watermark elements in the next frames for v seconds.

hese operations are repeated until one obtains the entireatermark sequence W�, which is the result of concatenat-

ng the determined watermark elements.

Step 5: the verification of the watermark sequence.When we obtain the entire watermark sequence W�, thedetector finally makes a decision whether watermark in-formation exists or not. The final decision is performedby normalized cross correlation between the original wa-termark sequence W and the extracted watermark se-



quence W�. This process is carried out with reduced timecomplexity by

NCC =IFFT�FFT�W� · FFT�W��*�

�W� · �W��, �9�

where * denotes complex conjugation. If the maximumvalue of the normalized cross correlation obtained ex-ceeds an adaptive threshold �2, we conclude that water-mark information has been inserted into the video. Theanalysis of the error probability with respect to thethresholds �1 and �2 is described in Sec. 4.

3.3.2 Two-pass detectorIf a watermarked video is partially clipped, the one-passdetector may be confused and fail to detect, because thetemporally inconsistent sync for detection intervals leads toinaccurate estimation of the reference pattern. The two-passdetector solves this problem. Figure 7 describes its algo-rithm.

First pass. In the first pass, the patterns estimated fromall frames of a clipped video are accumulated separatelyin each detecting ROI in order to obtain the p referencepatterns. Since the inserted watermark sequence, com-posed of 1’s and −1’s, is based on a key-dependentpseudorandom-number generator following a standardnormal distribution, the proportions of the n and −n pat-terns embedded in the watermark pattern intervals aresimilar, and consequently the summation of the patternsestimated in each ROI of those intervals is nearly zero.In other words, the patterns estimated from watermarkpattern intervals offset each other, and only the patternsestimated from reference pattern interval remain. Thus,when the number of patterns per frame is p, we canobtain p reference patterns by simply accumulating all

Fig. 7 Two-pass watermark detection procedure. �The number ofpatterns per frame is 4.�



ptdohTtdtu

4Spotpoa


O

patterns estimated from each ROI in all the frames of theclipped video. The process for the reference pattern ofthe m’th ROI can be mathematically represented by

�10�

where n is the number of all the frames in the clippedvideo, and it is divided into a and b, which are thenumbers of frames in the reference and watermark pat-tern intervals, respectively. The functions Sr and Sw re-spectively mean the patterns estimated from the refer-ence and watermark pattern intervals by the adaptiveWiener filter. Also, b is divided into b1 and b2, which arerespectively the numbers of the positive pattern and thenegative pattern in the watermark pattern intervals. Thefirst pass produces p reference patterns�Er� · , · ,0� , . . . ,Er� · , · , p−1��.

Second pass. The second pass is performed again at thestarting position of the video clip. Since the referencepatterns are obtained in the first pass, the determinationof the watermark elements is repetitively performed inthis pass. The process is the same as in step 3 of theone-pass detector. The only difference is that the deter-mined elements may include the results �consecutive1’s� obtained from the reference interval. After the de-termination, step 4 of the one-pass detector is also car-ried out in order to enhance the accuracy of the refer-ence patterns. When the entire watermark sequence W�is obtained from the clipped video, the verification of thewatermark sequence is performed, based on cross corre-lation, as in step 5 of the one-pass detector. If the clippedvideo contains several watermarking sections, the resultof the cross correlation is several periodic peaks.

In the two-pass detector, since the estimated referenceatterns in the first pass are always self-synchronized withhe estimated watermark patterns in the second pass, theetector is robust against geometrical distortions, like thene-pass detector. Real-time use of the two-pass detector,owever, is impossible because it scans the video twice.hus, in a practical detection scenario, the one-pass detec-

or is preferentially exploited in real time. If the one-passetector fails to detect due to the temporal sync problem,he two-pass detector is utilized to compensate for the fail-re.

Error Probability Analysisince the proposed detector is correlation-based, the errorrobability of detection is dependent on the selected thresh-ld. In order to minimize the watermark detection error, thehreshold should be chosen cautiously. There are false-ositive error and false-negative error. False-positive errorccurs when a watermark detector indicates the presence ofwatermark in videos without a watermark, and false-



negative error occurs when the detector fails to retrieve theinserted watermark from a watermarked video. It is com-mon to select the threshold on the basis of false-positiveerror, because the analysis of false-negative error is verydifficult owing to the variety of attacks on watermarkedvideos.

In the proposed method, we exploit two correlation op-erations: the normalized correlation for the watermark ele-ments and the normalized cross correlation for the finalwatermark sequence. First, in order to compute the errorprobability for the determination of a watermark element,an approximate Gaussian method22 is employed. Thisscheme assumes that the distribution of the normalized cor-relation values obtained from unwatermarked videos fol-lows a Gaussian model. In order to estimate the distributionof normalized correlation results by Eq. �6�, we tested ourdetector on unwatermarked videos. The result is plotted inFig. 8�a�, and is closely similar to that with a Gaussiandistribution model. And since the proposed detector decideswhether the embedded watermark element is 1 or −1, thedetector has two opportunities to obtain a false positive.Thus, we can approximate the error probability as

Pfp1 � �−

−�nc N 1 2�

exp�−x2

2�dx

+ ��nc N

1 2�

exp�−x2

2�dx

= 2��nc N

1 2�

exp�−x2

2�dx

= ��nc N/ 2

2 �

exp�− x2�dx = erfc� �nc N

2� , �11�

where �nc is the normalized correlation threshold and N isthe 1-D size of the watermark pattern. When N is 4096 and�nc is 0.089, the false-positive error probability of the wa-termark element detector is about 10−8.

The second correlation is performed for the verificationof the watermark sequence by Eq. �9�. Since the normalizedcross correlation is the set of the inner products of tworandom variable vectors with circular shifts, the result has aGaussian distribution model by the central limit theorem.Figure 8�b� is the distribution of the normalized cross cor-relation between a reference sequence and a watermark se-quence estimated from an unwatermarked video. The figureshows that the result follows the Gaussian distribution. Un-like the first normalized correlation, whose threshold isfixed, the threshold of the second case is obtained adap-tively from the output of the normalized cross correlation.By the Gaussian distribution model, the false-positive errorprobability can be approximately calculated by



wttp

5TmwvrsswbwocfswprRwvwC


O

Pfp2 � ��ncc=�ncc+�ncc�ncc

1 2��ncc

exp�− �x − �ncc�2

2�ncc2 �dx

= ��ncc

1 2�

exp�− x2

2�dx

=1

2�

�ncc/ 2

2 �

exp�− x2�dx =1

2erfc��ncc

2� , �12�

here �ncc and �ncc are the mean and the standard devia-ion of the normalized cross correlation results, respec-ively. When �ncc is set to 5.61, the false-positive errorrobability of the watermark detection is about 10−8.

Experimental Resultso evaluate the fidelity, robustness, and real-time perfor-ance of the proposed method, several experimental testsere performed in four full-HD-resolution �1920�1080�ideos, which are 5 min in length and 30 frames /s in frameate, from various genres as shown in Fig. 9. Lee et al.’scheme,13 Leest et al.’s scheme,12 and Coria et al.’scheme14 were chosen for performance comparison. Theatermark sequence having 1024 elements was embeddedy those methods. In our method, the intervals u and vere 40 and 2 s, respectively. The embedding ROI size forne pattern was 256�256, and consequently the size of theombined ROIs was 1024�1024, because 16 patterns perrame �p=16� were embedded as shown in Fig. 4�c�. Theize of the detecting ROI was 64�64 per pattern. Oneatermarking section lasted 40+ �1024 /16��2=168 s. Thearameters S0 and S1 in Eq. �1� were set to 6.5 and 1.0,espectively. In Lee et al.’s scheme, the entire embeddingOI size was the same as in our method, because the basicatermark pattern of size 128�128 was tiled eight timesertically and horizontally. Also, we set the interval of aatermark element to 5 frames in Leest et al.’s scheme. Inoria et al.’s scheme, the ROI size was the same as in our

Fig. 8 Normalized correlation distribution of unond correlation result.



method, and the 128�128 watermark pattern was insertedafter its one-level DT CWT. The patterns were renewedevery 2 s.

5.1 Fidelity TestWe used the average peak signal-to-noise ratio �PSNR� andthe average structural similarity �SSIM�23 as objective mea-sures. After the insertion of a watermark sequence, the av-erage PSNR and SSIM values in the embedding region are44.7 dB and 0.9987 for the drama, 44.3 dB and 0.9942 forthe music show, 43.8 dB and 0.9931 for the sports, and45.2 dB and 0.9955 for the action movie. In addition to theobjective analysis, we carried out a subjective quality testby using the method described in Lubin et al.’s paper.24

Five expert observers, who are familiar with our schemeand able to detect visual artifacts caused by embedding awatermark, participated in this experiment, in which eachtrial consisted of two presentations of the same video, once

videos: �a� first correlation result, and �b� sec-

Fig. 9 Test videos: �a� drama, �b� music show, �c� sports, and �d�action movie.

marked



wpbdtwcm

5SviefCtobratoApm

5Wowwowbrmtwc


O

ith and once without the watermark. The videos were dis-layed on Samsung PAVV 650 LCD TV in a room of 20-lxrightness. The subjects observed the projected videos at aistance of three times the screen height. After four trials,he observers reported which of the two videos was theatermarked one. In this experiment, no observer could

ertainly distinguish between the marked one and the un-arked one.

.2 Real-Time Performance Testince the proposed method mainly targets high-definitionideos, having a real-time process is very important. Wemplemented the proposed method using Intel multimediaxtension �MMX� technology and an Intel integrated per-ormance primitives �IPP� library, and tested it on an Intelore-2 quad CPU �2.5 GHz, 4-Gbyte RAM�. In general,

hree subfunctions for decoding video streams, embeddingr detecting watermarks, and displaying the video shoulde processed within 0.03 s/frame at the 30 frame /s videoate for real-time watermarking system.25 Table 1 shows theverage processing-time results for each frame of full-HDest videos. These results prove that the embedding andne-pass detecting processes have real-time performance.lthough it is impossible for the two-pass detector to berocessed in real time because of the double scan of thearked video, each pass of it is fast enough.

.3 Robustness Teste compared the robustness of our method with that of the

ther three schemes.12–14 For fairness of the robustness test,e precisely correlated the PSNR results of those schemesith the PSNR results of our method in Sec. 5.1. The res-lution, frame rate, and format of the watermarked videosere full HD �1920�1080�, 30 frames /s, and MPEG-2 atit rates between 16.3 and 17.2 Mbit /s. We focused on theobustness of our watermarking system against mixed geo-etrical distortions as well as single geometrical distor-

ions. Also, since these distortions could happen in practiceith additional DA and AD conversion attacks by cam-

order capture, we performed the robustness test on videos

Table 1 Result of com

ptical Engineering 097006-1


geometrically transformed by tripod-camcorder capture aswell as by the artificial manipulations. In addition, we dealtwith attacks, such as frame-rate conversion and transcodingto XviD MPEG-4 formats with average bit rates of7.0 Mbit /s, that video contents commonly undergo. Thepeak value of the normalized cross-correlation results wasintroduced as the criterion for the performance evaluation.

5.3.1 One-pass detector test

The bar graphs in Fig. 10 show the experimental results onvarious attacks in each test video. The x axis of the graphsrepresents the kind of attacks. “No attack” means that nomanipulation is applied after watermarking. The other at-tacks were conducted on the “No attack” video. “Mixedgeometrical attack 1” is the mixture of cropping to 4:3 ra-tio, horizontal 6-deg projection, and scaling to 640�480.“Mixed geometrical attack 2” is the mixture of translationby 40 pixels in the right and bottom directions, rotation by7 deg, and scaling to 640�480. Still cuts of the camcorder-capture-attacked videos are shown in Fig. 11. The videosunderwent DA and AD conversion attack as well as nonlin-ear geometrical distortions. The y axis represents the maxi-mum peak of the normalized cross-correlation between theinserted watermark sequence and the extracted watermarksequence. The black tag of each bar is the threshold forfp=10−8. This allows us to be confident that the specifiederror rate is not exceeded.

The results in Fig. 10 explain why Lee et al.’s scheme isrobust to single attacks but succumbs to compound attacks,including mixed geometrical attacks and camcorder captureattacks. The reason for this failure is that the estimation ofgeometrical distortion by the LACF becomes difficult be-cause those attacks break the periodicity of the repetitivelytiling watermark pattern. Also, the method shows low cor-relation results in the sports video even for single geometri-cal attacks. The Wiener filter used for denoising has rela-tively low performance in the severely textured regionssuch as the lawn, so that the efficiency of the LACF de-grades greatly.

ional complexity test.
putat



O

Fig. 10 The cross-correlation results for the one-pass detector against various attacks: �a� drama, �b�music show, �c� sports, and �d� action movie.

ptical Engineering September 2010/Vol. 49�9�097006-11

Downloaded from SPIE Digital Library on 28 Oct 2010 to 143.248.135.186. Terms of Use: http://spiedl.org/terms

lrisLwAiasrtaen

eClc

octisac

Fcc


O

The correlation results of Leest et al.’s scheme are allower than those of the proposed method. The method iselatively vulnerable to videos that have frequent change ofllumination, such as the music show, or that have manycene changes, such as the action movie. In other words,eest et al.’s scheme reveals its drawbacks in the videoshose mean luminance changes frequently and greatly.lso, the method is not able to detect the existence of the

nserted watermark in the case of frame-rate conversionttack, because its detector misapprehends the temporal po-ition of each watermark element owing to periodic frameeduction by the attack. In order to detect the inserted wa-ermark, the detector should restore the frame rate of thettacked video to the original one. However, that requiresxtra work, and the detector may not even know the origi-al frame rate.

Coria et al.’s scheme fails in most geometrical attacks,xcept in part for the case of rotation by 7 deg. The DTWT domain is invariant to mild geometrical attacks and

ossy compression, but very susceptible to severe geometri-al distortions, including camcorder capture attacks.

The normalized cross-correlation results of the proposedne-pass detector greatly exceeded the threshold in allases of attacks. It showed especially better performancehan the other methods in the cases of geometrical attacks,ncluding camcorder capture attacks. Our method displayslightly worse results than Lee et al.’s scheme, which usesn original reference pattern, in some nongeometricalases, because the reference pattern estimated by our

ig. 11 Still cuts of camcorder-captured videos: �a� the case ofamcorder capture attack 1 �drama�, and �b� the case of camcorderapture attack 2 �music show�.

Fig. 12 The cross-correlation results for



method is not exactly the same as the original referencesignal; but it has no problem identifying the inserted water-mark sequence. In the proposed method, the results formixed geometrical attack and camcorder capture attack areworse than those for the other attacks, because some detec-tion ROIs may be out of the inserted pattern region in thecase of such severe geometrical attacks. We can solve thisproblem by enlarging the embedding region for patterns.From the experimental results, it is proved that the pro-posed method is indeed robust against various attacks, irre-spective of the genre of video contents, and is superior toother methods with regard to the identification of the in-serted watermark sequence.

5.3.2 Two-pass detector testThe one-pass detector has the problem that it may fail todetect a watermark because of temporal desynchronizationin the case of a clipped video where part of the presentreference-pattern interval is removed. The two-pass detec-tor is employed to solve this problem as mentioned in Sec.3.3.2. To prove the performance of the two-pass detector,clipped versions of the preceding attacked videos weretested. They were each clipped by 40 s at the beginning andat the end, and consequently had 3-min 40-s running time.Figure 12 shows the experimental results using the two-pass detector. In all cases, the watermark is stably extractedwith high correlation value, although the values are lowerthan for the one-pass detector because of relatively imper-fect estimation of the reference patterns.

We tested our method on a clipped video that has severalwatermark sections. If the clipped video is processed by thetwo-pass detector, periodic peaks appear. We embedded awatermark into a 50-min video and captured an arbitrarypart of the video for 15 min with a tripod-mounted cam-corder. Then the captured video was processed by the two-pass detector. The result is shown in Fig. 13. This figureclearly shows periodic peaks, which are high enough toidentify the embedded watermark sequences, and the dis-tance between the peaks is the same as the total number ofthe elements calculated in the reference interval and a wa-termark sequence interval of one watermarking section.

o-pass detector against various attacks.
the tw


Tt

6Isacpafmwstsdbprsv

ATa�o

R


O

he seven peaks in the result show that the partially cap-ured video includes seven watermarking sections.

Conclusionn this paper, a robust high-definition video watermarkingcheme has been presented. We focused on robustnessgainst mixed geometrical attacks by a tripod-mountedamcorder or editing tools as well as common signal-rocessing attacks. The watermarking method was based on

spatially self-synchronizing reference pattern obtainedrom the watermarked video. The self-synchronizing signalade it possible for the detector to identify the insertedatermark elements without an additional synchronization

tep. Also, embedding the positive watermark pattern andhe negative watermark pattern in similar ratio made it pos-ible to exploit a two-pass detector. The presented two-passetector solved the problem of temporal asynchrony causedy clipping a watermarked video. Experimental resultsroved that the presented method is robust against tempo-ally and spatially compounded transformation attacks andhows satisfactory performance irrespective of the genre ofideo contents.

cknowledgementshis research is supported by Ministry of Culture, Sportsnd Tourism �MCST� and Korea Creative Content AgencyKOCCA� in Culture Technology �CT� Research & Devel-pment Program 2009.

eferences

1. J. J. K. Ó Ruanaidh and T. Pun, “Rotation, scale and translationinvariant spread spectrum digital image watermarking,” Signal Pro-cess. 66, 303–317 �1998�.

2. C. Y. Lin, M. Wu, J. A. Bloom, I. J. Cox, M. L. Miller, and Y. M. Lui,“Rotation, scale, and translation resilient watermarking for images,”IEEE Trans. Image Process. 10, 767–782 �2001�.

3. S. Pereira and T. Pun, “Robust template matching for affine resistantimage watermarks,” IEEE Trans. Image Process. 9, 1123–1129�2000�.

4. D. Delannay and B. Macq, “A method for hiding synchronizationmarks in scale and rotation resilient watermarking schemes,” Proc.SPIE 4675, 548–554 �2002�.

Fig. 13 The result of two-pass detection on a v



5. M. Kutter, “Watermarking resisting to translation, rotation, and scal-ing,” Proc. SPIE 3528, 423–431 �1998�.

6. D. Delanay and B. Macq, “Generalized 2-d cyclic patterns for secretwatermark generation,” in Proc. IEEE Conf. on Image Processing,Vol. 2, pp. 77–80 �2000�.

7. S. Voloshynovskiy, F. Deguillaume, and T. Pun, “Multibit digital wa-termarking robust against local nonlinear geometrical distortions,” inProc. IEEE Conf. on Image Processing, Vol. 3, pp. 999–1002 �2001�.

8. P. Bas, J.-M. Chassery, and B. Macq, “Geometrically invariant wa-termarking using feature points,” IEEE Trans. Image Process. 11,1014–1028 �2002�.

9. C. W. Tang and H. M. Hang, “A feature-based robust digital imagewatermarking scheme,” IEEE Trans. Signal Process. 51, 950–958�2003�.

10. M. Alghoniemy and A. H. Tewfik, “Geometric distortions correctionin image watermarking,” in Proc. SPIE 3971, 82–89 �2000�.

11. M. Y. Wu, J. H. Lee, and Y. K. Ho, “Object watermarking schemebased on resynchronization and shape subdivision,” Opt. Eng. 47,077003 �2007�.

12. A. van Leest, J. Haitsma, and T. Kalker, “On digital cinema andwatermarking,” Proc. SPIE 5020, 526–535 �2003�.

13. M. J. Lee, K. S. Kim, T. W. Oh, H. Y. Lee, and H. K. Lee, “Improvedwatermark synchronization based on local autocorrelation function,”J. Electron. Imaging 18, 023008–� �2009�.

14. L. E. Coria, M. R. Pickering, P. Nasiopoulos, and R. K. Ward, “Avideo watermarking scheme based on the dual-tree complex wavelettransform,” IEEE Trans. Inf. Forensics Security 3, 466–474 �2008�.

15. I. J. Cox, J. Kilian, T. Leighton, and T. Shamoon, “Secure spreadspectrum watermarking for multimedia,” IEEE Trans. Image Process.6, 1673–1687 �1997�.

16. F. Hartung, J. K. Su, and B. Girod, “Spread spectrum watermarking:malicious attacks and counterattacks,” Proc. SPIE 3657, 147–158�1999�.

17. J. Haitsma, T. Kalker, G. Depovere, and M. Maes, “A video water-marking system for broadcast monitoring,” in Proc. SPIE 3657, 103–112 �1999�.

18. T. Yamada, S. Tezuka, I. Echizen, Y. Fujii, and H. Yoshiura, “Use ofstatistically adaptive accumulation to improve video watermark de-tection,” Trans. Inf. Process. Soc. Jpn. 47, 2440–2453 �2006�.

19. I. G. Karybali and K. Berberidis, “Efficient spatial image watermark-ing via new perceptual masking and blind detection schemes,” IEEETrans. Inf. Forensics Security 1, 256–274 �2006�.

20. S. Voloshynovskiy, A. Herrigel, N. Baumgaertner, and T. Pun, “Astochastic approach to content adaptive digital image watermarking,”in Proc. 3rd Int. Workshop on Information Hiding, pp. 211–236,ACM �1999�.

21. J. Lim, Two-Dimensional Signal and Image Processing, Prentice-Hall�1990�.

22. M. L. Miller and J. A. Bloom, “Computing the probability of falsewatermark detection,” in Proc. 3rd Int. Workshop on InformationHiding, pp. 146–158, ACM �1999�.

23. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: from error visibility to structural similarity,”IEEE Trans. Image Process. 13, 600–612 �2004�.

artly captured by a tripod-mounted camcorder.
ideo p


http://dx.doi.org/10.1016/S0165-1684(98)00012-7

http://dx.doi.org/10.1016/S0165-1684(98)00012-7

http://dx.doi.org/10.1109/83.918569

http://dx.doi.org/10.1109/83.846253

http://dx.doi.org/10.1117/12.465314

http://dx.doi.org/10.1117/12.465314

http://dx.doi.org/10.1109/TIP.2002.801587

http://dx.doi.org/10.1109/TSP.2003.809367

http://dx.doi.org/10.1117/12.385011

http://dx.doi.org/10.1117/1.2751167

http://dx.doi.org/10.1117/12.476856

http://dx.doi.org/10.1117/1.3134121

http://dx.doi.org/10.1109/TIFS.2008.927421

http://dx.doi.org/10.1109/83.650120

http://dx.doi.org/10.1117/12.344665

http://dx.doi.org/10.1117/12.344661



http://dx.doi.org/10.1109/TIP.2003.819861

2

2

t

wmn


O

4. J. Lubin, J. Bloom, and H. Cheng, “Robust, content-dependent, high-fidelity watermark for tracking in digital cinema,” Proc. SPIE 5020,536–545 �2003�.

5. I. K. Kang, D. H. Im, Y. H. Suh, and H. K. Lee, “Real-time water-mark embedding for high resolution video watermarking,” in Proc.Int. Workshop on Security, pp. 227–238 �2006�.

Tae-Woo Oh received his BS degree incomputer engineering from Ajou University,Korea, in 2007, and his MS degree in com-puter science from Korea Advanced Insti-tute of Science and Technology �KAIST� in2009. He is currently working toward hisPhD degree in the Multimedia ComputingLaboratory, Department of Electrical Engi-neering and Computer Science, KAIST. Hisresearch interests include image and videowatermarking, video processing, and multi-media forensics.

Min-Jeong Lee received a BS degree incomputer engineering from Kyungpook Na-tional University, Korea, in 2006, and an MSdegree in computer science from Korea Ad-vanced Institute of Science and Technology�KAIST� in 2008. She is currently pursuing aPhD degree in the Multimedia ComputingLaboratory, Department of Electrical Engi-neering and Computer Science, KAIST. Herresearch interests are focused on imageand video watermarking, with particular at-

ention to multimedia forensics, and on information security.

Kyung-Su Kim received his BS degree incomputer engineering from Inha University,Incheon, Republic of Korea, in 2005, andhis MS and PhD degrees, both in computerscience, from Korea Advanced Institute ofScience and Technology �KAIST�, Daejeon,Republic of Korea, in 2007 and 2010, re-spectively. He is now with the Network Se-curity Research Team, KT Network R&DLab., Daejeon, Republic of Korea. His re-search interests include image and video

atermarking and fingerprinting, error concealment methods, infor-ation security, multimedia signal processing, multimedia commu-ications, and network security.



Heung-Kyu Lee received a BS degree inelectronic engineering from Seoul NationalUniversity, Seoul, Republic of Korea, in1978, and MS and PhD degrees in com-puter science from Korea Advanced Insti-tute of Science and Technology �KAIST�,Republic of Korea, in 1981 and 1984, re-spectively. Since 1986, he has been a pro-fessor in the Department of Computer Sci-ence, KAIST. His major interests are digitalwatermarking, digital fingerprinting, anddigital rights management.

Hae-Yeoun Lee received his MS and PhDdegrees in computer science from KoreaAdvanced Institute of Science and Technol-ogy �KAIST�, Republic of Korea, in 1997and 2006, respectively. From 2006 to 2007,he was a postdoctoral researcher at WeillMedical College, Cornell University, USA.He is now with Kumoh National Institute ofTechnology, Republic of Korea. His majorinterests are digital watermarking, imageprocessing, remote sensing, and digitalrights management.



http://dx.doi.org/10.1117/12.477336

Robust high-deﬁnition video watermarking based on self ...hklee.kaist.ac.kr/publications/OE(with...

Documents

Transcript of Robust high-deﬁnition video watermarking based on self ...hklee.kaist.ac.kr/publications/OE(with...