Digital Signal Processing · 2020. 9. 10. · signal of interest, N. S ∈R, then these sorted...

Digital Signal Processing 48 (2016) 188–200

Contents lists available at ScienceDirect

Digital Signal Processing

www.elsevier.com/locate/dsp

Hiding data in compressive sensed measurements: A conditionally

reversible data hiding scheme for compressively sensed measurements

Mehmet Yamaç a,∗, Çagatay Dikici b,a, Bülent Sankur a

a Bogaziçi University, Electrical and Electronics Engineering, Bebek 34342, Istanbul, Turkeyb Imagination Technologies, WD4 8LZ London, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history:Available online 28 September 2015

Keywords:Compressive sensingInformation hidingSteganographyData hiding

Most of the real-world signals we encounter in real-life applications have low information content. In other words, these signals can be well approximated by sparse signals in a proper basis. Compressive sensing framework uses this fact and attempts to represent signals by using far fewer measurements as compared to conventional acquisition systems. While the CS acquisition is linear, the reconstruction of the signal from its sparse samples is nonlinear and complex. The sparse nature of the signal allows enough room for some additional data sequence to be inserted and exactly recovered along with the reconstructed signal. In this study, we propose to linearly embed and hide data in compressively sensed signals and nonlinearly reconstruct both of them using a deflationary approach. We investigate the embedding capacity as a function of signal sparsity and signal compression, as well as the noise sensitivity of the proposed algorithm.

© 2015 Elsevier Inc. All rights reserved.

1. Introduction

Nyquist–Shannon sampling theorem states that a continuous-time bandlimited signal must be represented in terms of N uni-formly spaced samples taken at least two times faster than the signal bandwidth in order to be exactly reconstructed. However, a great majority of the signals that we encounter in practical ap-plications exhibit a rapid decay when expressed in an appropri-ate basis. In fact, this is the idea that has given birth to most of the lossy compression techniques such as JPEG [1], JPEG 2000 [2], etc. Many transform coding-based compression techniques like JPEG keep only large coefficients which constitute most of the sig-nal energy, and discard small ones. Although these techniques are widespread and standard in applications, one may need to look beyond the Nyquist–Shannon scheme in niche applications.

Compressive Sensing (CS) has attracted considerable attention since its first introduction by Donoho [3] and Candes et al. [4]. This new paradigm, in contrast to conventional data acquisition sys-tems, attempts to sense signals by using far fewer measurements than the conventional methods. Roughly speaking, CS tries to com-bine acquisition and compression processes into one step. This is a sensing strategy that enables significantly lower data rate and computation cost in the sensing part. On the other hand, signal re-

* Corresponding author.E-mail addresses: [email protected] (M. Yamaç),

[email protected] (Ç. Dikici), [email protected] (B. Sankur).

http://dx.doi.org/10.1016/j.dsp.2015.09.0171051-2004/© 2015 Elsevier Inc. All rights reserved.

covery in compressive sensing framework is generally achieved by non-linear reconstruction methods that are relatively costly. There-fore, the computational complexity is shifted from encoder to the decoder site, which is especially convenient whenever economies in energy and computation effort are needed at the acquisition site. For instance, CS framework has received attention in Wire-less Sensor Networks which requires a low-cost data acquisition system. Recently, Mamaghain et al. [5] have proposed a wireless body sensor network (WBSN) system, where sensors sample ECG signals using compressive sensing, then they transmit these mea-surements to a remote monitoring center over a wireless chan-nel. Xiang et al. [6] propose a compressive sensing video scheme over wireless channel. Their system is equipped with a single-pixel camera which is an extreme example of compressive sensing built by Takhar et al. [7].

There are cases where one wants to embed metadata on com-pressively sensed measurements. For example in a WBSN appli-cation, embedding of patient’s information may be required. At-taching patient metadata to biosensory measurements using a data hiding scheme would enhance the application. Recall that data hid-ing is the set of techniques to embed information in a media signal imperceptibly and it is often used for copyright protection, data indexing, image captioning or even for hidden communication in military applications [8–10].

The problem of data hiding in compressive sensing framework has been addressed in [11–15]. Sheikh and Baraniuk [11] have studied the data embedding problem in transform domain (e.g.

http://dx.doi.org/10.1016/j.dsp.2015.09.017

http://www.ScienceDirect.com/

http://www.elsevier.com/locate/dsp

mailto:[email protected]



http://dx.doi.org/10.1016/j.dsp.2015.09.017

http://crossmark.crossref.org/dialog/?doi=10.1016/j.dsp.2015.09.017&domain=pdf

M. Yamaç et al. / Digital Signal Processing 48 (2016) 188–200 189

DCT for images) by assuming sparsity of the host signal, that is, image. In the data embedding part, they first obtain a sparse repre-sentation of the cover signal of interest based on some convenient transform and hard thresholding these coefficients. Then, they em-bed the hidden data by spreading it out onto the sparse coeffi-cients. The marked image is then obtained by using the inverse DCT transform before transmission/storage. In the decoding part, they jointly decode both sparse DCT coefficients and embedded data using �1 minimization and linear decoding [16]. Using simi-lar logic, Zhang et al. [12] have exploited compressive sensing in a content reconstruction problem. Recently, C. Delpha et al. [13]have proposed an informed data hiding scheme where some data, known itself to be sparse, is hidden additively on the non-zero co-efficients of the sparse representation of the signal. The rest of the procedure is similar to that in [11] except that they use Costa’s quantization based data hiding scheme [17] to obtain the sparse data to be secretly embedded. If one has already a sparse repre-sentation of the signal of interest, these approaches are convenient strategies before storage or transmission.

Apart from these works, Valenzise et al. [14] have proposed a CS based algorithm that identifies and localizes forgeries. Patsakis et al. [15] have used compressive sensing to detect the existence of stego content in images.

In this study we introduce a data hiding scheme that embeds an additional information directly onto the CS measurements. This enables data hiding while sensing. Our proposed scheme differs from those in [11–13] in two important aspects. First, data hiding is realized during compressive sensing. More explicitly, we do not use compressive sensing for data hiding, but propose a data hid-ing method for CS measurements, that is, a scheme where hidden data is carried only by the CS measurements. Second, the hidden data co-exists with the cover signal only in the compressed form, so that, when the compressively sensed cover data is recovered, the hidden data is not only recovered, but is also simultaneously removed from it.

Finally, motivated by the need for real-time implementation, we develop further our method into an embedding scheme that achieves fast joint signal reconstruction and embedded data recov-ery. A preliminary version of this work was presented at EUSIPCO [18]. This version addressed only, as a proof of concept, the small signal case, and did not elaborate on the theoretical limits for ex-act recovery conditions, as given in Lemma 2, Theorem 3.

2. Compressive sensing basics

In this section, we provide a brief overview of the CS framework and point out to theorems relevant to the data hiding problem. Recent theoretical results related to the stability of reconstruction methods are also discussed in view of the necessity to recover the embedded data exactly and the document, i.e., the carrier message within a tolerable error bound.

Let S ∈ RN be our signal of interest and � be a basis so that

S has a unique representation in that basis such that S = �x. We also assume that the elements of the combiner vector x are ar-ranged in descending order of magnitude i.e., |x|(1) ≥ |x|(2) ≥ . . . ≥|x|(N) . If this basis � is properly chosen according to the class of signal of interest, S ∈ R

N , then these sorted magnitudes terminate at the kth term for k-sparse signals, and decay to zero rapidly, of-ten according to a power law if the signal S is compressible [19]. In other words, if N − k of the coefficients of x are negligibly small, then the signal x is denoted as compressible or approxi-mately k-sparse (typically N >> k) with respect to the sparsifying basis � . A signal x is also termed as strictly k-sparse if it has at most k non-zero coefficients, i.e., ‖x‖�N

0≤ k, where ‖.‖�N

prepresents

the �p-norm over N terms. In this work we assume the signal of interest, S to possess a unique and strictly k-sparse representation

in an orthonormal basis � . A discussion on how to extend it to ap-proximately sparse signals takes places in the Discussion section.

In CS framework, the signal is linearly sensed by taking m << Nmeasurements,

y = �S = ��x = Ax, (1)

where � is an m × N measurement operator and A = �� . There-fore, we would like to reconstruct x, from the measurements, y. However, (1) is an underdetermined system of linear equations that has infinitely many solutions under the assumption that Ais full row rank. We need one or more constraints to achieve a unique solution. Under the assumption that x is k-sparse, one can choose the sparsest solution x from among infinite varieties of so-lutions x. The solution can be cast as

(P0) : minx

‖x‖�N0

subject to Ax = y. (2)

Donoho et al. [20] showed that a unique solution to P0 can be achieved if m ≥ 2k. However, �0-norm minimization requires com-binatorial search, and it is an NP hard problem. Although there are alternative solutions to overcome this hurdle [21], such as Greedy Algorithms [22,23], we will focus on convex relaxation, since de-spite some advances in recent years, the theoretical analysis of the conditions for a guaranteed solution in the greedy methods are still shaky [24]. Most of the greedy algorithms either do not have any theoretical guarantee or offer weaker theoretical bounds com-pared to convex optimization approaches [25]. P0 can be relaxed to P1 [26] as follows

(P1) : minx

‖x‖�N1

subject to Ax = y. (3)

It has been shown that the �1 minimization is exact in noise free and exactly sparse case if the measurement matrix, A, obeys the following Restricted Isometry Property of order 2k [27].

Definition 1 (Restricted Isometry Property). Let A be an m ×N matrix and let δk ∈ (0, 1) be the smallest quantity such that

(1 − δk)‖x‖2�N

2≤ ‖Ax‖2

�m2

≤ (1 + δk)‖x‖2�N

2(4)

for all k-sparse signals, x in RN , then the matrix A satisfies the Restricted Isometry Property (RIP) of order of k with the restricted isometry constant δk(A).

In real life conditions, we can expect the measurements, y, to be corrupted with random noise pattern, z, such that the re-ceived signal becomes yn = Ax + z. Thus, a reconstruction method is expected to be stable in the presence of noise. The stability of the solution implies that a small change in the measurement vector should not lead to substantial changes in the recovered signal. Mathematically speaking, for the perturbed measurements, yn = Ax + z, a stable solution x would yield∥∥x − x

∥∥�N

2≤ κ ‖z‖�m

2(5)

with a small positive constant κ , and where x is the reconstructed signal.

If the measurements are corrupted by some noise process with bounded energy such that ‖z‖�m

2≤ ε [28,29], then (P1) can be re-

laxed to

(P ε1) : min

x‖x‖�N

1s.t. ‖y − Ax‖�m

2≤ ε, (6)

under certain conditions discussed below.Restricted Isometry Property implies the stability of (P ε

1). Al-though we will focus on strictly sparse signals, RIP also implies stability when we deal with approximately sparse signals, i.e.,

190 M. Yamaç et al. / Digital Signal Processing 48 (2016) 188–200

Table 1List of variables used in this work.

Number of uncompressed signal samples NNumber of compressed signal samples m m < N; typically m << NDimensionality of a strictly sparse signal k k < m; e.g., m > 2kLength of the embedded data bit sequence M M < mNull space of the embedded data encoding matrix p = m − MRestricted isometry constant of order k of a measurement matrix A δk(A)

Additive white Gaussian noise variance σ 2g

Factor in the Chernoff bound to limit the uncertainty of the reconstruction error for a sampling matrix satisfying the RIP order and under AWGN

γ 0 < γ < 1

The variance of the embedded data; for binary data it is proportional to square of the embedded data strength, a, and the number of embedded data bits

Ma2, υ2

A constant depending upon δ2k(A) and δ2k(F A) C20 Lemma 2

A positive constant depending on ς and distribution of entries of matrix A C(ς)

A positive constant which defines the pre-estimation error bound C3

A positive constant depending on δ(A) C1 Theorem 2A positive constant depending on δ(A) C2 Theorem 2

compressible signals. The implication is that RIP requires all m × ksubmatrices of A to be nearly orthonormal. The most recent re-sults on the sufficient conditions to guarantee stable recovery of k-sparse signals is δ2k <

√2 − 1 [27]. However, efforts to improve

the bound for sufficiency condition [30] continue.The following stability condition for (P ε

1), which is a conse-quence of the RIP condition, will be useful in the sequel.

Theorem 1. (See [27].) Assume that an m × N matrix A holds RIP of order 2k with δ2k <

√2 − 1, and the observations are contaminated by

additive noise with total power ε . Suppose also that signal of interest x is strictly k-sparse. Then, the solution x of P ε

1 (bounded noise energy case) satisfies the inequality

∥∥x − x∥∥

�N2

≤ 4

√1 + δ2k(A)

1 − (1 + √2)δ2k(A)

ε. (7)

In the case the measurements y = Ax are corrupted by Additive White Gaussian Noise (AWGN) rather than a bounded noise model, then, Theorem 1 results in the following lemma, which is itself refined version of Corollary 1.1 of [24, p. 32].

Lemma 1. Suppose that an m × N measurement matrix A satisfies the RIP of order 2k with δ2k <

√2 − 1. Suppose also measurements are of

the form y = Ax + z where z is the noise pattern with i.i.d. elements zidrawn from ∼N (0, σ 2

G ). Then, with ε = (1 + γ )√

mσG , the solution xof (P ε

1) satisfies

∥∥x − x∥∥

�N2

≤ 4

√1 + δ2k(A)

1 − (1 + √2)δ2k(A)

(1 + γ )√

mσG (8)

with probability at least 1 − exp(− 3m4 γ 2) where 0 < γ < 1.

Proof. Let z1, . . . zm be i.i.d. Gaussian random variables with zero mean and variance σ 2

G . Then for every z = (z1, . . . , zM) ∈ Rm and let 0 < γ < 1, one can easily get

Pr(‖z‖�m2

≥ (1 + γ )√

mσG) ≤ exp(−3m

4γ 2) (9)

by using Chernoff inequality. Then, Lemma 1 can be obtained ap-plying Theorem 1. �3. Information hiding on compressively sensed measurements

We pursue direct embedding on the compressively sensed mea-surements in lieu of conventional but the devious way of first reconstructing a signal from its sparse samples, inserting the ad-ditional data and re-sampling it compressively. Direct data em-bedding on the compressive samples has the double advantage of

being practical and efficient as well as the secrecy inherent in com-pressive sampling through the sensing matrix A.

In the sequel we will investigate the questions of whether it is possible to embed additional information directly onto CS mea-surements and to reconstruct both the signal and its payload (em-bedded signal) without loss of information and if so, to find the embedding capacity and signal to noise ratio limits.

Variables used in this work are listed in Table 1.

3.1. Data embedding

Let w ∈ {+a, −a}M be a binary data sequence of length M . One of the simplest ways of embedding this data onto compressively sensed measurements, y = Ax, is linear encoding of this data and directly adding these encoded messages onto measurements

yw = Ax + B w (10)

where, yw is the marked cover and B is a full rank m × M (m ≥ M) encoding matrix with ‖B w‖�m

2≤ P E . We have added an embedding

power constraint ‖B w‖�m2

≤ P E , to remain within the quantization range of the signal, that is not having to increase the number of bits due extended signal range.

Considering also the additive noise effect during sampling and transmission/storage processes, the received noisy message at the decoder can be expressed as

yn = yw + z = Ax + B w + z, (11)

where z is an unknown noise pattern. The noise vector z may be either uniformly bounded noise such as quantization error [28], or Gaussian noise. Although we consider only these two types of noise models in this study, our method can be easily modified in the presence of impulsive noise where the sporadic noise bursts can have very large energy content. This type of noise attack was studied in our earlier work [18].

3.2. Data hiding scenario

The embedding scheme described in Eqs. (10) and (11) can be used for data hiding. It is understood that the A (or the mea-surement matrix �) and B matrices are privately available to the sender and receiver (not available to the malicious users). The method of generating the matrices A and B will have a role on the characteristics of the system. In the following sections, we discuss two different ways of obtaining A and B , namely, using random matrices, as in Section 3.4, and using a subset of orthonormal bases, as in Section 4.

In the sequel we discuss the security aspects of a data hiding scheme. For a malicious user, estimating the matrices A and B in


our joint sampling and embedding scheme is practically infeasible. The security property of CS measurements, y, where A is deter-mined via a secret key, is studied in [31–33]. Rachlin et al. [31]investigate possibility of reconstruction x using only y without knowing A. They argue that even if CS does not achieve Shannon’s perfect secrecy [34], reconstruction of x using only y is not pos-sible in polynomial time. Orsdemir et al. [32] discuss the privacy protection when y is exposed to an additive noise, i.e., yz = y + z. They show that it is computationally infeasible for an adversary to estimate the matrix A using brute force and structured attacks. Kun Liu et al. [35] use random projections to preserve the security in data mining application. Wenjun Lu et al. [36] propose a privacy preserving image retrieval systems using also random projections and show that this system is secure against Ciphertext Only and Known Plaintext Attacks. Our scheme enjoys similar security prop-erty as in all these models based compressive sampling via random measurements scenario. Recent studies [37,38] also show that pri-vacy can be preserved while using partial transforms, i.e., such that row of measurement matrix � are chosen from a subset of the bases of some transform, such as Hadamard, fractional Fourier transform, etc.

Furthermore, our embedding scheme possesses not only the se-curity provided by A, but also the security due to linear coding itself, i.e., B w . Since we have already discussed security provided by compressive sampling, we now discuss linear encoding part. Let y′ be the vector of encoded data to be embedded such as meta-data i.e., such that y′ = B w . Assume that an adversary wants to estimate the bit sequence w from y′ without knowing the en-coding matrix B . It is shown that strong privacy can be preserved by adding noise to elements of the vector y′ with sufficient mag-nitudes [39]. Dwork et al. [40] investigate the bound, ρ , on the fraction of errors that is needed on y′ to preserve strong privacy. Their results show that when the fraction of errors exceeds an up-per bound ρ∗, LP decoding will fail almost surely. They found this bound ρ∗ to be ≈0.239 when the elements of B are generated from the Gaussian distribution.

In our system, we are adding to compressively sensed mea-surements y, the encoded data yw = y + y′ . In the first scenario, an adversary may try to estimate the matrix A and the signal x, and then w . This scenario corresponds to the security preserving problem in CS encoding framework discussed above. Alternatively, one malicious user may attempt to directly estimate the entries of w . This adversary will fail to recover embedded signal w from yw when |{i : |yi | ≥ α}| ≥ m × ρ∗ [40], where α is a sufficiently large positive constant. In view of the above, our system is secure against this attack if

∥∥y′∥∥‖y‖ is small enough.

All in all, we can argue that it is computationally infeasible for an attacker to succeed to estimate embedded data or the signal xprovided the bounds, α and ρ∗, are satisfied.

Even if the attacker could not estimate the secret keys A and B , the attacker can still try to destroy the communication. For in-stance, marked cover signal, yw can be exposed to additive noise attacks such as AWGN or uniform noise attacks. On the other hand, the malicious user may completely destroy some of the measure-ments [16]. This kind of attack can be modeled as a sparse at-tack [16]. All these cases correspond to additive noise model given in (11).

3.3. Joint signal reconstruction and embedded data recovery

For the exact recovery of both the signal x and the embedded data w from noisy measurements yn , we have two requirements: (i) The estimation w of the embedded data must be exact. (ii) The variance of estimation x of the signal must not exceed the un-certainty level given in Lemma 1 (or Theorem 1 for bounded noise energy case). Thus, it is easy to see that if (i) is satisfied, then (ii) is

also satisfied using �1 minimization. The recovery of the signal and of the embedded data proceeds in two tiers: First, we estimate xdisregarding the component w . After having subtracted the esti-mate of the x component from yn , the estimation of the embedded data w can be achieved straightforwardly by solving for an overde-termined system of equations. One can do here a second step of deflation by removing the embedded signal from Eq. (11), in or-der to achieve an improved estimate of the signal x. This situation, in fact is analogous to reversible data hiding schemes [41–43]. Re-versible data hiding, as commonly used in the literature, means that the digital cover, e.g. as digital image is reconstructed exactly in the receiver exactly, that is, bit by bit, after the extraction of the embedded data. Our case is very similar in that the compressively sensed cover and the embedded data are both recovered under certain conditions. Therefore, our data hiding and reconstruction scheme is a conditionally reversible data hiding method since re-covery is conditioned upon payload and the compression ratio as it will be discussed later. The details of the algorithm are as follows:

We start by first constructing a p × m matrix F which is the left annihilator matrix of the embedded data encoding matrix, i.e., F B = 0. We apply the matrix F to the noisy measurement vector yn , and obtain

y = F yn = F (Ax + B w + z) = F Ax + F z, (12)

where p = m − M . Let also the annihilator matrix F have or-thogonal rows and satisfy the row norm conditions ‖Fi‖2

�m2

= mp

where Fi denote the rows of the matrix F for i = 1, . . . , p. We would like to mention that neither row orthogonality nor the row norm constraints are necessary, but they both provide mathemat-ical convenience, as clarified in sequel. At this stage one has an underdetermined system of equations with noise pattern F z. Then, pre-estimation (or first-tier estimation) x of x can be found via

x = arg minx

‖x‖�N1

s.t.∥∥ y − F Ax

∥∥�

p2

≤ ε. (13)

After finding the pre-estimation x of x, computing pre-estimation w of w is possible using least squares,

w = (BT B)−1 BT(yn − Ax). (14)

At this point, the final estimation w is obtained as

wi = a ∗ sgn(wi), (15)

and the final estimation x of x can then be achieved via

x = arg minx

‖x‖�N1

s.t.∥∥(yn − B w) − Ax

∥∥�m

2≤ ε. (16)

As discussed in Section 2, a satisfactory reconstruction algo-rithm is expected to be stable in the presence of noise. The struc-ture of the measurement matrix, A, and annihilator matrix, F , both affect the stability condition of reconstruction. Based solely on the upper bounds of the restricted isometry constants of the matrices A and F A and the additive noise variance, a bound on the estima-tion error of the signal can be set as follows.

Lemma 2. Let A be an m × N matrix satisfying RIP order of 2k with δ2k(A) <

√2 − 1, where δ2k(A) is the restricted isometry constant of

the matrix A. Let also F be a p × m matrix with orthogonal rows such that ‖Fi‖2

�m2

= mp where Fi denote the rows of the matrix F . Suppose

the matrix F A satisfies the RIP of order 2k with δ2k(F A) <√

2 − 1. The marked cover have the form yn = Ax + B w + z such that zi ∼N (0, σ 2

G ), ‖w‖�M

2= υ . Then the proposed algorithm recovers M bits embedded

data exactly with probability

1 − exp(−3p

4γ 2), (17)


provided that

M ≤ υ2

C20ε

2. (18)

Furthermore, it also approximates x with a bounded error

∥∥x − x∥∥

�N2

≤ 4

√1 + δ2k(A)

1 − (1 + √2)δ2k(A)

(1 + γ )√

mσG , (19)

where C0 is a positive constant depends on δ2k(A) and δ2k(F A), and ε = (1 + γ )

√mσG .

The proof of Lemma is in Appendix A.

3.4. Compressive sampling via random measurements

In Lemma 2 above, one can observe that the number of reli-ably embeddable bits depends on the structures of the matrices Aand F via δ2k(A) and δ2k(F A) restricted isometry constants, which in turn determine the denominator term C0.

In this section we first recall from the literature the RIP con-ditions of random matrices, and then prove that the F A matrix also satisfies the RIP condition, thus guaranteeing the conditions of Lemma 1.

Consider the random measurement process, effected through random matrices which yield near optimal embedding [44]. A sim-ple proof of the fact that random matrices satisfy RIP of order kwith high probability was given by Baraniuk et al. [45], based on the well-known Johnson–Lindenstrauss Lemma [46]. Given a set D of points in RN , Johnson–Lindenstrauss Lemma states that one can find a map that embeds these points into a lower-dimensional Eu-clidean space Rm while preserving the relative distance between any two of these points. The key idea used in [45] is the concen-tration measure inequality for random matrices.

Corollary 1. (See [45].) Given any arbitrary fixed vector x ∈RN . One can

choose a probability distribution F such that an m × N matrix, A can be constructed whose entries are drawn from F satisfying

E(‖Ax‖2�m

2) = ‖x‖2

�N2

, (20)

and

Pr((1 − ς)‖x‖2�N

2≤ ‖Ax‖2

�m2

≤ (1 + ς)‖x‖2�N

2)

≤ 1 − 2 exp(−mC(ς)) (21)

for 0 < ς , where C(ς) is positive constant depending on ς and distribu-tion F.

For instance the matrix A, whose entries are independently drawn from Gaussian distribution with Ai, j ∼ N (0, 1

m ) results in C(ς) = ς2/4 − ς3/6. In the literature much effort has been spent to find C(ς). One example is that Achlioptas [47] who has proved that we also get C(ς) = ς2/4 − ς3/6 for the following distribu-tions:

Ai, j =⎧⎨⎩

+1√m

with probability 1/2,

−1√m

with probability 1/2,(22)

Ai, j = √3 ×

⎧⎪⎪⎨⎪⎪⎩

+1√m

with probability 1/6,

0 with probability 2/3,−1√

mwith probability 1/6.

(23)

Input: yn , A, B;Determine: ε1. Apply F to yn : y = F yn

2. Estimate x : x = arg minx ‖x‖�N1

s.t.∥∥ y − F Ax

∥∥�

p2

≤ ε

3. Estimate w : w = (BT B)−1 BT(yn − Ax)4. Threshold w : wi = a ∗ sgn(wi)

5. Estimate x : x = arg minx ‖x‖�N1

s.t.∥∥(yn − B w) − Ax

∥∥�m

2≤ ε

Output: x, w

Fig. 1. Algorithm 1: The reconstruction algorithm of the signal and the embedded data from noisy compressively encoded samples.

Theorem 2. (See [45].) Let A be the m × N measurement matrix that satisfies the inequality given in (21) with δ(A) := 2ς . If

m ≥ C1k log(N/k), (24)

then A satisfies the Restricted Isometry Property of order k with proba-bility ≥ 1 − 2exp(−C2m) where C1 and C2 are constants depending on δ(A).

Having presented the results from the literature on the random measurement matrices that satisfy the RIP condition, we can now turn our attention to the F A matrix, as occurring in Algorithm 1. The question is whether the matrix A that satisfies the RIP condi-tion of order 2k, premultiplied by the annihilator matrix F , yields a matrix F A still satisfying the RIP of order 2k.

Lemma 3. Let A be an m × N matrix with elements Ai, j drawn i.i.d.according to N (0, 1

m ) and let F be a p ×m matrix with orthogonal rows such that ‖Fi‖2

�m2

= mp where Fi denote the rows of the matrix F for i =

1, . . . , p. If

p ≥ O (k log(N/k)), (25)

then the matrix F A satisfies the Restricted Isometry Property of order kwith probability ≥ 1 − 2exp(−O (p)).

Proof. Let AC(i) be the columns of the matrix A and let Fi be the rows of the matrix F . Then the elements of the matrix F A will be Zi, j =< Fi, AT

C( j) > independent Gaussian random variables. We will prove only E(Z 2

i, j) = 1p . It can be calculated via

E(Z 2i, j) = E(< Fi, AC( j) >2) = E(

m∑l=1

F 2i,l A2

l, j) =m∑

l=1

F 2i,lE(A2

l, j)

= 1

m

m∑l=1

F 2i,l = 1

m‖Fi‖2

�m2

= 1

p. (26)

We can therefore use Theorem 2 to complete the poof of this lemma. �

This lemma states that the null space of the embedding matrix B should be large, in fact at least as large as the lower bound of the sparse reconstruction theorem. This lemma also implies that reconstruction of x is possible within a tolerable error bound given in Lemma 1 or Theorem 1 (for bounded noise energy case) by using m ≥ O (k log(N/k)) measurements. Furthermore, when the noise is AWGN, we can give the following stability condition that sets a limit on the number of embeddable bits, M .

Theorem 3. Let w ∈ {+a, −a}M be the data sequence. Consider the measurement matrix, A, and annihilator matrix, F , given in Lemma 3. Suppose marked measurements of the form yn = Ax + B w + z such that


Fig. 2. Performance results of the algorithms for the case in which measurements are corrupted by AWGN for sparsity level, k = m/6. (a) Prob(w �= w) values for M . (b) E{‖x − x‖2} values for M .

zi ∼ N (0, σ 2G ). For particular setting, ε = 2

√mσG , the proposed algo-

rithm approximates x with a bounded error given in (19) with γ = 1 and it also recovers M bits embedded data signal exactly with probability

Prob(w = w) ≥(

1 − exp

(−β1

a2

16C23σ

2G

))P1, (27)

where

P1 ≥ 1 − exp

(−3p

4

), C3 = 4

√1 + δ2k(F A)

1 − (1 + √2)δ2k(F A)

, (28)

provided that

M ≤ exp

(β0

a2

16C23σ

2G

), (29)

for any 0 < β0 < 1 and β1 = 1 − β0 .

In Section 2, we showed in Lemma 1 and Theorem 1 that reconstruction of the signal of interest, x, depends on the re-stricted isometry constant of A. Furthermore, Theorem 3 directly states that embedding capacity depends on the restricted isome-try constant of F A and the signal to noise ratio. Therefore, one should set matrices F and A in such a way that both F A and Ahave small restricted isometry constants. In Lemma 3, we demon-strated that it is possible to construct such an F A with desired restricted isometry constant. Indeed, one can choose the matrix Frandomly from any distribution as long as the distribution of F Asatisfies the concentration of measure inequality given in Corol-lary 1. For example, it would be inappropriate to choose both Fand A from Gaussian distribution, since the distribution of prod-uct matrix, F A, will be heavily-tailed distribution [48], and this will cause an increase in the number of measurement, m, to sat-isfy the stability condition under the same error bound compared to Gaussian (or sub Gaussian) measurements. To satisfy Lemma 3, one can choose the annihilator matrix F to be deterministic such as an orthonormal basis or a dictionary. We also choose ma-trix A to be Gaussian, so that the multiplication matrix F A is Gaussian as well, as a result we can easily prove that both Aand F A satisfy the RIP as shown in Lemma 3. By an appropri-ate theoretical analysis, one can also choose alternative setups for F and F A.

3.5. Simulation results for random measurements

In this section, we present simulated performance results of the proposed decoding algorithms for the case where the marked measurements are corrupted by AWGN. We experiment with the following parameters:

• The synthetic k-sparse signal, x, consists of N = 1024 samples and has unit norm.

• The signal sparsity is k = m/6, that is, the sparsity of the signal x is adjusted to be at one sixth of number of measurement, m.

• The M long payload (embedded data), w , is generated with ‖w‖�2

= ‖Ax‖�24 so that EDR (embedded data-to-document ra-

tio) is 0.25 or −6 dB. The M long payload is additively com-bined (embedded) with the m samples of the signal.

• The measurement matrix, A, is produced as Gaussian random matrix with Ai, j ∼N (0, 1/m).

• We consider the additive white Gaussian noise contaminating the signal at 32 dB signal-to-noise ratio (SNR). SNR in dB is defined as 20 log10(

‖Ax+B w‖2‖z‖2).

• The annihilator matrix F is generated with orthogonal rows, with ‖Fi‖2

�m2

= mp where Fi denote the rows of the matrix F

for i = 1, . . . , p and p = m − M . Consequently, the columns of the encoding matrix B (m × M) are obtained so as to span the null space of F .

• The k-sparse synthetic signal x is generated in the followingway: (i) The indices which index the location of k non-zero el-ements of x are randomly chosen as a subset of {1, 2, . . . , N}. (ii) Then, the values of each non-zero elements are Gaussian randomly generated and the amplitudes of coefficients are nor-malized to unit norm, i.e., ‖x‖�2

= 1.

Each experiment is conducted 1000 times, and corresponding Prob(w �= w) and E{‖x − x‖2} values are reported. Performance re-sults are compared in term of both the embedding rate, M

m and measurement rates m

N . We utilize �1-magic solver [49] to imple-ment �1-minimization parts. Notice that the abscissa in Fig. 2 is the compression rate, i.e., m/N , the ratio of compressive measure-ments to the original signal length while the ordinate is M/m, the ratio of the number of bits embedded to the number of mea-surements. The ordinate is in reverse order, that is, for decreasing values of the embedding rate or increasing embedded data foot-print.

The better performances are observed in the upper right cor-ner of the quadrant, that is, not surprisingly, for smaller compres-sion ratios (m/N → 1) and for smaller embedded data payloads (M/N → 0).

This experiment is also conducted for different sparsity levels, k = m/5 and k = m/4, respectively, for which performance results are shown in Fig. 3 and Fig. 4. As it is stated in Lemma 3, the num-ber of measurements sufficient for reliable reconstruction depends on sparsity level, k. Comparing Fig. 2 with Fig. 3, we observe that there is not a considerable increase in either probability of exact recovery of embedded data or the mean squared reconstruction error as we increase sparsity level from k = m/6 to k = m/5. How-




ever, when we increase the sparsity level to k = m/4, we see a noticeable increase in the probability of error of exact recovery of embedded data and the mean squared reconstruction error, which is shown in Fig. 4. Indeed, the number of sufficient measurements is given empirically as m ≥ 4k by many researchers [50] in the classical compressive sensing literature. This phase transition de-pending on sparsity level, k, is also observed empirically as about 4 × k in our experiments.

4. Data hiding for real-time large scale signals

The proposed data hiding scheme in Section 3, though correct in theory, may not be practical from the implementation point of view. In fact, the computational complexity and memory require-ment for the signal reconstruction part and embedded data de-coding, when working with random matrices are cumbersome. For example, consider a 512 × 512 pixel image to be sensed. Assume that we will take 90 000 measurements from N = 5122 length vec-torized image ( m

N = 0.36). In this case, the measurement matrix will have size be 90 000 × 262 144, which will require more than 80 gigabytes of storage. Indeed, most of the popular sparse re-construction algorithms require the computation of the transpose of the measurement matrix, �, several times. This brings a com-putational burden to the system when measurement matrices are of such large sizes. Furthermore, the computational complexity of signal reconstruction significantly increases when we insert addi-tional data on CS measurements. In the following section, we will present a more implementable approach for sparse signal recov-ery using incoherent measurements based on a fast implementable transform.

4.1. Incoherent measurements

In CS literature, one early approach to surmount the problem of efficiency for sparse signal recovery is to pick a sensing ma-trix which consists of randomly chosen rows from an orthonor-mal basis such as Fourier basis [4]. Although, constructing such a sensing scheme leads to fast implementations on sensing and re-construction part, guaranteeing theoretically its RIP is rather chal-lenging [51]. In such cases, it is preferable to deal with a easily computable guarantee condition such as “coherence”.

To clarify the concept of coherence, let us recall the recon-struction problem in the typical compressive sensing setup. For sampling, consider a special form of the measurement matrix given in Section 3.3. Mathematically speaking, let �1 ∈ {1, 2, 3, . . . , N} be a subset of indices indexing the location of the chosen rows from an orthonormal basis, �, with |�1| = m. In this case, the measure-ment vector, y will be

y = ��1 S = ��1ψx = Ax, (30)

where ��1 is the measurement matrix consisting of the rows picked from � indexed by �1. If the location of the non-zero coef-ficients is unknown, sensing individual coefficients of the vector xmay not be a proper strategy (that is in fact equivalent to set the matrix A as identity matrix), since this sampling method requires m = N independent measurements to collect sufficient information about the signal. Therefore, for a good sensing scheme, each ele-ment of a row of the measurement matrix must yield a sensing residual from every sample of the signal in the inner product op-eration. In other words, the rows of the matrix � should be as flat as possible in the ψ space. This amounts to the fact that the rows of A should not be sparse or compressible. More formally, let an N × N matrix U = �ψ . Define a functional μ(U ) as


μ(U ) = maxi, j

∣∣Ui, j∣∣ (31)

which is used to loosely quantify how distributed the rows of Uare [52]. Since Ui, j = ⟨

�Ti ,ψ j

⟩, the parameter μ(U ) is formally de-

fined as follows:

Definition 2 (Mutual coherence). (See [53].) The mutual coherence between the sampling basis, �, and the sparsifying basis, ψ , is defined as (both unit norm)

μ(�,ψ) = max1≤i, j≤N

∣∣∣⟨�Ti ,ψ j

⟩∣∣∣ . (32)

By elementary linear algebra, it can be seen that 1√N

≤μ(�, ψ) ≤ 1. As we discussed above, compressive sensing frame-work requires small μ(�, ψ), or in other words incoherent basis pairs. The pair of spike and sinusoidal signals can be given as an example of incoherent pairs [53]. Indeed, this pairing of ba-sis sets, is maximally incoherent, since μ(Spikes, Sinusoids) = 1√

N.

Candes et al. [52] have given a lower bound on the number of measurement in terms of coherence between sparsifying basis and sampling basis:

Theorem 4. (See [52].) Given a fixed S ∈ RN that has a k-sparse repre-

sentation x in a basis ψ . Pick a subset �1 of measurements domain from the sampling basis, �, with |�1| = m. If

m ≥ C .μ2(�,ψ).k. log N, (33)

where C is a positive constant, then, the reconstruction x of the signal xis exact via

x = minx

‖x‖�N1

s.t. y = ��1ψx (34)

with overwhelming probability.

4.2. Noiselets as representation basis

It is known that a sparsifying basis, wavelet basis, is inco-herent with noiselets [54] which constitutes another instance of incoherent basis pair. Noiselets are noise-like functions that are completely uncompressible with wavelet decompositions. There-fore, noiselet basis is maximally incoherent with wavelet basis. The coherence between noiselets and Haar wavelets is

√2N [55]. The

coherence of noiselets with Daubechies D4 and D8 are given as, respectively,

√2.2N and

√2.9N [50]. Besides being incoherent with

wavelets, noiselets also have fast implementations. Since, just like wavelet decomposition, noiselet decomposition can be made us-ing multiscale iteration and it can be accomplished in O (N log N)

time. These properties make the noiselets a good choice for a sens-ing basis especially for large-size applications. As an example for large-size signals, we consider the N = 512 × 512 image shown in Fig. 5. This image was transformed into wavelet domain, the 5%of largest coefficients of all coefficients were kept, the remaining ones being zeroed out, and then, it was reconstructed by taking the inverse wavelet transform. This constitutes our synthetically constructed k-sparse signal, k ≤ 0.05 ∗ N = 13 107. To constitute a measurement matrix, ��1 , m = 0.4 × N rows are randomly chosen from the rows of Noiselet basis (real part). It is observed that the reconstruction x of the wavelet coefficients are exact by using (34). The reconstructed image is also shown in Fig. 5.

4.3. The fast data hiding scheme

Having presented noiselets as a sampling basis with a low com-plexity, we now proceed how to adapt the embedding scheme in

Fig. 5. Signal recovery from Noiselet measurements (a) N = 5122 length k = N ∗0.05 = 13 107-sparse synthetic image. (b) Reconstructed image from m = 0.4 × NNoiselet measurements. Image is recovered with PSNR > 81 dB. (c) Reconstructed image in the presence of AWGN with 32 dB SNR. Image is recovered with PSNR >

53 dB.

Section 3.3. Since the least squares solution for embedded data re-covery also requires to store and compute the transpose of m × Mencoding matrix, it becomes computationally ineffective when mis large. Fortunately, an approach similar to the fast sparse recon-struction method presented in Section 4.1 can also be applied for embedded data recovery.

The idea is as follows: Choose the encoding matrix as a subset of columns from some other orthonormal matrix, which constitute an orthonormal basis, � (m × m). Preferentially, this orthonor-mal basis is chosen as a fast computable transform such as dis-crete Hartley transform (DHT), discrete cosine transform (DCT), etc. These columns are picked with a subset �2 ⊂ {1, 2, 3, . . . , m} of in-dices indexing the location of the chosen columns, with |�2| = M , denoted as ��2 (m × M). Therefore, the m long marked cover (marked measurements) in (10) turn out to be

yw = ��1 S + ��2 w, (35)

where S is the N length signal to be sensed and w is the M length data that we wish to embed.

In the reconstruction stage, the noisy measurements, yn =yw + z, of the signal and the embedded signal are received. In Algorithm 1 we had first chosen a candidate annihilator matrix F , then constructed the encoding matrix B from the null space of F . The procedure continued with finding the first-tier estimation xof x.

Suppose that �3 ⊂ �c2, where �c

2 = {1, 2, 3, . . . , m} \�2. Let F�3

be the p × m matrix corresponding to rows of F�3 chosen from the columns of the orthonormal basis � indexed by �3. Then it is apparent that the matrix F�3 will be the left annihilator of the matrix ��2 , i.e., such that F�3��2 = 0. In other words, one may use a subset of columns of a basis � as the encoding matrix, then another subset from unused columns simply be used as the rows of the annihilator matrix.

On the other hand, instead of finding the pseudoinverse of en-coding matrix, and applying it to the residual yn − Ax as in (14), we can just apply inverse transform of the fast computable trans-form, ��2 , and take the M coefficients indexed by �2 from the resulting coefficients.

In a nutshell, the advantage of constructing the measurement matrix, the encoding matrix and annihilator matrix all from or-thonormal bases is: (i) It requires low-complexity sensing and re-construction processes, (ii) Encoder and decoder sections do not need to store whole matrices; knowledge of �1, �2 and �3 is suf-ficient for reconstruction.

As an example we choose encoding basis � as discrete cosine transform (DCT) and sensing basis � as noiselet basis. Observation vector is obtained from 512 × 512 synthetic Lena Image shown in Fig. 5. We fix the number of noiselet measurements to m = N ×0.4. A 64 × 64 metadata image shown in Fig. 6 is converted into M =64 × 64 × 8 = 32 768 length binary stream. This M long data is


Fig. 6. Joint signal reconstruction and embedded data recovery in the presence AWGN with 32 dB SNR. (a) 512 × 512 synthetic Lena with sparsity level, k =0.05 ∗ N . (b) A 64 × 64 metadata image. (c) Reconstructed image from m = N × 0.4noiselet measurements. Image is recovered with average 53.58 PSNR over 100 trials. (d) Reconstruction of metadata image. Exact recovery is observed for each trial.

encoded using DCT with ‖w‖�2=

∥∥��1 S∥∥

�24 so that EDR (embedded

data-to-document ratio) is −6 dB. This marked measurements are also corrupted with noise at 32 dB signal-to-noise ratio (SNR). This synthetic image is recovered with average PSNR = 53.58 over 100 trials. Exact recovery of the metadata image is observed for each trial. While joint embedding and sensing take 0.1 seconds using MATLAB on a 2.8 GHz, 8 GB ram computer, decoding is done in 6.5 seconds, or 3.25 seconds to recover the hidden message and additional 3.25 seconds to recover the cover message.

4.4. Simulation results for incoherent measurements

In this section, we present simulated performance results of the proposed fast data hiding scheme. We experiment with the follow-ing parameters:

• We use the synthetically sparsified version of the 512 ×512 Lena Image, thus S ∈ R

N , N = 5122, sparsity level, k =0.05 × N .

• The measurement matrix, ��1 , is randomly chosen from the rows of noiselet basis. Real-valued “dragon” noiselets, which are implemented in [56] are used.

• The M long data, w , is generated with −6 dB WDR.• The rows of the annihilator matrix, F�3 is chosen from the

columns of m × m DCT matrix with �3 = {1, 2, . . . , p}, where p = m − M . Then, the columns of encoding matrix, ��2 is cho-sen from the remaining columns.

• We consider the Gaussian noise, z, with 32 dB signal-to-noise ration (SNR). SNR in dB is defined as 20 log10(

‖��1 S+��2 w‖2).
‖z‖2
• The estimated wavelet coefficients, x, and estimated embed-ded data, w , are found using the proposed algorithm in Fig. 1. Then, reconstruction of image is achieved via S = ψ x, where ψ is the inverse wavelet transform.

Each experiment is conducted 250 times, and corresponding Prob(w �= w), and PSNR values are reported. E{ ‖x−x‖2‖x‖2

} values are also recorded for better comparison with Section 3.5, where x is the vector of k = 0.05 × N sparse wavelet coefficients of the syn-thetic image before compressively sensing and data embedding, and x is the vector of wavelet coefficients of the reconstructed im-age. Gradient Projection for Sparse Reconstruction (GPSR) [57] is used to implement �1 minimization part. We use the MATLAB tool-box WaveLab850 [58] to implement wavelet coiflet 2. As was done in Section 3.5, for random noise modulation matrices, performance results, as shown in Fig. 7, are given for the incoherent struc-tured modulation matrices in terms of both the embedding rate, Mm and measurement rate m

N . Comparing Fig. 2(a) with Fig. 7(a), one can observe that the poor reconstruction bound for embed-ded data are shifted towards the M/m axis for m/N ≤ 0.3. This is because the wavelet coefficients of compressible images exhibit a rapid decay when sorted in descending magnitude order. The �1minimization algorithm we use cannot recover the wavelet coeffi-cients with small magnitudes when the number of measurements falls below a limit e.g. m/N ≤ 0.3. The non-recovered coefficients create the effect of an additional noise source in the process of decoding for the hidden message, w . Notice that this kind of rapid decay does not occur when we synthesize a sparse signal with random magnitudes located at random positions, as in Sec-tion 3.5.

We have repeated this experiment with two other images. (i) The 512 × 512 Kid image, at the same sparsity level for which the experimental results are shown in Fig. 8. (ii) The 256 × 256Cameraman image set to the sparsity level k = 0.1 × N (Fig. 9). As it is expected, we observe a small amount of increase in either probability of exact recovery of embedded data or in the mean squared reconstruction error as the sparsity level is increased from kN = 0.05 to k

N = 0.1.

5. Discussion and conclusion

In this work, we have presented a practical data hiding scheme in which data is directly spread over the compressively sensed measurements via an encoding matrix. The hidden data coexists only with the compressively sensed samples of the cover signal, and when the cover signal is to be recovered, the hidden data is also removed. A typical application case of this scheme is sen-sor networks in general, and wireless body sensor networks in particular. The proposed scheme can operate both with random modulation matrices A and B , as well as when these matrices are chosen structurally, e.g., noiselets and wavelets, which enables a fast implementation for large size data.

The reconstruction method recovers jointly both the signal and the embedded data. Since both the cover data (sparse samples) and the embedded data are exactly recovered under certain noise, payload and sparsity conditions, the proposed method can be qual-ified as conditionally reversible data hiding. The theoretical guar-antee for the joint recovery of the cover and hidden data has been investigated and it was shown to depend on restricted isometry constants of the two modulation matrices A and A F . Experimen-tal results have confirmed the recoverability of the cover signal and the hidden data. Trade-offs between the parameters of the embedding strength (a), the hidden data payload (M), hence the embedding rate (M/m), the sparsity of the cover signal (k), and the sampling compression (m/N) have been discussed.


Fig. 7. Performance results of the algorithms for the case in which measurements are corrupted by AWGN for 512 ×512 Lena with sparsity level, k = 0.05 ∗ N . (a) Prob(w �= w)

values for M . (b) E{ ‖x−x‖2‖x‖2} values for M . (c) PSNR values for M .

Fig. 8. Performance results of the algorithms for the case in which measurements are corrupted by AWGN for 512 × 512 synthetic Kid image. (a) N = 5122 length k =N ∗ 0.05 = 13 107-sparse synthetic Kid image. (b) Prob(w �= w) values for M . (c) E{ ‖x−x‖2‖x‖2

} values for M . (d) PSNR values for M .


Fig. 9. Performance results of the algorithms for the case in which measurements are corrupted by AWGN for 256 × 256 synthetic Cameraman image. (a) N = 2562 length k = N ∗ 0.1-sparse synthetic Cameraman image. (b) Prob(w �= w) values for M . (c) E{ ‖x−x‖2‖x‖2

} values for M . (d) PSNR values for M .

Two possible avenues to extend this work are the following: The �1 minimization method used in the work can be replaced by greedy methods [22,23]. However, the methods to provide per-formance guarantee conditions will be remarkably different than techniques used in convex relaxation methods. In this study we have considered strictly sparse signals. In the case of approxi-mately sparse, i.e., compressible signals, one should also take into account the approximation error in addition to uncertainty incur-ring by the noise pattern. The performance would depend on how well the compressible signals can be approximated by a k sparse vector [27].

Appendix A. Proofs

Proof of Lemma 2. Define a vector f = F z, then elements of this vector will be f i =< Fi, z >, where Fi denote the rows of the p × m matrix F . Consider that F has orthogonal rows and z is an m × 1 vector consists of i.i.d. Gaussian random variables with zi ∼ N (0, σ 2

G ). It is obvious that f i has distribution N (0, mp σ 2G ),

since using independence we get

E( f 2i ) = E(< Fi, z >2) = E((

m∑j=1

Fi, j z j)2)

= E(

m∑j=1

(Fi, j)2 +

m∑l=1

m∑t=1t �=l

zl zt Fi,l F i,t)

=m∑

j=1

F 2i, jE(z2

j ) +m∑

l=1

m∑t=1t �=l

E(zl zt)Fi,l F i,t

= σ 2G

m∑F 2

i,l = σ 2G ‖Fi‖2

�m2

= m

pσ 2

G , (A.1)
j=1
where we only use E(zl zt) for t �= l. Then by using inequality given in (9), it is easy to show that

Pr(‖F z‖�p2

≥ (1 + γ )

√m√p

√pσG) ≤ e− 3p

4 γ 2. (A.2)

Using Lemma 1, we can obtain that the solution x to (13) obeys∥∥x − x∥∥

�N2

≤ C3ε (A.3)

with probability at least 1 − e− 3p4 γ 2

.Now it is time to find a bound for remaining error be-

fore least squares solution to (14). That is to say, we look for ∥∥yn − Ax − B w∥∥

�m2

. Using yn = Ax + B w + z, we get∥∥yn − Ax − B w∥∥

�m2

= ∥∥A(x − x) + z∥∥

�m2

≤ ∥∥A(x − x)∥∥

�m2

+ ‖z‖�m2

≤ (C3(√

1 + δ2k(A) + C4) + 1)ε, (A.4)

where the last inequality comes from the triangular inequality. Let us define er = A(x − x) + z. Then the solution w to (14) obeys

w − w = (BT B)−1 BTer,

and since the eigenvalues of BT B are well-behaved, we have∥∥w − w∥∥

�M2

≈∥∥∥BTer

∥∥∥�M

2

≈ ‖er‖�M2

≤ C0ε, (A.5)

where C0 = C3(√

1 + δ2k(A) + C4) + 1. The term √

1 + δ2k(A)

comes from upper bound of RIP of A. One can claim that the vec-tor h = x − x may not necessarily be 2k-sparse, since the solution xto (15) is erroneous and not k-sparse. But it is known that the vec-tor h should approximately sparse and not much violate the upper bound

√1 + δ2k(A). Here we use term a constant C4 to represent

this possible violation without computing exact value of C4.


Now, would like the mention that the uncertainty on estima-tion w which is given in equation (A.5) is in worst-case scenario. We mean by worst-case scenario is the situation when we do not know anything about error er except its norm-bound, ‖er‖�M

2≤

C0ε . Under this assumption equation (A.5) implies that∥∥w − w∥∥

�M∞≤ ∥∥w − w

∥∥�M

2≤ C0ε. (A.6)

Therefore we can conclude that, if

|wi | ≥ C0ε, (A.7)

the solution w to (15) is exact with full probability. Considering ‖w‖�M

2= υ , inequality (A.7) turns to

|wi | = υ√M

≥ C0ε (A.8)

and it gives equation (18). Using Lemma 1, the solution x to (16)obeys

∥∥x − x∥∥

�N2

≤ 4

√1 + δ2k

1 − (1 + √2)δ2k

(1 + γ )√

mσG

which completes the proof. �Proof of Theorem 3. The proof can be done in three tiers:

(i) Let x be the solution to (13). By setting ε = (1 + γ )√

mσG

and using Lemma 1 and (A.2), x satisfies∥∥x − x∥∥

�N2

≤ C3ε (A.9)

with probability P0 ≥ 1 − exp(− 3p4 γ 2), where

C3 = 4

√1 + δ2k(F A)

1 − (1 + √2)δ2k(F A)

(A.10)

(ii) Define remaining error before least squares solution in (14), er = e + z, where e = A(x − x). Consider that Ai, j drawn i.i.d.according to N (0, 1

m ). Given ∥∥x − x

∥∥�N

2≤ C3ε , one can say, e is

Gaussian random vector which has elements, ei , with variance σ 2 ≤ C2

3ε2

m . Using independence between e and z, one can con-clude, the elements of Gaussian random vector er has variance σ 2

r ≤ 2 max{σ 2, σ 2G } ≤ 2

C23ε2

m . Considering the matrix B has or-thonormal vectors, the solution w of (14) satisfies,

w − w = BT er = er2 ∼ N(

0,σ 2er

). (A.11)

Then, the probability of error in (15) can be found as

Prob(wi �= wi |{∥∥x − x

∥∥�N

2≤ C3ε})

= Prob(|eir2

| > a) ≤ exp

(− a2

2σ 2r

)

≤ exp

(− a2m

4C23ε

2

), (A.12)

where eir2

the ith component of er2 . Using the union bound we get

Prob(w �= w|{∥∥x − x∥∥

�N2

≤ C3ε})

= Prob(‖er‖�N∞ > a) ≤ M exp

(− a2m

4C23ε

2

)

= exp

(log(M) − a2m

4C23ε

2

). (A.13)

(iii) Using (i) and (ii), probability of exact embedded data re-covery of Algorithm 1 can be found as

Prob(w = w) ≥(

1 − exp

(log(M) − a2m

4C23ε

2

))P0 (A.14)

For special case γ = 1, i.e., ε = 2√

mσG , by adjusting

M ≤ exp

(β0

a2m

4C23ε

2

)= exp

(β0

a2

16C23σ

2G

), (A.15)

for any 0 < β0 < 1, we get

Prob(w = w) ≥(

1 − exp

(−β1

a2

16C23σ

2G

))P1, (A.16)

where

P1 ≥ 1 − exp

(−3p

4

), β1 = 1 − β0. � (A.17)

References

[1] W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard, 1st edition, Kluwer Academic Publishers, Norwell, MA, USA, 1992.

[2] G. Wallace, The JPEG still picture compression standard, IEEE Trans. Consum. Electron. 38 (1) (1992) xviii–xxxiv, http://dx.doi.org/10.1109/30.125072.

[3] D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52 (4) (2006) 1289–1306, http://dx.doi.org/10.1109/TIT.2006.871582.

[4] E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal re-construction from highly incomplete frequency information, IEEE Trans. Inf. Theory 52 (2) (2006) 489–509, http://dx.doi.org/10.1109/TIT.2005.862083.

[5] H. Mamaghanian, N. Khaled, D. Atienza, P. Vandergheynst, Compressed sensing for real-time energy-efficient ECG compression on wireless body sensor nodes, IEEE Trans. Biomed. Eng. 58 (9) (2011) 2456–2466, http://dx.doi.org/10.1109/TBME.2011.2156795.

[6] S. Xiang, L. Cai, Transmission control for compressive sensing video over wire-less channel, IEEE Trans. Wirel. Commun. 12 (3) (2013) 1429–1437, http://dx.doi.org/10.1109/TWC.2013.012313.121150.

[7] D. Takhar, J.N. Laska, M.B. Wakin, M.F. Duarte, D. Baron, S. Sarvotham, K.F. Kelly, R.G. Baraniuk, A new compressive imaging camera architecture using optical-domain compression, in: Electronic Imaging 2006, International Soci-ety for Optics and Photonics, 2006, p. 606509.

[8] I.J. Cox, M.L. Miller, J.A. Bloom, Digital Watermarking and Steganography, The Morgan Kaufmann Series in Multimedia Information and Systems, Morgan Kaufmann Publishers, Burlington (Mass.), 2008.

[9] M. Barni, F. Bartolini, Watermarking Systems Engineering: Enabling Digital As-sets Security and Other Applications, CRC Press, 2004.

[10] F.A. Petitcolas, R. Anderson, M. Kuhn, Information hiding—a survey, Proc. IEEE 87 (7) (1999) 1062–1078, http://dx.doi.org/10.1109/5.771065.

[11] M. Sheikh, R. Baraniuk, Blind error-free detection of transform-domain water-marks, in: IEEE International Conference on Image Processing, vol. 5, ICIP 2007, 2007, pp. V-453–V-456.

[12] X. Zhang, Z. Qian, Y. Ren, G. Feng, Watermarking with flexible self-recovery quality based on compressive sensing and compositive reconstruction, IEEE Trans. Inf. Forensics Secur. 6 (4) (2011) 1223–1232, http://dx.doi.org/10.1109/TIFS.2011.2159208.

[13] C. Delpha, S. Hijazi, R. Boyer, A compressive sensing based quantized water-marking scheme with statistical transparency constraint, in: Digital-Forensics and Watermarking, in: Lecture Notes in Computer Science, vol. 8389, Springer, Berlin, Heidelberg, 2014, pp. 409–422.

[14] G. Valenzise, M. Tagliasacchi, S. Tubaro, G. Cancelli, M. Barni, A compressive-sensing based watermarking scheme for sparse image tampering identification, in: 16th IEEE International Conference on Image Processing, ICIP 2009, 2009, pp. 1265–1268.

[15] C. Patsakis, N. Aroukatos, LSB and DCT steganographic detection using com-pressive sensing, J. Inf. Hiding Multimed. Signal Process. 5 (1) (2014) 20–32.

[16] E.J. Candès, T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51 (12) (2005) 4203–4215, http://dx.doi.org/10.1109/TIT.2005.858979.

[17] J. Eggers, R. Bauml, R. Tzschoppe, B. Girod, Scalar Costa scheme for information embedding, IEEE Trans. Signal Process. 51 (4) (2003) 1003–1019, http://dx.doi.org/10.1109/TSP.2003.809366.

[18] M. Yamac, C. Dikici, B. Sankur, Robust watermarking of compressive sensed measurements under impulsive and Gaussian attacks, in: Proceedings of the 21st European Signal Processing Conference, EUSIPCO 2013, 2013, pp. 1–5.

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib6A706567s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib6A706567s1

http://dx.doi.org/10.1109/30.125072

http://dx.doi.org/10.1109/TIT.2006.871582


http://dx.doi.org/10.1109/TBME.2011.2156795

http://dx.doi.org/10.1109/TWC.2013.012313.121150

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib54616B686172s1




http://refhub.elsevier.com/S1051-2004(15)00285-7/bib77617465726D61726Bs1



http://refhub.elsevier.com/S1051-2004(15)00285-7/bib6261726E693230303477617465726D61726B696E67s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib6261726E693230303477617465726D61726B696E67s1

http://dx.doi.org/10.1109/5.771065




http://dx.doi.org/10.1109/TIFS.2011.2159208

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib536869s1








http://refhub.elsevier.com/S1051-2004(15)00285-7/bib70617473616B6973323031346C7362s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib70617473616B6973323031346C7362s1


http://dx.doi.org/10.1109/TSP.2003.809366

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib79616D616332303133726F62757374s1



http://dx.doi.org/10.1109/TBME.2011.2156795

http://dx.doi.org/10.1109/TWC.2013.012313.121150




[19] S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way, 3rd edition, Academic Press, 2008.

[20] D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthog-onal) dictionaries via �1 minimization, Proc. Natl. Acad. Sci. USA 100 (2005) 2197–2202, http://dx.doi.org/10.1073/pnas.0437847100.

[21] J. Tropp, S. Wright, Computational methods for sparse solution of linear inverse problems, Proc. IEEE 98 (6) (2010) 948–958, http://dx.doi.org/10.1109/JPROC.2010.2044010.

[22] T.T. Cai, L. Wang, Orthogonal matching pursuit for sparse signal recovery with noise, IEEE Trans. Inf. Theory 57 (7) (2011) 4680–4688, http://dx.doi.org/10.1109/TIT.2011.2146090.

[23] J. Wang, B. Shim, On the recovery limit of sparse signals using orthogonal matching pursuit, IEEE Trans. Signal Process. 60 (9) (2012) 4973–4976, http://dx.doi.org/10.1109/TSP.2012.2203124.

[24] M.A. Davenport, M.F. Duarte, Y.C. Eldar, G. Kutyniok, Introduction to Com-pressed Sensing, Cambridge University Press, 2012.

[25] H. Rauhut, On the impossibility of uniform sparse reconstruction using greedy methods, Sampl. Theory Signal Image Process. 7 (2) (2008) 197–215.

[26] J.A. Tropp, Just relax: convex programming methods for identifying sparse sig-nals in noise, IEEE Trans. Inf. Theory 52 (3) (2006) 1030–1051, http://dx.doi.org/10.1109/TIT.2005.864420.

[27] E.J. Candès, The restricted isometry property and its implications for com-pressed sensing, C. R. Math. 346 (9) (2008) 589–592, http://dx.doi.org/10.1016/j.crma.2008.03.014.

[28] E.J. Candès, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and in-accurate measurements, Commun. Pure Appl. Math. 59 (8) (2006) 1207–1223, http://dx.doi.org/10.1002/cpa.20124.

[29] J. Haupt, R. Nowak, Signal reconstruction from noisy random projections, IEEE Trans. Inf. Theory 52 (9) (2006) 4036–4048, http://dx.doi.org/10.1109/TIT.2006.880031.

[30] S. Foucart, A note on guaranteed sparse recovery via �1-minimization, Appl. Comput. Harmon. Anal. 29 (1) (2010) 97–103, http://dx.doi.org/10.1016/j.acha.2009.10.004.

[31] Y. Rachlin, D. Baron, The secrecy of compressed sensing measurements, in: 46th Annual Allerton Conference on Communication, Control, and Computing, 2008, 2008, pp. 813–817.

[32] A. Orsdemir, H. Altun, G. Sharma, M. Bocko, On the security and robustness of encryption via compressed sensing, in: Military Communications Conference 2008, MILCOM 2008, IEEE, 2008, pp. 1–7.

[33] R. Huang, K. Rhee, S. Uchida, A parallel image encryption method based on compressive sensing, Multimed. Tools Appl. 72 (1) (2014) 71–93, http://dx.doi.org/10.1007/s11042-012-1337-0.

[34] C.E. Shannon, Communication theory of secrecy systems, Bell Syst. Tech. J. 28 (4) (1949) 656–715, http://dx.doi.org/10.1002/j.1538-7305.1949.tb00928.x.

[35] K. Liu, H. Kargupta, J. Ryan, Random projection-based multiplicative data per-turbation for privacy preserving distributed data mining, IEEE Trans. Knowl. Data Eng. 18 (1) (2006) 92–106, http://dx.doi.org/10.1109/TKDE.2006.14.

[36] W. Lu, A. Varna, A. Swaminathan, M. Wu, Secure image retrieval through fea-ture protection, in: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, 2009, pp. 1533–1536.

[37] X. Liu, W. Mei, H. Du, Optical image encryption based on compressive sens-ing and chaos in the fractional Fourier domain, J. Mod. Opt. 61 (19) (2014) 1570–1577, http://dx.doi.org/10.1080/09500340.2014.946565.

[38] N. Zhou, H. Li, D. Wang, S. Pan, Z. Zhou, Image compression and encryption scheme based on 2D compressive sensing and fractional Mellin transform, Opt. Commun. 343 (2015) 10–21, http://dx.doi.org/10.1016/j.optcom.2014.12.084.

[39] C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis, in: Theory of Cryptography, Springer, 2006, pp. 265–284.

[40] C. Dwork, F. McSherry, K. Talwar, The price of privacy and the limits of LP de-coding, in: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’07, ACM, New York, NY, USA, 2007, pp. 85–94.

[41] S. Lee, C.D. Yoo, T. Kalker, Reversible image watermarking based on integer-to-integer wavelet transform, IEEE Trans. Inf. Forensics Secur. 2 (2007) 321–330, http://dx.doi.org/10.1109/TIFS.2007.905146.

[42] W. Hong, T.-S. Chen, H.-Y. Wu, An improved reversible data hiding in encrypted images using side match, IEEE Signal Process. Lett. 19 (4) (2012) 199–202, http://dx.doi.org/10.1109/LSP.2012.2187334.

[43] G. Coatrieux, W. Pan, N. Cuppens-Boulahia, F. Cuppens, C. Roux, Reversible watermarking based on invariant image classification and dynamic histogram shifting, IEEE Trans. Inf. Forensics Secur. 8 (1) (2013) 111–120, http://dx.doi.org/10.1109/TIFS.2012.2224108.

[44] D.L. Donoho, For most large underdetermined systems of linear equations the minimal �1-norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59 (6) (2006) 797–829, http://dx.doi.org/10.1002/cpa.20132.

[45] R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the re-stricted isometry property for random matrices, Constr. Approx. 28 (3) (2008) 253–263, http://dx.doi.org/10.1007/s00365-007-9003-x.

[46] W. Johnson, J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in: Conference in Modern Analysis and Probability, New Haven, Conn., 1982, in: Contemporary Mathematics, vol. 26, American Mathematical Society, 1984, pp. 189–206.

[47] D. Achlioptas, Database-friendly random projections: Johnson–Lindenstrauss with binary coins, J. Comput. Syst. Sci. 66 (4) (2003) 671–687, http://dx.doi.org/10.1016/s0022-0000(03)00025-4.

[48] B.G. Ivanoff, N. Weber, Tail probabilities for weighted sums of products of nor-mal random variables, Bull. Aust. Math. Soc. 58 (02) (1998) 239–244, http://dx.doi.org/10.1017/S0004972700032214.

[49] E.J. Candès, J. Romberg, l1-magic: recovery of sparse signals via convex pro-gramming, www.acm.caltech.edu/l1magic/downloads/l1magic.pdf4.

[50] E.J. Candès, M.B. Wakin, An introduction to compressive sampling, IEEE Signal Process. Mag. 25 (2) (2008) 21–30, http://dx.doi.org/10.1109/MSP.2007.914731.

[51] E. Candès, Y. Plan, A probabilistic and RIPless theory of compressed sensing, IEEE Trans. Inf. Theory 57 (11) (2011) 7235–7254, http://dx.doi.org/10.1109/TIT.2011.2161794.

[52] E.J. Candès, J. Romberg, Sparsity and incoherence in compressive sampling, In-verse Probl. 23 (3) (2007) 969, http://dx.doi.org/10.1088/0266-5611/23/3/008.

[53] D.L. Donoho, X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Inf. Theory 47 (7) (2001) 2845–2862, http://dx.doi.org/10.1109/18.959265.

[54] R. Coifman, F. Geshwind, Y. Meyer, Noiselets, Appl. Comput. Harmon. Anal. 10 (1) (2001) 27–44, http://dx.doi.org/10.1006/acha.2000.0313.

[55] T. Tuma, P. Hurley, On the incoherence of noiselet and Haar bases, in: Interna-tional Conference on Sampling Theory and Applications, SAMPTA’09, 2009.

[56] J. Romberg, Imaging via compressive sampling, IEEE Signal Process. Mag. 25 (2) (2008) 14–20, http://dx.doi.org/10.1109/MSP.2007.914729.

[57] M.A. Figueiredo, R.D. Nowak, S.J. Wright, Gradient projection for sparse recon-struction: application to compressed sensing and other inverse problems, IEEE J. Sel. Top. Signal Process. 1 (4) (2007) 586–597.

[58] D. Donoho, A. Maleki, M. Shahram, Wavelab 850, software toolkit for time-frequency analysis.

Mehmet Yamaç received the B.S. degree in Electrical and Electronics Engineering from Anadolu University, Eskisehir, Turkey, in 2009 the M.S. degree in Electrical and Electronics Engineering from Bogaziçi University, Istanbul, Turkey, in 2014. He is currently a Ph.D. candidate in Electrical and Electronics Engineering at Bogaziçi University and teaching assistant. His research interests are computer and machine vision, machine learning and compressive sensing.

Çagatay Dikici received his B.S. (2001) and M.S. (2004) degrees in electrical and electronics engineering from Bogaziçi University, Istanbul, Turkey, and his Ph.D. (2007) degree in computer science from INSA de Lyon, France. Currently he works as a research engineer at Imagination Technologies, United Kingdom. His research interests include information theory, watermarking, compression and computer vision.

Bülent Sankur is presently at Bogazici University in the Department of Electrical-Electronic Engineering. His research interests are in the areas of digital signal processing, security and biometry, cognition and multimedia systems. He has served as a consultant in several industrial and gov-ernment projects and has been involved in various European framework and/or bilateral projects. He has held visiting positions at the University of Ottawa, Technical University of Delft, and Ecole Nationale Supérieure des Télécommunications, Paris. He was the chairman of EUSIPCO’05: The European Conference on Signal Processing, as well as technical chairman of ICASSP’00. Dr. Sankur is presently an associate editor in IEEE Trans. on Image Processing, Journal of Image and Video Processing, and IET Biomet-rics.

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib4D616C6C61743A323030383A5754533A31353235343939s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib4D616C6C61743A323030383A5754533A31353235343939s1

http://dx.doi.org/10.1073/pnas.0437847100

http://dx.doi.org/10.1109/JPROC.2010.2044010



http://refhub.elsevier.com/S1051-2004(15)00285-7/bib456C646172s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib456C646172s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib72617568757432303038696D706F73736962696C697479s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib72617568757432303038696D706F73736962696C697479s1


http://dx.doi.org/10.1016/j.crma.2008.03.014

http://dx.doi.org/10.1002/cpa.20124


http://dx.doi.org/10.1016/j.acha.2009.10.004







http://dx.doi.org/10.1007/s11042-012-1337-0

http://dx.doi.org/10.1002/j.1538-7305.1949.tb00928.x

http://dx.doi.org/10.1109/TKDE.2006.14




http://dx.doi.org/10.1080/09500340.2014.946565

http://dx.doi.org/10.1016/j.optcom.2014.12.084

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib4E6F69736553656E7369746976697479s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib4E6F69736553656E7369746976697479s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib44776F726Bs1




http://dx.doi.org/10.1109/LSP.2012.2187334


http://dx.doi.org/10.1002/cpa.20132

http://dx.doi.org/10.1007/s00365-007-9003-x

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib6A4C31s1




http://dx.doi.org/10.1016/s0022-0000(03)00025-4

http://dx.doi.org/10.1017/S0004972700032214

http://www.acm.caltech.edu/l1magic/downloads/l1magic.pdf4

http://dx.doi.org/10.1109/MSP.2007.914731


http://dx.doi.org/10.1088/0266-5611/23/3/008

http://dx.doi.org/10.1109/18.959265

http://dx.doi.org/10.1006/acha.2000.0313

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib74756D61494353544132303039s1

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib74756D61494353544132303039s1

http://dx.doi.org/10.1109/MSP.2007.914729

http://refhub.elsevier.com/S1051-2004(15)00285-7/bib6669677565697265646F323030376772616469656E74s1



http://dx.doi.org/10.1109/JPROC.2010.2044010




http://dx.doi.org/10.1016/j.crma.2008.03.014


http://dx.doi.org/10.1016/j.acha.2009.10.004

http://dx.doi.org/10.1007/s11042-012-1337-0


http://dx.doi.org/10.1016/s0022-0000(03)00025-4

http://dx.doi.org/10.1017/S0004972700032214


http://dx.doi.org/10.1109/18.959265

Digital Signal Processing · 2020. 9. 10. · signal of interest, N. S ∈R, then these sorted...

Documents

Transcript of Digital Signal Processing · 2020. 9. 10. · signal of interest, N. S ∈R, then these sorted...