Reversible Data Hiding Based on Histogram Modification of SMVQ Indices

11
638 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010 Reversible Data Hiding Based on Histogram Modification of SMVQ Indices Jiann-Der Lee, Member, IEEE, Yaw-Hwang Chiou, and Jing-Ming Guo, Senior Member, IEEE Abstract—This work presents a novel reversible data-hiding scheme that embeds secret data into a transformed image and achieves lossless reconstruction of vector quantization (VQ) in- dices. The VQ compressed image is modified by the side-matched VQ scheme to yield a transformed image. Distribution of the trans- formed image is employed to achieve high embedding capacity and a low bit rate. Moreover, three configurations, under-hiding, normal-hiding, and over-hiding schemes, are utilized to improve the proposed scheme further for various applications. Experi- mental results demonstrate that the proposed scheme significantly enhances the compression ratio and embedding capacity. Exper- imental results also show that the proposed scheme achieves the best performance among approaches in literature in terms of the compression ratio and embedding capacity. Index Terms—Image compression, lossless data hiding, lossless recovery, vector quantization (VQ). I. INTRODUCTION R APID advances in the Internet allow users to commu- nicate and exchange information daily. Hence, massive amounts of digital information, e.g., digital images, video, and audio, are transmitted over the Internet, subsequently incurring security problems such as interception, modification, and mon- tage. Thus, ensuring that information exchanged over the In- ternet remains safe and secure has become extremely important. Data hiding is an important means of embedding secret data into a cover image with minimal perceptual degradation. To avoid raising the attention of a third party to an embedded image, the quality of the cover image must be close to that of the original image. Generally, information-hiding tech- niques can be classified into two categories, namely, reversible information-hiding schemes [1]–[5] and irreversible informa- tion-hiding schemes [6]–[8]. However, for some applications, such those for military, legal literature, medicine, and fine artwork, the original cover image can be recovered to main- tain content integrity. Only secret data can be extracted with Manuscript received March 23, 2010; revised July 16, 2010; accepted Au- gust 04, 2010. Date of publication August 16, 2010; date of current version November 17, 2010. This work was supported in part by the National Science Council, Republic of China, Taiwan, under Grant NSC98-2220-E-182-002 and Grant NSC 99-2631-H-011-001. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Adnan Alattar. J.-D. Lee and Y.-H. Chiou are with the Department of Electrical Engineering, Chang Gung University, Tao-Yuan 333, Taiwan (e-mail: [email protected]. tw). J.-M. Guo is with the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2010.2066971 irreversible information-hiding schemes and no restoration of cover images is made. Conversely, secret data in reversible information-hiding schemes are extracted and cover images can be restored completely. Information-hiding techniques can be performed in three domains, i.e., the spatial domain [5], [9]–[11], frequency do- main [12]–[17], and compressed domain. Information-hiding techniques in the spatial domain modify pixel values in a cover image. The least-significant bit (LSB) modification method, which uses fixed LSBs in each pixel to embed a secret message, is the most common and easily employed method for hiding a message within an image. In the frequency domain, a cover image is transformed into the frequency domain using a frequency-oriented mechanism (e.g., DCT, FFT, DWT, and IWT). The transformed coefficients of sub-bands with little sensibility to the human visual system (HVS) are modified to embed secret messages. Researchers have recently begun work on the data-hiding techniques in the compressed domain. In 2002, Jo and Kim [22] proposed a data-hiding technique based on vector quantization (VQ). They paired two extremely similar codewords and assigned them to two different groups. One group hides secret bit 0, and the other hides secret bit 1. The codewords that cannot make pairs are organized into the third group, which cannot hide secret data. Although the process for data hiding and extraction in this scheme is easily employed, embedding capacity is unsatisfactory and this scheme is an irreversible information-hiding scheme. In 2006, Shie et al. developed an adaptive data-hiding scheme based on VQ-com- pressed code [6]. In this scheme, image blocks are classified as embeddable and unembeddable blocks based on variances and side-match distortions (SMDs) of blocks. The embeddable blocks are employed to hide secret data, and unembeddable blocks remain unchanged. Although this scheme has a high embedding capacity, the quality of the reconstructed image reduces as the quantity of secret data increases. Additionally, this scheme is an irreversible information-hiding scheme. During 2006–2009, Chang et al. created various reversible data-hiding schemes on the VQ-compressed domain [1]–[4], [24]. In [24], a side-matched VQ-based (SMVQ) reversible data-hiding scheme was proposed, where secret data is em- bedded and the codeword is modified by two codewords with different proportions. Finally, the embedded result is trans- mitted without an index table. In [1], a secret data-hiding scheme was proposed based on the search-order coding com- pression method of VQ indices. Search-order codes (SOCs) and original index values (OIVs) are used to hide secret bit 0 or 1. Although low capacity is the principal shortcoming of this scheme, the embedded result will not be distorted. In [3], a VQ-based data-hiding scheme with recovery capability was 1556-6013/$26.00 © 2010 IEEE

Transcript of Reversible Data Hiding Based on Histogram Modification of SMVQ Indices

638 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010

Reversible Data Hiding Based on HistogramModification of SMVQ Indices

Jiann-Der Lee, Member, IEEE, Yaw-Hwang Chiou, and Jing-Ming Guo, Senior Member, IEEE

Abstract—This work presents a novel reversible data-hidingscheme that embeds secret data into a transformed image andachieves lossless reconstruction of vector quantization (VQ) in-dices. The VQ compressed image is modified by the side-matchedVQ scheme to yield a transformed image. Distribution of the trans-formed image is employed to achieve high embedding capacityand a low bit rate. Moreover, three configurations, under-hiding,normal-hiding, and over-hiding schemes, are utilized to improvethe proposed scheme further for various applications. Experi-mental results demonstrate that the proposed scheme significantlyenhances the compression ratio and embedding capacity. Exper-imental results also show that the proposed scheme achieves thebest performance among approaches in literature in terms of thecompression ratio and embedding capacity.

Index Terms—Image compression, lossless data hiding, losslessrecovery, vector quantization (VQ).

I. INTRODUCTION

R APID advances in the Internet allow users to commu-nicate and exchange information daily. Hence, massive

amounts of digital information, e.g., digital images, video, andaudio, are transmitted over the Internet, subsequently incurringsecurity problems such as interception, modification, and mon-tage. Thus, ensuring that information exchanged over the In-ternet remains safe and secure has become extremely important.

Data hiding is an important means of embedding secret datainto a cover image with minimal perceptual degradation. Toavoid raising the attention of a third party to an embeddedimage, the quality of the cover image must be close to thatof the original image. Generally, information-hiding tech-niques can be classified into two categories, namely, reversibleinformation-hiding schemes [1]–[5] and irreversible informa-tion-hiding schemes [6]–[8]. However, for some applications,such those for military, legal literature, medicine, and fineartwork, the original cover image can be recovered to main-tain content integrity. Only secret data can be extracted with

Manuscript received March 23, 2010; revised July 16, 2010; accepted Au-gust 04, 2010. Date of publication August 16, 2010; date of current versionNovember 17, 2010. This work was supported in part by the National ScienceCouncil, Republic of China, Taiwan, under Grant NSC98-2220-E-182-002 andGrant NSC 99-2631-H-011-001. The associate editor coordinating the reviewof this manuscript and approving it for publication was Dr. Adnan Alattar.

J.-D. Lee and Y.-H. Chiou are with the Department of Electrical Engineering,Chang Gung University, Tao-Yuan 333, Taiwan (e-mail: [email protected]).

J.-M. Guo is with the Department of Electrical Engineering, NationalTaiwan University of Science and Technology, Taipei 106, Taiwan (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2010.2066971

irreversible information-hiding schemes and no restoration ofcover images is made. Conversely, secret data in reversibleinformation-hiding schemes are extracted and cover imagescan be restored completely.

Information-hiding techniques can be performed in threedomains, i.e., the spatial domain [5], [9]–[11], frequency do-main [12]–[17], and compressed domain. Information-hidingtechniques in the spatial domain modify pixel values in a coverimage. The least-significant bit (LSB) modification method,which uses fixed LSBs in each pixel to embed a secretmessage, is the most common and easily employed method forhiding a message within an image. In the frequency domain, acover image is transformed into the frequency domain usinga frequency-oriented mechanism (e.g., DCT, FFT, DWT, andIWT). The transformed coefficients of sub-bands with littlesensibility to the human visual system (HVS) are modified toembed secret messages. Researchers have recently begun workon the data-hiding techniques in the compressed domain. In2002, Jo and Kim [22] proposed a data-hiding technique basedon vector quantization (VQ). They paired two extremely similarcodewords and assigned them to two different groups. Onegroup hides secret bit 0, and the other hides secret bit 1. Thecodewords that cannot make pairs are organized into the thirdgroup, which cannot hide secret data. Although the process fordata hiding and extraction in this scheme is easily employed,embedding capacity is unsatisfactory and this scheme is anirreversible information-hiding scheme. In 2006, Shie et al.developed an adaptive data-hiding scheme based on VQ-com-pressed code [6]. In this scheme, image blocks are classifiedas embeddable and unembeddable blocks based on variancesand side-match distortions (SMDs) of blocks. The embeddableblocks are employed to hide secret data, and unembeddableblocks remain unchanged. Although this scheme has a highembedding capacity, the quality of the reconstructed imagereduces as the quantity of secret data increases. Additionally,this scheme is an irreversible information-hiding scheme.

During 2006–2009, Chang et al. created various reversibledata-hiding schemes on the VQ-compressed domain [1]–[4],[24]. In [24], a side-matched VQ-based (SMVQ) reversibledata-hiding scheme was proposed, where secret data is em-bedded and the codeword is modified by two codewords withdifferent proportions. Finally, the embedded result is trans-mitted without an index table. In [1], a secret data-hidingscheme was proposed based on the search-order coding com-pression method of VQ indices. Search-order codes (SOCs)and original index values (OIVs) are used to hide secret bit 0or 1. Although low capacity is the principal shortcoming ofthis scheme, the embedded result will not be distorted. In [3],a VQ-based data-hiding scheme with recovery capability was

1556-6013/$26.00 © 2010 IEEE

LEE et al.: REVERSIBLE DATA HIDING BASED ON HISTOGRAM MODIFICATION OF SMVQ INDICES 639

developed. First, in their 2-bit embedding method, codewordsin the original VQ codebook are sorted in descending orderbased on referred counts. The sorted codebook is then parti-tioned into six clusters. Clusters with the two largest referredcounts are employed to hide secret data. The other clustersare used only for image reconstruction. Since indicators areadded in front of most encoded indices and only two clusterscan be used to hide secret data, this scheme cannot obtain asatisfactory embedding capacity and bit rate (BR). In [4], secretdata are hidden in a compressed image based on the principlethat neighboring blocks in an image are highly correlated,and frequently occurring indices are encoded by short codesand rare symbols are encoded by long codes. In this scheme,a 1-bit indictor is always required for each index and only1-bit secret data can be embedded in each index. This schemecannot obtain a large embedding capacity. Shie and Lin [18]employed SOCs to encode each index of an index table in rasterscan order. An encoded index is denoted as an SOC or OIVcodes and is replaced with a -bits code or remains unchanged.The embedding capacity of this scheme is significantly betterthan that of other schemes. However, the BR of this schemeis also larger than that of VQ. In [19], Yang and Lin extendedthe scheme developed by Chang et al. [3], in which the VQcodebook is divided into clusters and half of these clustersare used to embed secret data, where is the size of secret dataembedded into each VQ index.

Based on the special distribution of a transformed image andinspired by the scheme developed by Chang et al. [3], this workrestores a compressed image and increases embedding capacityusing a novel reversible data-hiding scheme in the SMVQ-com-pressed domain. Compared with previous approaches [3], [19],which sort the VQ codebook in descending order based onreferred counts, the proposed scheme employs SMVQ to sortcodebooks. The main problems in previous schemes [3], [19]are low embedding capacity and high BR. To solve these prob-lems, the proposed scheme applies SMVQ to the index table togenerate a transformed image. Indices in the transformed imageare modified for camouflage. During the extraction phase, theproposed algorithm recovers the original transformed imageand extracts secret data; SMVQ is then applied to restore theoriginal VQ index. As confirmed by experimental results, theproposed scheme outperforms other approaches [3], [18], [19]in terms of embedding capacity and BR.

The remainder of this paper is organized as follows. Section IIpresents related work to briefly introduce the concept of VQ andSMVQ theory, and reviews the schemes developed by Chang etal. [3], and Yang and Lin [19]. Section III describes the pro-posed scheme in detail. Section IV gives experimental resultsand performances comparisons of the proposed scheme and ex-isting approaches. Finally, Section V presents conclusions.

II. RELATED WORK

A. Brief Concept of VQ

VQ initially involves constructing a codebook from a set oftraining images; the elements in the codebook are called code-words. Generally, the LBG algorithm [20] is employed to yieldthe desired codebook. With the generated codebook, each block

Fig. 1. Example of SMVQ.

in an image is encoded with the index of the nearest codeword,such that total storage space for an image is minimized. To mea-sure the similarity of a block, , and thenearest codeword, , where is the thcodeword in a codebook, the squared Euclidean distance is gen-erally used

(1)

When the encoding process is complete, block is only rep-resented by the index of the nearest codeword. This represen-tation significantly reduces overall storage space. To decode animage, table lookup is executed using the same codebook as thatused by the encoder.

B. SMVQ

Kim first proposed SMVQ for image coding [21]. Thehigh correlation among neighboring blocks is the key featureexploited in SMVQ, where the state codebook is generatedaccording to a side-match prediction. A side-match predictionassumes the values of pixels adjacent to neighboring blocksare equal. The blocks in SMVQ located in the first row andfirst column of an image are encoded using conventionalVQ, and the remaining blocks are predicted using theirneighboring encoded blocks. Fig. 1 shows an example ofthe relationships among an encoding block , its upperneighboring block , and its left neighboring block .This work denotes the border vector and side vector ofblock as

and

).The squared Euclidean distance is utilized to measure SMD

between block (predicted by blocks and ) and a codewordcw in the codebook, as follows:

(2)

Thus, the SMVQ sorts codebook CB according to SMDs, andthe first codewords in codebook CB are selected to form statecodebook SC, in which SMDs are smallest for block . TheSMVQ then picks the closest codeword from the state codebookfor block to encode block . Because the size of the state

640 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010

codebook is smaller than that of the codebook, the size of theindex is reduced to improve coding gain.

C. Scheme Developed by Chang et al. [3]

The VQ codebook is sorted using referred counts to obtaina sorted codebook. For the case that embeds two secret bitsinto each index, the indices corresponding to the first one-thirdof codewords in the sorted codebook are called embeddableindices and the remaining indices are called unembeddableindices. The embeddable indices are classified into two clusters,cluster and cluster . The unembeddable indices are partitionedinto four clusters, cluster cluster cluster , and cluster .Cluster and cluster with relatively higher referred counts areutilized to embed secret data. The other clusters are employedto reconstruct the image. As the number of embeddable in-dices in a compressed image increases, embedding capacityincreases. One embeddable index must cooperate with fourunembeddable indices, each of which belonging to a differentcluster during an embedding task.

No indicator is required during the encoding process whenan encoding index belongs to cluster cluster and secret data

or [ or ] are utilized; otherwise, one in-dicator is required. Additionally, one indicator is required whenan encoding index belongs to cluster or cluster , and no indi-cator is required when an encoding index belongs to clusteror cluster . The length of the encoding stream of an embed-dable index is bits, where is codebook sizewhen the encoded index is followed by an indicator; otherwise(without indicator), the length of the encoding stream of an em-beddable index is bits. The length of the encoding streamof an unembeddable index is bits,where is the size of a cluster when the encoded index is fol-lowed by an indicator; otherwise (without indicator), the lengthof the encoding stream of the unembeddable index isbits.

The data-hiding and encoding processes in the scheme devel-oped by Chang et al. [3] are defined in (3)–(10), where is theencoding index; is the encoded index; denotes 2-bit secretdata; is the cluster size; location is the position of index

in a cluster; value is the location th value in cluster, and is the concatenation operation

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

As mentioned, to increase the embedding capacity, thescheme designed by Chang et al. [3] has the following char-acteristics. 1) The number of embeddable indices determinesembedding capacity, meaning that embedding capacity in-creases as the number of embeddable indices increases. 2) Toachieve two-bit embedding, one embeddable index must co-operate with four unembeddable indices. Thus, embeddingcapacity is increased when the number of unembeddable in-dices used to encode an embeddable index increases. A tradeoffalways exists between the number of embeddable indices andunembeddable indices. 3) Indicators 1 and 0 are required forthe encoding process. Moreover, the size of an indicator is thesame as that of an index. As the number of added indicatorsincreases, code size increases. 4) The embeddable indices ofthe sorted codebook may not be compatible with all images.

D. Scheme Developed by Yang and Lin [19]

To improve the approach developed by Chang et al., Yangand Lin [19] sorted a VQ codebook using referred counts anddivided the codebook into four clusters; the first half of theseclusters with the highest referred counts are utilized to embedsecret data. All unused index values are employed as additionalindicators to reduce coding length. Additionally, Yang and Linalso proposed a strategy for exchanging indexes in the sortedcodebook to increase embedding capacity. Using this strategyis because that the sorted codebook obtained in advance fromtraining images is not suitable for processed cover images. Thus,some seldom referred count codewords are in the front half ofthe codebook, and some frequently referred count codewordsare in the second half of the codebook. Consequently, they ex-change the seldom referred count codewords in the front halfof the codebook with the frequently referred count codeword inthe second half of the codebook to obtain a new sorted code-book with a high capacity for data-hiding.

Moreover, the approach generated by Yang and Lin combinesreversible SMVQ hiding with reversible VQ hiding to enhancecompression rate. Specifically, for an encoding block , a statecodebook with codewords is constructed using its upperneighboring block and its left neighboring block . If en-coding block is located in state codebook , block isencoded by reversible SMVQ hiding; otherwise, block isencoded by reversible VQ hiding. Since both VQ and SMVQhiding are utilized, an additional 1-bit indicator for each en-coding index is required to determine which one is adoptedduring the encoding process. However, encoding length can bereduced to bits via SMVQ hiding. Com-pared to the technique developed by Chang et al., the approachgenerated by Yang and Lin has higher embedding capacity andsuperior compression rate.

LEE et al.: REVERSIBLE DATA HIDING BASED ON HISTOGRAM MODIFICATION OF SMVQ INDICES 641

Fig. 2. Histogram of compressed image “Lena.”

Fig. 3. Histogram of transformed image “Lena.”

III. PROPOSED SCHEME

A grayscale cover image of size is partitioned intononoverlapped blocks of size , meaning that

blocks can be obtained with a cover image. Each blockof the image can be encoded using VQ and, thus, each block isrepresented with an index of the nearest codeword in the code-book. A compressed image consists of the indices, also calledcompression code. The compression code length is proportionalto codebook size. The SMVQ can be further applied to a com-pressed image to generate a transformed image. To reconstruct acompressed image perfectly, the size of the state codebook andthat of the codebook are first set as equal. Next, the codebookis sorted by SMD between the encoding block and codewordsin the codebook. Finally, all codewords in the sorted codebookare selected to generate the state codebook. For each index ofa compressed image, a transformed index can be obtained like-wise.

Figs. 2 and 3 show histograms of the “Lena” compressed re-sult and corresponding transformed result, respectively, with acodebook of size 256. Most transformed indices are distributedaround zero. Fig. 4 shows the relationships among an image con-

Fig. 4. Relationships among image, compressed image, transformed image,and encoded image.

TABLE IINDICATOR FOR EACH PORTION

sisting of pixels, a compressed image consisting of compressionindices, a transformed image consisting of transformed indices,and an encoded image consisting of encoded indices.

A. Classification of Indices

The histogram of a transformed image is normally distributedaround zero. A transformed image generally spans by the trans-formed indices with the smallest values (Fig. 3). The indicesin a transformed image are first categorized into three portions,i.e., portion 1, portion 2, and portion 3, with distinct sizes forhiding secret data. Indices in portion 1 have the smallest values,while indices in portion 3 have the largest values. Suppose thesize of a state codebook is , and is identical to the size ofthe VQ codebook. For and , portion1 has indices with values between and ; portion 2 hasindices with values between and ; and portion 3 hasindices with values between and . Indices in portion1 are utilized to hide secret data and reconstruct an image; in-dices in portion 2 are employed to compress the code streamand reconstruct an image; and indices in portion 3 are only usedfor image reconstruction. Compared to the scheme developedby Chang et al. [3], which uses referred counts to sort the code-book, the proposed scheme uses the distribution of indices tocategorize transformed indices into the three portions. Finally,a large number of transformed indices are in portion 1.

Because transformed indices are classified into the three por-tions, some leading codes (indicators) are required to label dif-ferent portions (Table I).

B. Encoding

Suppose the size of secret data is bits. During the en-coding process, an encoding index belonging to portion 1 is de-noted by and the encoded index is denoted by . Then, the

642 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010

Fig. 5. Example of the proposed method of embedding 2-bit secret data.

TABLE IIHIDING SPACE AND ENCODING VALUES FOR VARIOUS NUMBERS OF

EMBEDDING INDICES �� � ��

is expressed in (11), where ** indicates an exponential op-erator

(11)

Fig. 5 shows an example of the proposed method for embed-ding 2-bit secret data into each transformed index belonging toportion 1. The embedding function (Fig. 5) can be realized when

in (11) is set at 2. Suppose the size of the encoding space foreach embeddable transformed index is bits. To restore theindex from an encoded index, the hiding space of each index isreduced to floor . Table II shows the hidingspace and encoding values when is set to 7 with 1, 2, 4, and8 indices. For transformed index 0, the hiding space is 4 bitsand encoding values are 0–15 when the number of embeddableindices is 8. A large value in can yield a large number of em-beddable indices. Accordingly, the hiding space of each indexis reduced significantly.

An encoding index belonging to portion 2 is denoted by ,and the encoded index is denoted by . Then, is expressedas

(12)

(12-1)

In the proposed scheme, indices belonging to portion 2 areutilized to compress the code stream and for image reconstruc-tion. Thus, and are set to zero in (12), denoted by (12-1).Portion 2 has indices. To restore the transformed index,

bits are utilized to represent the encodedindex. The coding gain can be further enhanced if the size of theencoded index is bits.

Similarly, an encoding index belonging to portion 3 is de-noted by and the encoded index is denoted by . Then,is expressed as

(13)

(13-1)

In the proposed scheme, indices belonging to portion 3 aresimply employed for image reconstruction. Thus, and in(13) are set to zero, denoted by (13-1). The transformed indexcan be restored when the encoded index is represented by

bits.In the transformation phase (a transformed index is acquired

by applying SMVQ to the VQ compressed stream), a compres-sion index associated with the first column and first row of thecompressed image remains unchanged. Indices associated withthe first row (column), except for the first one, simply use the left(up) index to calculate the distortion of the SMVQ. During theencoding phase, a transformed index can be encoded by (11),(12-1), or (13-1). Furthermore, data hiding can be applied tocover images using three procedures under various demands,i.e., over hiding, normal hiding, and under hiding. For an indexbelonging to portion 1, the index in over hiding is encoded bymore than bits and can hide additional information,where is the size of the VQ codebook. Meanwhile, the BRis also increased. During normal hiding, an index can be en-coded by bits and can hide more information. In thisconfiguration, BR is less than or equal to that of VQ. Duringunder hiding, an index can be encoded with bitsand relatively less information can be hidden. In this case, BRis markedly decreased.

The encoding process of the proposed method is summarizedas follows.

Input: A grayscale cover image of size , acodebook CB of size , and a secret data stream.

Output: The code stream in binary form .Step 1. Compress cover image using VQ to obtain

compressed image of size .Step 2. Transform compressed image using SMVQ to

obtain transformed image of size .Step 3. Read the next transformed index, denoted as ,

from the transformed image in the raster scanorder.

Step 4. If belongs to portion 1, thenStep 4.1. (secret data), where S2D

denotes the -bit data from the secret datastream,

Step 4.2. is encoded using (11),

where is the concatenation operation andD2B is the decimal to binary operation.

Step 5. If belongs to portion 2, thenis encoded using (12-1),

indicator 2 D2B .Step 6. If belongs to portion 3, then

is encoded using (13-1),indicator 3 D2B .

LEE et al.: REVERSIBLE DATA HIDING BASED ON HISTOGRAM MODIFICATION OF SMVQ INDICES 643

Fig. 6. Example of the encoding process to hide 3-bit secret data. (a) Imagemap. (b) Secret data. (c) Compressed image. (d) Transformed image. (e) En-coded image.

Step 7. Repeat Steps 3–7 until all transformed indices in thetransformed image are processed.

Step 8. Output the code stream .

During the encoding process, the length of the encoded resultof an index is , orfor the indices belonging to portion 1, portion 2, or portion 3,respectively. Fig. 6 shows an example of the encoding processfor hiding 3-bit secret data. Let be indicator 1, be indi-cator 2, and be indicator 3, where ,and , and a transformed index is obtained using SMVQwith a state codebook of size 32 (identical to the size of the VQcodebook). For each image (Fig. 6), let to be the indiceslocated in the first row and to be the indices located inthe second row, and so on. Initially, the transformed image isacquired by applying SMVQ to the compressed image. For thetransformed image, has a value of 0 and belongs to portion 1.Secret data are . According to (11), a leading code fol-lowed by encoding code is presented as the encoded re-sult for . Notably, has a value of 1, and belongs to portion1. The secret data are . A leading code followed by en-coding code is the encoded result for . Notably,has a value of 2 and belongs to portion 2. According to (12-1),a leading code followed by encoding code are outputas the encoded result for . Similarly, has a value of 31and belongs to portion 3. According to (13-1), a leading code

followed by encoding code are the encoded resultfor .

C. Decoding and Extraction

Let , , and represent the encoded index (excludedindicator), restored index, and restored , respectively. For anencoded index belonging to portion 1, and can be derivedby (14) and (15), respectively,

(14)

(15)

For an encoded index belonging to portion 2, can be de-rived by

(16)

For an encoded index belonging to portion 3, can be de-rived by

(17)

The decoding and extraction process of the proposed methodis summarized as follows.

Input: The code stream in binary form .Output: The reconstructed cover image of size ,

and retrieved secret data.Step 1. Set .Step 2. Read the indicator (1-bit) from code stream ,

.Step 3. If indicator , then

read the next bits from code stream ,,

mod 2 ** ,,

where B2D is binary to the decimal operation.Step 4. If indicator , then read the next 1 bit,

Step 4.1. If read bit , then read thenext bits, B2D

,,.

Step 4.2. If read bit , then read the nextbits,

B2D,.

Step 5. Repeat Steps 2–5 until all bits in the code streamare processed.

Step 6. Output the reconstructed compressed image andsecret data.

Fig. 7 shows an example of the decoding process. Accordingto the proposed algorithm, an indicator can be identified fromcode stream . At the receiver, the decoder can extract em-bedded secret data and reconstruct the original compressedimage as follows. Initially, the next indicator is read from codestream . The first indicator is recognized as and, thus, thecorresponding encoded index belongs to portion 1. Then, thenext bits are read and converted into a decimal value.The original transformed index, 0, is recovered by (14), andsecret data, , are restored by (15). Similarly, the secondindicator is recognized as , and the corresponding encodedindex belongs to portion 1. Then, the next bits areread and converted into a decimal value. The original trans-formed index, 1, is recovered by (14), and secret data, ,are restored by (15). Next, the third indicator is recognized as

, and the corresponding encoded index belongs to portion2. Then, the next bits are read and converted into a

644 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010

Fig. 7. Example of the decoding process.

decimal value. The original transformed index, 2, is recoveredby (16). The eighth indicator is recognized as and, thus,the corresponding encoded index belongs to portion 3. Then,the next (32) bits are read and converted into a decimalvalue. The original transformed index, 31, is recovered by (17).Finally, the compressed image is reconstructed by applyingSMVQ to the restored transformed image.

IV. EXPERIMENTAL RESULTS

Experiments are conducted to assess the performance ofthe proposed method. Various test images sized 512*512 [23](Fig. 8) are employed as training images or cover images.The codebooks with sizes of 128, 256, 512, and 1024, with16 dimensional codewords, were trained by the LBG algorithmusing images of “Airplane,” “Boat,” “Lena,” “Peppers,” and“Sailboat.” These codebooks are utilized to evaluate com-pression performance of the proposed scheme and schemesin literature. In total, 16 384 compression codes (indices) arecreated when a 512*512 image is encoded using VQ, of which16 dimensional codewords are in the codebook.

The secret data in the experiment is in binary format, 0 and1, and are generated by a pseudorandom number generator. If ahigh level of security is required, secret data can be encryptedprior to embedding using well-known cryptographic methodsas DES or RSA. Since the proposed scheme provides losslessrecovery, the quality of an embedded image is not discussedhere. Instead, this work focuses on embedding capacity (EC)results and overall code size. The BR is defined in (18), where

is the size of . For a compression method, a small BRindicates good compression performance and vice versa

(18)

Fig. 9 shows the performance of the proposed algorithm(normal hiding) using different codebook sizes, and differentnumber of indices in portion 1 and portion 2 for the image“Lena.” The encoded size of an index in portion 1 isbits, where is codebook size. Thus,and for codebook of sizes of 128, 256, 512, and 1024,respectively. For the codebook size of 128, the largest EC is52 180 bits when the number of indices in portion 1 is 2 andthat in portion 2 is 8 (Fig. 9). Figs. 9(b)–(d) show simula-tion results for different codebook sizes. The largest ECs are56 315, 58 200, and 59 330 bits for codebook sizes are 256,512, and 1024, respectively. Additionally, variation in ECs

Fig. 8. Test images. (a) Airplane. (b) Lena. (c) Peppers. (d) Boat. (e) Sailboat.(f) Tiffany. (g) House. (h) Elaine.

of the image is 7 k bits for different codebook sizes. Fig. 10shows the performance of the proposed algorithm (normalhiding) with different codebook sizes and different numbersof indices in portion 1 and portion 2 for the image “House.”The largest embedding capacities are 41 970, 44 335, 45 090,and 45 640 bits for each codebook size of 128, 256, 512, and1024, respectively. Variation in embedding capacities of theimage is 4 k bits for various codebook sizes. Moreover, allBRs of the two images are less than the traditional BRs ofVQ. The number of indices in portion 1 and portion 2 aredenoted as and , respectively. The above two images,“Lena” and “House,” have the largest EC and smaller BR when( Codebook size (2, 8, 128), (4, 16, 256), (8, 32,512), and (16, 64, 1024). The EC may not increase when thenumber of indices in portion 1 increases, because increasingthe number of indices in portion 1 reduces the hiding space ofeach index in portion 1

Table III shows simulation results (normal hiding) for var-ious test images. These simulation results are obtained with

Codebook size (2, 8, 128), (4, 16, 256), (8, 32,512), and (16, 64, 1024). For each codebook size, the BR issmaller than the BR of VQ. For codebook of sizes 128, 256, and512, a large codebook size can yield a high EC. A small bit vari-ation exists for the EC difference between codebook sizes 512and 1024. When codebooks are sized 128, 256, 512, or 1024,

LEE et al.: REVERSIBLE DATA HIDING BASED ON HISTOGRAM MODIFICATION OF SMVQ INDICES 645

Fig. 9. Performance of the proposed algorithm (normal hiding) based on dif-ferent codebooks, different number of portion 1, and different number of portion2 for image “Lena.” (a) Codebook size: 128. (b) Codebook size: 256. (c) Code-book size: 512. (d) Codebook size: 1024.

the average EC is 3.0, 3.3, 3.4, 3.3 bits/index, respectively. Inthe proposed scheme, as the indices of portion 1 support a highEC and the indices in portion 2 support a high compression rate,the proposed scheme provides a very high EC and compressionefficiency. Tables IV and V present simulation results for testimages for cases of under hiding and over hiding, respectively.For the case of under hiding, the hiding space of each index is

Fig. 10. Performance of the proposed algorithm (normal hiding) based on dif-ferent codebooks, different number of portion 1, and different number of portion2 for image “House.” (a) Codebook size: 128. (b) Codebook size: 256. (c) Code-book size: 512. (d) Codebook size: 1024.

reduced and a small BR and low EC are then obtained. Con-versely, for the case of over hiding, the hiding space of eachindex is increased and a large BR and high EC are acquired.

To demonstrate the superiority of the proposed scheme,those developed by Chang et al. [3], Yang and Lin [19]VQ SMVQ , and Shie and Lin [18] are used for comparison.

Herein, two codebooks sizes, 256 and 512, are employed in

646 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010

TABLE IIISIMULATION RESULTS (NORMAL HIDING) FOR VARIOUS TEST IMAGES

TABLE IVSIMULATION RESULTS (UNDER HIDING) FOR VARIOUS TEST IMAGES

experiments. Tables VI and VII show comparison results.The left and middle of Table VI show the EC and BR of theschemes developed by Chang et al. [3] and Yang-Lin [19]with 1 and 2 bits of secret data embedded into each index,respectively. The right of Table VI shows the EC and BR ofthe proposed scheme (under hiding) with indices encodedby , or bits. The ECand BR of the proposed scheme, with indices encoded by

, are superior to those of other schemes, withindices encoded by 2-bit secret data. Particularly, all ECs of theproposed scheme are much higher than those of other schemes.

TABLE VSIMULATION RESULTS (OVER HIDING) FOR VARIOUS TEST IMAGES

Moreover, a small EC and low BR are required occasionally.The proposed scheme, with indices encoded byor bits, and other schemes with indices encodedby 1-bit secret data, satisfy this requirement (Table VI). In thiscase, the EC and BR of the proposed scheme are still superiorto those of existing schemes. The proposed scheme performsbest, followed by that by Yang and Lin, and by Chang et al.(Table VI). As mentioned, when a low EC is required, theproposed scheme with under hiding is satisfactory for datahiding and normal hiding, and over hiding not needed.

Generally, a small BR and high EC are associated with goodperformance for data-hiding and compression systems. The pro-posed scheme (encoded space is bits) with differentcodebook sizes can meet the requirement of supporting a veryhigh EC and slightly increased BR. Table VII shows the twoschemes with high ECs of codebook sizes 256 and 512. The leftof Table VII shows the EC and BR of the scheme developedby Shie-Lin [18], which encoded an SOC with bitsand embedded bits of secret data into eachindex value. The right of Table VII shows the EC and BR of theproposed scheme (over hiding) with encoded bits .Because the indices in portion 1 support a high EC and the in-dices in portion 2 support a high compression rate, the proposedscheme, which employs transformed images as cover objects toembed secret data, can obtain high EC and good compressionefficiency simultaneously. Thus, indices in portion 1 support ahigh EC and indices of portion 2 support a high compressionrate. Each EC and BR of the proposed scheme are superior tothose of the scheme developed by Shie-Lin. Each EC of the pro-posed scheme is much larger than those of the scheme by Shieand Lin [18]. Notably, when a high EC is required, over hidingwith the proposed scheme is necessary, and the cover image willhave a high BR.

LEE et al.: REVERSIBLE DATA HIDING BASED ON HISTOGRAM MODIFICATION OF SMVQ INDICES 647

TABLE VIEC AND BR OF CHANG ET AL’S, YANG AND LIN’S, AND THE PROPOSED (UNDER HIDING) SCHEMES

TABLE VIIPERFORMANCE COMPARISON OF THE PROPOSED SCHEME (OVER HIDING)

AND SHIE AND LIN’S SCHEME

Compared with the scheme of Chang et al., which embeds1 bit of secret data into each index value, the proposed algo-rithm increases the EC from 33% to 64%. Similarly, comparedwith the scheme by Chang et al., which embeds 2 bits of secretdata into each index value, the proposed algorithm increases theEC from 26% to 48%. Compared with the scheme by Yang-Lin,

which embeds 1 bit of secret data into each index value, the pro-posed algorithm increases the EC from 14% to 59%. Similarly,compared with the scheme by Yang-Lin, which embeds 2 bitsof secret data into each index value, the proposed algorithm in-creases the EC from 13% to 40%. Additionally, Compared withthe approach by Shie-Lin, the proposed algorithm increases theEC from 1% to 36%.

V. CONCLUSION

This work presents a novel reversible data-hiding schemebased on the distribution of the transformed image obtainedby applying SMVQ to a VQ compressed image. The proposedscheme can embed secret data into a transformed image andrestore the VQ compressed image after secret data extraction.Since the indices in portion 1 support a high EC and indices inportion 2 support high coding gain, the proposed scheme hasa very high EC and compression efficiency. Additionally, theproposed scheme has the better efficiency than existing schemesunder various codebook sizes for the eight test images.

REFERENCES

[1] C.-C. Chang, G.-M. Chen, and M.-H. Lin, “Information hiding basedon search-order coding for VQ indices,” Pattern Recognit. Lett., vol.25, no. 11, pp. 1253–1261, 2004.

[2] C.-C. Chang, Y.-P. Hsieh, and C.-Y. Lin, “Lossless data embeddingwith high embedding capacity based on declustering for VQ-com-pressed codes,” IEEE Trans. Inf. Forensics Security, vol. 2, no. 3, pp.341–349, Sep. 2007.

[3] C.-C. Chang, W.-C. Wu, and Y.-C. Hu, “Lossless recovery of a VQindex table with embedded secret data,” J. Vis. Commun. Image Rep-resent., vol. 18, no. 3, pp. 207–216, 2007.

648 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010

[4] C.-C. Chang, T. D. Kieu, and Y.-C. Chou, “Reversible informationhiding for VQ indices based on locally adaptive coding,” J. Vis.Commun. Image Represent., vol. 20, no. 1, pp. 57–64, 2009.

[5] J. Tian, “Reversible data embedding using a difference expansion,”IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 8, pp. 890–896,Aug. 2003.

[6] S.-C. Shie, S. D. Lin, and C.-M. Fang, “Adaptive data hiding basedon SMVQ prediction,” IEICE Trans. Inf. Syst., vol. E89-D, no. 1, pp.358–362, 2006.

[7] M.-N. Wu, C.-C. Lin, and C.-C. Chang, “An embedding techniquebased upon block prediction,” J. Syst. Software, vol. 81, no. 9, pp.1505–1516, 2008.

[8] W.-C. Du and W.-J. Hsu, “Adaptive data hiding based on VQ com-pressed images,” Proc. Inst. Elect. Eng., Vision, Image and Signal Pro-cessing, vol. 150, no. 4, pp. 233–238, Aug. 2003.

[9] C.-L. Tsai, H.-F. Chiang, K.-C. Fan, and C.-D. Chung, “Reversible datahiding and lossless reconstruction of binary images using pair-wise log-ical computation mechanism,” Pattern Recognit., vol. 38, no. 11, pp.1993–2006, 2005.

[10] Z. Ni, Y.-Q. Shi, N. Ansari, and W. Su, “Reversible data hiding,” IEEETrans. Circuits Syst. Video Technol., vol. 16, no. 3, pp. 354–362, Mar.2006.

[11] C.-K. Chan and L. M. Cheng, “Hiding data in images by simple LSBsubstitution,” Pattern Recognit., vol. 37, no. 3, pp. 469–474, 2004.

[12] C.-T. Hsu and J.-L. Wu, “Hidden digital watermarks in images,” IEEETrans. Image Process., vol. 8, no. 1, pp. 58–68, Jan. 1999.

[13] G. Langelaar and R. Lagendijk, “Optimal differential energy water-marking of DCT encoded images and video,” IEEE Trans. ImageProcess., vol. 10, no. 1, pp. 148–158, Jan. 2001.

[14] Y. Wang, J. Doherty, and R. Van Dyck, “A wavelet-based water-marking algorithm for ownership verification of digital images,” IEEETrans. Image Process., vol. 11, no. 2, pp. 77–88, Feb. 2002.

[15] S. Mallat, “A theory for multiresolution signal decomposition: Thewavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol.11, no. 7, pp. 674–693, Jul. 1989.

[16] G. Xuan, J. Zhu, J. Chen, Y. Shi, Z. Ni, and W. Su, “Distortionless datahiding based on integer wavelet transform,” Electron. Lett., vol. 38, no.25, pp. 1646–1648, Dec. 2002.

[17] G. Xuan, Y. Shi, Z. Ni, J. Chen, C. Yang, Y. Zhen, and J. Zheng, “Highcapacity lossless data hiding based on integer wavelet transform,” inProc. 2004 Int. Symp. Circuits and Systems, May 2004, vol. 2, pp.II-29–32.

[18] S.-C. Shie and S. D. Lin, “Data hiding based on compressed VQ in-dices of images,” Comput. Standards Interfaces, vol. 31, no. 6, pp.1143–1149, 2009.

[19] C.-H. Yang and Y.-C. Lin, “Reversible data hiding of a VQ index tablebased on referred counts,” J. Vis. Commun. Image Represent., vol. 20,no. 6, pp. 399–407, 2009.

[20] Y. Linde, A. Buzo, and R. Gray, “An algorithm for vector quantizerdesign,” IEEE Trans. Commun., vol. 28, no. 1, pp. 84–95, Jan. 1980.

[21] T. Kim, “Side match and overlap match vector quantizers for images,”IEEE Trans. Image Process., vol. 1, no. 2, pp. 170–185, Apr. 1992.

[22] M. Jo and H. D. Kim, “A digital image watermarking scheme basedon vector quantization,” IEICE Trans. Inf. Syst., vol. E85-D, pp.1054–1056, 2002.

[23] Image Database [Online]. Available: http://sipi.usc.edu/database[24] C.-C. Chang, W.-L. Tai, and C.-C. Lin, “A reversible data hiding

scheme based on side match vector quantization,” IEEE Trans. Cir-cuits Syst. Video Technol., vol. 16, no. 10, pp. 1301–1308, Oct. 2006.

Jiann-Der Lee (A’06–M’10) was born in Tainan,Taiwan, in 1961. He received the B.S., M.S., andPh.D. degrees from the Department of ElectricalEngineering, National Cheng Kung University,Taiwan, in 1984, 1988, and 1992, respectively.

He is currently a full professor with the Depart-ment of Electrical Engineering, Chang Gung Univer-sity, Tao-Yuan, Taiwan. His current research interestsinclude image processing, pattern recognition, com-puter vision, consumer electronics, and VLSI CADdesign.

Dr. Lee is a member of IAPR, and is listed in the Who’s Who in the World andWho’s Who in Finance and Industry. He has received a number of investigatorawards (e.g., from the National Science Council, Taiwan, and Acer Founda-tions, Taiwan), the Excellent Teaching Award, 2002, and the Excellent ResearchAward, 2003, from Chang Gung University.

Yaw-Hwang Chiou was born in Taichung, Taiwan,in 1958. He received the B.S. degree from Depart-ment of Electronic Engineering, National TaiwanUniversity of Science and Technology, Taipei,Taiwan, in 1984, and the M.S. degree from De-partment of Electronic Engineering, Chung YuanChristian University, Taoyuan, Taiwan, in 1992.He is currently pursuing the Ph.D. degree at theDepartment of Electrical Engineering, Chang GungUniversity, Tao-Yuan, Taiwan.

During the years from 1992 to 1998, he joined theDepartment of Electronic Engineering, Vanung University, Taiwan, as a lecturer.His research interests concern image coding and digital watermarking.

Jing-Ming Guo (M’05–SM’10) was born in Kaoh-siung, Taiwan, on November 19, 1972. He receivedthe B.S.E.E. and M.S.E.E. degrees from NationalCentral University, Taoyuan, Taiwan, in 1995 and1997, respectively, and the Ph.D. degree from theInstitute of Communication Engineering, NationalTaiwan University, Taipei, Taiwan, in 2004.

From 1998 to 1999, he was an InformationTechnique Officer with the Chinese Army. From2003 to 2004, he was granted the National ScienceCouncil scholarship for advanced research from the

Department of Electrical and Computer Engineering, University of California,Santa Barbara. He is currently an Associate Professor with the Department ofElectrical Engineering, National Taiwan University of Science and Technology,Taipei. His research interests include multimedia signal processing, multimediasecurity, digital halftoning, and digital watermarking.

Dr. Guo is a member of the IEEE Signal Processing Society. He received theExcellence Teaching Award in 2009, the Research Excellence Award in 2008,the Acer Dragon Thesis Award in 2005, the Outstanding Paper Awards fromIPPR, Computer Vision and Graphic Image Processing in 2005 and 2006, andthe Outstanding Faculty Award in 2002 and 2003.