On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... ·...

14
Multimed Tools Appl (2016) 75:4245–4258 DOI 10.1007/s11042-015-2468-x On the security of binary arithmetic coding based on interval shrinking Tao Xiang · Jianglin Sun · Xinwen Fu Received: 15 May 2014 / Revised: 4 December 2014 / Accepted: 12 January 2015 / Published online: 29 January 2015 © Springer Science+Business Media New York 2015 Abstract Arithmetic coding (AC) is often chosen for joint compression and encryption. In this paper, we conduct a careful analysis of secure binary AC schemes based on interval shrinking regarding its security. We first give theoretical conditions for our attacks, then under these conditions we are able to recover the plaintext using traditional AC decoder without knowing any shrinking information. The attacks are feasible to uni-shrinking and symmetric bi-shrinking in different AC models. Extensive simulations are carried out and experimental results have validated our theoretical analysis. Effective remedy is also given in the end. Keywords Arithmetic coding · Joint compression and encryption · Cryptanalysis · Interval shrinking 1 Introduction Joint compression and encryption is an effective and practical way to protect large volume of data, especially multimedia data. With the advances of hardware and software, the vol- ume of multimedia data increases dramatically, and compression is a requirement to save bandwidth in transmission and space in storage. Meanwhile, encryption is often involved for security consideration. Generally speaking, there are three encryption strategies when com- pression is concerned. The first strategy is encryption before compression, and it has been widely investigated in multimedia encryption, e.g. data scrambling in the spatial domain of images and videos. The main problem of this strategy is encryption usually degrades com- T. Xiang () · J. Sun College of Computer Science, Chongqing University, Chongqing 400044, China e-mail: [email protected] X. Fu Department of Computer Science, University of Massachusetts Lowell, Lowell, MA 01854, USA

Transcript of On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... ·...

Page 1: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

Multimed Tools Appl (2016) 75:4245–4258DOI 10.1007/s11042-015-2468-x

On the security of binary arithmetic coding basedon interval shrinking

Tao Xiang · Jianglin Sun ·Xinwen Fu

Received: 15 May 2014 / Revised: 4 December 2014 / Accepted: 12 January 2015 /Published online: 29 January 2015© Springer Science+Business Media New York 2015

Abstract Arithmetic coding (AC) is often chosen for joint compression and encryption. Inthis paper, we conduct a careful analysis of secure binary AC schemes based on intervalshrinking regarding its security. We first give theoretical conditions for our attacks, thenunder these conditions we are able to recover the plaintext using traditional AC decoderwithout knowing any shrinking information. The attacks are feasible to uni-shrinking andsymmetric bi-shrinking in different AC models. Extensive simulations are carried out andexperimental results have validated our theoretical analysis. Effective remedy is also givenin the end.

Keywords Arithmetic coding · Joint compression and encryption · Cryptanalysis ·Interval shrinking

1 Introduction

Joint compression and encryption is an effective and practical way to protect large volumeof data, especially multimedia data. With the advances of hardware and software, the vol-ume of multimedia data increases dramatically, and compression is a requirement to savebandwidth in transmission and space in storage. Meanwhile, encryption is often involved forsecurity consideration. Generally speaking, there are three encryption strategies when com-pression is concerned. The first strategy is encryption before compression, and it has beenwidely investigated in multimedia encryption, e.g. data scrambling in the spatial domain ofimages and videos. The main problem of this strategy is encryption usually degrades com-

T. Xiang (�) · J. SunCollege of Computer Science, Chongqing University, Chongqing 400044, Chinae-mail: [email protected]

X. FuDepartment of Computer Science, University of Massachusetts Lowell, Lowell, MA 01854, USA

Page 2: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

4246 Multimed Tools Appl (2016) 75:4245–4258

pression performance dramatically because the encrypted data is hard to be compressed, andthus it is not adoptable in practice. The second strategy is encryption after compression, andit is feasible in applications because the compression performance is intact, e.g. encryptingcompressed multimedia files using traditional encryption algorithms such as AES. However,it still faces various challenges. For example, the encrypted files may not keep format com-patibility because traditional encryptions are usually operated on binary streams withoutconsideration of any format specifications; the synchronization between compression andencryption is also a problem in real-time transmission as the encryption speed must matchthe throughput of compression or vice versa. The last strategy is joint compression andencryption, which can do two operations in a single step and meet the requirement of secu-rity and efficiency simultaneously. The superiorities of joint compression and encryptionover other two strategies are evident: 1) the complexity in computation and implementationcan be saved; 2) the problem of synchronization between compression and encryption iseliminated; 3) the compression performance can be well maintained even after encryption;4) the compressed file can be format compliant so that both plain and secure decoders workwell on the encrypted data and not exception will be induced. For these reasons, joint com-pression and encryption are adopted in many schemes and systems protecting multimediadata [1, 5, 17, 24, 31, 32].

Arithmetic coding (AC) is a prevalent choice to design joint compression and encryptionsystem. Researchers have explored joint encryption and compression by taking a vari-ety of compression algorithms into consideration, such as Huffman coding [26, 31], AC[28], quadtree coding [5], and embedded zerotree wavelet algorithm (EZW) [5]. AC iswidely used to design cryptosystems not only for its superiority in compression perfor-mance, but also for its desirable characteristics in cryptography. AC has been developedover several decades and gained its prevalence after Witten et al. gave an implementationin finite precision [29]. Binary AC has been adopted in multimedia compression stan-dards, such as MQ coding in JPEG2000 [8] and context-adaptive binary arithmetic coding(CABAC) in H.264 [1]. AC is optimal in that it can be arbitrarily close to the entropyof the probability distributions, thus it is better than encodings such as Huffman cod-ing, which are restricted to integral bits per character. Another feature of AC, valuablefrom the viewpoint of cryptography, is that if the decoder loses its place in the trans-mitted bit-stream, it is very difficult for it to regain synchronization [23]. By this virtue,the encoded stream cannot be easily recovered if some parameters or AC processes arekept secret. Various secure AC schemes have been proposed for protecting images andvideos in the literature [1, 4, 7, 8, 10, 15, 16, 19–21, 27, 28, 30], but many of them arefound to be insecure [2, 3, 6, 9, 12, 13, 18, 25, 33–36]. Therefore, it is of great impor-tance for practical deployment to investigate the security of encryption schemes basedon AC.

In this paper, we analyze the security of secure binary arithmetic coding schemesbased on interval shrinking and propose attacks on different shrinking strategies. Ourcontributions can be summarized as follows: 1) We formalize interval shrinking strate-gies into uni-shrinking and bi-shrinking, and give the conditions with their proofswhere shrinking cannot guarantee the security of encoded code stream and it isdecodable by using traditional AC decoder without any knowledge about the shrink-ing. 2) We present efficient attacks on both uni-shrinking and symmetric bi-shrinking,and our attacks work for many secure binary AC schemes as long as the consid-ered shrinking strategies are adopted. 3) We conduct extensive experiments to validatethe feasibility of our attacks. 4) An effective remedy is given to resist the proposedattacks.

Page 3: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

Multimed Tools Appl (2016) 75:4245–4258 4247

This rest of this paper is organized as follows. Section 2 gives a brief introduction onrelated work. Section 3 gives a brief introduction about binary AC and the basic principleof secure binary AC based on interval shrinking. Security analysis of secure binary ACschemes based on interval shrinking is given in Section 4. Effective remedy to improve thesecurity of binary AC encryption schemes is also provided in Section 5. Section 6 concludesthe paper.

2 Related work

A great deal of research has been done on secure AC. In [28], Witten and Cleary pointedout the potential of affording privacy by text compression and gave some initial con-sideration of joint compression and encryption on adaptive AC. Bergen and Hogan theninvestigated the security of static AC [2] and adaptive AC [3] respectively, and found thatAC is not secure by itself. Cleary et al. also gave some theoretical analysis on inferringthe probability of binary symbols for static model AC [6], and showed that for a cho-sen plaintext attack, a sequence of w + 2 symbols is sufficient to uniquely determine thesymbol probability represented in w-bit. Irvine et al. showed that AC, as a symmetriccryptosystem, can be reduced naturally to the NP-complete subset sum problem [9]. [3]presented a flooding attack on adaptive AC. The attack is not capable of explicitly discov-ering the key. It is aimed at taking control of the model and reducing it to a manageableform. Later, [18] proposed an adaptive brute-force attack to recover the key under adap-tive model. However, if the codec can be re-initialized regularly, it will make the attackin [3] and [18] infeasible. In [20] and [19], Liu et al. presented a secure adaptive ACresisting Bergen-Hogan attack by 1) using random numbers as the initial frequency countsfor every symbol, 2) selecting the initial interval randomly, 3) substituting the first 16-bitoutput of the encoder, and 4) shrinking the current interval by a random secret string cycli-cally. However, the interval shrinking operation weakens the compression performance by2%.

In [10], the decoding process of AC is considered as repetition of Bernoulli shift map,and data encryption is achieved by controlling the piecewise linear maps by a secret key inthree kinds of approaches, i.e. perturbation method, switching method, and source extensionmethod. In [8], Grangetto et al. proposed a randomized binary AC (RAC) for JPEG2000.The basic idea is to use a pseudo-random number generator (PRNG) to control the swap-ping of the intervals of the least probable symbol (LPS) and the most probable symbol(MPS). The efficiency of this approach is expected to be worse than that of the standardapproach (i.e. using a keystream to mask the compressed data) since the length of thekeystream is longer than the compressed data [11]. Recently, Katti et al. presented a the-oretical definition of security for encryption schemes that modify compression schemesfor encryption in [12], and proved that RAC does not satisfy the indistinguishability underciphertext-only attacks [14]. A binary AC with key-based interval splitting (ISAC) wasproposed in [27], which breaks the continuity requirement of the interval for a symbol intraditional AC. However, the scheme is found to be vulnerable to known plaintext and cho-sen plaintext attacks [11] and [15]. [15] also suggested an enhanced version, called thesecure AC (SAC), by introducing permutation steps at the input and output. Zhou et al.addressed the security problem of SAC under a chosen ciphertext attack in [35] and [36],and found that the codeword permutation step could be removed by recovering the keyvectors used in the codeword permutation step. Sun et al. also found it is vulnerable tochosen-plaintext [25].

Page 4: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

4248 Multimed Tools Appl (2016) 75:4245–4258

There are also some researches on secure AC utilizing chaotic cryptography. Bose andPathak integrated a variable model AC with a coupled chaotic system for designing anencryption scheme [4]. Zhou and Au gave their comments on [4] in [33]. Mi et al. pro-posed an arithmetic coder whose mapping intervals are changed irregularly according to akeystream derived from chaotic map and plaintext [21]. Li and Zhang presented a secure ACscheme based on nonlinear dynamic filter (NDF) with changeable coefficients [16]. Wonget al. found that iterating a skew tent map reversely is equivalent to AC, and proposed asimultaneous compression and encryption scheme in which the chaotic map model for AC isdetermined by a secret key. Moreover, the compressed sequence is masked by a pseudoran-dom keystream generated by another chaotic map [30]. Duan et al. introduced a randomizedAC scheme based on order-1 Markov model that achieves encryption by scrambling thesymbol’s order in the model and choosing the relevant order’s probability randomly [7].

From the above review, we can find that many secure AC schemes are developed inad hoc, and the definition of security model is often absent. For this reason, the secu-rity analysis of existing secure AC schemes are also scheme-specific. In [12] and [13],the authors gave a security definition, i.e. indistinguishability, of encryption scheme usingAC and found the corresponding attacks. In this paper, from another aspect, we theoreti-cally formalize and analyze the process of interval shrinking, and find attacks on a classof encryption schemes based on interval shrinking. Meanwhile, our attacks are simple andefficient.

3 Secure binary AC based on interval shrinking

3.1 Binary AC

The basic principle of AC is using a decimal fraction between [0, 1) to represent thesource message, which is also called as floating-point AC. When coding alphabet is binary,it is called binary AC. Because binary AC is often adopted for practical application inmultimedia compression, we focus on binary AC in this paper. Figure 1 illustrates theprocess of encoding a message M = aba where p(a) = 0.75, p(b) = 0.25. The encod-ing is conducted in a recursive way. Final interval [0.296875, 0.4375) is assigned to themessage M , and any decimal fraction within that interval can be used to denote the mes-sage. The decoding task is performed by the dual procedure. Please refer to [29] fordetail.

With the increasing message length, the length of assigned interval to representthat message will decrease. Thus, it needs higher precision to denote a decimalfraction within that interval, which may not be satisfied when it is implementedin finite precision. Another problem is that the whole code stream is only avail-able when the entire encoding process finishes. Integer AC is proposed to solvethese problems in [29], and it was further optimized in [22]. The integer AC uti-lizes integers to present intervals and introduces renormalization to support incrementaltransmission.

Another merit of AC is that it separates the statistical model and the coder [29]. Thisoffers great flexibility in compression as well as programming. According to the incorpo-rated models, AC can be classified into: 1) static AC, in which the model is predeterminedand fixed for all source messages; 2) content-based AC, whose model is collected in advancefrom the source to be encoded; and 3) adaptive AC, that updates the model based on thesymbol being encoded and uses the updated model to encode the next symbol on the fly.

Page 5: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

Multimed Tools Appl (2016) 75:4245–4258 4249

a

b

0

0.25

1

a

0.4375

ab

1

0.25

0.4375

a

b

0.25

0.4375

aba

a

b 0.296875

Fig. 1 An example of encoding aba using binary AC

We can use fixed model here to include static model and content-based model since theyare both constant during the coding process.

3.2 Secure binary AC based on interval shrinking

Interval shrinking is usually utilized in AC to design encryption schemes [20] and [19]. It isfound in [20] that AC is sensitive to the end points of the current interval, and the decoderis unable to work if at some place these values maintained by the encoder and decoder haveonly very slight differences. For this reason, the status of the decoder should be alwaysidentical to that of encoder to rebuild the original message. If current interval of encoder issecretly shrunk, the decoder will lose synchronization and be unable to recover the sourcemessage exactly.

Generally speaking, the interval can be shrunk by 1) changing upper bound, 2) changinglower bound, and 3) changing upper and lower bounds. We call the strategy as uni-shrinkingwhen shrinking is done at one end of the interval, and bi-shrinking when it is done onboth ends of the interval. In Witten et al.’s implementation [29], shrinking could be done,as shown in Fig. 2, by decreasing the value of high or/and increasing the value of low.For analytic clarity, we split each coding iteration with interval shrinking into two steps:a traditional AC iteration and a subsequent shrinking operation. xn denotes the intervalpartition point between two symbols in traditional AC in the nth iteration, and yn is theinterval partition point in AC with interval shrinking in nth iteration.

4 Security analysis

It is clear from Fig. 2 that the partitioned intervals for source symbols after interval shrinkingare misaligned to those of traditional AC, and some intervals after shrinking are partiallyoverlapped with others belonging to different symbols in traditional AC (marked in boldlines in Fig. 2). If the final output falls into these overlapped ranges, the decoder is unableto recover the source message correctly without knowing the secret shrinking information.On the other side, it is found in Fig. 2 that for some symbols (a in Fig. 2b and b in Fig.2a and c), the intervals after shrinking are subintervals of those before shrinking, and nooverlapping exists. That means if the encoder output is within these intervals, the code

Page 6: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

4250 Multimed Tools Appl (2016) 75:4245–4258

p(a)

p(b)

p'(b)p(b)

p(a)p'(a)

p( )

xnyn

high

low(a)

p(a)

p(b)

p'(b)

p(b)

p(a)p'(a)

p( )

xnyn

high

low(b)

p(a)

p(b)

p'(b)p(b)

p(a)p'(a)

p( l)

p( h)

xn yn

low

high

(c)

Fig. 2 Interval shrinking. a Decreasing high; b Increasing low; c Decreasing high and increasing low

stream is decodable even without knowing the shrinking information. In order to analyzethe security in this situation, we give the following two theorems.

Theorem 1 If xn ≥ yn for 1 ≤ n ≤ N , a homogeneous string MbN = bb . . . b

︸ ︷︷ ︸

N

encoded by

AC with interval shrinking can be decoded correctly by a traditional arithmetic decoder.

Proof First, no matter which shrinking strategy is adopted, interval shrinking does notchange the affiliation between intervals in each iteration. That is, the shrunk current intervalis still a subinterval of previous intervals. Second, if xn ≥ yn for 1 ≤ n ≤ N , the intervalassigned to symbol b by arithmetic encoder with interval shrinking falls into a subintervalof that by traditional arithmetic encoder. This means the output after interval shrinking isalso a legal representation of Mb

N = bb . . . b︸ ︷︷ ︸

N

in traditional AC. Based on these two points,

If xn ≥ yn for 1 ≤ n ≤ N , the string MbN encoded by AC with interval shrinking can be

decoded correctly by a traditional arithmetic decoder.

Theorem 2 If xn ≤ yn for 1 ≤ n ≤ N , a homogeneous string MaN = aa . . . a

︸ ︷︷ ︸

N

encoded by

AC with interval shrinking can be decoded correctly by a traditional arithmetic decoder.

Proof Along the same lines as the preceding proof.

We now use the above theorems to analyze the security of uni-shrinking and bi-shrinking.

4.1 Uni-shrinking

In the uni-shrinking strategy, if the plain message is elaborately chosen, it is easy to recoverthe encrypted code stream with the traditional arithmetic decoder, no matter how the symbolfrequency is distributed or how much the interval is shrunk. More specifically, in Fig. 2a, theinterval of symbol b after shrinking is a subinterval of the interval before shrinking. If thenext symbol to be encoded is b, a traditional arithmetic decoder is capable of recovering theoriginal source message ab. Since the coding procedure of AC is iterative, it is easy to prove

Page 7: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

Multimed Tools Appl (2016) 75:4245–4258 4251

that xn ≥ yn for n ≥ 1 in this scenario. For this reason, if we construct a string satisfyingthe format of M = ∗bb . . ., where ∗ denotes a legal leading symbol, the whole string M

can be recovered without any knowledge of interval shrinking according to Theorem 1.Two points should be noted here: 1) The conclusion here is for the fixed model. When theadaptive model is considered, Mb = bb . . . still works. 2) Fig. 2 shows the situation ofinterval shrinking after encoding current symbol in each iteration. If the interval is shrunkfirst in each iteration, the proposed attack is still feasible for Mb. The discussion in Fig. 2bis very similar to that of Fig. 2a, and choosing a string M = ∗aa . . . or Ma = aa . . . canachieve the same purpose.

Based on the discussion above, we summarize the attack for uni-shrinking strategy asfollows. Assume m is the source symbol located on the opposite end of shrinking direction,e.g. m = b in Fig. 2a, if we construct a source message of the following format:

M ={ ∗mm . . . , fixed model,

mm . . . , adaptive model(1)

then M can be correctly decoded with traditional AC if it is encoded by one of the uni-shrinking strategies. The attack procedures can also be described in Algorithm 1.

To validate this theoretical analysis, we implement the secure binary AC algorithm withinterval shrinking based on Witten et al.’s version [29] (we include message length informa-tion at the head of code stream for simplicity and do not consider the terminating symbol).In static and adaptive models, the models are initialized with equal frequencies for all sym-bols. The experimental results are shown in Table 1. The formats of source messages arechosen by (1) in different scenarios, and the decoded messages are always identical with thesource messages, which is consistent with our theoretical conclusion.

4.2 Bi-shrinking

When shrinking is done on both ends of interval (depicted in Fig. 2c), several situationsshould be considered. We classify the bi-shrinking strategy into symmetric bi-shrinking andunsymmetric bi-shrinking based on whether the shrinking ratios at two ends are equal or notin each iteration, and focus on the symmetric bi-shrinking situation here for simplicity andit can be generalized to the unsymmetric situation. We analyze the security of symmetricbi-shrinking in floating-point AC and discuss its extension in integer AC. Suppose Mb

n =bb . . . b︸ ︷︷ ︸

n

(n ≥ 1) is encoded by bi-shrinking, the interval partit ion points xn and yn can be

formulated as below:⎧

xn = ln + (hn − ln)p(b)

yn = ln + (hn − ln)p(εl)

+(hn − ln)(1 − p(ε))p(b)

(2)

Page 8: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

4252 Multimed Tools Appl (2016) 75:4245–4258

Table 1 Experimental results on attacking encryption schemes based on uni-shrinking

Strategy Model Source message (M) Code stream (C) Decoded message (M ′)

Uni-shrinking

Staticbbbbbbbbbb 000000000001 bbbbbbbbbb

(Decreasing high)

abbbbbbbbb 100000000001 abbbbbbbbb

Content-basedbbbbbbbbbb 001 bbbbbbbbbb

abbbbbbbbb 110110 abbbbbbbbb

Adaptive bbbbbbbbbb 01101 bbbbbbbbbb

Uni-shrinking

Staticaaaaaaaaaa 111111111101 aaaaaaaaaa

(Increasing low)

baaaaaaaaa 011111111101 baaaaaaaaa

Content-basedaaaaaaaaaa 101 aaaaaaaaaa

baaaaaaaaa 001001 baaaaaaaaa

Adaptive aaaaaaaaaa 11110 aaaaaaaaaa

where [ln, hn) is the coding interval in nth iteration, p(εh) and p(εl) are the shrinking ratioson both ends, and p(ε) = p(εh) + p(εl).

We can easily solve (2) and get⎧

⎪⎨

⎪⎩

xn = l0 + (h0 − l0)pn+1(b)

yn = l0 + (h0 − l0)p(b)p(εl)1−pn(b)(1−p(ε))n

1−p(b)(1−p(ε))

+(h0 − l0)pn+1(b)(1 − p(ε))n

(3)

where [l0, h0) is the initial coding interval.In the floating-point AC, if symmetric bi-shrinking is considered, (3) is simplified to (4)

since h0 = 1, l0 = 0, and p(εh) = p(εl) = r .{

xn = pn+1(b)

yn = p(b)r1−pn(b)(1−2r)n

1−p(b)(1−2r)+ pn+1(b)(1 − 2r)n

(4)

1020

3040

50

00.2

0.40.6

0.81

−0.1

0

0.1

0.2

0.3

0.4

iteration (n)p(b)

x n − y

n

Fig. 3 The difference between xn and yn in symmetric bi-shrinking when r = 0.01

Page 9: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

Multimed Tools Appl (2016) 75:4245–4258 4253

1 2 3 4 5 6 7 8 9 100

0.002

0.004

0.006

0.008

0.01

0.012

iteration (n)

part

ition

poi

nt

xn (AC)

yn (AC with equal bi−shrinking)

(a)

1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

iteration (n)

part

ition

poi

nt

xn (AC)

yn (AC with equal bi−shrinking)

(b)

2 4 6 8 10 12 14 16 18 200.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

iteration (n)

part

ition

poi

nt

xn (AC)

yn (AC with equal bi−shrinking)

(c)

20 40 60 80 100 120 1400.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

iteration (n)

part

ition

poi

nt

xn (AC)

yn (AC with equal bi−shrinking)

(d)

2 4 6 8 10 12 14 16 18 200.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

iteration (n)

part

ition

poi

nt

xn (AC)

yn (AC with equal bi−shrinking)

(e)

Fig. 4 The trajectories of xn and yn in symmetric bi-shrinking. a p(b) = 0.1, r = 0.01; b p(b) = 0.5, r =0.01; c p(b) = 0.9, r = 0.01; d p(b) = 0.99, r = 0.01; e p(b) = 0.9, r = 0.1

From Theorem 1, the security of shrinking strategy lies in the values of xn and yn. Givenr = 0.01, the difference between xn and yn in symmetric bi-shrinking is plotted in Fig.3 with respect to different values of p(b). It is found that the larger value p(b) takes, thehigher probability we have to recover Mb

N using the traditional arithmetic decoder. Figure4a-d give a clearer view of the trajectories of xn and yn for different p(b). When p(b) islarge enough, say 0.9, the condition in Theorem 1, i.e. xn ≥ yn, could be satisfied with a

Page 10: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

4254 Multimed Tools Appl (2016) 75:4245–4258

proper N while 1 ≤ n ≤ N . Meanwhile, the larger p(b), the larger N , e.g. N = 10 whenp(b) = 0.9 (Fig. 4c) and N = 100 when p(b) = 0.99 (Fig. 4d). Another observation isthat under the same value of p(b), N increases with decreasing r , e.g. N = 8 when r = 0.1(Fig. 4e) while N = 10 when r = 0.01 (Fig. 4c). This result is self-explaining becauseless shrinking will reduce the probability of ambiguity in decoding, when r = 0 the schemeturns to be a traditional AC and no security can be provided therein.

The above analysis about symmetric bi-shrinking is based on the encoding of homoge-neous strings containing b, and similar results can be derived if Ma

n = aa . . . a︸ ︷︷ ︸

n

is considered.

When randomness is introduced, the value of r is usually generated by a PRNG and variesin each iteration. In this case, we assume r is a random number in [0, R] (i.e. r ∈ [0, R]).The above conclusions still hold by substituting r with R in the above equations, except thatwe may get a greater N to satisfy the conditions in Theorem 1 and 2, which benefits ourcryptanalysis.

The experimental results are listed in Table 2. We set R = 0.01 and generate thevalue of r between [0, R] by rand function in the C programming language. As statedabove, when R is given, the selection or the randomness performance of PRNG mayonly make the encoded output slightly different but will not affect the correctness ofdecoded messages within the length of N . In the static model, the symbol frequenciesare set manually to meet the requirement. In content-based and adaptive models, thesymbol frequencies are initialized as equal by setting all frequency counters to 1 andthe model is built based on the source messages or updated on the fly during encod-ing. As can be seen from Table 2, all the messages encrypted by AC with symmetricbi-shrinking are correctly recovered by traditional arithmetic decoder. It is found in theexperiments where the attack has good performance on the adaptive model, becausethe probability of MSB keeps increasing between consecutive two halving points and

Table 2 Experimental results on attacking encryption schemes based on symmetric bi-shrinking when R =0.01

Model Initial Probability Source message (M) Code stream (C) Decoded message (M ′)

Staticp(b) = 0.9 bbbbbbbbbb 001 bbbbbbbbbb

p(b) = 0.1 aaaaaaaaaa 110 aaaaaaaaaa

Content-basedp(b) = 0.5 bbbbbbbbbb 001 bbbbbbbbbb

p(b) = 0.5 aaaaaaaaaa 101 aaaaaaaaaa

Adaptivep(b) = 0.5 bbbbbbbbbb 01110 bbbbbbbbbb

p(b) = 0.5 aaaaaaaaaa 11110 aaaaaaaaaa

Page 11: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

Multimed Tools Appl (2016) 75:4245–4258 4255

this gives traditional arithmetic decoder the capability of recovering longer messagecorrectly.

It should be noted that for the integer AC, there may be some errors at the end of recov-ered messages of length N , especially when N is large. In integer implementation such asWitten et al.’s version [29], the magnitude of interval shrinking sr in symmetric bi-shrinkingis usually done by

sr = (high − low)r (5)

However, the changes of range (high − low) are not monotonic as the renormal-ization enlarges the current interval. Therefore, the value of sr may be greater thanour expectation even if a threshold R is given to r . For example, suppose high =(168)10 = (10101000)2, low = (128)10 = (10000000)2, and R = 0.01, we canget sr ≤ 40. If renormalization is carried on, high = (80)10 = (01010000)2,low = (0)10 = (00000000)2, then sr ≤ 80 with the same R. Since the magni-tude of interval shrinking may increase, N satisfying the condition in Theorem 1 or 2will decrease, and the length of correctly decoded messages will be shorter than ourexpectation.

5 Remedy

In this section, we propose an effective remedy to enhance the security of binary AC encryp-tion schemes based on interval shrinking. An intuitive and effective way is applying abit-wise XOR operation on the output of AC code stream [34]. Specifically, a PRNG witha secret seed is adopted to generate a sequence of keystream, and the keystream is thenused to mask the output of AC code stream by bit-wise XOR operation. It is formulatedas below:

C = C ⊕ K (6)

where K is the keystream generated by a PRNG. Because XOR operation confuses the codestream, we cannot get any interval assignment information from it. In this case, Decodingthe masked code stream with traditional AC or secure AC in the absence of secret seed willlead to errors.

With this remedy, we need to generate a keystream whose length is equivalent to thelength of AC code stream, so the coding speed may be affected. To alleviate the negativeimpact on coding efficiency, we can selectively mask part of AC code steam rather thanevery single bit of AC code stream, say every few bits. By doing so, we can get a goodtradeoff between security and efficiency.

6 Conclusion

In this paper, we analyze the security of secure binary arithmetic coding (AC) schemesbased on interval shrinking. We formalize the interval shrinking strategies into uni-shrinking and symmetric bi-shrinking. Then we give the conditions with their proofsunder which interval shrinking is insecure. Based on this classification and conditions,

Page 12: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

4256 Multimed Tools Appl (2016) 75:4245–4258

we analyze the security of uni-shrinking and symmetric bi-shrinking in static, content-based, and adaptive AC models, present attacks are to recover the plain message usinga traditional arithmetic decoder. An effective improvement to resist the attacks is alsoprovided.

Acknowledgements The work in this paper was supported by the Program for New Century ExcellentTalents in University (No. NCET-12-0589) and the National Natural Science Foundation of China (No.61302161).

References

1. Asghar MN, Ghanbari M, Fleury M, Reed MJ (2014) Sufficient encryption based on entropy codingsyntax elements of H.264/SVC. Multimedia Tools and Applications. doi:10.1007/s11042-014-2160-6

2. Bergen HA, Hogan JM (1992) Data security in a fixed-model arithmetic coding compression algorithm.Computers & Security 11:445–461

3. Bergen HA, Hogan JM (1993) A chosen plaintext attack on an adaptive arithmetic coding compressionalgorithm. Computers & Security 12:157–167

4. Bose R, Pathak S (2006) A novel compression and encryption scheme using variable model arith-metic coding and coupled chaotic system. IEEE Transactions on Circuits and Systems I: Regular Papers53(4):848–857

5. Cheng H, Li X (2000) Partial encryption of compressed images and videos. IEEE Transactions on SignalProcessing 48(8):2439–2451

6. Cleary JG, Irvine SA (1995) lngrid Rinsma-Melchert: On the insecurity of arithmetic coding. Computers& Security 14:167–180

7. Duan L, Liao X, Xiang T (2011) A secure arithmetic coding based on Markov model. CommunNonlinear Sci Numer Simul 16(6):2554–2562

8. Grangetto M, Magli E, Olmo G (2006) Multimedia selective encryption by means of randomizedarithmetic coding. IEEE Trans Multimed 8(5):905–917

9. Irvine SA, Cleary JG, Rinsma-Melchert I (1995) The subset sum problem and arithmetic coding. In:Working Papers, vol 95, pp 1–12. University of Waikato, Department of Computer Science

10. Ishibashi H, Tanaka K (2001) Data encryption scheme with extended arithmetic coding. In: SPIE Pro-ceedings of Mathematics of Data/Image Coding, Compression, and Encryption IV, with Applications,pp 222–233

11. Jakimoski G, Subbalakshmi KP (2008) Cryptanalysis of some multimedia encryption schemes. IEEETrans Multimed 10(3):330–338

12. Katti RS, Srinivasan SK, Vosoughi A (2011) On the security of randomized arithmetic codesagainst ciphertext-only attacks. IEEE Transactions on Information Forensics and Security 6(1):19–27

13. Katti RS, Vosoughi A (2012) On the security of key-based interval splitting arithmetic coding withrespect to message indistinguishability. IEEE Transactions on Information Forensics and Security7(3):895–903

14. Katz J, Lindell Y (2007) Introduction to Modern Cryptography. Chapman & Hall/CRC, London15. Kim H, Wen J, Villasenor JD (2007) Secure arithmetic coding. IEEE Transactions on Signal Processing

55(5):2263–227216. Li H, Zhang J (2009) A secure and efficient entropy coding based on arithmetic coding. Commun

Nonlinear Sci Numer Simul 14:4304–431817. Lian S, Sun J, Liu G, Wang Z (2008) Efficient video encryption scheme based on advanced video coding.

Multimedia Tools and Applications 38(1):75–8918. Lim J, Boyd C, Dawson E (1997) Cryptanalysis of adaptive arithmetic coding encryption schemes. In:

Proceedings of the Second Australasian Conference on Information Security and Privacy (ACISP’97),vol 1270, pp 216–227

19. Liu X, Farrell P, Boyd C (1999) A unified code. In: Cryptography and coding: 7th IMA conference(IMA - Crypto & Coding’99), vol 1746, pp 84–93

20. Liu X, Farrell PG, Boyd C (1997) Resisting the Bergen-Hogan attack on adaptive arithmeticcoding. In: Proceedings of the 6th IMA International Conference on Cryptography and Coding,pp 199–208

Page 13: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

Multimed Tools Appl (2016) 75:4245–4258 4257

21. Mi B, Liao X, Chen Y (2008) A novel chaotic encryption scheme based on arithmetic coding. Chaos,Solitons and Fractals 38:1523–1531

22. Moffat A, Neal RM, Witten IH (1998) Arithmetic coding revisited. ACM Trans Inf Syst 16(3):256–294

23. Moo PW, Wu X (1999) Resynchronization properties of arithmetic coding. In: Proceedings of DataCompression Conference (DCC’99), pp 545–549

24. Ou SC, Chung HY, Sung WT (2006) Improving the compression and encryption of images using FPGA-based cryptosystems. Multimedia Tools and Applications 28(1):5–22

25. Sun HM, Wang KH, Ting WC (2009) On the security of the secure arithmetic code. IEEE Transactionson Information Forensics and Security 4(4):781–789

26. Varalakshmi LM, Florence SG (2013) An enhanced encryption algorithm for video based on multipleHuffman tables. Multimedia Tools and Applications 64(3):717–729

27. Wen J, Kim H, Villasenor JD (2006) Binary arithmetic coding with key-based interval splitting. IEEESignal Processing Letters 13(2):69–72

28. Witten IH, Cleary JG (1988) On the privacy afforded by adaptive text compression. Computers &Security 7:397–408

29. Witten IH, Neal RM, Cleary JG (1987) Arithmetic coding for data compression. Commun ACM30(6):520–540

30. Wong KW, Lin Q, Chen J (2010) Simultaneous arithmetic coding and encryption using chaotic maps.IEEE Transactions on Circuits and Systems Part II: Express Briefs 57(2):146–150

31. Wu CP, Kuo CCJ (2005) Design of integrated multimedia compression and encryption systems. IEEETrans Multimed 7(5):828–839

32. Zhang M, Tong X (2014) A new algorithm of image compression and encryption based on spatiotempo-ral cross chaotic system. Multimedia Tools and Applications. doi:10.1007/s11042-014-2227-4

33. Zhou J, Au OC (2008) Comments on a novel compression and encryption scheme using variable modelarithmetic coding and coupled chaotic system. IEEE Transactions on Circuits and Systems I: RegularPapers 55(10):3368–3369

34. Zhou J, Au OC, Fan X, Wong PH (2008) Joint security and performance enhancement for securearithmetic coding. In: 15th IEEE International Conference on Image Processing 2008 (ICIP’08), pp3120–3123

35. Zhou J, Au OC, Wong PH, Fan X (2008) Cryptanalysis of secure arithmetic coding. In: IEEEInternational Conference on Acoustics, Speech and Signal Processing 2008 (ICASSP’08), pp 1769–1772

36. Zhou J, Au OC, Wong PHW (2009) Adaptive chosen-ciphertext attack on secure arithmetic coding.IEEE Transactions on Signal Processing 57(5):1825–1838

Tao Xiang received the B.S., M.S. and Ph.D. degrees in Computer Science from Chongqing Univer-sity, China, in 2003, 2005, and 2008, respectively. He is currently an Associate Professor of ChongqingUniversity. His research interests include multimedia security, chaotic cryptography, and particle swarmoptimization.

Page 14: On the security of binary arithmetic coding based on ...xinwenfu/paper/Journals/16_MTA_75_8... · Arithmetic coding (AC) is a prevalent choice to design joint compression and encryption

4258 Multimed Tools Appl (2016) 75:4245–4258

Jianglin Sun received his B.S., M.S. degrees in Computer Science from Chongqing University, China, in2005 and 2010, respectively. He is now a Ph.D. candidate in Chongqing University. His research interestsinclude cryptography and multimedia security.

Xinwen Fu is an Associate Professor in the Department of Computer Science, University of MassachusettsLowell. He received B.S. (1995) and M.S. (1998) in Electrical Engineering from Xi’an Jiaotong University,China and University of Science and Technology of China respectively. He obtained Ph.D. (2005) in Com-puter Engineering from Texas A&M University. Dr. Fu’s current research interests are in network securityand privacy, network forensics, computer forensics, information assurance, system reliability and networkingQoS.