Retrieval of Bitmap Compression History

8/8/2019 Retrieval of Bitmap Compression History

http://slidepdf.com/reader/full/retrieval-of-bitmap-compression-history 1/6

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8 No. 8, 2010

Retrieval of Bitmap Compression History

Salma Hamdy, Haytham El-Messiry, Mohamed Roushdy, Essam Kahlifa

Faculty of Computer and Information Sciences

Ain Shams UniversityCairo, Egypt

{s.hamdy, hmessiry, mroushdy, esskhalifa}@cis.asu.edu.eg

Abstract — The histogram of Discrete Cosine Transform

coefficients contains information on the compression parameters

for JPEGs and previously JPEG compressed bitmaps. In this

paper we extend the work in [1] to identify previously

compressed bitmaps and estimate the quantization table that was

used for compression, from the peaks of the histogram of DCT

coefficients. This can help in establishing bitmap compression

history which is particularly useful in applications like image

authentication, JPEG artifact removal, and JPEG recompression

with less distortion. Furthermore, the estimated table calculatesdistortion measures to classify the bitmap as genuine or forged.

The method shows good average estimation accuracy of around

92.88% against MLE and autocorrelation methods. In addition,

because bitmaps do not experience data loss, detecting

inconsistencies becomes easier. Detection performance resulted in

an average false negative rate of 3.81% and 2.26% for two

distortion measures, respectively.

Keywords: Digital image forensics; forgery detection; compression history; Quantization tables.

I. INTRODUCTION

Although JPEG images are the most widely used imageformat, sometimes images are saved in an uncompressed raster

form (bmp, tiff), and in most situations, no knowledge of

previous processing is available. Some applications are

required to receive images as bitmaps with instructions for

rendering at a particular size and without further information.

The image may have been processed and perhaps compressed

with contain severe compression artifacts. Hence, it is useful

to determine the bitmap history; whether the image has ever

been compressed using the JPEG standard and to know what

quantization tables were used. Most of the artifact removal

algorithms [2-9] require the knowledge of the quantization

table to estimate the amount of distortion caused by

quantization and avoid over-blurring. In other applications,

knowing the quantization table can help in avoiding further

distortion when recompressing the image. Some methods try

to identify bitmap compression history using Maximum

Likelihood Estimation (MLE) [10-11] or by modeling the

distribution of quantized DCT coefficients, like the use of

Benford’s law [12], or modeling acquisition devices [13].

Furthermore, due to the nature of digital media and the

advanced digital image processing techniques, digital images

may be altered and redistributed very easily forming a rising

threat in the public domain. Hence, ensuring that media

content is credible and has not been altered is becoming an

important issue governmental security and commercial

applications. As a result, research is being conducted for

developing authentication methods and tamper detection

techniques. Usually JPEG compression introduces blocking

artifacts and hence one of the standard passive approaches is

to use inconsistencies in these blocking fingerprints as a

reliable indicator of possible tampering [14]. These can also beused to determine what method of forgery was used.

In this paper we are interested in the authenticity of the

image. We extend the work in [1] to bitmaps and use the

proposed method for identifying previously compressed

bitmaps and estimating the quantization table that was used.

The estimated table is then used to determine if the mage was

forged or not by calculating distortion measures.

In section 2 we study the histogram of DCT AC

coefficients of bitmaps and show how it differs for previously

JPEG compressed bitmaps. We then validate that without

modeling rounding errors or calculating prior probabilities,

quantization steps of previously compressed bitmaps can still

be determined straightforward from the peaks of the

approximated histograms of DCT coefficients. Results are

discussed in section 3. Section 4 is for conclusions.

II. HISTOGRAM OF DCT COEFFICIENTS IN BITMAPS

We studied in [1] the histogram of quantized DCT

coefficients and showed how it can be used to estimate

quantization steps. Here, we study uncompressed images and

validate that the approximated histogram of DCT coefficients

can be used to determine compression history. Bitmap image

means no data loss and hence all what is required to build an

informative histogram is expected to be present in the

coefficients histograms.

The first step is to decide if the test image was previously

compressed because if the image was an originaluncompressed there is no compression data to extract. When

the image is decided to have a compression history, the next

step is to estimate that history. For grayscale image,

compression history mainly means its quantization table which

will be the focus of this paper. For color image, this is

extended to estimating color plane compression parameters

that includes subsampling and associated interpolation.

141 http://sites.google.com/site/ijcsis/ISSN 1947-5500




Vol. 8 No. 8, 2010

Fig. 1(b) shows the approximated histogram H *

of DCT

coefficient at position (3,3) of the luminance channel of an

uncompressed Lena image and the histogram of the image

after being JPEG compressed with quality factor 80. It is clearthat the latter contains periodic patterns that are not present in

the uncompressed version. It was observed that the coefficient

is very likely to have been quantized with a step of this

periodic [15]. Now if that JPEG was stored in a bitmap

uncompressed form, we expect the DCT coefficients to have

the same behavior because nothing is lost during this format

change. This is evident in Fig. 1(d) which shows an identical

histogram to the one in Fig. 1(c). Hence, similar to the

argument in [1], if we closely observe the histogram of H *(i,j)

outside the main lobe, we notice that the maximum peak

occurs at a value that is equal to the quantization step used to

quantize X q(i,j). This observation applies to most low

frequency AC coefficients. Fig. 2(a) and (b) show |H|, theabsolute histograms of DCT coefficients for Lena of Fig. 1(a)

at frequencies (3,3) and (3,4), respectively. As for high

frequencies, the maximum occurred at a value matching Q(i,j)

when |X *(i,j)|>B, (Fig. 2 (c) and (d)), where B is as follows:

u,v

)jπ v(.

)iπ u( c(u) c(v).

B(i,j)(i,j) X (i,j) X Γ q

*

16

12cos

16

12cos50

(1)

where X q(i,j) is the quantized coefficient, and X *(i,j) is the

approximated quantized coefficient, Γ is the round off error,

and

otherwise

for

c 1

0 21

)(

See [1, 11].

Sometimes we do not have enough information to

determine Q(i,j) for high frequencies (i,j). This happens when

the histogram outside the main lobe decays rapidly to zero

showing no periodic structure. This reflects the small or zero

value of the coefficient. At such cases, it can be useful to

estimate as many of the low frequencies and then search

through lookup tables for a matching standard table.

Estimating the quantization table of a bitmap can help

determine part of its compression history. If all (or most of) of the low frequency steps were estimated to be ones, we can

conclude that the image did not go through previous

compression. High frequencies may bias because they have

very low contribution and do not provide a good estimate.

Moreover, this method works well also for uncompressed or

lossless compressed tiff images. Fig. 3(d) shows the 96.7%

correctly estimated Q table using the above method of a tiff

image taken from UCID [16]. The X’s mark the

“undetermined” coefficients.

Now for verifying the authenticity of the image, we use the

same distortion measures we used in [1]. The average

distortion measure is calculated as a function of the

remainders of DCT coefficients with respect to the original Q

matrix:

8

1

8

1

),(),,(mod1

i j

jiQ ji D B (2)

where D(i,j) and Q(i,j) are the DCT coefficient and the

corresponding quantization table entry at position (i,j),

respectively. An image block having a large average distortion

value indicates that it is very different from what it should be

and is likely to belong to a forged image. Averaged over the

entire image, this measure can be used for making a decision

about authenticity of the image.

In addition, the JPEG 8×8 “blocking effect” is somehow

still present in the uncompressed version and hence blockingartifact measure, BAM [14], can be used to give an estimate of

the distortion of the image. It is computed from the Q table as:

8

1

8

1),(

),( ),(),()(2

i j jiQ

ji Dround jiQ ji Dn B (3)

where B(n) is the estimated blocking artifact for the nth

block.

(a) Lena image (b) Uncompressed

(c) JPEG compressed Q(3,3)=6 (d) Previously compressed bmp

Fig. 1. Histograms of X *(3,3).

(a) (b)

(c) (d)

Fig. 2. (a) | X *(3,3)| where Hmax occurs at Q(3,3)=6. (b) | X *(3,4)| where Hmax

occurs at Q(3,4) = 10 (c) | X *(5,4)| where Hmax occurs at Q(5,4)=22. (d) |X *(7,5)| where Hmax occurs at Q(7,5) = 41.

142 http://sites.google.com/site/ijcsis/

ISSN 1947-5500




Vol. 8 No. 8, 2010

III. EXPERIMENTAL RESUTLS AND DISCUSSION

A. Estimation Accuracy

Our testing image set consisted of 550 images collected

from different sources (more than five camera models), in

addition to some from the public domain Uncompressed Color

Image Database (UCID), which provides a benchmark for

image processing analysis [16]. Each of these images was

compressed with different quality factors, [60, 70, 80, and 90].

Again, each of these was uncompressed and resaved as

bitmap. This yielded 550×4 = 2,200 untouched images. For

each quality factor group, an image’s histogram of DCT

coefficients at one certain frequency was generated and used

to determine the corresponding quantization step at that

frequency according to section 2. This was repeated for all the64 histograms of DCT coefficients. The resulting quantization

table was compared to the quality factor’s known table and the

percentage of correctly estimated coefficients was recorded.

Also, the estimated table was used in equations (2) and (3) to

determine the image’s average distortion and blocking artifact

measures, respectively. These values were recorded and used

later to set a threshold value for distinguishing forgeries from

untouched images.

Table 1 shows the accuracy of estimating all 64 entries

using the proposed method for each quality factor averaged

over the whole set. It exhibits a similar behavior to JPEG

images; as quality factor increases, estimation accuracy

increases steadily with an expected drop for quality factors

higher than 90 as the periodic structure becomes less

prominent and the bumps are no longer separate enough .

Overall, we can see that the estimation accuracy is higher than

that of JPEG images [1]. We anticipate that because lossy

compression tends to lessen available data to make a better

estimate. Average estimation time for all 64 entries of imagesof size 640×480 for different QFs was 52.7 seconds.

Estimating Q using MLE methods [10-11] is based on

searching for all possible Q(i,j) for each DCT coefficient over

the whole image which can be computationally exhaustive for

large size files. Another method [12] proposed a logarithmic

law and argued that the distribution of the first digit of DCT

coefficients follows that generalized Benford’s law. The

method is based on re-compressing the test image with several

quality factors and fitting the distribution of DCT coefficients

of each version to the proposed law. The QF of the version

having the least fitting artifact is chosen and its corresponding

Q table is the desired one. Of course the above methods can

only estimate standard compression tables. Although it may beaccurate, it is time consuming. Plus it fails when the re-

compression quantization step is an integer multiple of the

original compression step size. Another method [17] tends to

calculate the autocorrelation function of the histogram of DCT

coefficients. The displacement corresponding to the peak

closest to the peak at zero is the value of Q(i,j) given that the

peak is higher than the mean value of the autocorrelation

function. The method eventually uses a hybrid approach; the

low frequency coefficients are determined directly from the

autocorrelation function, while the higher-frequency ones are

estimated by matching the estimated part to standard JPEG

tables scaled by a factor of s, which is determined from the

known coefficients.

Table 2 shows the estimation accuracy while Table 3

shows estimation time, for the different mentioned methods

against ours. Note that accuracy was calculated for directly

estimating only the first nine AC coefficients without

matching. This is due to the methods failing to estimate high

frequency coefficients as most of them are quantized to zero.

On the other hand, the listed time is for estimating the nine

coefficients and then retrieving the whole matching table from

JPEG standard lookup tables. Maximum peak is faster than

5 4 3 2 1 1 1 1

4 1 1 1 1 10 10 10

1 1 1 1 1 10 10 10

1 1 1 1 1 10 10 10

1 1 1 1 14 12 12 12

1 1 1 1 12 13 11 11

1 1 1 1 13 11 12 11

1 1 1 1 13 12 12 12

(a) Test image (b) Estimated Q for uncompressed version (most low frequencies are ones).

3 4 4 6 10 16 20 24

5 5 6 8 10 23 24 22

6 5 6 10 16 23 28 22

6 7 9 12 20 35 32 25

7 9 15 22 27 44 41 31

10 14 22 26 32 42 45 37

20 26 31 35 41 48 47 X

29 37 38 39 45 40 X X

3 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 1 X

0 0 0 0 0 X X

(c) Estimated Q for previously compressed version with QF = 80. (d) Difference between (c) and original table for QF=80.

Fig. 3. Estimating Q table for original and previously compressed tif image.

TABLE I. PERCENTAGE OF CORRECTLY ESTIMATED COEFFICIENTS

FOR SEVERLA QFS

QF 60 70 80 90

BMP 82.07% 84.80% 87.44% 89.44%

JPEG[1] 72.03% 76.99% 82.36% 88.26%


ISSN 1947-5500




Vol. 8 No. 8, 2010

TABLE II. ESTIMATION ACCURCAY FOR THE FIRSY 3×3 AC

COEFFICIENTS FOR SEVERAL QFS

QFMethod

50 60 70 80 90 100 Avg.Acc.

MLE 75.31 83.10 90.31 96.34 93.83 59.5 83.06

Benford 99.08 87.59 80.82 93.81 59.47 31.53 75.38

Auto. 48.94 50.37 63.71 81.43 65.37 57.50 61.22

Max.Peak 97.93 97.07 99.01 97.67 89.57 76.04 92.88

statistical modeling and nearly as fast as that autocorrelation

method. However, average accuracy of our method is farhigher. MLE is reliable with 83% accuracy but with more than

double the time. Benford’s law based method has an accuracy

of 75 % but is the worst in time because recompressing the

image and calculating distributions for each compressed

version may become time consuming for larger images.

Images used in the experiments were of size 640×480.

B. Forfery Detection

From the untouched previously compressed bitmap image

set, we selected 500 images for each quality factor, each of

which was subjected to four common forgeries; cropping,

rotation, composition, and brightness changes. Cropping

forgeries were done by deleting some columns and rows from

the original image. An image was rotated by 270o for rotationforgeries Copy-paste forgeries were done by randomly

copying a block of pixels from an arbitrary image and placing

it in the original image. Random values were added to every

pixel of the image to simulate brightness change. The resulting

fake images were then stored in their uncompressed form for a

total of (500×4) × 4 = 8,000 images. Next, the quantization

table for each of these images was estimated as above and

used to calculate the image’s average distortion, (2), and the

blocking artifact, (3), measures, respectively.

Fig. 4(a) and (b) show values of the average distortion

measure and blocking artifact measure, respectively. The

scattered dots represent 500 untouched images (averaged for

all quality factors for each image) while the cross marksrepresent 500 images from the forged dataset. As the figure

shows, values from forged images tend to cluster higher than

those from untampered images. We tested the distortion

measure for untouched images against several threshold values

and calculated the corresponding false positive rate FPR (the

number of untouched images declared as tampered), An ideal

case would be a threshold giving zero false positive. However,we had to take into account the false negatives (the number of

tampered images declared as untampered) that may occur

when testing for forgeries. Hence, we require a threshold value

keeping both FPR and the FNR low. For average distortion

measure, we selected a value that gave FPR of 10.8% and a

lower FNR as possible for the different types of forgeries for

average distortion. The horizontal line marks this threshold τ =

50. Similarly, we selected the BAM’s threshold to be τ = 40,

with a corresponding FPR of 5.6%. Table 4 shows the false

negative rate (FNR) for the different forgeries at different

quality factors for bitmaps and JPEGs. As expected, as QF

increases, a better estimate of the quantization matrix of the

original untampered image is obtained, and as a result theerror percentage decreases. Notice how the values drop than

those for JPEG file. Notice also that detection of cropping is

possible when the cropping process breaks the natural JPEG

grid, that it, the removed rows or columns do not fall in line

with the 8×8 blocking. Similarly, when the pasted part fails to

fit perfectly into the original JPEG compressed image, the

distortion metric exceeds the detection threshold, and a

possible composite is declared. Fig. 5 shows examples of

composites. The resulting distortion measures for each

composite are shown in left panel. The dark parts denote low

distortion whereas brighter parts indicate high distortion

values. Notice the highest values correspond to the alien part

and hence mark the forged area.

TABLE III. ESTIMATION TIME IN SECONDS FOR THE FIRSY 3×3 AC

COEFFICIENTS FOR SEVERAL QFS

QFMethod

50 60 70 80 90 100

MLE 38.73 37.33 37.44 37.36 37.32 34.14

Benford 59.95 58.67 58.70 58.72 58.38 80.04

Auto 9.23 11.11 11.10 11.12 11.24 8.96

Max.Peak 11.27 11.29 11.30 11.30 11.30 11.56

TABLE IV. FORGERY DETECTION ERROR RATES FOR BITMAPS AND JPEGS

Distortion Measure Original Cropping Rotation Compositing Brightness

Average JPEG 12.6% 9.2% 7.55% 8.6% 6.45%

BMP 10.8% 3.9% 4.45% 2.0% 4.9%

BAM JPEG 6.8% 3.3% 5.95% 3.15% 5.0%

BMP 5.6% 1.05% 3.05% 1.25% 3.7%

(a) Average distortion measure (b) Blocking artifact measure

Fig. 4. Distortion measures for untouched and tampered images.


ISSN 1947-5500




Vol. 8 No. 8, 2010

IV. CONCLUSIONS

The method discussed in this paper is based on using theapproximated histogram of DCT coefficients of bitmaps forextracting the image’s compression history; its quantizationtable. Also the extracted table is used to expose imageforgeries. The method proved to have practically highestimation accuracy when tested on a large set of image fromdifferent sources compared to other statistical approaches.

Moreover, estimation times proved to be faster than statisticalmethods while maintaining very good accuracy for lowerfrequencies. Experimental results also showed thatperformance for bitmaps surpasses that of JPEGs because of their lossy nature but on the other hand, it takes more time toprocess a bitmap.

REFERENCES

[1] Hamdy S., El-Messiry H., Roushdy M. I., Kahlifa M. E., “Forgery

detection in JPEG compressed images”, JAR -Unpublished, 2010.

[2] Rosenholtz R., Zakhor A., “Iterative procedures for reduction of

blocking effects in transform image coding,” IEEE Trans. Circuits Syst.

Video Technol., vol. 2, pp. 91 – 94, Mar. 1992.[3] Fan Z., Eschbach R., “JEPG decompression with reduced artifacts,”

Proc. IS&T/SPIE Symp. Electronic Imaging: Image and VideoCompression, San Jose, CA, Feb. 1994.

[4] Fan Z., and F. Li, “Reducing artifacts in JPEG decompression by

segmentation and smoothing,” Proc. IEEE Int. Conf. Image Processing,

vol. II, 1996, pp. 17 – 20.

[5] Tan K. T., Ghanbari M., “Blockiness detection for MPEG -2-codedvideo,” IEEE Signal Process. Lett ., vol. 7, pp. 213 – 215, Aug. 2000.

[6] Minami S., Zakhor A., “An optimization approach for removing

blocking eff ects in transform coding,” IEEE Trans. Circuits Syst. Video

Technol., vol. 5, pp. 74 – 82, Apr. 1995.

[7] Yang Y., N Galatsanos. P., Katsaggelos A. K., “Regularized

reconstruction to reduce blocking artifacts of block discrete cosinetransform compressed images,” IEEE Trans. Circuits Syst. Video

Technol., vol. 3, pp. 421 – 432, Dec. 1993.

[8] Luo J., Chen C.W., Parker K. J., Huang T. S., “Artifact reduction in lowbit rate dct- based image compression,” IEEE Trans. Image Process., vol.

5, pp. 1363 – 1368, 1996.

[9] Chou J., Crouse M., Ramchandran K., “A simple algorithm for removing

blocking artifacts in block-transform coded images,” IEEE Signal

Process. Lett ., vol. 5, pp. 33 – 35, 1998.[10] Fan Z., de Queiroz R. L., “Maximum likelihood estimation of jpeg

quantization table in the identification of bitmap compression history”,

in Proc. Int. Conf. Image Process. ’00, 10-13 Sept. 2000, 1: 948 – 951.

[11] Fan Z., de Queiroz R. L., “Identification of bitmap compression history: jpeg detection and quantizer estimation”, in IEEE Trans. Image

Process., 12(2): 230 – 235, February 2003.

[12] Fu D., Shi Y.Q., Su W., “A generalized benford's law for jpegcoefficients and its applications in image forensics”, in Proc. SPIE

Secur., Steganography, and Watermarking of Multimed. Contents IX ,

vol. 6505, pp. 1L1-1L11, 2007.[13] Swaminathan A., Wu M., Ray Liu K. J., “Digital image forensics via

intrinsic fingerprints”, IEEE Trans. Inf. Forensics Secur., 3(1): 101-117,

March 2008.

[14] Ye S., Sun Q., Chang E.-C., “Detection digital image forgeries by

measuring inconsistencies in blocking artifacts”, in Proc. IEEE Int.

Conf. Multimed. and Expo., July, 2007, pp. 12-15.[15] J. Fridrich, M. Goljan, and R. Du, "Steganalysis based on JPEG

compatibility", SPIE Multimedia Systems and Applications, vol. 4518,

Denver, CO, pp. 275-280, Aug. 2001.[16] Schaefer G., Stich M., “UCID – An Uncompressed Color Image

Database”, School of Computing and Mathematics, Technical. Report,

Nottingham Trent University, U.K., 2003. [17] Petkov A., Cottier S., “Image quality estimation for jpeg-compressed

images without the original image”, EE398 Projects - Image and Video

Compression, Stanford University, March 2008.


ISSN 1947-5500




Vol. 8 No. 8, 2010

(a) Three composite bitmap images.

(b) Distortion measure for the three images in (a).

Fig. 5. Distortion measures for some composite bitmap images. The left panel represents the average distortion measure while the right panel represents the

blocking artifact measure.


ISSN 1947-5500

Retrieval of Bitmap Compression History

Documents

Transcript of Retrieval of Bitmap Compression History