Retrieval of Bitmap Compression History
Transcript of Retrieval of Bitmap Compression History
8/8/2019 Retrieval of Bitmap Compression History
http://slidepdf.com/reader/full/retrieval-of-bitmap-compression-history 1/6
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010
Retrieval of Bitmap Compression History
Salma Hamdy, Haytham El-Messiry, Mohamed Roushdy, Essam Kahlifa
Faculty of Computer and Information Sciences
Ain Shams UniversityCairo, Egypt
{s.hamdy, hmessiry, mroushdy, esskhalifa}@cis.asu.edu.eg
Abstract — The histogram of Discrete Cosine Transform
coefficients contains information on the compression parameters
for JPEGs and previously JPEG compressed bitmaps. In this
paper we extend the work in [1] to identify previously
compressed bitmaps and estimate the quantization table that was
used for compression, from the peaks of the histogram of DCT
coefficients. This can help in establishing bitmap compression
history which is particularly useful in applications like image
authentication, JPEG artifact removal, and JPEG recompression
with less distortion. Furthermore, the estimated table calculatesdistortion measures to classify the bitmap as genuine or forged.
The method shows good average estimation accuracy of around
92.88% against MLE and autocorrelation methods. In addition,
because bitmaps do not experience data loss, detecting
inconsistencies becomes easier. Detection performance resulted in
an average false negative rate of 3.81% and 2.26% for two
distortion measures, respectively.
Keywords: Digital image forensics; forgery detection; compression history; Quantization tables.
I. INTRODUCTION
Although JPEG images are the most widely used imageformat, sometimes images are saved in an uncompressed raster
form (bmp, tiff), and in most situations, no knowledge of
previous processing is available. Some applications are
required to receive images as bitmaps with instructions for
rendering at a particular size and without further information.
The image may have been processed and perhaps compressed
with contain severe compression artifacts. Hence, it is useful
to determine the bitmap history; whether the image has ever
been compressed using the JPEG standard and to know what
quantization tables were used. Most of the artifact removal
algorithms [2-9] require the knowledge of the quantization
table to estimate the amount of distortion caused by
quantization and avoid over-blurring. In other applications,
knowing the quantization table can help in avoiding further
distortion when recompressing the image. Some methods try
to identify bitmap compression history using Maximum
Likelihood Estimation (MLE) [10-11] or by modeling the
distribution of quantized DCT coefficients, like the use of
Benford’s law [12], or modeling acquisition devices [13].
Furthermore, due to the nature of digital media and the
advanced digital image processing techniques, digital images
may be altered and redistributed very easily forming a rising
threat in the public domain. Hence, ensuring that media
content is credible and has not been altered is becoming an
important issue governmental security and commercial
applications. As a result, research is being conducted for
developing authentication methods and tamper detection
techniques. Usually JPEG compression introduces blocking
artifacts and hence one of the standard passive approaches is
to use inconsistencies in these blocking fingerprints as a
reliable indicator of possible tampering [14]. These can also beused to determine what method of forgery was used.
In this paper we are interested in the authenticity of the
image. We extend the work in [1] to bitmaps and use the
proposed method for identifying previously compressed
bitmaps and estimating the quantization table that was used.
The estimated table is then used to determine if the mage was
forged or not by calculating distortion measures.
In section 2 we study the histogram of DCT AC
coefficients of bitmaps and show how it differs for previously
JPEG compressed bitmaps. We then validate that without
modeling rounding errors or calculating prior probabilities,
quantization steps of previously compressed bitmaps can still
be determined straightforward from the peaks of the
approximated histograms of DCT coefficients. Results are
discussed in section 3. Section 4 is for conclusions.
II. HISTOGRAM OF DCT COEFFICIENTS IN BITMAPS
We studied in [1] the histogram of quantized DCT
coefficients and showed how it can be used to estimate
quantization steps. Here, we study uncompressed images and
validate that the approximated histogram of DCT coefficients
can be used to determine compression history. Bitmap image
means no data loss and hence all what is required to build an
informative histogram is expected to be present in the
coefficients histograms.
The first step is to decide if the test image was previously
compressed because if the image was an originaluncompressed there is no compression data to extract. When
the image is decided to have a compression history, the next
step is to estimate that history. For grayscale image,
compression history mainly means its quantization table which
will be the focus of this paper. For color image, this is
extended to estimating color plane compression parameters
that includes subsampling and associated interpolation.
141 http://sites.google.com/site/ijcsis/ISSN 1947-5500
8/8/2019 Retrieval of Bitmap Compression History
http://slidepdf.com/reader/full/retrieval-of-bitmap-compression-history 2/6
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010
Fig. 1(b) shows the approximated histogram H *
of DCT
coefficient at position (3,3) of the luminance channel of an
uncompressed Lena image and the histogram of the image
after being JPEG compressed with quality factor 80. It is clearthat the latter contains periodic patterns that are not present in
the uncompressed version. It was observed that the coefficient
is very likely to have been quantized with a step of this
periodic [15]. Now if that JPEG was stored in a bitmap
uncompressed form, we expect the DCT coefficients to have
the same behavior because nothing is lost during this format
change. This is evident in Fig. 1(d) which shows an identical
histogram to the one in Fig. 1(c). Hence, similar to the
argument in [1], if we closely observe the histogram of H *(i,j)
outside the main lobe, we notice that the maximum peak
occurs at a value that is equal to the quantization step used to
quantize X q(i,j). This observation applies to most low
frequency AC coefficients. Fig. 2(a) and (b) show |H|, theabsolute histograms of DCT coefficients for Lena of Fig. 1(a)
at frequencies (3,3) and (3,4), respectively. As for high
frequencies, the maximum occurred at a value matching Q(i,j)
when |X *(i,j)|>B, (Fig. 2 (c) and (d)), where B is as follows:
u,v
)jπ v(.
)iπ u( c(u) c(v).
B(i,j)(i,j) X (i,j) X Γ q
*
16
12cos
16
12cos50
(1)
where X q(i,j) is the quantized coefficient, and X *(i,j) is the
approximated quantized coefficient, Γ is the round off error,
and
otherwise
for
c 1
0 21
)(
See [1, 11].
Sometimes we do not have enough information to
determine Q(i,j) for high frequencies (i,j). This happens when
the histogram outside the main lobe decays rapidly to zero
showing no periodic structure. This reflects the small or zero
value of the coefficient. At such cases, it can be useful to
estimate as many of the low frequencies and then search
through lookup tables for a matching standard table.
Estimating the quantization table of a bitmap can help
determine part of its compression history. If all (or most of) of the low frequency steps were estimated to be ones, we can
conclude that the image did not go through previous
compression. High frequencies may bias because they have
very low contribution and do not provide a good estimate.
Moreover, this method works well also for uncompressed or
lossless compressed tiff images. Fig. 3(d) shows the 96.7%
correctly estimated Q table using the above method of a tiff
image taken from UCID [16]. The X’s mark the
“undetermined” coefficients.
Now for verifying the authenticity of the image, we use the
same distortion measures we used in [1]. The average
distortion measure is calculated as a function of the
remainders of DCT coefficients with respect to the original Q
matrix:
8
1
8
1
),(),,(mod1
i j
jiQ ji D B (2)
where D(i,j) and Q(i,j) are the DCT coefficient and the
corresponding quantization table entry at position (i,j),
respectively. An image block having a large average distortion
value indicates that it is very different from what it should be
and is likely to belong to a forged image. Averaged over the
entire image, this measure can be used for making a decision
about authenticity of the image.
In addition, the JPEG 8×8 “blocking effect” is somehow
still present in the uncompressed version and hence blockingartifact measure, BAM [14], can be used to give an estimate of
the distortion of the image. It is computed from the Q table as:
8
1
8
1),(
),( ),(),()(2
i j jiQ
ji Dround jiQ ji Dn B (3)
where B(n) is the estimated blocking artifact for the nth
block.
(a) Lena image (b) Uncompressed
(c) JPEG compressed Q(3,3)=6 (d) Previously compressed bmp
Fig. 1. Histograms of X *(3,3).
(a) (b)
(c) (d)
Fig. 2. (a) | X *(3,3)| where Hmax occurs at Q(3,3)=6. (b) | X *(3,4)| where Hmax
occurs at Q(3,4) = 10 (c) | X *(5,4)| where Hmax occurs at Q(5,4)=22. (d) |X *(7,5)| where Hmax occurs at Q(7,5) = 41.
142 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/8/2019 Retrieval of Bitmap Compression History
http://slidepdf.com/reader/full/retrieval-of-bitmap-compression-history 3/6
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010
III. EXPERIMENTAL RESUTLS AND DISCUSSION
A. Estimation Accuracy
Our testing image set consisted of 550 images collected
from different sources (more than five camera models), in
addition to some from the public domain Uncompressed Color
Image Database (UCID), which provides a benchmark for
image processing analysis [16]. Each of these images was
compressed with different quality factors, [60, 70, 80, and 90].
Again, each of these was uncompressed and resaved as
bitmap. This yielded 550×4 = 2,200 untouched images. For
each quality factor group, an image’s histogram of DCT
coefficients at one certain frequency was generated and used
to determine the corresponding quantization step at that
frequency according to section 2. This was repeated for all the64 histograms of DCT coefficients. The resulting quantization
table was compared to the quality factor’s known table and the
percentage of correctly estimated coefficients was recorded.
Also, the estimated table was used in equations (2) and (3) to
determine the image’s average distortion and blocking artifact
measures, respectively. These values were recorded and used
later to set a threshold value for distinguishing forgeries from
untouched images.
Table 1 shows the accuracy of estimating all 64 entries
using the proposed method for each quality factor averaged
over the whole set. It exhibits a similar behavior to JPEG
images; as quality factor increases, estimation accuracy
increases steadily with an expected drop for quality factors
higher than 90 as the periodic structure becomes less
prominent and the bumps are no longer separate enough .
Overall, we can see that the estimation accuracy is higher than
that of JPEG images [1]. We anticipate that because lossy
compression tends to lessen available data to make a better
estimate. Average estimation time for all 64 entries of imagesof size 640×480 for different QFs was 52.7 seconds.
Estimating Q using MLE methods [10-11] is based on
searching for all possible Q(i,j) for each DCT coefficient over
the whole image which can be computationally exhaustive for
large size files. Another method [12] proposed a logarithmic
law and argued that the distribution of the first digit of DCT
coefficients follows that generalized Benford’s law. The
method is based on re-compressing the test image with several
quality factors and fitting the distribution of DCT coefficients
of each version to the proposed law. The QF of the version
having the least fitting artifact is chosen and its corresponding
Q table is the desired one. Of course the above methods can
only estimate standard compression tables. Although it may beaccurate, it is time consuming. Plus it fails when the re-
compression quantization step is an integer multiple of the
original compression step size. Another method [17] tends to
calculate the autocorrelation function of the histogram of DCT
coefficients. The displacement corresponding to the peak
closest to the peak at zero is the value of Q(i,j) given that the
peak is higher than the mean value of the autocorrelation
function. The method eventually uses a hybrid approach; the
low frequency coefficients are determined directly from the
autocorrelation function, while the higher-frequency ones are
estimated by matching the estimated part to standard JPEG
tables scaled by a factor of s, which is determined from the
known coefficients.
Table 2 shows the estimation accuracy while Table 3
shows estimation time, for the different mentioned methods
against ours. Note that accuracy was calculated for directly
estimating only the first nine AC coefficients without
matching. This is due to the methods failing to estimate high
frequency coefficients as most of them are quantized to zero.
On the other hand, the listed time is for estimating the nine
coefficients and then retrieving the whole matching table from
JPEG standard lookup tables. Maximum peak is faster than
5 4 3 2 1 1 1 1
4 1 1 1 1 10 10 10
1 1 1 1 1 10 10 10
1 1 1 1 1 10 10 10
1 1 1 1 14 12 12 12
1 1 1 1 12 13 11 11
1 1 1 1 13 11 12 11
1 1 1 1 13 12 12 12
(a) Test image (b) Estimated Q for uncompressed version (most low frequencies are ones).
3 4 4 6 10 16 20 24
5 5 6 8 10 23 24 22
6 5 6 10 16 23 28 22
6 7 9 12 20 35 32 25
7 9 15 22 27 44 41 31
10 14 22 26 32 42 45 37
20 26 31 35 41 48 47 X
29 37 38 39 45 40 X X
3 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 1 X
0 0 0 0 0 X X
(c) Estimated Q for previously compressed version with QF = 80. (d) Difference between (c) and original table for QF=80.
Fig. 3. Estimating Q table for original and previously compressed tif image.
TABLE I. PERCENTAGE OF CORRECTLY ESTIMATED COEFFICIENTS
FOR SEVERLA QFS
QF 60 70 80 90
BMP 82.07% 84.80% 87.44% 89.44%
JPEG[1] 72.03% 76.99% 82.36% 88.26%
143 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/8/2019 Retrieval of Bitmap Compression History
http://slidepdf.com/reader/full/retrieval-of-bitmap-compression-history 4/6
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010
TABLE II. ESTIMATION ACCURCAY FOR THE FIRSY 3×3 AC
COEFFICIENTS FOR SEVERAL QFS
QFMethod
50 60 70 80 90 100 Avg.Acc.
MLE 75.31 83.10 90.31 96.34 93.83 59.5 83.06
Benford 99.08 87.59 80.82 93.81 59.47 31.53 75.38
Auto. 48.94 50.37 63.71 81.43 65.37 57.50 61.22
Max.Peak 97.93 97.07 99.01 97.67 89.57 76.04 92.88
statistical modeling and nearly as fast as that autocorrelation
method. However, average accuracy of our method is farhigher. MLE is reliable with 83% accuracy but with more than
double the time. Benford’s law based method has an accuracy
of 75 % but is the worst in time because recompressing the
image and calculating distributions for each compressed
version may become time consuming for larger images.
Images used in the experiments were of size 640×480.
B. Forfery Detection
From the untouched previously compressed bitmap image
set, we selected 500 images for each quality factor, each of
which was subjected to four common forgeries; cropping,
rotation, composition, and brightness changes. Cropping
forgeries were done by deleting some columns and rows from
the original image. An image was rotated by 270o for rotationforgeries Copy-paste forgeries were done by randomly
copying a block of pixels from an arbitrary image and placing
it in the original image. Random values were added to every
pixel of the image to simulate brightness change. The resulting
fake images were then stored in their uncompressed form for a
total of (500×4) × 4 = 8,000 images. Next, the quantization
table for each of these images was estimated as above and
used to calculate the image’s average distortion, (2), and the
blocking artifact, (3), measures, respectively.
Fig. 4(a) and (b) show values of the average distortion
measure and blocking artifact measure, respectively. The
scattered dots represent 500 untouched images (averaged for
all quality factors for each image) while the cross marksrepresent 500 images from the forged dataset. As the figure
shows, values from forged images tend to cluster higher than
those from untampered images. We tested the distortion
measure for untouched images against several threshold values
and calculated the corresponding false positive rate FPR (the
number of untouched images declared as tampered), An ideal
case would be a threshold giving zero false positive. However,we had to take into account the false negatives (the number of
tampered images declared as untampered) that may occur
when testing for forgeries. Hence, we require a threshold value
keeping both FPR and the FNR low. For average distortion
measure, we selected a value that gave FPR of 10.8% and a
lower FNR as possible for the different types of forgeries for
average distortion. The horizontal line marks this threshold τ =
50. Similarly, we selected the BAM’s threshold to be τ = 40,
with a corresponding FPR of 5.6%. Table 4 shows the false
negative rate (FNR) for the different forgeries at different
quality factors for bitmaps and JPEGs. As expected, as QF
increases, a better estimate of the quantization matrix of the
original untampered image is obtained, and as a result theerror percentage decreases. Notice how the values drop than
those for JPEG file. Notice also that detection of cropping is
possible when the cropping process breaks the natural JPEG
grid, that it, the removed rows or columns do not fall in line
with the 8×8 blocking. Similarly, when the pasted part fails to
fit perfectly into the original JPEG compressed image, the
distortion metric exceeds the detection threshold, and a
possible composite is declared. Fig. 5 shows examples of
composites. The resulting distortion measures for each
composite are shown in left panel. The dark parts denote low
distortion whereas brighter parts indicate high distortion
values. Notice the highest values correspond to the alien part
and hence mark the forged area.
TABLE III. ESTIMATION TIME IN SECONDS FOR THE FIRSY 3×3 AC
COEFFICIENTS FOR SEVERAL QFS
QFMethod
50 60 70 80 90 100
MLE 38.73 37.33 37.44 37.36 37.32 34.14
Benford 59.95 58.67 58.70 58.72 58.38 80.04
Auto 9.23 11.11 11.10 11.12 11.24 8.96
Max.Peak 11.27 11.29 11.30 11.30 11.30 11.56
TABLE IV. FORGERY DETECTION ERROR RATES FOR BITMAPS AND JPEGS
Distortion Measure Original Cropping Rotation Compositing Brightness
Average JPEG 12.6% 9.2% 7.55% 8.6% 6.45%
BMP 10.8% 3.9% 4.45% 2.0% 4.9%
BAM JPEG 6.8% 3.3% 5.95% 3.15% 5.0%
BMP 5.6% 1.05% 3.05% 1.25% 3.7%
(a) Average distortion measure (b) Blocking artifact measure
Fig. 4. Distortion measures for untouched and tampered images.
144 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/8/2019 Retrieval of Bitmap Compression History
http://slidepdf.com/reader/full/retrieval-of-bitmap-compression-history 5/6
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010
IV. CONCLUSIONS
The method discussed in this paper is based on using theapproximated histogram of DCT coefficients of bitmaps forextracting the image’s compression history; its quantizationtable. Also the extracted table is used to expose imageforgeries. The method proved to have practically highestimation accuracy when tested on a large set of image fromdifferent sources compared to other statistical approaches.
Moreover, estimation times proved to be faster than statisticalmethods while maintaining very good accuracy for lowerfrequencies. Experimental results also showed thatperformance for bitmaps surpasses that of JPEGs because of their lossy nature but on the other hand, it takes more time toprocess a bitmap.
REFERENCES
[1] Hamdy S., El-Messiry H., Roushdy M. I., Kahlifa M. E., “Forgery
detection in JPEG compressed images”, JAR -Unpublished, 2010.
[2] Rosenholtz R., Zakhor A., “Iterative procedures for reduction of
blocking effects in transform image coding,” IEEE Trans. Circuits Syst.
Video Technol., vol. 2, pp. 91 – 94, Mar. 1992.[3] Fan Z., Eschbach R., “JEPG decompression with reduced artifacts,”
Proc. IS&T/SPIE Symp. Electronic Imaging: Image and VideoCompression, San Jose, CA, Feb. 1994.
[4] Fan Z., and F. Li, “Reducing artifacts in JPEG decompression by
segmentation and smoothing,” Proc. IEEE Int. Conf. Image Processing,
vol. II, 1996, pp. 17 – 20.
[5] Tan K. T., Ghanbari M., “Blockiness detection for MPEG -2-codedvideo,” IEEE Signal Process. Lett ., vol. 7, pp. 213 – 215, Aug. 2000.
[6] Minami S., Zakhor A., “An optimization approach for removing
blocking eff ects in transform coding,” IEEE Trans. Circuits Syst. Video
Technol., vol. 5, pp. 74 – 82, Apr. 1995.
[7] Yang Y., N Galatsanos. P., Katsaggelos A. K., “Regularized
reconstruction to reduce blocking artifacts of block discrete cosinetransform compressed images,” IEEE Trans. Circuits Syst. Video
Technol., vol. 3, pp. 421 – 432, Dec. 1993.
[8] Luo J., Chen C.W., Parker K. J., Huang T. S., “Artifact reduction in lowbit rate dct- based image compression,” IEEE Trans. Image Process., vol.
5, pp. 1363 – 1368, 1996.
[9] Chou J., Crouse M., Ramchandran K., “A simple algorithm for removing
blocking artifacts in block-transform coded images,” IEEE Signal
Process. Lett ., vol. 5, pp. 33 – 35, 1998.[10] Fan Z., de Queiroz R. L., “Maximum likelihood estimation of jpeg
quantization table in the identification of bitmap compression history”,
in Proc. Int. Conf. Image Process. ’00, 10-13 Sept. 2000, 1: 948 – 951.
[11] Fan Z., de Queiroz R. L., “Identification of bitmap compression history: jpeg detection and quantizer estimation”, in IEEE Trans. Image
Process., 12(2): 230 – 235, February 2003.
[12] Fu D., Shi Y.Q., Su W., “A generalized benford's law for jpegcoefficients and its applications in image forensics”, in Proc. SPIE
Secur., Steganography, and Watermarking of Multimed. Contents IX ,
vol. 6505, pp. 1L1-1L11, 2007.[13] Swaminathan A., Wu M., Ray Liu K. J., “Digital image forensics via
intrinsic fingerprints”, IEEE Trans. Inf. Forensics Secur., 3(1): 101-117,
March 2008.
[14] Ye S., Sun Q., Chang E.-C., “Detection digital image forgeries by
measuring inconsistencies in blocking artifacts”, in Proc. IEEE Int.
Conf. Multimed. and Expo., July, 2007, pp. 12-15.[15] J. Fridrich, M. Goljan, and R. Du, "Steganalysis based on JPEG
compatibility", SPIE Multimedia Systems and Applications, vol. 4518,
Denver, CO, pp. 275-280, Aug. 2001.[16] Schaefer G., Stich M., “UCID – An Uncompressed Color Image
Database”, School of Computing and Mathematics, Technical. Report,
Nottingham Trent University, U.K., 2003. [17] Petkov A., Cottier S., “Image quality estimation for jpeg-compressed
images without the original image”, EE398 Projects - Image and Video
Compression, Stanford University, March 2008.
145 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/8/2019 Retrieval of Bitmap Compression History
http://slidepdf.com/reader/full/retrieval-of-bitmap-compression-history 6/6
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010
(a) Three composite bitmap images.
(b) Distortion measure for the three images in (a).
Fig. 5. Distortion measures for some composite bitmap images. The left panel represents the average distortion measure while the right panel represents the
blocking artifact measure.
146 http://sites.google.com/site/ijcsis/
ISSN 1947-5500