Scanned document compression using block based hybrid video codec
-
Upload
muthu-samy -
Category
Education
-
view
473 -
download
1
description
Transcript of Scanned document compression using block based hybrid video codec
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 1
Scanned Document Compression Using a
Block-based Hybrid Video CodecAlexandre Zaghetto, Member, IEEE, and Ricardo L. de Queiroz, Senior Member, IEEE
Abstract—This paper proposes a hybrid patternmatching/transform-based compression method for scanneddocuments. The idea is to use regular video interframeprediction as a pattern matching algorithm that can be appliedto document coding. We show that this interpretation maygenerate residual data that can be efficiently compressed bya transform-based encoder. The efficiency of this approachis demonstrated using H.264/AVC as a high quality single-and multi-page document compressor. The proposed method,called Advanced Document Coding (ADC), uses segments ofthe originally independent scanned pages of a document tocreate a video sequence, which is then encoded through regularH.264/AVC. The encoding performance is unrivaled. Resultsshow that ADC outperforms AVC-I (H.264/AVC operating inpure intra mode) and JPEG2000 by up to 2.7 dB and 6.2 dB,respectively. Superior subjective quality is also achieved.
Index Terms—Scanned document compression, advanced doc-ument coding, pattern matching, H.264/AVC.
I. INTRODUCTION
COMPRESSION of scanned documents can be tricky. The
scanned document is either compressed as a continuous-
tone picture, or it is binarized before compression. The binary
document can then be compressed using any available two-
level lossless compression algorithm (such as JBIG [1] and
JBIG2 [2]), or it may undergo character recognition [3].
Binarization may cause strong degradation to object contours
and textures, such that, whenever possible, continuous-tone
compression is preferred [4]. In single/multi-page document
compression, each page may be individually encoded by
some continuous-tone image compression algorithm, such as
JPEG [5] or JPEG2000 [6], [7]. Multi-layer approaches such
as the mixed raster content (MRC) imaging model [8]–[12]
are also challenged by soft edges in scanned documents, often
requiring pre- and post-processing [13].
Natural text along a document typically presents repetitive
symbols such that dictionary-based compression methods be-
come very efficient. For continuous-tone imagery, the recur-
rence of similar patterns is illustrated in Fig. 1. Nevertheless,
an efficient dictionary-based encoder relying on continuous-
tone pattern matching is not that trivial. We propose an encoder
that explores such a recurrence through the use of pattern-
matching predictors and efficient transform encoding of the
residual data.
Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].
The authors are with the Department of Computer Science,Universidade de Brasilia, Brazil, e-mail: [email protected],[email protected].
Fig. 1. Digitized books usually present recurrent patterns across differentpages and across regions of the same page.
It is important to place our proposal within the proper
scenario. Three premises are assumed. Firstly, we want to
avoid complex multi-coder schemes such as MRC. Secondly,
the decoder should be as standard as possible. Since we are
dealing with scanned compound documents (mixed pictures
and text), natural image encoders, such as JPEG2000, are the
most adequate. Non-standard encoders, based on fractals [14]–
[16], texture prediction [17], [18], template matching [19]
or multiscale pattern recurrence [20], [21], are good options
out of the scope of what is being proposed. Thirdly, one
should provide high quality reconstructed versions of scanned
documents. This is especially important if rare books of
historical value must be digitally stored, thus discarding optical
character recognition (OCR) and token-based methods [2],
[10]. In summary, we want a standard single coder approach
that operates on natural images and delivers high-quality
reconstructed compound documents.
The proposed coder makes heavy use of the H.264/AVC
standard video coder [22]. H.264/AVC has been well explained
in the literature [23]–[27]. H.264/AVC leads to substantial
performance improvement when compared to other existing
standards [25], [28], such as MPEG-2 [29] and H.263 [30].
Among such improvements we can mention [22], [31]: in-
terframe variable block size prediction; arbitrary reference
frames; quarter-pel motion estimation; intraframe macroblock
prediction; context-adaptive binary arithmetic coding; and in-
loop deblocking filter. Results point to at least a factor of
two improvement over previous standards. The many cod-
ing advances brought into H.264/AVC not only set a new
benchmark for video compression, but they also make it a
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 2
formidable compressor for still images [32]. The intraframe
macroblock prediction, combined with the context-adaptive
binary arithmetic coding (CABAC) [33] turns the H.264/AVC
into a powerful still image compression method (i.e. working
on a video sequence composed of one single frame). We refer
to this coder as AVC-I. Gains of the AVC-I over JPEG2000
are typically in the order of 0.25 dB to 0.5 dB in PSNR
for pictorial images [31], [32], [34]. For compound images
(mixture of text and picture) [4], the PSNR gains are more
substantial, even surpassing the mark of 3 dB improvement
over JPEG2000, in some cases [31].
The hypothesis presented is this paper is that a scanned
document encoder that employs state-of-the-art video coding
techniques and generates an H.264/AVC-decodable bit-stream
yields the best rate-distortion performance compared to other
continuous-tone still image compressors. 1.
II. THE PROPOSED METHOD AND ITS IMPLEMENTATION
USING AVC
The proposed document coder has a generic concept and an
implementation based on a stock H.264/AVC video coder. We
now describe the desired features and how one can implement
them using AVC. The generic description may help the reader
to adapt other video coders for that purpose or to develop
non-standard-based (proprietary) variations.
A. Block-based pattern matching
The encoder is based on pattern matching. The document
image is segmented into blocks of pixels. Each block is
matched to an existing pattern in a dictionary which is popu-
lated by the previous contents of the same document. In order
to do that, we partition the scanned document, which may be
made of one or more scanned pages of H×W pixels, into Np
(H/√
Np × W/√
Np pixels) sub-pages or frames. Hence, a
scanned book may be decomposed into many frames. Figure 2
illustrates the page pre-processing (partition) algorithm, while
Fig. 3 shows an example of a frame sequence built from a
3-pages set.
Blocks have for example 16×16 pixels and each one is
matched to an existing pattern in a previous frame. In this way,
the previous frames make a dynamic dictionary of patterns to
look when encoding the present frame, which is continuously
being updated as more frames are encoded. Once a match is
found, the matching pattern is used to predict the block and
the prediction error (residue) is encoded along with the frame
number and position where the match was found (reference
vector). A block can be partitioned into smaller blocks to ease
prediction at the cost of spending more bits to encode reference
vectors. Figure 4 illustrates the effect of using the pattern
matching prediction algorithm. Figures 4 (a) and (b) show
examples of a reference and a current text area, respectively.
Figures 4 (c), (e) and (g) represent the predictions of the
current text using 16×16, 8×8 and 4×4-pixel block partitions.
Figures 4 (d), (f) and (h) are the corresponding residual data.
1Preliminary results of the proposed method over multi-page text-onlydocuments have been presented at a conference [35].
0 1 2
3 4 5
6 7 8
9 10 11
Fig. 2. A document page is partitioned into segments (labeled in sequence).Each one is considered a frame and can be sequentially encoded.
Fig. 3. Example of a frame sequence, built from a 3 pages set, Np = 4
frames/page. Frames 1 to 4, 5 to 8, and 9 to 12 are built from pages 1, 2 and3, respectively.
Notice that the 4×4-pixel prediction generates a lower-energy
residual, when compared to the 16× 16 and 8× 8 prediction,
however, they require encoding more reference vectors.
In this context, video coders often use motion estimation
techniques which are essentially the same as pattern matching.
The H.264/AVC is capable of partitioning macroblocks of 16×
16 pixels into any valid combination of blocks of 16 × 8,
8× 16, 8× 8, 4× 8, 8× 4, and 4× 4 pixels. Our algorithm is
then to feed the document frames as video frames into AVC
since its motion estimation algorithm will take care of the
pattern matching search for us. However, motion estimation
algorithms always take advantage of the fact that video content
at the same frame position in neighbor frames are typically
very correlated. Since this is not our case, it is advisable to
make the search window to cover as much as possible of the
reference frames, or the whole frame, in order to enrich the
dictionary and to remove the spatial dependency.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 3
(a) (b)
(c) (d)
(e) (f)
(g)
2
(h)
Fig. 4. Illustration of approximate pattern matching using interframeprediction: (a) reference text; (b) current text; (c), (e) and (g) predicted text(block size: 16×16, 8×8 and 4×4 pixels, respectively); and (d), (f) and (h)prediction residue (block size: 16×16, 8×8 and 4×4 pixels, respectively).Each zoomed image patch has 178× 178 pixels.
B. Inter- and intra-frame prediction
AVC also allows for intra-frame prediction, in which a block
(partitioned or not) can be predicted from neighboring blocks
by means of directional extrapolation of the border pixels.
The decision to use or not intra-frame prediction is typically
based on rate-distortion optimization (RDO) and we use RDO
in all our simulations. However, AVC does not allow for
in-frame motion vectors (IFMV), but many variations using
such a feature and other sophisticated methods of intra-frame
prediction do exist [19]. Apparently, HEVC will also support
IFMV [36]. Breaking up the pages into frames allows for some
intra-document prediction similar to IFMV, yet using a stock
video coder. Furthermore, the information derived from IFMV
is typically very small compared to all compressed data, such
that the advantage should not be much relevant. Because of
that, we do not use IFMV.
Another issue is the random access to different book pages.
In order to get to a book page, we are forced to decompress
all the frames it uses as reference. So, if random access is an
issue, we suggest to periodically use no-reference frames, i.e.
frames in which inter-frame prediction is not allowed, relying
on pure intra-frame prediction/extrapolation.
In our encoder, using the AVC structure, each block-
partitioning combination and prediction mode is tested and the
best one is picked through RDO. With RDO within AVC, in
the k-th configuration test in a macroblock, AVC computes the
rate Rk (bits spent to encode the block) and distortion Dk (sum
of absolute differences - SAD) achieved by reconstructing the
block. One picks the block partition method that minimizes
Jk = Rk + λDk.
The process is then repeated for every macroblock. As usual,
λ controls compression ratios and is varied to find the RD
curves in our simulations.
C. Residual coding
The residual macroblock, i.e. the prediction error, is trans-
formed using 4 8×8-pixel discrete cosine transform (DCT) or
an integer approximation of it. The transformed blocks are
quantized and encoded using arithmetic coding. H.264/AVC
uses an integer transform with similar properties as the DCT
and the resulting transformed coefficients are quantized and
entropy encoded using CABAC.
D. Compound documents and region classification
Compound document compression usually segments the
image into regions and classifies each one as containing text
and graphics or images (or halftones, for instance). Once
a region is classified, it can be encoded using a proper
algorithm. This approach is driven by objects such as text
characters so that regions of the image are labeled based on
our estimate of its contents. Our method, however, is driven
by the compression itself. Rather than only testing pattern-
matching-based prediction for every block partition, we also
test prediction by extrapolating neighboring blocks, as in
”intra-prediction´´ in H.264/AVC. The RD-optimized selection
of the best prediction assures that the best option is picked.
Text and graphics shall contain recurrent patterns and will
be often encoded using patterns from previous regions, while
pictorial regions may resort to intra-prediction. In this sense,
segmentation is embedded into the encoding process. In fact,
the block prediction and RDO may have the same effect of
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 4
Fig. 5. Configuration parameters that have greater influence on the encoderperformance: Rf (number of reference frames) and Sr (search range).
a segmentation map, even though benefiting the compression
process, and not the true identification of image contents.
E. Encoder and decoder summary
In our concept, the frames are fed into AVC in sequence
just like in a regular video coder. Because of that relation to
AVC, we refer to the proposed method as advanced document
coding, or, simply, ADC. In a nutshell, ADC operation can
be summarized as (i) break the book pages into frames; (ii)
feed all frames to H.264/AVC resulting in an AVC-compatible
stream; (iii) decode the bit stream; and (iv) assemble the
decoded frames into the final document book pages.
In order to work well, H.246/AVC should operate in ”High”
profile, following an IPPP... framework. The encoder should
periodically use no-reference I-frames in the case random
access is desired. RDO should be turned on. Motion estimation
should be set to full search over a window that is as large
as possible. Note that other video coders such as HEVC
and MPEG-2 will also work, even though achieving different
performance levels due to their different sophistication levels.
III. EXPERIMENTAL RESULTS
In our tests, different page sets are compressed using
JPEG2000, AVC-I (H.264/AVC operating in pure intra mode)
and the proposed ADC. The reason we chose JPEG2000 and
AVC-I for comparison is that these are the most suitable
standards that would meet the three premises presented in
Section I.
Distortion metrics based on visual models such as Structural
Similarity (SSIM) [37] and Video Quality Metric (VQM) [38]
have been extensively tested for pictorial content. However,
they are unproven for text and graphics, which rely more on
resolution than on number of gray levels. Readability is very
important and some alternative metrics such as OCR efficiency
are considered. A good objective metric to reflect subjective
perception of text has not been well explored yet. Hence, we
opted to stick to the traditional PSNR as a distortion metric.
In JPEG2000 and AVC-I compression, the pages are sepa-
rately encoded. As for ADC, the first frame of the sequence
is encoded as an I-frame (only intraframe prediction modes
are used) and all the remaining frames are encoded as P-
frames (in addition to intraframe prediction, only past frames
0.2 0.3 0.4 0.5 0.6 0.7 0.8
20
25
30
35
40
45
Bitrate (bpp)
PS
NR
(dB
)
Sequence "guita" (Np = 4, S
r = 32 and R
f = 1, 3, 5)
ADC (Sr = 32, R
f = 5)
ADC (Sr = 32, R
f = 3)
ADC (Sr = 32, R
f = 1)
AVC−I
JPEG2000
(a)
0.2 0.3 0.4 0.5 0.6 0.7 0.8
20
25
30
35
40
45
Bitrate (bpp)
PS
NR
(dB
)
Sequence "guita" (Np = 4, S
r = 08, 16, 32 and R
f = 5)
ADC (Sr = 32, R
f = 5)
ADC (Sr = 16, R
f = 5)
ADC (Sr = 08, R
f = 5)
AVC−I
JPEG2000
(b)
Fig. 6. Comparison between JPEG2000, AVC-I and the proposed ADC, fordifferent combinations of search ranges (Sr) and number of reference frames(Rf ) for test sequence “guita”. The number of frames/page (Np) is 4.
are used as reference by the interframe prediction). We also
considered that each page may be segmented into Np = 4
frames, Np = 16 frames, or not segmented at all (Np = 1,
for multi-page documents only). Two configuration parameters
have greater influence on the encoder performance. One is
the number of reference frames, Rf , the other is the search
range, Sr, as illustrated in Fig. 5. Initially, we evaluated the
effect of choosing different values for Sr and Rf . Figure 6
shows PSNR plots comparing JPEG2000, AVC-I and ADC
(Np = 4 frames/page), for different combinations of Sr and
Rf . The PSNR was calculated using the global mean square
error (MSE). The higher the Sr and Rf values, the better the
rate-distortion performance. In particular, for Sr = 32 pixels
and Rf = 5 frames, ADC outperforms AVC-I by more than 2
dB and JPEG2000 by more than 5 dB, at 0.5 bit/pixel (bpp).
Our test set is composed by 18 documents divided into the
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 5
TABLE IAVERAGE OBJECTIVE (PSNR IN DB) IMPROVEMENT OVER EXISTING
STANDARDS FOR THE 4 DOCUMENT TEST SETS.
Document Set 0 1 2 3
JPEG2000 6.26 5.61 4.47 2.41
AVC-I 2.75 2.58 1.56 0.89
following 4 classes2:
• Class 0: 6 multi-page text-only documents;
• Class 1: 6 single-page text-only documents;
• Class 2: 3 multi-page compound documents; and
• Class 3: 3 single-page compound documents.
Figure 7 illustrates one example of each class. Examples
of PSNR plots are shown in Figs. 8. Figure 9 shows average
PSNR improvement of ADC over JPEG2000 and AVC-I for
each of the document class test sets. In all cases, JPEG2000
and AVC-I are objectively outperformed, considering Sr = 32,
Rf = 5 and Np = 16. Figure 10 (a) shows a zoomed
part of the original “cerrado” sequence. Its encoded and
reconstructed versions using AVC-I, JPEG2000 and ADC,
at approximately 0.25 bits/pixel, are shown in Figs. 10 (b),
(c) and (d), respectively. ADC also yields superior subjective
quality. As a reference, in Table I we present the gains
of the proposed method over AVC-I and JPEG2000 using
Bjontegaard’s method [39] applied to the curves in Fig. 9. As
one can see from the table and from the graphics, the gains
are very substantial.
Our software-based tests using the popular and efficient
x246 implementation of the H.264/AVC standard and the
Kakadu implementation of JPEG 2000, indicate that ADC
(AVC running with 5 reference frames, slowest search, 32×32
window, in IPPPP... mode) is near 10× slower than JPEG
2000. x264 in an Intel Core i7 platform has been shown to
encode RGB video at a rate of 3 to 30 million pixels per
second (Mps). The variation is due to the various x264 settings
that affect speed and quality. Of course, in order to do that,
the system was be dedicated to the task, as an appliance. A
scanned letter-sized (8.5× 11 in) page at 600 pixels per inch
(ppi) yields about 33 million pixels. Hence, we can expect a
page compression speed roughly in the order of 5 to 50 pages
per minute (ppm). This page rate may be acceptable for many
on-the-fly applications and is definitely reasonable for off-line
compression of books and such. A rigorous complexity study
of the encoding algorithms presented here is beyond the scope
of this paper.
IV. CONCLUSIONS
In this paper, we presented a pattern matching/transform-
based encoder for scanned documents named ADC. The reason
why we decided to use H.264/AVC tools to implement the
proposed method is because its interframe prediction scheme
allied with RDO yield an efficient pattern matching algorithm.
2The entire test set is available at http://image.unb.br/queiroz/testset
In addition, the intraframe prediction, the DCT-based trans-
form and the CABAC contribute to improve the encoding
efficiency.
In essence, our work can be summarized as splitting the
document into many pages, forming frames, and feeding the
frames to AVC. Despite the simplicity of the idea, the perfor-
mance for scanned documents is unrivaled, to our knowledge.
Results show that ADC objectively outperforms AVC-I and
JPEG2000 by up to 2.7 dB and 6.2 dB, respectively, with more
significant gains observed for multi-page text-only documents.
Furthermore, the encoder outputs documents with superior
subjective quality. Replacing H.264/AVC by HEVC in ADC
would yield even larger gains.
REFERENCES
[1] JBIG, “Information Technology - Coded Representation of Picture andAudio Information - Progressive Bi-level Image Compression. ITU-TRecommendation T.82,” Mar. 1993.
[2] JBIG2, “Information Technology - Coded Representation of Picture andAudio Information - Lossy/Lossless Coding of Bi-level Images. ITU-TRecommendation T.88,” Mar. 2000.
[3] S. Mori, C.Y. Suen, and K.; Yamamoto, “Historical review of OCRresearch and development,” Proceedings of the IEEE, vol. 80, no. 7, pp.1029–1058, Jul. 1992.
[4] R. L. de Queiroz, Compressing Compound Documents, in the Document
and Image Compression Handbook, by M. Barni, Marcel-Dekker, EUA,2005.
[5] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compres-
sion Standard, Chapman and Hall, 1993.
[6] JPEG, “Information Technology - JPEG2000 Image Coding System -Part 1: Core Coding System. ISO/IEC 15444-1,” 2000.
[7] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression
Fundamentals, Standards and Practice, Kluwer Academic, EUA, 2002.
[8] MRC, “Mixed Raster Content (MRC). ITU-T Recommendation T.44,”1999.
[9] R. L. de Queiroz, R. Buckley, and M. Xu, “Mixed Raster Content(MRC) model for compound image compression,” Proc. of SPIE Visual
Communications and Image Processing, vol. 3653, pp. 1106–1117, Jan.1999.
[10] P. Haffner, P. G. Howard, P. Simard, Y. Bengio, and Y. Lecun, “Highquality document image compression with DjVu,” Journal of Electronic
Imaging, vol. 7, pp. 410–425, 1998.
[11] G. Feng and C. A. Bouman, “High-quality MRC document coding,”IEEE Trans. on Image Processing, vol. 15, no. 10, pp. 3152–3169, Oct.2006.
[12] A. Zaghetto, R. L de Queiroz, and D. Mukherjee, “MRC compressionof compound documents using threshold segmentation, iterative data-filling and H.264/AVC-INTRA,” Proc. Indian Conference on Computer
Vision, Graphics and Image Processing, Dec. 2008.
[13] A. Zaghetto and R. L de Queiroz, “Improved layer processing for MRCcompression of scanned documents,” Proc. of IEEE Intl. Conference on
Image Processing, pp. 1993 – 1996, Nov. 2009.
[14] E. Walach and E. Karnin, “A fractal-based approach to image com-pression,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal
Processing, vol. 11, pp. 529 – 532, Apr. 1986.
[15] S.M. Kocsis, “Fractal-based image compression,” in Proc. 23rd.
Asilomar Conf. on Signals, Systems and Computers, vol. 1, pp. 177–181, 1989.
[16] A. Wakatani, “Improvement of adaptive fractal image coding on GPUs,”in Proc. IEEE Intl. Conf. on Consumer Electronics, pp. 255 –256, Jan.2012.
[17] D.C. Garcia and R.L. de Queiroz, “Least-squares directional intraprediction in h.264/avc,” IEEE Signal Processing Letters, vol. 17, no.10, pp. 831–834, Oct. 2010.
[18] Y. Liu L. Liu and E. Delp, “Enhanced intra prediction using contex-tadaptive linear prediction,” in Proc. of PCS 2007 - Picture Coding
Symp., Nov. 2007.
[19] C. Lan, J. Xu, F. Wu, and G. Shi, “Intra frame coding with templatematching prediction and adaptive transform,” in Proc. IEEE Intl. Conf.
Image Processing,, pp. 1221 –1224, Sep., 2010.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 6
(a) (b) (c) (d)
Fig. 7. Examples of documents used in our experiments: (a) class 0: multi-page text-only documents; (b) class 1: single-page text-only documents; (c) class2: multi-page compound documents; and (d) class 3: single-page compound documents.
[20] E. B. Lima Filho, E. A. B. da Silva, M. B. de Carvalho, and F. S. Pinage,“Universal image compression using multiscale recurrent patterns withadaptive probability model,” IEEE Trans. on Image Processing, vol. 17,no. 4, pp. 512–527, Apr. 2008.
[21] N. Francisco, N. Rodrigues, E. da Silva, M. de Carvalho, S. de Faria, andV. da Silva, “Scanned compound document encoding using multiscalerecurrent patterns,” IEEE Trans. on Image Processing, vol. 19, no. 10,pp. 2712–2724, Apr. 2010.
[22] JVT, “Advanced Video Coding for Generic Audiovisual Services. ITU-TRecommendation H.264,” Nov. 2007.
[23] I. E. G. Richardson, H.264 and MPEG-4 video compression, Wiley,EUA, 2003.
[24] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overviewof the H.264/AVC video coding standard,” IEEE Trans. on Circuits and
Systems for Video Technology, vol. 13, no. 7, pp. 560–576, July 2003.
[25] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan,“Rate-constrained coder control and comparison of video coding stan-dards,” IEEE Trans. on Circuits and Systems for Video Technology, vol.13, no. 7, pp. 688–703, July 2003.
[26] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira,T. Stockhammer, and T. Wedi, “Video Coding with H.264/AVC: Tools,Performance, and Complexity,” IEEE Circuits and Systems Magazine,vol. 4, no. 1, pp. 7–28, Mar. 2004.
[27] G. J. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC AdvancedVideo Coding Standard: Overview and Introduction to the Fidelity RangeExtensions,” Proc. of SPIE Conference on Applications of Digital Image
Processing XXVII, Special Session on Advances in the New Emerging
Standard: H.264/AVC, vol. 5558, pp. 53–74, Aug. 2004.
[28] N. Kamaci and Y. Altunbasak, “Performance comparison of theemerging H.264 video coding standard with the existing standards,”Proc. Intl. Conf. on Multimedia and Expo, vol. 1, pp. 345–348, July2003.
[29] B. G. Haskell, A. Puri, and A. N. Netravalli, Digital Video: An
Introduction to MPEG-2, Chapman and Hall, EUA, 1997.
[30] ITU-T, “Video Coding for Low Bit Rate Communication. ITU-TRecommendation H.263,” Version 1: Nov. 1995, Version 2: Jan. 1998,Version 3: Nov. 2000.
[31] R. L. de Queiroz, R. S. Ortis, A. Zaghetto, and T. A. Fonseca, “Fringebenefits of the H.264/AVC,” Proc. of International Telecommunications
Symposium, pp. 208–212, Sep. 2006.
[32] D. Marpe, V. George, and T. Wiegand, “Performance comparison ofintra-only H.264/AVC and JPEG2000 for a set of monochrome ISO/IECtest images,” Contribution JVT ISO/IEC MPEG and ITU-T VCEG, Doc.
JVT M-014, Oct. 2004.
[33] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binaryarithmetic coding in the H.264/AVC video compression standard,” IEEE
Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp.620 – 636, July 2003.
[34] A. Al, B. P. Rao, S. S. Kudva, S. Babu, D. Sumam, and A. V.Rao, “Quality and complexity comparison of H.264 intra mode withJPEG2000 and JPEG,” Proc. of IEEE International Conference on Image
Processing, vol. 1, pp. 24–27, Oct. 2004.
[35] A. Zaghetto and R. L de Queiroz, “High-quality scanned book compres-sion using pattern matching,” Proc. of IEEE International Conference
on Image Processing, pp. 26 – 29, Sep. 2010.[36] K. Ugur et. al. , “High performance, low complexity video coding and
the emerging HEVC standard,” IEEE Trans. on Circuits and Systems
for Video Technology, vol. 20, no. 12, pp. 1688–1697, Dec. 2010.[37] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: From error visibility to structural similarity,” IEEE
Transactios on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.[38] M.H. Pinson, and S. Wolf, “A new standardized method for objectively
measuring video quality,” IEEE Transactions on Broadcasting, vol.50,no.3, pp. 312- 322, Sept. 2004.
[39] G. Bjontegaard, “Calculation of average PSNR differences betweenRD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX,Apr. 2001.
Alexandre Zaghetto received the Engineer degreein 2002, from the Federal University of Rio deJaneiro, Rio de Janeiro, Brazil, the M.Sc. degreein 2004, from the University of Brasilia, Brasilia,Brazil, and and the Ph.D. degree in 2009 also fromthe University of Brasilia, all in Electrical Engi-neering. In 2009, he became Associate Professor atthe Computer Science Department at University ofBrasilia. His main research interests are in image andvideo processing, compound document coding, bio-metrics, fuzzy logic and artificial neural networks.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 7
0.2 0.3 0.4 0.5 0.6 0.7 0.8
25
30
35
40
45
Bitrate(bpp)
PS
NR
(dB
)Comparison between coders: "guita"
ADC − 16 frames/page
ADC − 4 frames/page
ADC − 1 frame/page
AVC−I
JPEG2000
(a)
0.2 0.4 0.6 0.8
25
30
35
40
Bitrate(bpp)
PS
NR
(dB
)
Comparison between coders: page 2 of "guita"
ADC − 16 frames/page
ADC − 4 frames/page
AVC−I
JPEG2000
(b)
0.3 0.4 0.5 0.6 0.7 0.8
30
35
40
Bitrate(bpp)
PS
NR
(dB
)
Comparison between coders: "paper"
ADC − 16 frames/page
ADC − 4 frames/page
AVC−I
JPEG2000
(c)
0.4 0.6 0.8 1
28
30
32
34
36
38
40
42
Bitrate(bpp)
PS
NR
(dB
)
Comparison between coders: "carta"
ADC − 16 frames/page
ADC − 4 frames/page
AVC−I
JPEG2000
(d)
Fig. 8. Examples of PSNR plots for: (a) class 0 (multi-page, text-only), document “guita” (number of pages: 2, size: 1568 × 1024 pixels); (b) class 1(single-page, text-only), page 2 of document “guita”(1568× 1024 pixels); (c) class 2 (multi-page, compound), document “paper” (number of pages: 4, size:2304 × 1632 pixels); and (d) class 3 (single-page, compound), document “carta” (2152 × 1632 pixels). Search range and number of reference frames areSr = 32 and Rf = 5, respectively.
Ricardo L. de Queiroz received the Engineer de-gree from Universidade de Brasilia , Brazil, in 1987,the M.Sc. degree from Universidade Estadual deCampinas, Brazil, in 1990, and the Ph.D. degreefrom The University of Texas at Arlington , in 1994,all in Electrical Engineering.
In 1990-1991, he was with the DSP researchgroup at Universidade de Brasilia, as a researchassociate. He joined Xerox Corp. in 1994, wherehe was a member of the research staff until 2002.In 2000-2001 he was also an Adjunct Faculty at
the Rochester Institute of Technology. He joined the Electrical EngineeringDepartment at Universidade de Brasilia in 2003. In 2010, he became a FullProfessor at the Computer Science Department at Universidade de Brasilia.Dr. de Queiroz has published over 150 articles in Journals and conferencesand contributed chapters to books as well. He also holds 46 issued patents.He is an elected member of the IEEE Signal Processing Society’s MultimediaSignal Processing (MMSP) Technical Committee and a former member of theImage, Video and Multidimensional Signal Processing (IVMSP) TechnicalCommittee. He is a past editor for the EURASIP Journal on Image andVideo Processing, IEEE Signal Processing Letters, IEEE Transactions on
Image Processing, and IEEE Transactions on Circuits and Systems for VideoTechnology. He has been appointed an IEEE Signal Processing SocietyDistinguished Lecturer for the 2011-2012 term.
Dr. de Queiroz has been actively involved with the Rochester chapter ofthe IEEE Signal Processing Society, where he served as Chair and organizedthe Western New York Image Processing Workshop since its inception until2001. He is now helping organizing IEEE SPS Chapters in Brazil andjust founded the Brasilia IEEE SPS Chapter. He was the General Chairof ISCAS’2011, and MMSP’2009, and is the General Chair of SBrT’2012.He was also part of the organizing committee of ICIP’2002. His researchinterests include image and video compression, multirate signal processing,and color imaging. Dr. de Queiroz is a Senior Member of IEEE, a memberof the Brazilian Telecommunications Society and of the Brazilian Society ofTelevision Engineers.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 8
0.2 0.4 0.6 0.8
26
28
30
32
34
36
38
40
42
Bitrate(bpp)
Ave
rag
e P
SN
R(d
B)
Average over class 0
ADC − 16 frames/page
AVC−I
JPEG2000
(a)
0.4 0.6 0.8 1
26
28
30
32
34
36
38
40
42
Bitrate(bpp)
Ave
rag
e P
SN
R(d
B)
Average over class 1
ADC − 16 frames/page
AVC−I
JPEG2000
(b)
0.4 0.6 0.8 1 1.2
26
28
30
32
34
36
38
40
Bitrate(bpp)
Ave
rag
e P
SN
R(d
B)
Average over class 2
ADC − 16 frames/page
AVC−I
JPEG2000
(c)
0.4 0.6 0.8 1 1.2 1.4 1.6
28
30
32
34
36
38
40
Bitrate(bpp)
Ave
rag
e P
SN
R(d
B)
Average over class 3
ADC − 16 frames/page
AVC−I
JPEG2000
(c)
Fig. 9. Comparison of ADC against JPEG2000 and AVC-I in terms of PSNR averaged for documents in: (a) class 0; (b) class 1; (c) class 2; and (d) class 3.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
SUBMITTED TO IEEE TRANS. IMAGE PROCESSING DEC. 2010 9
(a) (b)
(c) (d)
Fig. 10. Subjective comparison among coders: (a) zoomed part of “cerrado” sequence; reconstructed versions using (b) AVC-I, (c) JPEG2000 and (d) ADC,at approximately 0.25 bits/pixels. ADC yields superior subjective quality.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TIP.2013.2251641
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].