IEIE Transactions on Smart Processing and Computing, vol...

IEIE Transactions on Smart Processing and Computing, vol. 3, no. 1, February 2014 http://dx.doi.org/10.5573/IEIESPC.2014.3.1.1

1

IEIE Transactions on Smart Processing and Computing

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

Yong-Hwan Kim1,2, Dong-Hyeok Kim2, Joo-Young Yi2, and Je-Woo Kim2

1 Smart Media Research Center, Korea Electronics Technology Institute / Seoul, 121-835, Korea [email protected] 2 Multimedia IP Research Center, Korea Electronics Technology Institute / Seongnam-si, Gyeongggi-do 483-816, Korea

{kdh007, jyyi, jwkim}@keti.re.kr

* Corresponding Author: Yong-Hwan Kim

Received October 20, 2013; Revised October 31, 2013; Accepted November 15, 2013; Published February 28, 2014

* Regular Paper

Abstract: This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

Keywords: HEVC, SAO, Low-latency, Multi-core, SIMD 1. Introduction

HIGH Efficiency Video Coding (HEVC), which is also known as the H.265 video codec, is the latest video compression standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) group, which was established by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Expert Group (VCEG) [1, 2]. The HEVC is expected to achieve an up to 50% bit-rate reduction with the same visual quality relative to the former Advanced Video Coding (AVC/H.264) standard [3].

In HEVC, pictures are divided uniformly into square blocks called Coding Tree Units (CTUs), which are similar to the Macroblocks used in earlier standards. These CTUs are divided further in a quadtree structure to form Coding Units (CUs), which form the basic processing unit. In-loop filtering of the HEVC consists of two stages. The first stage is deblocking filter (DF) and second stage is the Sample Adaptive Offset (SAO) filter, as shown in Fig. 1.

The SAO is a newly adopted tool in HEVC to enhance

both the subjective and objective quality [4]. The DF reduces blocking artifacts, and the SAO reduces both ringing and banding artifacts. In addition, the SAO can enhance the edge sharpness or smoothing.

According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level [1, 5]. That is, SAO filtering is an inherently picture-based operation because it requires deblocked pixels from all of its eight neighbors (left, above, right, below, above-left, above-right, below-left, and below-right) CTUs. Most realtime decoders, however, execute their sub-modules on a CTU basis for reducing the latency and memory bandwidth. In addition, multi-core decoding is widely used since large picture, such as UHD (3840x2160), decoding is not possible in single-core decoding. Therefore, a CTU-based, low-latency SAO filtering architecture is essential for utilizing pipelined parallel decoding efficiently. Parveen.G.B et al. proposed a low-latency SAO encoding architecture for a multi-core encoding environment, where method-B appears to be a good trade-off between the

Kim et al.: Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

2

latency and compression ratio [6]. P. N. Subramanya et al. proposed a low-latency SAO decoding architecture for a multi-core HW decoding environment, where method C provides CTU-based low-latency SAO filtering [7]. Method C has two drawbacks: 1) the side information overhead is high because it processes the pixel data from four CTUs simultaneously, 2) less efficient in Single Instruction Multiple Data (SIMD) optimization, which is indispensable to all SW codecs, due to the many conditional branches and data alignment problems [8].

This paper presents a low-latency CTU-based SAO filtering architecture to enable efficently pipelined parallel decoding and effective SIMD optimization. In addition, a SIMD optimization scheme of SAO filtering is proposed to reduce the SAO filtering time significantly.

This paper is organized as follows. Section 2 explains a SAO overview and the existing picture-based SAO filtering architecture. The proposed algorithms are presented in detail in Section 3. Section 4 reports the experimental results and compares those with the existing algorithm, and Section 5 concludes the paper.

2. SAO Overview

SAO is a non-linear amplitude mapping filter that operates on deblocked pixels [1, 2, 4]. Each CTU can have one of three types of SAO filtering: (1) No SAO filtering, (2) Band Offset (BO), and (3) Edge Offset (EO). Both BO and EO add a certain offset value to a sample, where the offset value of BO is chosen from the received lookup table by the sample magnitude. On the other hand, the offset value of EO is chosen from the received lookup table by a edge direction and gradient.

2.1 Band Offset (BO) BO classifies all pixels of a CTU into multiple bands,

where each band contains pixels with the same intensity. For example, 8-bit pixel intensities (0-255) are divided into 32 fixed bands with the width of the band as 8 samples. Only four consecutive bands and offsets are selected and signaled to a decoder, as shown in Fig. 2. Note that BO does not use the pixels of the neighboring CTUs, and four band offsets can be wrapped around. The BO filtering equation is expressed as Eq. (1).

m = src[x][y]>>(bitDepth-5), dst[x][y] = Clip3( 0, (1<<bitDepth)-1, (1) src[x][y] + BoOffsetTab[m] )

where Clip3(min, max, a) = (a > max) ? max : ((a < min) ? min : a). The src[x][y] and dst[x][y] represent a deblocked pixel and SAO-filtered pixel, respectively, m means a quantized pixel, and BoOffsetTab[0..31] holds four BO offset values and 28 zero values. x and y are pixel coordinates with a range of values: x=0..CtbWidth, y=0..CtbHeight, where CtbWidth and CtbHeight represent Coding Tree Block (CTB) width and height in pixel units, respectively.

2.2 Edge Offset (EO) EO uses the neighboring pixels including those of

neighboring CTUs to compute edge directional information. Fig. 3 shows four EO classes that specify the edge direction, where the c represent current pixel and both b and c represent the neighboring pixels. Note that in a CTU, one EO class and four edge offset values according to pre-defined category are signaled to a decoder.

Given a EO class, each sample is classified into five categories, as shown in Table 1, where category 0 means no SAO filtering.

m = Sign(src[x][y] - src[x-1][y]), n = Sign(src[x][y] - src[x+1][y]), (2)

Intra Prediction

Motion Compensation

DecodedPicture Buffer

ReconstructionInverse

Quantization/Transform

Entropy Decoding

Intra mode info.

Inter mode info.SAO info.Residuals

Deblocking Filter

SAO

Loop Filter

Bitstream

Displayor File

Fig. 1. Block diagram of HEVC decoder.

Fig. 2. Band offsets.

EO class 0(Horizontal: 0o)

EO class 1(Vertical: 90o)

EO class 2(135o diagonal)

EO class 3(45o diagonal)

Fig. 3. Four EO classes.

Table 1. EO sample classification rule.

Category (Edge Type) Condition Description

0 c==a && c==b No SAO filtering1 c < a && c < b Local minima

2 (c < a && c==b) || (c==a && c < b) Negative edge

3 (c > a && c==b) || (c==a && c > b) Positive edge

4 c > a && c > b Local maxima

IEIE Transactions on Smart Processing and Computing, vol. 3, no. 1, February 2014

3

dst[x][y] = Clip3( 0, (1<<bitDepth)-1, src[x][y] + EoOffsetTab[m + n + 2] )

Eq. (2) shows EO class 0 filtering equation, where

Sign(a) = (a > 0) ? 1 : ((a==0) ? 0 : -1) and EoOffsetTab [0..4] has five EO offset values including one zero value. The other EO class filtering equations can be derived easily by adjusting the x and y coordinates according to the angle of Fig. 3.

2.3 Picture-based SAO Architecture A picture-based SAO architecture is the default method

of performing SAO filtering after decoding and deblocking a picture in the HEVC standard and HM [1, 5]. Fig. 4 shows a flowchart of the architecture, where saoPicBuf is the backup buffer for retaining the original deblocked pixels. The srcPicBuf holds the original deblocked pixels and SAO filtered pixels as a result. Unlike DF, SAO filtering always uses the original deblocked pixels, not the previously SAO-filtered pixels. This is why the picture-sized backup buffer is required. The iCtu represents the CTU number and iSizeInCtu means the total number of CTUs in a picture. The saoPicBuf is the input and the srcPicBuf is the output of the SaoFilterCTU_HM() function, which filters the current CTU pixels by selectively referencing the neighboring CTUs.

After normal SAO filtering for a picture using iterative CTU-based filtering, the PCM samples and losslessly coded samples are restored in the RestorePcmSamplesPic() function.

This architecture has two major drawbacks: 1) high latency, which makes it difficult to achieve efficiently pipelined parallel decoding; and 2) large memory requirements due to picture backup buffer (i.e., saoPicBuf).

3. The Proposed Scheme

To overcome the drawbacks of a picture-based SAO architecture, low-latency CTU-based SAO architecture was designed, which also takes into account the following two points: 1) SIMD-friendly structure, and 2) Easy

implementation covering various Tile and slice combinations. Finally, SIMD optimization of SAO filtering is proposed to reduce significantly the SAO filtering time itself.

3.1 Low-latency SAO Architecture Unlike the HM, which executes DF in picture-basis, we

assume that DF is performed on a CTU-basis. Basically, the SAO of the current CTU can begin filtering after decoding and deblocking its below-right CTU because it requires deblocked pixels of the eight neighboring CTUs. For example, after deblocking CTU #10, the SAO of CTU #1 can be started, as shown in Fig. 5. The proposed architecture follows the basic nature of the SAO precisely, unlike previous work [7]. That is, SAO filtering is performed on the entire pixels of a CTU after deblocking its below-right CTU. Note that last column CTUs are processed immediately after its left CTU, and the last CTU row is processed separately after deblocking the last CTU of a picture. Fig. 6 shows the filtering order in the case of the Tile structure. Fig. 7 presents a flowchart of the proposed architecture, where iCtuX and iCtuY represent the x and y coordinates of a CTU, respectively. The iWidthInCtu and iHeightInCtu means the number of CTUs in a CTU row and column, respectively. The proposed architecture has the following three steps: 1) process an above-left CTU if available, 2) process an above CTU if the current CTU is the right-most CTU, and 3) process the last CTU row if the current CTU is the last one in the picture. Note that Fig. 7 explains SAO filtering order of Figs. 5 and 6.

To reduce the backup buffer overhead of the picture-based SAO and enable low-latency SAO filtering, the CTU-based SAO buffer structure was designed. First, the proposed CTU-based SAO architecture requires four types of buffer, as shown in Fig. 8 and Table 2. That is, the original deblocked pixels such as all the internal bottom-

iCtu < iSizeInCtu

SaoFilterCTU_HM(iCtu)

Y

iCtu = iCtu + 1

Start (iCtu=0)

End

memcpy(saoPicBuf, srcPicBuf, sizeofPic)

N

RestorePcmSamplesPic()

Fig. 4. Flowchart of picture-based SAO architecture.

1 [s110] 2 [s2

11] 3 [s3

12] 4 [s4

13] 5 [s5

14] 6 [s6

15] 7 [s7

16] 8 [s8

16]

9 [s918] 10 [s10

19] 11 [s11

20] 12 [s12

21] 13 [s13

22] 14 [s14

23] 15 [s15

24] 16 [s16

24]

17 [s1726] 18 [s18

27] 19 [s19

28] 20 [s20

29] 21 [s21

30] 22 [s22

31] 23 [s23

32] 24 [s24

32]

25 [s2532] 26 [s26

32] 27 [s27

32] 28 [s28

32] 29 [s29

32] 30 [s30

32] 31 [s31

32] 32 [s32

32]

* sNM : SAO filtering order N after M-th CTU decoding and deblocking

CTU

Fig. 5. Low-latency SAO filtering order without a Tile.

1 [s16] 2 [s2

7] 3 [s3

8] 4 [s4

13] 9 [s5

14] 10 [s6

15] 11 [s7

16] 12 [s8

16]

5 [s918] 6 [s10

19] 7 [s11

20] 8 [s18

29] 13 [s19

30] 14 [s20

31] 15 [s21

32] 16 [s22

32]

17 [s1222] 18 [s13

23] 19 [s14

24] 20 [s23

33] 29 [s24

34] 30 [s25

35] 31 [s26

36] 32 [s27

36]

21 [s1526] 22 [s16

27] 23 [s17

28] 24 [s28

37] 33 [s29

38] 34 [s30

39] 35 [s31

40] 36 [s32

40]

25 [s3340] 26 [s34

40] 27 [s35

40] 28 [s36

40] 37 [s37

40] 38 [s38

40] 39 [s39

40] 40 [s40

40]

Tile boundaryCTU

Fig. 6. Low-latency SAO filtering order with four Tiles.


4

most (B1) pixels, a external right-bottom (B2) pixel, internal right-most (C1) pixels, an external below-right (C2) pixel, and an internal bottom-right pixel (A) of current CTU should be stored for SAO filtering of the other CTUs. Note that A pixel should be stored at separate buffer for the below-right CTU because the same pixel in the B and C buffers is replaced with other one in the below and right CTU. In Table 2, iCtbWidthY and iCtbWidthC represent Y and Cb/Cr CTB width, respectively.

For example, assume the UHD video (3840x2160) stream, where typically iCtbWidthY = 64 and iCtbWidthC

= 32. The picture-based SAO requires a buffer with size equal to 3840x2160x1.5=11.9 Mbytes. The proposed low-latency CTU-based SAO architecture requires a buffer with a size equal to 24.3 Kbytes (Table 2), which amounts to only 0.2% compared to the picture-based SAO.

Fig. 9 presents a flowchart of the proposed SAO filtering algorithm using the four buffers in Table 2, which is SaoFilterCTU() in Fig. 7. In Fig. 9, only Y filtering is shown for simplicity. The BackupPixels() function stores some pixels of the current, right, and below CTUs before SAO filtering, where the A, B, and C buffers are used. The SaoBlockCopy() function copies pixels of: (a) current CTU from srcPicBuf buffer, (b) right and below CTUs from srcPicBuf buffer if available, (c) above-left CTU from A buffer, (d) above and above-right CTUs from B

iCtuY > 0

SaoFilterCTU(iCtuX-1, iCtuY-1 )

iCtuX >= (iWidthInCtu-1)

Y

SaoFilterCTU(iCtuX, iCtuY-1)Y

Start

End

N

N

(iCtuX >= (iWidthInCtu-1))&& (iCtuY >= (iHeightInCtu-1))

(iCtuX < iWidthInCtu)

N

iCtuX = iCtuX+1

Y

iCtuX = 0

SaoFilterCTU(iCtuX, iCtuY)

Y

N

(3)Process the last CTU row

(1)Process an above-left CTU

(2)Process an above CTU

iCtuX > 0Y

N

Fig. 7. Flowchart of the low-latency SAO filteringarchitecture.

CTU

A B1 B2

C1

C2

CTU boundary

pixel

(a)

CTU

AB1 B2

C1

C2

CTU boundary

(b)

Fig. 8. CTU boundary pixels (a) Neighboring pixelsrequired for filtering current CTU, (b) Current CTU'spixels (A, B1, B2, C1, and C2) to be stored for filteringthe other CTUs.

Table 2. Four buffers for the proposed SAO architecture.

Buffer Size Description

A Y: iWidthInCtu * (iHeightInCtu-1) Cb: the same as above Cr: the same as above

An Above-left pixel

B (B1 and B2)

Y: iWidthInCtu * (iCtbWidthY+1) Cb: iWidthInCtu * (iCtbWidthC+1) Cr: the same as above

Above and Above-right

pixels [line buffer]

C (C1 and C2)

Y: iHeightInCtu * (iCtbWidthY+1) Cb: iHeightInCtu * (iCtbWidthC+1) Cr: the same as above

Left and below-left pixels

[line buffer]

D Y:(iCtbWidthY+2)*(iCtbWidthY+2) Cb:(iCtbWidthC+2)*(iCtbWidthC+2) Cr: the same as above

Current CTU buffer including all neighboring

pixels

saoType==SAO_MODE_OFF

N

SaoBlockCopy(blkWidth, blkHeight, iBdAvail)

Y

Start

iBdAvail = Derive boundary availability

End

BackupPixels(iCtuX, iCtuY)

BackupPixels(iCtuX, iCtuY)

SaoFilterCtb(blkWidth, blkHeight, iBdAvail)

bPcmDisable || bTrQuantBypass

RestorePcmSamplesCtb(iCtuX, iCtuY)

Y

N

Fig. 9. Flowchart of the SaoFilterCTU().


5

buffer, (e) left and left-below CTUs from C buffer, to D buffer. The SaoFilterCtb() function filters the entire pixels of the current CTU from D buffer as the input and srcPicBuf buffer as the output using Eq. (1) or (2) according to the SAO type and EO class.

The PCM and lossless samples were restored on a CTU-basis. The RestorePcmSamplesCtb() function selectively restores the original samples in a CTU.

Figs. 10 and 11 show the pseudo code of BackupPixels() and SaoBlockCopy() functions, respectively. The blkWidth and blkHeight of Figs. 9 and 11 represent the width and height of the Coding Block (CB) to be filtered, respectively, which can be different from iCtbWidth in the case of the right-most CTUs and bottom-most CTUs.

Picture-based pipelined parallel decoding [9] is possible using the proposed low-latency CTU-based SAO architecture, as shown in Fig. 12. For example, picture 2 (core 2) decoding can be started immediately after the SAO filtering second CTU row of picture 1 (core 1). In addition, picture 3 (core 3) decoding can be started immediately after SAO filtering the second CTU row of picture 2 (core 2). Therefore, in this example, three cores can decode the pictures simultaneously in pipelined manner. This property is very efficient for large picture decoding, such as a UHD video, because the decoding overhead is decentralized to the multi-core CPU. Note that the thread syncronization unit between the cores can be one out of a CTU, multiple CTUs, or CTU row.

3.2 SIMD Optimization Scheme Eq. (2) was optimized using SIMD instruction [8]. The

entire pixels of the current CTU are filtered in the proposed low-latency SAO architecture.

Fig. 13 shows the SIMD pseudo code. The proposed scheme is composed of seven steps: 1) load left 16 pixels,

right 16 pixels, and current 16 pixels, 2) conversion from 8-bit pixels to 16-bit pixels, 3) calculation of the edge type per pixel using Table 1, 4) table lookup of EoOffsetTab[0...15], 5) add pixels and offset values, 6) save 16 pixels to destination buffer, 7) conditional pixel restoration. The CALC_ETYPE() macro calculates the 8 edge types simultaneously using Table 1. Sixteen table lookup operations are performed at once using _mm_shuffle_epi8() instruction, as shown in Fig. 14. Note that the blkWidth value is assumed to be equal to a multiple of 16 for simplicity. SIMD optimization of the remaining EO class 1-3 can be designed easily by adjusting step 1 and proper handling of conditional pixel

If ((iCtbY+1) < iHeightInCtu) { (1)Store bottom line pixels of current CTU to B_buffer[iCtbX][] (B1); If (iCtbX < (iWidthInCtu-1)) { (2)Store an external right-bottom pixel of current CTU to B_buffer[iCtbX][iCtbWidth] (B2); (3)Store an internal bottom-right pixel of current CTU to A_buffer[iCtbY][iCtbX]; } } If (iCtbX < (iWidthInCtu-1)) { (4)Store internal right-most pixels of current CTU to C_buffer[iCtbY][] (C1); If ((iCtbY+1) < iHeightInCtu) { (5)Store an external below-right pixel of current CTU to C_buffer[iCtbY][iCtbWidth] (C2); } }

Fig. 10. Pseudo code of BackupPixels().

If (above CTU is available) { If (above-left CTU is available) { (1)Copy A_buffer[iCtbY-1][iCtbX-1] to D_buffer[-1][-1]; } (2)Copy B_buffer[iCtbX][] to D_buffer[-1][] (B1); } if (above-right CTU is available) { (3)Copy B_buffer[iCtbX][blkWidth] to D_buffer[-1][blkWidth] (B2); } If (left CTU is available) { (4)Copy C_buffer[iCtbY][] to D_buffer[][-1] (C1); If ( below-left CTU is available) { (5)Copy C_buffer[iCtbY][blkHeight] to D_buffer[blkHeight][-1] (C2); } } (6)Copy current CTU's pixels to D_buffer[][]; If (iCtbX < (iWidthInCtu-1)) { (7)Copy right CTU's left pixels to D_buffer[][blkWidth-1]; } If (below CTU is available) { (8)Copy below CTU's top pixels to D_buffer[blkHeight][]; }

Fig. 11. Pseudo code of SaoBlockCopy().

(1)

CTU row

(2)

(3)

(4)

(5)

Picture 1 (Core 1)

(3)

(4)

(5)

(6)

(7)

Picture 2 (Core 2)

... ...

(5)

(6)

(7)

(8)

(9)

Picture 3 (Core 3)

...

Fig. 12. Picture-based pipelined parallel decoding architecture.


6

restoration (i.e., step 7). Fig. 15 presents BO filtering optimization of Eq. (1)

using SIMD instruction. The proposed scheme is composed of seven steps: 1) prepare new band offset table, in which a zero index has a starting band offset, 2) load 16

pixels and convert those to 16-bit pixels, 3) quantize the pixels by using Eq. (2), 4) reduce the quantized pixel range and magnify the unwanted pixel values to over 0x80, 5) table lookup with a new band offset table, 6) add pixels and offset values, and 7) save 16 pixels to the destination

#define CALC_ETYPE(srcA, srcC, srcB) { xmm3 = _mm_cmpgt_epi16(srcC, srcA); xmm4 = _mm_cmpgt_epi16(srcC, srcB); srcA = _mm_cmplt_epi16(srcC, srcA); srcB = _mm_cmplt_epi16(srcC, srcB); xmm3 = _mm_srli_epi16(xmm3, 15); xmm4 = _mm_srli_epi16(xmm4, 15); xmm3 = _mm_or_si128(xmm3, srcA); xmm4 = _mm_or_si128(xmm4, srcB); srcA = _mm_add_epi16(xmm3, xmm4); /* srcA => edge_type: -2, -1, 0, 1, or 2 */ } blkWidth16 = blkWidth >> 4; xmm7 = _mm_set1_epi8(2); // [2...2] xmm8 = _mm_load_si128(EoOffsetTab); for (y = 0; y < blkHeight; y++) { for(x = 0; x < blkWidth16; x++) { // (1) load left, current, right pixels xmm1 = _mm_loadu_si128(srcBlk-1); xmm0 = _mm_srli_si128(xmm1, 1); xmm2 = _mm_srli_si128(xmm1, 2); // (2) 16-bit conversion xmm1 = _mm_cvtepu8_epi16(xmm1); xmm0 = _mm_cvtepu8_epi16(xmm0); xmm2 = _mm_cvtepu8_epi16(xmm2); // (3) calculation of 8 edge types CALC_ETYPE(xmm1, xmm0, xmm2); xmm5 = _mm_loadu_si128(srcBlk-1 + 8); xmm6 = _mm_srli_si128(xmm5, 1); xmm2 = _mm_srli_si128(xmm5, 2); xmm5 = _mm_cvtepu8_epi16(xmm5); xmm6 = _mm_cvtepu8_epi16(xmm6); xmm2 = _mm_cvtepu8_epi16(xmm2); CALC_ETYPE(xmm5, xmm6, xmm2); // (4) 16 edge types and table lookup xmm1 = _mm_packs_epi16(xmm1, xmm5); xmm1 = _mm_add_epi8(xmm1, xmm7); xmm2 = _mm_shuffle_epi8(xmm8, xmm1); // (5) addition of pixels and offsets xmm3 = _mm_cvtepi8_epi16(xmm2); xmm4 = _mm_cvtepi8_epi16( _mm_srli_si128(xmm2, 8)); xmm0 = _mm_add_epi16(xmm0, xmm3); xmm6 = _mm_add_epi16(xmm6, xmm4); // (6)save 16 pixels to destination buffer xmm0 = _mm_packus_epi16(xmm0, xmm6); _mm_store_si128(dstBlk, xmm0); srcBlk += 16; dstBlk += 16; } srcBlk -= blkWidth; dstBlk -= blkWidth; // (7) conditional pixel restoration If (left CTU is not available) dstBlk[0] = srcBlk[0]; If (right CTU is not available) dstBlk[blkWidth-1] = srcBlk[blkWidth-1]; srcBlk += srcStride; dstBlk += dstStride; }

Fig. 13. SIMD pseudo code of SaoFilterCtb() in the case of EO class 0.

o2 0 o3 o4 0o1xmm8:(EoOffsetTab)

0 0 0 0 0 0 0 0 0 0

1 1 2 2 20 1 3 3 1 4 4 2 1 1 0xmm1:(Edge type)

o2 o2 0 0 0o1 o2 o3 o3 o2 o4 o4 0 o2 o2 o1xmm2:

(offset value)

...

Fig. 14. illustration of xmm2 = _mm _shuffle _epi (xmm8, xmm1) operation in EO filtering.

If (iBandStartPos <= 28)) { shiftBit = bitDepth - 5; blkWidth16 = blkWidth>>4; // (1) Prepare new band offset table Ofs16[16] = {0}; Ofs16[0] = BoOffsetTab[iBandStartPos+0]; Ofs16[1] = BoOffsetTab[iBandStartPos+1]; Ofs16[2] = BoOffsetTab[iBandStartPos+2]; Ofs16[3] = BoOffsetTab[iBandStartPos+3]; xmm7 = _mm_set1_epi16(iBandStartPos); xmm6 = _mm_set1_epi16(iBandStartPos+4); xmm8 = _mm_load_si128(Ofs16); for (y = 0; y < blkHeight; y++) { for (x = 0; x < blkWidth16; x++) { // (2) load pixels and 16-bit conversion xmm0 = _mm_cvtepu8_epi16(*(srcBlk)); xmm1 = _mm_cvtepu8_epi16(*(srcBlk+8)); // (3) quantize pixels xmm2 = _mm_srli_epi16(xmm0, shiftBit); xmm3 = _mm_srli_epi16(xmm1, shiftBit); // (4) manipulation of quantized pixels xmm4 = _mm_subs_epi16(xmm2, xmm7); xmm5 = _mm_subs_epi16(xmm3, xmm7); xmm2 = _mm_cmpgt_epi16(xmm2, xmm6); xmm3 = _mm_cmpgt_epi16(xmm3, xmm6); xmm2 = _mm_or_si128(xmm2, xmm4); xmm3 = _mm_or_si128(xmm3, xmm5); // (5) table lookup xmm2 = _mm_packs_epi16(xmm2, xmm3); xmm2 = _mm_shuffle_epi8(xmm8, xmm2); // (6) addition of pixels and offsets xmm3 = _mm_cvtepi8_epi16(xmm2); xmm4 = _mm_cvtepi8_epi16( _mm_srli_si128(xmm2, 8)); xmm0 = _mm_add_epi16(xmm0, xmm3); xmm1 = _mm_add_epi16(xmm1, xmm4); // (7)save 16 pixels to destination buffer xmm0 = _mm_packus_epi16(xmm0, xmm1); _mm_store_si128(dstBlk, xmm0); srcBlk += 16; dstBlk += 16; } srcBlk += (srcStride - blkWidth); dstBlk += (dstStride - blkWidth); } } Else { Fall back to default HM code }

Fig. 15. SIMD pseudo code of SaoFilterCtb() in the case of BO.


7

buffer. Note that, in step 4, the quantized pixels are manipulated to use the property of the _mm_shuffle_epi() instruction, which returns 0 if a value is greater than or equal to 0x80, as shown in Fig. 16. In Fig. 15, the iBandStartPos represents the first band offset position, which is signaled to a decoder. Note that if the iBandStartPos has a value greater than 28, the band offsets are wrapped around. In such a case, default table lookup codes of HM are used.

4. Performance Evaluation

For the simulation, the proposed algorithm was applied to a sub-optimized Main profile HEVC decoder, which heavily uses SIMD instruction. JCT-VC test sequences were used and encoded by HM12.1. An experiment was performed four times in a row in the workstation1. The purpose of the first replication was to load the program code and the bitstream into the disk cache and at least partially into the L2 cache. The median value of the three other replications was reported. We disabled the rendering of the video and writing of the output files, and also subtracted files reading time from total time to minimize the effect of the I/O operations on the execution time. The decoding time was measured using the QueryPerformanceCounter() function, which is the most precise timer function in the system. All the experiments were performed by only exchanging the SAO architecture and SAO SIMD functions in the sub-optimized HEVC decoder.

Table 3 lists the performance of the proposed low-latency CTU-based SAO architecture in the single-core decoding environment. In the table, two streams (People on street, Traffic) were cropped UHD (2560x1600, 30Hz) sequences and the others were Full-HD (1920x1080, 60Hz) sequences. The SAO ratio was calculated using the following Eq. (3):

SAO ratio = (NumSAO_Y*4 + NumSAO_C*2) (3) / (NumCTU*6) * 100

where NumSAO_Y and NumSAO_C represent the number of SAO filtered Y CTB and Cb/Cr CTB of all pictures, respectively. The NumCTU means the number of CTU of all pictures. The dT represents the delta time between PicSAO and CtuSAO calculated using Eq. (4):

1 Two Zeon [email protected] (six-core), 16GB DDR3

RAM, and Windows 7 SP1

dT = (CtuSAO - PicSAO) / PicSAO * 100 (4) As shown in Table 3, the proposed low-latency CTU-

based SAO architecture has a similar speed to that of the picture-based SAO architecture, in spite of additional backup and copy operations. Note that the decoding speed of Table 3 includes the proposed SAO SIMD scheme.

Table 4 lists the performance of the SAO SIMD scheme for two sequences, where CTL represents the C table lookup codes of HM and SIMD represents the proposed SAO SIMD schemes. The proposed SIMD scheme is approximately 509% faster on average than that of HM.

Table 5 lists the total decoding speed of the decoder with the CTL as well as the proposed SIMD SAO scheme, combined with the proposed low-latency SAO architecture.

In Table 5, the speed-up gain (i.e., dT) is quite different due to the variable SAO ratio. Generally, a high SAO ratio results in a high speed-up. In the total decoding time, the proposed SIMD SAO scheme is faster than the HM SAO scheme by approximately 6.86% on average.

Although the proposed low-latency CTU-based SAO architecture was not tested in the multi-core HEVC decoder, the previous results show the significant decoding speed-up ratio up to 295% using four cores [9].

o2 o3 o4 0 0o1xmm8:(BoOffsetTab)

0 0 0 0 0 0 0 0 0 0

1 1 0x80 2 0x910 0 3 0x85 1 3 3 2 0xA0 0 0xC5xmm1:(Quantizedand manipulated pixels)

o2 o2 0 o3 0o1 o1 o4 0 o2 o4 o4 o3 0 o1 0xmm2:

(offset value)

...

Fig. 16. Illustration of xmm2 = _mm_shuffle_epi (xmm8,xmm1) operation in the BO filtering.

Table 3. Total decoding speed comparison between picture-based SAO (PicSAO) and the proposed CTU-based SAO (CtuSAO).

Sequences Bitrate [kbps]

SAO ratio [%]

PicSAO [fps]

CtuSAO[fps]

dT [%]

People on street 8390.2 18.6 14.56 14.56 0.00

Traffic 3054.5 11.5 26.10 25.89 -0.80BQTerrace 11560.8 32.9 33.40 34.57 0.70

BasketballDrive 23017.9 52.1 22.67 22.58 -0.40

ParkScene 3912.7 9.4 41.28 41.30 0.05Tennis 9072.7 49.4 38.80 38.69 -0.28

Average 9834.8 28.98 29.62 29.60 -0.12

Table 4. SAO decoding speed comparison between the HM's table lookup SAO and the proposed SIMD SAO.

Sequences SAO type CTL [sec] SIMD [sec]

Gain [%]

EO class 0 0.389 0.079 392.4 EO class 1 1.331 0.231 476.2 EO class 2 0.046 0.027 70.4 EO class 3 0.257 0.052 394.2

BQTerrace

BO 0.157 0.026 503.8 EO class 0 0.152 0.021 623.8 EO class 1 0.141 0.022 540.9 EO class 2 0.305 0.038 702.6 EO class 3 0.186 0.016 1062.5

Tennis

BO 0.030 0.007 328.6 Average - 0.299 0.052 509.5


8

Furthermore, the decoding speed-up ratio can be greater than the previous result because of two facts: 1) the final HEVC standard omits the Adaptive Loop Filter (ALF), which increases the inter-frame synchronization latency, 2) the proposed SIMD optimization of SAO filtering can reduce the latency. The proposed algorithm has advantages over a picture-based SAO architecture: 1) significantly less memory required as shown in Table 2, 2) low-latency property enables efficient multi-core decoding. In addition, the proposed SIMD optimization scheme is suitable for the low-latency CTU-based SAO architecture.

The sub-optimized Main profile HEVC decoder with the proposed algorithms passed all the HEVC conformance bitstreams [10] except for those of the Main10 profile and the proprietary 223 test bitstreams including various Tile and slice combinations.

5. Conclusion

This paper proposed a low-latency CTU-based SAO architecture and a SAO filtering optimization scheme by SIMD instructions, which can be used for the realtime decoding of 4K video in a multi-core environment. The proposed architecture showed a similar speed to other existing schemes in single-core decoding environment. The architecture has two advantages over the existing picture-based SAO architecture: 1) significantly less memory requirements, and 2) low-latency property enabling efficient multi-core decoding. In addition, the proposed architecture is suitable for efficient SIMD optimization compared to the existing CTU-based SAO filtering architecture in the SW codec. The proposed SIMD scheme for SAO filtering significantly sped up all SAO filtering classes by approximately 509% on average. Although we simulated the proposed algorithm in the decoder, the proposed methods can also be applied to a HEVC encoder without modification.

Acknowledgement

This study was supported by a grant from the Seoul R&BD program (SS110004M0229111).

References

[1] ITU-T Rec. H.265, High Efficiency Video Coding, ITU-T, March 2013. Article (CrossRefLink)

[2] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” IEEE Trans. CSVT, Vol. 22, No. 12, pp. 1649-1668, December 2012. Article (CrossRefLink)

[3] J.-R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand, “Comparison of the coding efficiency of video coding standards-including High Efficiency Video Coding (HEVC),” IEEE Trans. CSVT, Vol. 22, No. 12, pp. 1669-1684, December 2012. Article (CrossRefLink)

[4] C.-M. Fu, et al, “Sample adaptive offset in the HEVC standard,” IEEE Trans. CSVT, Vol. 22, No. 12, pp. 1755-1764, December 2012. Article (CrossRefLink)

[5] JCT-VC, HEVC Test Model (HM) reference software 12.1. Article (CrossRefLink)

[6] Parveen.G.B and R. Adireddy, “Analysis and approximation of SAO estimation for CTU-level HEVC encoder,” Proc. Int. Conf. VCIP, Nov. 2013. Article (CrossRefLink)

[7] P. N. Subramanya, R. Adireddy, and D. Anand, “SAO in CTU decoding loop for HEVC video decoder,” Proc. Int. Conf. Signal Processing and Communication, December 2013. Article (CrossRefLink)

[8] Intel, Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 2, June 2013. Article (CrossRefLink)

[9] J.-Y. Yi, Y.-H. Kim, J. Park, and J.-W. Kim, “Implementation of HEVC decoder S/W using frame-based multi-threading method,” Proc. ITC-CSCC, Sapporo, Japan, July 2012.

[10] T. Suzuki, G. Sullivan, and W. Wan, HEVC conformance draft 5, JCTVC-O1004, 15th meeting, Geneva, CH, October 2013. Article (CrossRefLink)

Yonghwan Kim received his B.S. and M.S. degrees in electrical engineering from Chung-Ang University, Seoul, Korea in 1996 and 1998, respectively, and a Ph.D. degree in image engineering from Chung-Ang University, Seoul, Korea, in 2008. From 1999 to 2001, he had worked for

SungJin C&C, Seoul, Korea, where he optimized the MPEG-1/2 Video CODEC for DVR. Since 2001 he has worked for Korea Electronics Technology Institute (KETI), Seongnam, Korea. He is currently a managerial researcher in the Multimedia IP Research Center, KETI. His current research interests are in the area of 2D and 3D video coding including HEVC, SHVC, and RGB video coding, and its implementation.

Table 5. Total decoding speed comparison between HM's table lookup SAO and the proposed SIMD SAO.

Sequences Bitrate [kbps]

SAO ratio [%]

CTL [fps]

SIMD [fps]

dT [%]

People on street 8390.2 18.6 14.02 14.56 3.85

Traffic 3054.5 11.5 24.91 25.89 3.93 BQTerrace 11560.8 32.9 32.07 34.57 7.80 Basketball

Drive 23017.9 52.1 20.92 22.58 7.93

ParkScene 3912.7 9.4 40.10 41.30 2.99 Tennis 9072.7 49.4 33.74 38.69 14.67

Average 9834.8 28.98 27.63 29.60 6.86


9

Donghyeok Kim received his B.S. and M.S. degrees in the Department of Information and Communication Engin-eering, Dongguk University in Seoul, South Korea, in 2008 and 2010, respectively. In 2014, he joined the Multimedia IP Research Center of Korea Electronics Technology Institute

(KETI), Seongnam, Korea. His research interests include High Efficiency Video Coding (HEVC), filter banks and wavelets, image processing.

Jooyoung Yi received her B.S. and M.S. degrees in electronic engineering from Chonbuk National University, Jeonju, Korea, in 2005 and 2007, respectively. In 2007, she joined the Multimedia IP Research Center of Korea Electronics Technology Institute (KETI), Seongnam, Korea. She is

currently involved in the development of video codec, such as HEVC and SHVC.

Jewoo Kim received his B.S. and M.S. degree in control and instrument engineering from the University of Seoul, Seoul, Korea in 1997 and 1999. Since 1999 he has worked for Korea Electronics Technology Institute (KETI), Seongnam, Korea. He is currently a managerial researcher in

the Multimedia IP Research Center, KETI. He has been involved in various projects including Multi-view 3D video system, video transcoding system, UHD recording system, etc. His current interests are in the area of UHD broadcasting and realistic media processing including HEVC video coding, UHD contents production system, and audio signal processing, and its implementation.

Copyrights © 2014 The Institute of Electronics and Information Engineers


10


Measurement Coding for Compressive Sensing of Color Images

Khanh Quoc Dinh, Chien Van Trinh, Viet Anh Nguyen, Younghyeon Park, and Byeungwoo Jeon

College of Information and Communication Engineering, Sungkyunkwan University / Suwon, Korea {dqkhanh, trinhchien, vietanh, neversky, bjeon}@skku.edu

* Corresponding Author: Byeungwoo Jeon


* Regular Paper

Abstract: From the perspective of reducing the sampling cost of color images at high resolution, block-based compressive sensing (CS) has attracted considerable attention as a promising alternative to conventional Nyquist/Shannon sampling. On the other hand, for storing/transmitting applications, CS requires a very efficient way of representing the measurement data in terms of data volume. This paper addresses this problem by developing a measurement-coding method with the proposed customized Huffman coding. In addition, by noting the difference in visual importance between the luma and chroma channels, this paper proposes measurement coding in YCbCr space rather than in conventional RGB color space for better rate allocation. Furthermore, as the proper use of the image property in pursuing smoothness improves the CS recovery, this paper proposes the integration of a low pass filter to the CS recovery of color images, which is the block-based

20 -norm minimization. The proposed coding scheme shows considerable gain compared to conventional measurement coding.

Keywords: Compressive sensing, Color image, Measurement coding, Color space conversion 1. Introduction

The high cost of conventional sampling using the Nyquist/Shannon rate [1, 2] for high resolution images has highlighted the need for an alternative sampling scheme. Compressive sensing (CS) [3-6] is one such possibility. For a length-N image vector x (i.e., with N pixels), CS acquires only M measurements, which is formed into a measurement vector y through a linear projection in Eq. (1):

,y x= Φ (1)

where Φ is called a measurement matrix, which determines the linear projection of signal x to its measurement vector y. For the recovery of the vector x, the measurement matrix Φ needs to satisfy certain conditions, such as the restricted isometry property [4]. The ratio of

/r M N= , which is commonly called the subrate or measurement rate, shows how effective the CS scheme is in terms of the sensing cost compared to the conventional

Nyquist/Shannon rate. The measurement matrix Φ has M N× elements,

whose size becomes tremendous in the case of high-resolution images. If sensing is achieved at a frame-basis, N refers to the number of all pixels of the three channels in a picture. As an example of 512 512× color images with 3 color channels at r = 0.5, the size of the measurement matrix is 0.5 3 512 512 393216.× × × = Even under the typical practice of processing each channel separately, this size is still quite high. Therefore, instead of frame-based sensing, block-based sensing per channel is a more practical approach [7-10]. Each channel of an image is divided into multiple blocks with a size of B B× . The blocks in each channel are sensed with a small measurement matrix BΦ with a size ,M N× where N B B= × and .M r N= × In this way, the size requirement of the measurement matrix decreases considerably, which can be stored more easily at the sensing part.

In addition to the size issue of the measurement matrix, the measurement data needs to be stored or transmitted,


11

which raises a new problem of how to encode those measurements efficiently, and is called measurement coding (MC). For grayscale images, some studies based on the DPCM concept [11-13] have already shown their coding efficiency that under an appropriate configuration (quantization step, subrate, and measurement matrix selections), their coding performance can be comparable to conventional JPEG compression. Motivated by these studies, this paper examined a measurement data-encoding method for color images using the DPCM scheme.

To make the MC more practical, the MC should equip itself with an entropy coding scheme to represent the residual measurements efficiently using the minimum number of bits. In this sense, similar to JPEG [14, 15], this study utilized the Huffman coding with a CS-friendly Huffman table synchronization between the encoder and decoder. By noting that the residual measurements obtained by applying DPCM to the measurement data follow the Laplacian distribution [12], this study developed a method that can easily generate a replica of the Huffman table at the decoder upon receiving only the Laplacian parameters (i.e., mean and variance).

Most imaging devices acquire data in RGB space. Therefore, the CS measurements of the color signal are also likely to be from RGB space. The YCbCr space treats the luma (Y) channel differently from the chroma (Cb and Cr) channels [16]. More coding error is bearable in the chroma channels than in the luma channel, and it is exploited widely in conventional image/video coding. For example, JPEG encodes the color image with different quantization tables for the luma and chroma channels [14]. Another example is color-subsampling, which is more commonly referred to as the 4:2:2 or 4:2:0 format [16]. Therefore, this paper proposes sensing in RGB space to support wider acquisition applications but encode the measurements in YCbCr space to achieve better coding performance.

This paper is organized as follows. Section 2 briefly presents the background on the conventional block-based CS of color images and MC for grayscale images, and then delivers the proposed MC for compressive sensed color images with color conversion in the measurement domain from RGB to YCbCr space. Section 3 explains the proposed improved CS recovery. Sections 4 and 5 report the experimental results and conclusion respectively.

2. Proposed measurement coding for color images

Owing to the large size of (color) images, it is normally desirable to sense images in a block-based manner, in which each color channel is first divided into multiple non-overlapping blocks of a size B B× , and the blocks are sensed independently by the small sized measurement matrix BΦ [7-10]. Equivalently, the kth block

( ) ( ) ( )TT T Tk k k

R G Bx x x⎡ ⎤⎢ ⎥⎣ ⎦

of a color image x is sensed as:

k kR B Rk kG B Gk kB B B

y xy xy x

⎡ ⎤ ⎡ ⎤Φ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥= Φ⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥Φ⎣ ⎦⎣ ⎦ ⎣ ⎦

(2)

In this way, despite sensing the whole image of three

color channels, only one block measurement matrix BΦ with a small size is stored at the sensing part.

2.1 Improved measurement coding

2.1.1 Directional measurement coding In a block-based CS of grayscale and color images, the

individual measurement data inside a block have no correlation with each other because of the random nature of the measurement matrix. On the other hand, the measurement blocks associated with the same projection (i.e., the same measurement matrix BΦ ) may have a high correlation on a block-by-block basis. This analysis was verified in other studies [11-13] by measuring the correlation using grayscale images, which is normally larger than 0.8. This paper shows that, in RGB color images, the measurements between adjacent blocks associated with the same projection also have a strong correlation. Table 1 lists the average correlation coefficient of the measurement vector y of a block with its upper neighbor yp calculated using Eq. (3).

( ) ( )

2 2

,T

p p

p p

y y y yE

y y y yρ

⎧ ⎫− −⎪ ⎪= ⎨ ⎬− −⎪ ⎪⎩ ⎭

(3)

where y and py are the mean values of y and yp, respectively. The correlation in the measurement domain is larger than 0.8 for all color channels of two test images, Lena and Peppers.

The strong correlation in the measurement domain makes it natural to apply the directional MC [12, 13] to color images to effectively represent its measurements. As shown in Fig. 1, for each color channel, the MC searches for the best predictor (in terms of residual energy) of a block y from its set of neighboring blocks (SNB): west, north, north-west, and north-east blocks. The matching criterion based on the residual energy is: { }2

minpred y SNBy y y

∈= − (4)

Table 1. Correlation coefficient ρ between the blocks in the measurement domain of RGB images (B = 32, averaged over 5 simulations with 5 Gaussian random measurement matrices)

Lena Peppers Sub-rate R G B R G B 0.1 0.967 0.873 0.956 0.954 0.804 0.8030.2 0.974 0.887 0.962 0.959 0.818 0.8170.3 0.973 0.885 0.960 0.960 0.817 0.816

Dinh et al.: Measurement Coding for Compressive Sensing of Color Images

12

Note that, the strong correlation among the measurements of the neighboring blocks allows very small residual energy. Hence, the entropy of the residual signal is also expected to be very small. Therefore, for color images, using directional coding [12, 13], the measurement vectors of all blocks can be encoded effectively in RGB space. Fig. 2 shows the directional MC for each color channel, where the residual signal is quantized and subjected to an entropy coding process.

2.1.2 Entropy coding - Huffman coding In this paper, under the design direction of pursuing

low encoding complexity instead of time-consuming arithmetic coding or even context-adaptive arithmetic coding, the Huffman coding was used for its optimality for symbol-by-symbol coding. On the other hand, two issues need to be addressed before Huffman coding can be applied to the measurement data of color images: how to effectively generate a Huffman table, and how to efficiently notify the table to the decoder.

Most residual signals in image/video processing, including the measurement residual yres (residual signal of three color channels obtained by DPCM process), follow Laplacian distribution [12, 26]. Suppose that Yr is a scalar element of the residual signal vector, yres, obtained from the DPCM process (note that it is quantized before Huffman coding, as shown in Fig. 2), Yr has a probability distribution of a Laplacian with a mean of μ and a variance of 22 /α as follows:

( )Pr2

trY t e α μα − −= = (5)

As a result, Eq. (5) gives the probability of the event

rY μ= as:

( )Pr2rY αμ= = (6)

From Eq. (6), α is approximated by ( )2 f uα = × ,

where ( )f u is the estimated probability of a residual measurement being identical to the mean value of the quantized residual. The probabilities of other values were calculated using Eq. (5). Based on the estimated probabilities, the Huffman table can be constructed. Fig. 3 shows a decent match between the approximated Laplacian distribution and the actual histogram of the quantized measurement residual of the R channel of Lena. This

suggests that the Huffman coding constructed by the estimated probabilities is expected to work well with the directional MC. Furthermore, with only notification of α , μ , and range of residual yres, a decoder can generate an exact replica of the Huffman table used in an encoder for their entropy decoding process.

2.1.3 Proposed RGB-space sensing and YCbCr-space measurement coding

In the previous section, a directional MC scheme was presented for color images in RGB color space. On the other hand, YCbCr space, but not RGB, is known to be more useful in reducing the color redundancy, as is used widely in conventional image/video coding of JPEG and HEVC [14-16]. In those coding frameworks, more error is allowable to the chroma channels (Cb and Cr) to save considerable bitrate while guaranteeing equivalent perceptual quality. To apply that property to the MC of color images, this paper proposes a simple way to convert the measurement from RGB to YCbCr space for encoding without any change in the sensing process.

2.2 Color-space conversion in measurement domain

In the spatial domain, the color-space conversion of the kth block from RGB to YCbCr is calculated easily by a

Measurement predictor

Buffer

+ Quantizer

Inverse Quantizer

+

y

ypred

yres Entropy coding

yrec

Compressbitstream

Fig. 2. DPCM scheme of measurement coding.

-100 -50 0 50 1000

0.05

0.1

prob

abili

ty

residual yres

hist. of residualappr. Laplacian dist.

Fig. 3. Histogram of the residual measurement ("hist. of residual") and its approximated Laplacian distribution ("appr. Laplacian dist.") (R-channel of the Lena image, Gaussian random measurement matrix, r= 0.3, quantization step of 4).

Fig. 1. Directional measurement prediction


13

linear transformation [17-19] as follows:

11 12 13

128 21 22 23

128 31 32 33

k k k kY R G Bk k k kCb R G B

k k k kCr R G B

x a x a x a x

x x a x a x a x

x x a x a x a x

= + +

= + + +

= + + +

, (7)

where x128 is a vector with the same size to k

Rx , but has all its values of 128. In addition, a11 = 0.299; a12 = 0.587; a13 = 0.114; a21 = -0.168736; a22 = -0.331264; a23 = a31 = 0.5; a32 = -0.418688; and a33 = -0.081312. Owing to the nature of the linear projections in Eqs. (2) and (7), the measurement vector k

YCbCry of the kth block in YCbCr space can be calculated from the measurement vector k

RGBy in RGB space as follows:

11 12 13

128 21 22 23

128 31 32 33

,

k k k kY R G Bk k k kCb R G Bk k k kCr R G B

y a y a y a yy y a y a y a yy y a y a y a y

⎡ ⎤ ⎡ ⎤+ +⎢ ⎥ ⎢ ⎥= + + +⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥+ + +⎣ ⎦ ⎣ ⎦

(8)

where y128 denotes the measurement vector of x128 as

128 128By x= Φ . Therefore, the conversion of color space in the spatial domain is equivalent to the same conversion in the measurement domain. Therefore, the MC can be performed for YCbCr to take advantage of the color decomposition nature of YCbCr. Table 2 lists the spatial correlation of Eq. (3) in the measurement domain of YCbCr space. Obviously, the correlation is also high, resulting in good performance of the directional MC described in the previous section. This paper used the same MC of RGB space in YCbCr space. After reconstructing those measurements in YCbCr space, the recovered YCbCr image, k

YCbCrx , is converted back to the recovered image in RGB space, k

RGBx , by: ( )13 128

k k kR Y Crx x b x x= + −

( ) ( )22 128 23 128k k k kG Y Cb Crx x b x x b x x= + − + −

( )32 128k k kB Y Cbx x b x x= + − (9)

where k is the index of the block being processed, and b13 = 1.402; b22 = -0.34414; b23 = -0.71414; and b32 = 1.772.

2.3 Bit allocation for the YCbCr channels As discussed in the previous subsection on the different

importance of the luma and chroma channels on the human perception of color images, this paper proposes to perform coding differently from channel to channel. That is, luma (i.e., Y) channel should receive the most care because any loss in the Y channel can be recognized easily by the viewers. Therefore, a small quantization step is assigned to MC of the Y channel. For the chroma channels, a large quantization step is used. This has less error in the reconstructed measurement of the Y channel, resulting in good recovery. The large quantization step used for the chroma channels saves bit usage. Note that the relative insensitivity of the human eye to colors makes the relatively larger coding errors in the chroma less perceptible to humans. As a result, the HVS-based tradeoff between the bit amount and recovered quality in YCbCr space provides better rate-distortion performance.

3. Improved recovery of the block-based CS of color images

The block-based CS (BCS) is very important for not only sampling cost reduction but also for effective MC [12]. Accordingly, a corresponding recovery method is in need. For color images, there is an effective recovery method called the smoothed 20 -norm minimization [20]. In addition, among the block-based recovery methods for grayscale images, BCS-SPL (Block Compressed Sensing with Smooth Projected Landweber Reconstruction) [8] has high recovery performance arising from its smoothness pursuit. This paper applied the recovery concept of those recoveries [8, 20] to the context of a block-based CS of color images.

Because applying the work reported by Nagesh and Li [20] directly to independent block-by-block recovery may degrade the quality of the recovered image considerably by generating discontinuity in the block boundaries [21-23], the same structure of BCS-SPL [8] was used in the present study to recover all the blocks in parallel with the help of a low pass filter. The low pass filter reduces the discontinuity in the block boundaries. Besides addressing that discontinuity, the smoothness is a very important property of image signals, not only in grayscale but also in all color images. Therefore, the low pass filter also helps in pursuing that important property. Similar to Mun and Fowler [8], a Wiener filter was applied with a window size of 3 3× as a low pass filter at the end of each iteration of the 20 -norm minimization [20]. Table 3 lists the improved recovery method for color images step by step. Details of the smoothed 20 -norm minimization are reported elsewhere [20].

Finally, in this paper, the proposed MC for color images was composed of four phases: sensing using Eq. (2); color space conversion using Eq. (8); coding and decoding as shown in section 2; recovery in Table 3; and color space conversion using Eq. (9) as shown in the flowchart (Fig. 4).

Table 2. Correlation coefficient ρ between the blocks in the measurement domain of the YCbCr images (B = 32, average over 5 simulations with a Gaussian random measurement matrix).

Lena Peppers Sub-rate Y Cb Cr Y Cb Cr 0.1 0.934 0.997 0.998 0.900 0.987 0.9860.2 0.942 0.997 0.998 0.909 0.988 0.9880.3 0.941 0.997 0.998 0.908 0.988 0.988


14

4. Experimental results

The proposed methods were verified with four 512 512× color images; Lena, Peppers, Mandrill, and Jet (Fig. 5). The block size of sensing was 32 32× , which is a tradeoff of the performance between MC and CS recovery. A uniform quantizer is used in MC with quantization steps of 4 for the RGB channels, and 4/8/8 respectively for the Y, Cb, and Cr channels. The measurement matrix has its entries following an i.i.d. Gaussian random distribution. Owing to the random nature of the measurement matrix, at each configuration of test, 5 simulations were performed and the average of the results was used for a more reliable performance evaluation.

As discussed in Section 2, the proposed method of MC with Huffman coding can be used either in the RGB or YCbCr space. Table 4 lists the schemes tested in this study, which are different from each other either by the color space to perform the directional MC (with Huffman coding) or by their recoveries. Because the sensing and coding parts are the same in Schemes 2 and 3, the bit-per-pixel numbers of them are identical.

(a) Coding performance of MC in YCbCr space Different encoding of the luma (Y) channel from the

chroma (Cb and Cr) channels helps save a large number of bits, while degrading the quality of the recovered images only slightly.

The percentage of saved bits was calculated using the following equation:

100%RGB YCbCr

RGB

bpp bppBitSaving

bpp−

= × , (10)

where RGBbpp and YCbCrbpp are the bpp (bits per pixel) of MC performed in RGB and YCbCr space, respectively. Note that the bpp counts all three color channels together.

Table 3. Improved CS recovery for color images with a Wiener filter.

Input: BΦ , transform matrix Ψ , y, minσ , maximum iteration imax, initial x0 Output: x Equivalent matrix: ( ) ( ),3 ,3BA diag diag= Φ Ψ

for all kth block: ( ) 0;,3k ks diag x= Ψ end Initiate σ while minσ σ> and k < kmax do for all kth block with its jth element

1. Calculate ( ) ( )( )2

2exp

T

k k ks s j s j⎡ ⎤Δ = −⎢ ⎥⎣ ⎦

2. Update *k k ks s sμ= − Δ

3. Project ( ) ( )1* *T Tk k k ks s A A A As s

−= − −

end of for perform Wiener filter with window [3,3] update σ , i = i + 1 end of while for all kth block: ( ),3 T

k kx diag s= Ψ end

Fig. 4. Flowchart of the propped coding scheme for CS of color images.

(a) Lena

(b) Peppers

(c) Mandrill (d) Jet

Fig. 5. Test images.

Table 4. Three test schemes.

Scheme Color space for MC Recovery

Scheme 1 RGB Block-by-block 20 -norm minimization [20] without a Wiener filter

Scheme 2 Same as Scheme 1

Scheme 3YCbCr The proposed improved CS

recovery in Section 3


15

Table 5 shows that the mean bpp of the Jet at all three test subrates was reduced from 3.942 to 2.387 when YCbCr is used instead of RGB space for MC (i.e., see Schemes 2 and 1). This means 39.190% of the bits were saved. In the case of Peppers, the bit saving was 26.779% (its bpp changes from 4.017 bits to 2.941 bits). Other images have 28.956% (Mandrill) and 34.111% (Lena) bits reduction.

(b) Performance of Huffman coding The performance of the proposed Huffman coding for

MC was evaluated in terms of the difference between the entropy of quantized residual measurement per pixel (denoted by epp) and number of bits actually used by Huffman coding per pixel (denoted by bpp) for all three color channels. This difference was calculated explicitly in an overhead percentage, ovhe, as:

100%bpp eppovheepp−

= × (11)

The ovhe in Table 6 shows that the proposed Huffman

coding has at most 3.837% of the rate overhead in the case of Lena with MC in YCbCr space at a subrate of 0.4

Table 5. Rate-distortion performance comparison (bpp vs. PSNR or FSIMc).

(bpp: bit per pixel (all color channels) including all bit overhead; BD-PSNR2-1 : Bjontegaard differences in PSNR [24] between Schemes2 and 1 (anchor); the "+" sign means Scheme 2 is better)

MC in RGB space MC in YCbCr space Comparison Scheme 1 Scheme 2 Scheme 3 subrate

bpp (A) PSNR FSIMc

bpp (B) PSNR

(C) FSIMc PSNR(D) FSIMc

Bit Saving (B vs. A)

(%)

BD-PSNR2-1

[dB]

(C-D)[dB]

Lena 0.1 1.499 18.867 0.784 0.986 15.862 0.779 23.248 0.887 34.223 7.3860.2 3.019 26.792 0.927 1.991 26.588 0.926 28.847 0.946 34.051 2.2590.3 4.518 28.933 0.958 2.977 28.531 0.957 30.307 0.964 34.108 1.7760.4 6.033 30.730 0.974 3.978 30.056 0.973 31.393 0.974 34.063 1.337avg. 3.767 26.331 0.911 2.483 25.260 0.909 28.449 0.943 34.111

+2.764

3.190Peppers

0.1 1.598 18.230 0.765 1.170 16.466 0.778 23.008 0.888 26.783 6.5420.2 3.220 26.331 0.913 2.358 26.045 0.911 28.793 0.95 26.770 2.7480.3 4.817 28.681 0.950 3.526 28.202 0.948 30.274 0.968 26.801 2.0720.4 6.431 30.288 0.968 4.71 29.586 0.967 31.226 0.978 26.761 1.64 avg. 4.017 25.883 0.899 2.941 25.075 0.901 28.325 0.946 26.779

+2.077

3.251Mandrill

0.1 1.682 16.100 0.739 1.194 14.598 0.728 18.661 0.788 29.013 4.0630.2 3.389 19.403 0.834 2.408 19.362 0.833 20.57 0.855 28.947 1.2080.3 5.077 20.369 0.877 3.613 20.305 0.876 21.282 0.888 28.836 0.9770.4 6.780 21.422 0.906 4.812 21.324 0.906 21.989 0.911 29.027 0.665avg. 4.232 19.324 0.839 3.007 18.897 0.834 20.626 0.861 28.956

+1.032

1.728Jet

0.1 1.545 13.939 0.654 0.944 14.302 0.697 21.777 0.834 38.900 7.4750.2 3.123 25.466 0.876 1.918 25.284 0.875 27.845 0.919 38.585 2.5610.3 4.669 28.324 0.925 2.853 27.928 0.924 29.756 0.946 38.895 1.8280.4 6.431 30.288 0.968 3.834 29.861 0.953 31.265 0.962 40.383 1.404avg. 3.942 24.504 0.856 2.387 24.344 0.862 27.661 0.915 39.190

+4.849

3.317

Table 6. Bit overhead (%) by Huffman coding.

(a) Measurement coding in RGB space

Lena Pepper subratebpp epp ovhe bpp epp ovhe

0.1 1.499 1.479 1.352 1.598 1.592 0.3770.2 3.019 2.975 1.479 3.220 3.202 0.5620.3 4.518 4.450 1.528 4.817 4.788 0.6060.4 6.033 5.941 1.549 6.431 6.393 0.594avg 1.477 0.535

(b) Measurement coding in YCbCr space

Lena Peppers subrate bpp epp ovhe bpp epp ovhe

0.1 0.986 0.954 3.354 1.170 1.149 1.8280.2 1.991 1.919 3.752 2.358 2.311 2.0340.3 2.977 2.868 3.801 3.526 3.453 2.1140.4 3.978 3.831 3.837 4.710 4.611 2.147avg 3.686 2.031


16

compared to its entropy value. (c) Reconstruction Performance in YCbCr space Although many bits can be saved by coding in the

YCbCr domain instead of RGB, the PSNR (averaged over all three channels) of the recovered images can decrease. Table 5 shows that on average, for the four test images of Lena, Peppers, Mandrill, and Jet (see the "avg." row of corresponding test images), the degradations (i.e., differences in PSNR between the performance of Scheme 2 and performance of Scheme 1) were 1.071 dB, 0.808 dB, 0.426 dB, and 0.160 dB, respectively.

(d) Rate-Distortion performance of MC in YCbCr

space Regarding both bit saving and PSNR degradation, the

BD-PSNR [24] was calculated to evaluate the effectiveness of Scheme 2 over Scheme 1 (anchor) in the overall rate-distortion sense. As shown in Table 5, the BD-PSNR between Schemes 2 and 1 (i.e., BD-PSNR2-1) was up to +4.849 dB for the image of the Jet; while still being +1.032 dB for a much detailed image of Mandrill. This shows that Scheme 2 is significantly better than Scheme 1 in terms of the overall rate-distortion performance.

From a quality assessment viewpoint, it is well known that the PSNR is not the best measure that matches with HVS quite well (i.e., in terms of quality that humans actually perceive). Among the many quality-measuring alternatives to PSNR, the feature similarity for color images (FSIMc) [25] showed its superiority in matching with the perceptual sense of viewers. Note that the bit saving of MC in YCbCr space in Scheme 2 arises from the configuration of large quantization steps for the chroma channels of Cb and Cr, which might not play the most significant role in the perceived quality. Therefore, the quality assessment of FSIMc was also measured. Table 5 also shows that despite the large number of saved bits (e.g., 39.190% saved bits in the case of the Jet image), FSIMc does not change much between Schemes 1 and 2. For the Lena image, the FSIMc only decreases from 0.911 (Scheme 1) to 0.909 (Scheme 2). Similarly, Mandrill shows only 0.005 degradation in the FSIMc, whereas for Jet and Peppers, FSIMc even increases by 0.006 and 0.002 from Schemes 1 to 2, respectively.

(e) Performance of the improved recovery Note that with an embedded Wiener filter, the recovery

pursues smoothness, which is a very important property of the image signal (irrespective of RGB or YCbCr spaces). This prior information integrated with a CS recovery method helps improve the quality of the recovered images, particularly for smooth images of Peppers and Jet. In Table 5, the gain of Scheme 3 over Scheme 2 was on average, 3.251 dB for the Peppers image and 3.317 dB for the Jet image. On the other hand, a smaller (average) gain by the improved recovery of 1.728 dB was observed in the case of the detailed image like Mandrill. For Lena, the gain was 3.190 dB using a similar calculation.

At very low subrates, such as 0.1 or 0.2, the conventional recovery method [20] cannot perform well for the following reasons: little information in such few

measurements; and those measurements even contain errors (quantization error). For example, at a subrate of 0.1, Nagesh and Li [20] could provide a recovered image quality of 15.8 dB for Lena, and 16.4 dB for Peppers. On the other hand, with the help of a low pass filter, recovery is associated with prior information of the smoothness of images, which is very helpful, where the recovery has less information on the images to be recovered at a low subrate. Therefore, the proposed recovery can provide much better quality, i.e., 23.2 dB and 23.0 dB for Lena and Peppers, which are 7.3 dB and 6.5 dB gains, respectively.

At a high subrate, recovery has much more information on the images to be recovered. Accordingly, prior information on the smoothness is not very important. The proposed recovery provides only 1.3dB and 1.6 dB gains for Lena and Peppers.

(f) Complexity analysis Here, the time consumption was tested using a

measurement encoder and the proposed recovery was assessed using a computer with a configuration of Intel Core i5-2500 3.30GHz, 4GB Ram, and Window 7 32-bit operating system.

Encoding complexity: the encoding time consumed by measurement encoding in RGB space was compared with that in YCbCr space. With a large quantization step for the chroma channels by coding in YCbCr space, the measurement coding consumes less time than that in RGB space, as shown in Table 7(a).

Recovery complexity: The time consumed by the conventional recovery [20] was compared with that of the proposed recovery. With support from prior information of smoothness, the proposed recovery has a greater chance to converge faster despite the additional process of a low pass filter. Accordingly, the time consumption of the proposed recovery is smaller than the recovery [20]. Table 7(b) presents an example of Lena.

5. Conclusion remarks

This paper proposed a measurement coding method equipped with Huffman coding for color images. This

Table 7. Time consumption for Lena, average over 5 simulations.

(a) Encoding complexity

subrate Color space 0.1 0.2 0.3 0.4 RGB 9.9 25.1 35.8 46.7

YCbCr 6.2 14.1 22.1 25.5

(b) Recovery complexity

subrate Recovery0.1 0.2 0.3 0.4

[20] 164.2 227.7 241.9 252.6 Proposedrecovery 154.3 185.9 234.5 227.0


17

method is novel in sensing in RGB space and encoding the measurements in YCbCr space. In this way, although the sensing part can be more widely usable because no changes are needed in the sensing part compared to conventional CS, a more efficient MC is possible because the compressed bit usage can be balanced better considering the visual importance between the luma and chroma channels. For a complete framework, the recovery for CS of the color images with a smoothness pursuit was also improved. The simulation results confirmed the effectiveness and efficiency of the proposed framework.

Acknowledgement

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (No. 2011-001-7578).

References [1] C. E. Shannon, "Communication in the presence of

noise", Proc. Institute of Radio Engineers, vol. 37(1), pp. 10–21, 1949. Article (CrossRef Link)

[2] H. Nyquist, "Certain topics in telegraph transmission theory," Trans. AIEE, vol. 47, pp. 617–644, 1928. Article (CrossRef Link)

[3] D. L. Donoho, "Compressive sensing," IEEE Trans. on Inform. Theory, vol. 52(4), pp. 1289-1306, Apr. 2006. Article (CrossRef Link)

[4] E. Candes and T. Tao, “Near-optimal signal recovery from random projections and universal encoding strategies,” IEEE Trans. Inf. Theory, vol. 52(12), pp. 5406-5425, 2006. Article (CrossRef Link)

[5] H. Lee, S. Park, S. Park, “Introduction to Compressive Sensing,” The magazine of IEEK, vol. 38(1), pp. 19-30, 2011. Article (CrossRef Link)

[6] Y. M. Cho, “Compressive Sensing - Mathematical Principles and Practical Implications," The magazine of IEEK, vol. 38(1), pp. 31-43, 2011. Article (CrossRef Link)

[7] L. Gan, “Block compressed sensing of natural images,” Proc. Intern. Conf. on Digital Signal Process., pp. 403-406, UK, 2007. Article (CrossRef Link)

[8] S. Mun and J. E. Fowler, “Block compressed sensing of image using directional transforms,” Proc. IEEE Intern. Conf. on Image Process. (ICIP), pp. 3021–3024, 2009. Article (CrossRef Link)

[9] J. Xu, J. Ma, D. Zhang, Y. D. Zhang, and S. Lin, “Improved total variation minimization method for compressive sensing by intra-prediction,” Signal Process., vol. 92(11), pp. 2614–2623, 2012. Article (CrossRef Link)

[10] J. Bigot, C. Boyer, and P. Weiss, "An analysis of block sampling strategies in compressed sensing," arXiv:1305.4446 [cs.IT], 2013. Article (CrossRef Link)

[11] S. Mun and J. E. Fowler, “DPCM for quantized block-based compressive sensing of images,” Proc.

European Signal Process. Conf., pp. 1424-1428, Romania, 2012. Article (CrossRef Link)

[12] K. Q. Dinh, H. J. Shim, and B. Jeon, "Measurement coding for compressive imaing based on structured measurement matrix," Proc. IEEE Intern. Conf. on Image Process. (ICIP), pp. 10-13, 2013. Article (CrossRef Link)

[13] J. Zhang, D. Zhao, and F. Jiang, "Spatially directional predictive coding for block-based compressive sensing of natural images," Proc. IEEE Intern. Conf. on Image Process. (ICIP), pp. 1021-1025, 2013. Article (CrossRef Link)

[14] W. B. Pennebaker and J. L. Mitchell, JPEG still image data compression standard, Van Nostrand Reinhold, 1993. Article (CrossRef Link)

[15] S. Shin, H. Go, H. Park, and B. Jeon, “DPCM-Based Image Pre-Analyzer and Quantization Method for Controlling the JPEG File Size,” IEEK for Conf., vol. 28(2), pp. 561-564, 2005. Article (CrossRef Link)

[16] G. J. Sullivan, WA. Redmond, J. Ohm, W. Han, and T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) standard," IEEE Trans. Circuits and System. for Video Technology, vol. 22(12), pp. 1649-1668, 2012. Article (CrossRef Link)

[17] K. N. Plataniotis and A. N. Venetsanopoulos, Color Image Processing and Applications, Springer, Berlin, 2000. Article (CrossRef Link)

[18] H. Ahn, H. Jeong, J. Ha, K. Kim, and B. Kang, “Improvement Structure of RGB to YCbCr Conversion block of ISP Platform in Mobile Phones,” Proc. Interm. Soc Design Conf., pp. 448-451. 2009. Article (CrossRef Link)

[19] A. M.D. Zahangir and H. J. Lee, “A comparative study of different color space for paddy disease segmentation," The Institute of Electronics Engineers of Korea - Signal Processing, vol. 3, pp. 90-98, 2001. Article (CrossRef Link)

[20] P. Nagesh and B. Li, “Compressive imaging of color images,” IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), pp. 1261-1264, 2009. Article (CrossRef Link)

[21] G. Coluccia, D. Valsesia, and E. Magli, "Smoothness-Constrained Image Recovery from Block-based Random Projections," Proc. IEEE Intern. Workshop on Multimedia Signal Process., pp. 129-134, Italia, 2013. Article (CrossRef Link)

[22] H. T. Kung and S. J. Tarsa, "Partitioned compressive sensing with neighbor-weighted decoding," Proc. IEEE Military Comm. Conf., pp. 149-156, USA, 2011. Article (CrossRef Link)

[23] K. Q. Dinh, H. J. Shim, and B. Jeon, "Deblocking filter for artifact reduction in distributed compressive video sensing," Proc. IEEE Visual Comm. and Image Process., pp. 1-5, USA, 2012. Article (CrossRef Link)

[24] G. Bjontegaard, Calculation of average PSNR differences between RD-curves, (VCEG-M33). Article (CrossRef Link)

[25] L. Zhang, L. Zhang, X. Mou, and D. Zhang, "FSIM: a feature similarity index for image quality assessment," IEEE Trans. Image Process., vol. 20(8),


18

pp. 2378-2386, 2011. Article (CrossRef Link) [26] C. Brites, J. Ascenso, and F. Pereira, “Studying

temporal correlation noise modeling for pixel based Wyner-Ziv video coding” Proc. of IEEE Int. Conf. on Image Processing, pp. 273-276, USA, 2006. Article (CrossRef Link)

Khanh Quoc Dinh received his B.S. degree in Electronics and Telecommu- nications from Hanoi University of Science and Technology, Hanoi, Vietnam, in 2010, a M.S. degree in Electrical and Computer Engineering from Sungkyunkwan University, Suwon, Korea, in 2012. He is currently

a Ph.D. candidate in the Digital Media Laboratory at Sungkyunkwan University. His research interests include video compression and compressive sensing.

Chien Van Trinh received his B.S. degree in Electronics and Telecommu- nications from Hanoi University of Science and Technology, Hanoi, Vietnam, in 2012. He is currently a Master’s student in the Digital Media Laboratory at Sungkyunkwan University. His research interest is compressive

sensing.

Viet Anh Nguyen received his B.S. degree in Electronics and Telecommu- nications from Hanoi University of Science and Technology, Hanoi, Vietnam, in 2011, M.S. degree in Electrical and Computer Engineering from Sungkyunkwan University, Suwon, Korea, in 2013. He is currently

a Ph.D. candidate in the Digital Media Laboratory at Sungkyunkwan University. His research interests include video compression, and compressive sensing.

Younghyeon Park received his B.S. degree in Electronics Electrical Engi- neering from Sungkyunkwan University, Suwon, Korea in 2011. He is currently a Ph.D. candidate in the Digital Media Laboratory at Sungkyunkwan University. His research interests include video compression and compressed sensing.

Byeungwoo Jeon received his BS degree in 1985 and an MS degree in 1987 from the Department of Elec-ronics Engineering, Seoul National University, Seoul, Korea. He received his PhD degree in 1992 from the School of Electrical Engineering at Purdue University, Indiana, United

States. From 1993 to 1997 he was in the Signal Processing Laboratory at Samsung Electronics in Korea, where he worked on video compression algorithms, designing digital broadcasting satellite receivers, and other MPEG-related research for multimedia applications. Since September 1997, he has been with the Faculty of the School of Information and Communication Engineering, Sungkyun-wan University, Korea, where he is currently a professor. His research interests include multimedia signal processing, video compression, statistical pattern recognition, and remote sensing.



19


Multi-Resolution Kronecker Compressive Sensing

Thuong Nguyen Canh1, Khanh Dinh Quoc2, and Byeungwoo Jeon3

Department of Electronic and Electrical Engineering, Sungkyunkwan University / Suwon, South Korea {ngcthuong, kqdinh, bjeon}@skku.edu

* Corresponding Author: Byeunwoo Jeon


* Short Paper

Abstract: From the perspective of reducing the sampling cost of color images at high resolution, block-based compressive sensing (CS) has attracted considerable attention as a promising alternative to conventional Nyquist/Shannon sampling. On the other hand, for storing/transmitting applications, CS requires a very efficient way of representing the measurement data in terms of data volume. This paper addresses this problem by developing a measurement-coding method with the proposed customized Huffman coding. In addition, by noting the difference in visual importance between the luma and chroma channels, this paper proposes measurement coding in YCbCr space rather than in conventional RGB color space for better rate allocation. Furthermore, as the proper use of the image property in pursuing smoothness improves the CS recovery, this paper proposes the integration of a low pass filter to the CS recovery of color images, which is the block-based

20 -norm minimization. The proposed coding scheme shows considerable gain compared to conventional measurement coding.

Keywords: Compressive sensing, Color image, Measurement coding, Color space conversion 1. Introduction

Recently, compressive sensing (CS) [1] has attracted considerable attention for its capability of simultaneous sampling and compression. CS allows, from a much smaller number of measurements, to reconstruct a signal by relying on the sparsity property of signals in some sparse domains (i.e., DCT, DWT, gradient domain...). On the other hand, in case of multidimensional signals (e.g., image or video), a frame-based CS has practical difficulties, such as high computational complexity or large memory requirements arising from the large number of measurements. In this regard, a block-based approach [2, 3] was developed but it missed the global characteristics of the images despite preserving the local ones. Duarte et al. introduced a Kronecker compressive sensing (KCS) scheme [4] that senses data in the frame-based manner but can reduce the complexity considerably using a Kronecker product.

A key challenge of CS towards practical applications is reducing the computational complexity of reconstruction. In general, the higher image resolution becomes, the larger computational complexity CS requires. A partial solution

for this is a multi-resolution sensing framework that senses multi-resolution measurements and reconstructs a low resolution (LR) image, but later a high resolution (HR) image is reconstructed using a powerful reconstruction supported by sufficient computational complexity. This scheme has an added feature of providing a fast preview for real-time compressive image/video with a low-cost reconstruction of a low resolution (LR) image [11-13]. This can also provide benefits to many image processing tasks, such as image classification, object detection, etc. with little sacrifice in accuracy. For example, the initial object detection can be obtained from a low cost reconstructed LR image/video, and then enhanced from a reconstructed HR image/video.

CS is unable to provide a high quality reconstruction if its subrate is too low. Because CS takes a much smaller number of measurements via random projection, it is easy to miss important signal features. The loss of some high frequency signal components brings in consequential suffering of heavy staircase artifacts. Therefore, CS recovery has difficulty in reconstructing accurate HR images at a very low subrate. Conventional image/video compression faces similar problems in achieving a very high compression ratio while dealing with a HR image.

Canh et al.: Multi-Resolution Kronecker Compressive Sensing

20

One possible remedy is employment of down-sampling of the image/video sequences before compression and using a super resolution (SR) technique to up-sample the decompressed signal at the decoder [8]. SR is an image processing technique that can generate a HR image from a single or a set of LR images [8]. This scheme can be used for quality-bitrate control of a reconstructed HR in spatially scalable image/video coding.

One of the widely used CS reconstruction methods is the total variation (TV) technique [5-7], which can achieve good CS recovery performance while preserving edges of images well. However, in case of a very low subrate, such as 0.05, it has very poor recovery performance, as depicted in Fig. 1(a). Therefore, for a given low subrate, instead of sensing the original resolution (called high resolution (HR)) image, it might be better if the same sensing is performed to its spatially down-sampled (called low resolution (LR)) image, and the HR image is generated by up-sampling the CS reconstructed LR image. The super resolution (SR) technique can be used for up-sampling.

Motivated by this, the aim of this study is to sense a LR image and to utilize SR to achieve better HR without increasing the number of measurements. Fig. 1 illustrates such a possibility. Fig. 1(a) shows a CS reconstructed original resolution image of 512x512 (i.e., HR image) at a low subrate of 0.05. Equivalent subrate of 0.2 (=0.05x4) is used to produce a LR image (256x256), and its reconstructed HR images are shown in Figs. 1(b) and (c) which are generated by up-sampling LR via a bi-cubic interpolation [16] (for the algorithm names in Fig. 1, see Table 1, which will be explained later). As shown in Figs. 1(b) and (c) in comparison with Fig. 1(a), the SR-assisted reconstructed HR image produces higher reconstruction quality.

This paper proposes a multi-resolution compressed sensing framework that allows images to be reconstructed at various resolutions. By exploiting the relationship between the measurements of LR and HR images, we propose a HR sensing matrix, which shares the same measurement vector with the sensing LR image. In addition, the proposed sensing scheme corresponds to the sensing spatially down-sampled image (i.e., LR image) at a high subrate. That is, it senses the LR image using the

same number of measurements originally associated with the HR image. A HR image is finally reconstructed from the LR image. The reconstructed images are refined further to remove staircase artifacts using a BM3D [9] filter through post-processing as reported in [10]. The proposed multi-resolution Kronecker compressive sensing scheme is simulated to verify its efficiency over the conventional KCS in both PSNR and perceptual quality.

The rest of this paper is organized as follows. Section II presents the background of the compressed sensing. The proposed multi-resolution sensing framework is delivered in section III. Numerical experiments are presented in section IV, and the paper is concluded in section V.

2. Background

This section first introduces the background of compressive sensing and then presents some related work.

2.1 Compressive Sensing An emerging signal processing technique, CS allows

acquiring signals with a much smaller sampling rate than the Shannon/Nyquist rate via a random projection. CS theory states that for a natural image,

2 1 ,nf ×∈R which is sparse in a selected domain specified by a sparsifying matrix, , ,f αΨ = Ψ it is possible to reduce its sampling cost by taking a much smaller number of measurements,

2 1 ,my ×∈R where ,y f= Φ and later reconstructing the signal f by solving the following optimization problem:

(1)

where 2 2 ,m n the notation

p⋅ denotes the norm-p

with p being normally set to either 0 or 1, and 2 2m n×Φ∈R

is a sensing matrix which satisfies the restricted isometry property [1].

To reduce complexity caused by a large size of sensing matrix in multi-dimensional signals, Duarte and Baraniuk

(a) TV[6]: 25.26dB (b) SRTV: 27.06dB (c) SRTV+BM3D2: 28.96dB

Fig. 1. Recovered Lena image (512x512) from compressive sensing with a subrate of 0.05 (see Table 1).


21

[4] presented Kronecker compressive sensing, which jointly models the sensing matrix for each signal dimension. For the 2D signal, ,n nF ×∈R the sensing matrix is given as ,R GΦ = ⊗ where ⊗ denotes the Kronecker product, R and T m nG ×∈R represent the sensing matrices for each dimension. Therefore, the CS measurement is rewritten as ,Y RFG= where ( )y vect Y= is a vectorized version of the measurement matrix .Y The measurement constraint is:

Under the framework of KCS, the optimization

problem formulated as a total variation (TV)-based CS recovery [5-7] can be solved for the reconstructed signal:

(2)

where ,μ γ are constant parameters and the total variation (an)isotropic for 2D discrete image are given as:

where x∇ and y∇ denote the gradient operators in the horizontal and vertical direction. Using the split Bregman technique [5], Eq. (2) can be solved more easily by replacing , , ,x x y yV F D F D= = ∇ = ∇ and adding parameters , ,x yB B and W as follows:

(3)

where λ and ν are constant parameters. Eq. (3) can be split further into sub-problems , , , ,x yF V D D which can be solved via eigen-decomposition and shrinkage function [6].

2.2 Related Work The problem of multi-resolution CS has attracted high-

level of attention recently. Park et al. presented a multi-scale framework for compressive video sensing [17], which can obtain LR and HR reconstructed images at the recovery side. On the other hand, this framework requires that compressive measurements are sampled at multiple scales for each video frame. Towards practical video compressive sensing, Baraniuk et al. proposed a dual scale sensing matrix (DSS) in CS-MUVI framework [12], which can generate an efficiently computable low-resolution video pre-view. To reduce computational complexity further, Goldstein et al. [13] proposed a new multi-resolution framework based on the STOne transform. In addition, the DSS was exploited further to provide spatially scalable compressive sensing [11]. These

algorithms were designed for a single pixel camera imaging system [14], in which the elements of the sensing matrices are chosen either +1 or -1 to achieve easier and fast implementation. In the present approach, the HR sensing matrix is created based on the LR sensing matrix, which can be generated arbitrarily, like any other sensing matrices, such as random Gaussian sensing matrix, etc.

While related works [11-13, 17] were proposed for a normal CS framework, the proposed sensing scheme was significantly different in that it was under the KCS framework [4]. Moreover, while the approach [11, 12] sacrifices the performance of the recovered LR to have multi-resolution capability, in contrast to this approach, the proposed method reconstructs LR at a high reconstruction quality and then uses the SR technique to improve the HR quality. An approach of using SR to improve performance for exploiting predictive coding in spatially scalable compressive imaging has been reported [11], but it senses the HR and LR measurements separately and reconstructs them independently. In contrast, the proposed algorithm in this paper shares the same measurements between HR and LR, and reconstructs the HR and LR images jointly.

3. Proposed Multi-Resolution Kronecker Compressive Sensing

In this section, we first present a relationship between sensing matrices of high and low resolution images, and outline the proposed multi-resolution sensing matrices. The proposed reconstruction will be given in the later part.

3.1 Multi-resolution CS Acquisition As mentioned in [13], the multi-resolution CS is

desirable for enabling a fast preview of an image/video. Unfortunately, the conventional KCS framework [4] does not support the multi-resolution measurements. In general, KCS measurements of the same image at different resolutions (HR and LR) can be obtained by the following:

(4) (5) where HRY and LRY denote the measurement matrices of the HR 2 2( )n n

HRF ×∈R and LR image ( )n nLRF ×∈R obtained

Fig. 2. Relationship between the HR and LR sensing matrices.


22

by the corresponding sensing matrices, ,LRR T m nLRG ×∈R

and ,HRR (2 ) ,T m nHRG ×∈R respectively.

To support the proposed multi-resolution measurement (that is, the same measurements shared by HR and LR), it is important to carefully design the sensing matrices so that the images can be reconstructed at different resolutions from the same set of measurements. In addition, it is better to design sensing matrices to be fully compatible with the conventional KCS without any modification of the sensing and recovering parts. Therefore, the relationship between the HR and LR sensing matrices should be investigated carefully. For that purpose, the LR and HR images are obtained via down-sampling operation as follows:

(6)

where SD is a down-sampling matrix, LRF is a bi-linear down-sampled version of ,HRF and the (.)T operator stands for a transpose operator. By setting ,LR HRY Y= the HR sensing matrix can be derived from the LR sensing one as follows:

(7)

Fig. 2 shows the relationship between the two sensing

matrices. Using this HR sensing matrix (called LSM), the LR image and HR image can be reconstructed with the same set of measurements. Note that the use of the HR sensing matrix in (7) is equivalent to the sensing LR image at a high subrate. Therefore, we discard the high frequency component (i.e., textures) at the sensing part. By keeping the same number of measurements, if a subrate of the sensing HR image is ,r then the subrate for the LR image is 22 .r× As a result, the proposed sensing matrix prefers a subrate smaller than 0.25 because the subrate of LR becomes 1. Otherwise, the additional measurements will be wasted.

3.2 Multi-Resolution CS Reconstruction Because HR and LR image sensing is designed to share

the same measurements, reconstructing the HR and LR images is straightforward using TV[6] without modification. This is the conventional sensing and recovery method (called TV in Table 1). On the other hand, in this paper, the LR image is reconstructed first from the measurements, ,LRY using the sensing matrices , ,LR LRR G and the SR technique, such as bi-cubic interpolation [16], is applied simply to the LR image to generate the HR image; this is denoted by the Super-Resolution-assisted

Total Variation reconstruction (SRTV) in Table 1. Thanks to the SR technique, we are able to obtain some details in the reconstructed HR image. The better SR algorithm is expected to show higher performance.

Both LR and HR images contain significant staircase artifacts. This drawback was overcome by post-processing [10], which implements the BM3D [9]. By reconstructing the residual image by iterative filtering, the staircase artifact can be removed effectively due to the structure-preserving properties of the BM3D filter. The details of the algorithm are presented in Table 2. The structural similarity SSIM [15] between the two consecutive iterations is selected as the stopping because the aim was to preserve the nonlocal structure. The BM3D post-processing scheme was used after reconstructing the LR image, and then apply the SR technique was then applied; the algorithm is called SRTV+BM3D. Moreover, because the HR sensing matrices ( , )HR HRR G can be obtained using (7), the reconstructed HR image of (SRTV+BM3D) can be refined further by BM3D post processing. This scheme is called SRTV+BM3D2 in Table 1. By iteratively removing the staircase artifact in both the low and high resolution images, the proposed SRTV+BM3D2 is expected to achieve the highest reconstruction performance.

4. Experimental Results

In this section, the effectiveness of the proposed idea of

Table 1. Description of the reconstruction algorithms.

Algorithm Descriptions

TV[6] Conventional TV recovery [6] based on KCS.

SRTV* SR-assisted TV reconstruction: recover LR by TV [6], then use SR to obtain HR.

SRTV+BM3D* Post-process recovered LR by BM3D beforeusing SR to obtain HR.

SRTV+BM3D2*Dual BM3D post-processing for SRTV: after recovery by SRTV+BM3D, apply post filter BM3D again to HR

Proposed (*)

Table 2. Description of the post processing algorithm [10].


23

SRTV, and its variants, SRTV+BM3D and SRTV+BM3D2 are validated by comparing the objective and subjective performance with TV [6], as listed in Table 1.

4.1 Parameter Setting for the Experiment For parameter setting for a original (high) solution

image of 2 2 , ( 256),n n n× = the Kronecker compressive sensing measurements was obtained by HRR and HRG with

a size, 2 2 ,n r n⎡ ⎤×⎢ ⎥ where r denotes an intended subrate

of the HR image and ⋅⎡ ⎤⎢ ⎥ stands for ceiling operator. The HR compressive measurements were the same as the LR compressive measurements generated from the Gaussian matrices using the proposed approach, LRR and ,LRG at

subrate 4r with a size of 2 .n r n⎡ ⎤ ×⎢ ⎥ The reconstruction

parameters were set up as 0.5, 0.05, 1λ ν μ= = = for all the recovery algorithms and residual reconstructions as well. The stopping criteria for TV reconstruction is

54 10−× and for BM3D post-processing was 0.002,tol < and 10.σ = All results were obtained by averaging five simulations.

Table 3 compares the values of PSNR and the structure similarity index (SSIM) [16] with test images of size 512×512 at various subrates from 0.05 to 0.025. Fig. 3 presents all the test images. The proposed algorithm was compared with TVAL3 [18] with a block size of 64 and BCS-SPL-DDWT with a block size of 32 because the frame-based CS could not be used due to out of memory problems. The experimental environment was a computer with an Intel(R) Core(TM) i5 (3.3GHz) and 4G memory, running Windows 7 and Matlab 2012b.

Table 3. Performance comparison of various algorithms in PSNR and SSIM.

BCS-SPL[2] TVAL3[18] TV[6] SRTV* SRTV+BM3D* SRTV+BM3D2*Image Sub rate PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM0.05 25.38 0.739 22.78 0.616 25.23 0.705 27.06 0.756 28.10 0.786 28.87 0.7770.10 28.03 0.804 25.80 0.706 28.06 0.770 30.14 0.829 30.88 0.846 32.21 0.8380.15 29.89 0.842 28.08 0.773 29.71 0.810 32.14 0.874 32.42 0.876 34.06 0.8730.20 31.35 0.864 29.73 0.820 31.06 0.839 33.55 0.903 33.31 0.892 35.15 0.897

Lena

0.25 32.46 0.887 31.21 0.854 32.19 0.860 34.45 0.924 33.78 0.901 35.77 0.9140.05 21.58 0.580 20.46 0.487 20.85 0.498 22.42 0.582 22.99 0.619 22.94 0.7400.10 22.57 0.640 21.98 0.562 22.30 0.555 23.84 0.665 24.43 0.707 24.64 0.8390.15 23.37 0.685 22.86 0.818 23.18 0.597 24.64 0.725 25.01 0.756 25.39 0.8990.20 24.16 0.722 23.53 0.666 23.97 0.638 25.20 0.775 25.25 0.779 25.62 0.929

Barbara

0.25 24.90 0.755 24.12 0.706 24.90 0.673 25.36 0.809 25.36 0.790 25.69 0.9440.05 25.66 0.736 22.01 0.585 25.25 0.707 26.79 0.761 27.79 0.787 28.64 0.8150.10 28.96 0.796 25.40 0.675 28.59 0.771 29.50 0.825 30.06 0.833 31.10 0.8680.15 30.78 0.827 27.86 0.742 30.41 0.804 30.69 0.856 30.88 0.852 31.97 0.8910.20 32.02 0.848 29.83 0.791 31.85 0.805 31.43 0.875 31.30 0.863 32.42 0.906

Peppers

0.25 32.95 0.864 31.57 0.829 32.91 0.849 31.86 0.894 31.52 0.869 32.68 0.9170.05 23.03 0.750 21.88 0.634 24.93 0.747 26.59 0.794 27.88 0.826 28.83 0.9100.10 25.97 0.826 25.16 0.754 28.11 0.822 30.38 0.880 31.41 0.896 33.03 0.9530.15 28.41 0.873 27.51 0.823 30.33 0.865 32.97 0.928 33.34 0.928 35.36 0.9690.20 30.43 0.904 29.52 0.873 32.08 0.895 34.83 0.956 34.61 0.947 37.01 0.980

Camera-man

0.25 32.12 0.925 31.08 0.903 33.48 0.913 36.40 0.975 35.39 0.957 38.15 0.9880.05 24.27 0.598 23.18 0.535 23.99 0.559 25.38 0.612 25.77 0.623 26.06 0.6620.10 26.91 0.676 25.67 0.630 26.12 0.642 27.78 0.712 28.02 0.724 28.62 0.7600.15 28.04 0.726 27.16 0.693 27.34 0.694 29.46 0.780 29.47 0.770 30.29 0.8160.20 28.90 0.761 28.39 0.744 28.38 0.738 30.75 0.831 30.48 0.822 31.45 0.854

Goldhill

0.25 29.67 0.792 29.37 0.782 29.29 0.772 31.64 0.865 31.11 0.840 32.19 0.8810.05 22.97 0.586 21.46 0.515 22.53 0.547 23.98 0.604 24.39 0.621 24.74 0.6420.10 25.25 0.665 24.02 0.613 24.75 0.628 26.49 0.708 26.94 0.724 27.72 0.7200.15 26.65 0.717 25.63 0.680 26.16 0.684 28.26 0.781 28.52 0.786 29.59 0.7760.20 27.77 0.756 26.98 0.731 27.40 0.729 29.48 0.832 29.43 0.822 30.59 0.811

Boats

0.25 28.68 0.787 28.22 0.773 28.46 0.766 30.21 0.862 29.87 0.840 31.04 0.829Average 27.44 0.764 26.08 0.710 27.46 0.729 29.12 0.806 29.32 0.809 30.39 0.853

Proposed(*)


24

4.2 Results and Discussions As shown in Table 3, the SRTV, SRTV+BM3D and

SRTV+BM3D2 algorithms outperformed the conventional method, TV [6], BCS-SPL-DDWT [2], and TVAL3 [18]. Moreover, the SRTV+BM3D2 showed the best performance in most cases. The results show that the proposed idea of sensing the LR with a high subrate shows better performance than the conventional CS. Even by applying a simple SR (e.g., bi-cubic) to the reconstructed LR image in the SRTV algorithm, the 1.5dB gain on average over the conventional KCS employing TV was still achieved [6]. With the structure preserving property of BM3D post-processing to the LR image, the SRTV+BM3D gives an additional gain of 0.3dB. By building the HR sensing matrix as in (7), it is possible to doubly apply BM3D post-processing to both LR and HR images in the proposed SRTV+BM3D2, which can offer up to 2.7dB gain (in the case of Cameraman image at subrate 0.25) over the single application of the BM3D post-filter in SRTV+BM3D. In particular, the proposed SRTV+BM3D2 algorithm demonstrated the best PSNR performance in most test images with a mean gain of 2.94dB and 1dB over the TV [6] and SRTV+BM3D, respectively.

Because only a LR image was measured and a simple SR technique (e.g., bi-cubic) was used to generate the HR image, the HR image often suffers from loss of image details or texture (i.e., due to loss of high frequency components). In addition, the SR technique was reported to generate a smooth HR image and is quite effective in up-sampling a smoothed image [8]. Therefore, the proposed method performs the best with very smooth images, such as Peppers, Cameraman, and Goldhill. The conventional KCS is unable to capture and recover high frequency components well (e.g., edges or details) if it senses a very small number of measurements (i.e., at a very low subrate). Therefore, the reconstructed image of the conventional CS at a very low subrate loses the high frequency components and produces a very low quality reconstructed image, as already visualized in Fig. 1.

Therefore, some high frequency information was discarded using the proposed HR sensing matrix, which corresponds to the sense LR at a high subrate. Because the CS reconstruction works well at a high subrate and the SR technique can provide some level of detail, even without post-processing in the SRTV algorithm, it can still improve the performance in complex textured images, such as the Lena and Boats images. Obviously, higher performance can be achieved with SRTV+BM3D and SRTV+BM3D2

by exploiting the structures of the image via BM3D. The proposed algorithm can reconstruct both LR and HR images with high quality, as shown in Fig. 4 (Cameraman image at subrate 0.1).

Table 4 lists the reconstruction time of the LR and HR images using various reconstruction algorithms. The reconstructed LR images could be obtained within approximately 3 seconds (in SRTV), 14 seconds (in SRTV+BM3D and SRTV+BM3D2) with and without BM3D processing. In addition, the computational complexity of the reconstructing HR images ranged from 4 secs in SRTV to 16 sec in SRTV+BM3D, and 85 sec in

SRTV+BM3D including the LR reconstruction time. Therefore, an appropriate reconstruction method can be chosen based on the computational capability of the decoder. As shown in Table 5, a very high performance LR image can be obtained using the proposed framework.

In addition, the same conclusion can be drawn in terms

Table 4. Reconstruction time (sec) of the LR and HR Lena image at a subrate 0.05 and 0.15.

HR Reconstruction LR ReconstructionAlgorithm 0.05 0.15 0.05 0.15

BCS-SPL[2] 51.65 24.98 - - TVAL3[18] 18.36 37.43 - -

TV[6] 38.40 26.02 - - SRTV* 3.27 3.08 3.26 3.07

SRTV+BM3D* 15.02 14.83 14.99 14.82 SRTV+BM3D2* 85.12 85.70 15.08 14.96

Proposed(*)

Table 5. Performance of the LR image with/without post processing BM3D in PSNR (dB).

Subrate Algorithm 0.05 0.10 0.15 0.20 0.25

w/o BM3D 28.32 33.01 37.32 42.57 64.78Lena with BM3D 30.08 34.65 38.63 43.24 52.38w/o BM3D 24.81 27.94 31.35 36.81 57.56Barbara with BM3D 26.03 30.33 35.07 40.47 51.39w/o BM3D 29.00 34.36 38.66 43.54 65.86Peppers with BM3D 31.32 36.16 39.72 43.82 52.46w/o BM3D 28.11 33.59 37.96 42.81 63.99Cameramanwith BM3D 29.73 34.73 38.52 42.92 52.67w/o BM3D 26.73 30.49 34.15 38.92 64.43Goldhill with BM3D 27.46 31.13 34.46 39.09 50.29w/o BM3D 25.58 29.90 34.25 39.91 62.57Boats with BM3D 26.24 30.61 34.89 40.06 50.81

woBM3D (SRTV), wBM3D (SRTV+BM3D and SRTV+BM3D2)

Fig. 3. From Left to right and up to down are Lena, Barbara, Peppers, Cameramen, Goldhill and Boats test images of size 512x512.


25

of the visual quality performance of the reconstruction algorithms, as depicted in Fig. 5 for the Lena image. Obviously, the proposed algorithm produced the best visual quality with clearer textured regions (e.g., see Lena’s hair and her hat) and fewer staircase artifacts. However, the reconstructed HR loses some high frequency information (e.g., see the details of Lena’s hat, and blurring in Lena’s hair region). Moreover, the simple SR technique, such as like bi-cubic interpolation would also smooth the edges in the HR image. Therefore, some high frequency information, such as edges and detail, are lost in the reconstructed HR image. In general, higher recovery

performance of the HR image is expected by the better performing SR method. Nevertheless, the task of better preserving the details will be undertaken in future work.

5. Conclusion

This paper proposed a novel multi-resolution Kronecker compressive sensing that allowed simple spatially scalable compressive imaging. The proposed scheme not only provided a high quality low resolution image but also significantly improved the reconstruction

Low resolution image (256x256) High resolution image (512x512)

Fig. 4. Reconstructed image by SRTV+BM3D2 at subrate 0.1

Original

TV[6], 28.06dB

SRTV, 30.15dB

SRTV+BM3D, 30.88dB SRTV+BM3D2, 32.31dB

Fig. 5. Visual quality comparison of several reconstruction algorithms at subrate 0.1


26

performance of the high resolution image, particularly with a small number of measurements. A future study will extend this sampling scheme further to obtain a scalable compressive sensing framework and exploit predictive coding between the base layers and enhancement layers.

Acknowledgement

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (No. 2011-001-7578)

References

[1] D. Donoho, “Compressed sensing,” IEEE Trans. Info.

Theory, vol. 52, no. 4, pp. 1289–1306, 2006. Article (CrossRef Link)

[2] S. Mun and E. Fowler, “Block compressed sensing of images using directional transforms,” in Proc. IEEE Intern. Conf. on Image Process.(ICIP), pp. 3021-3024, USA, 2009. Article (CrossRef Link)

[3] K. Q. Dinh, H. J. Shim, and B. Jeon, “Measurement coding for compressive Imaging based on structured measuremnet matrix,” in Proc. IEEE Intern. Conf. on Image Process.(ICIP), pp. 10-13, 2013. Article (CrossRef Link)

[4] M. Duarte and R. Baraniuk, “Kronecker compressive sensing,” IEEE Trans. Image Process., vol.21, no.2, pp. 494–504, 2012. Article (CrossRef Link)

[5] T. Goldstein and S. Osher, “The split Bregman method for L1 regularized problems,” SIAM J. on Imaging Sci., vol. 2, no. 2, pp. 323-343, 2009. Article (CrossRef Link)

[6] S. Shishkin, H.Wang, and G. Hagen, “Total variation minimization with separable sensing operator,” in Proc. Conf. on Image and Signal Process.(ICISP), pp. 86–93, 2010. Article (CrossRef Link)

[7] T. N. Canh, K. Q. Dinh and B. Jeon, “Total variation for Kronecker compressive sensing with new regularization,” in Proc. Pic. Coding Symp.(PCS), pp. 261-264. 2013. Article (CrossRef Link)

[8] A. K. Katsagellos, R. Mollina, and J. Mateos, Super-Resolution of Images and Video, Morgan & Claypool, 2007. Article (CrossRef Link)

[9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080-2095, 2007. Article (CrossRef Link)

[10] Y. Kim, H. Oh, and A. Bilgin, “Video compressed sensing using iterative self-similarity modeling and residual reconstruction,” J. of Electron. Imaging, vol. 22. no.2, pp. 021005, 2013. Article (CrossRef Link)

[11] D. Valseia and E. Magli, “Spatially scalable compressed image sensing with hybrid transform and inter-layer prediction model,” in IEEE Inter. Workshop on Multimedia Signal Process.(MMSP), pp. 373-378, 2013. Article (CrossRef Link)

[12] A. Sankaranarayanan, C. Studer, and R. Baraniuk,

“CS-MUVI: Video compressive sensing for spatial-multiplexing cameras,” in IEEE Inter. Conf. Computational Photography (ICCP), pp. 1-10, Apr. 2012. Article (CrossRef Link)

[13] T. Goldstein, L. Xu, K. F. Kelly, and R. G. Baraniuk, “The STONE transform: multi-resolution image enhancement and real-time compressive video,” Available at Arxiv.org (arXiv:1311.3405), 2013. Article (CrossRef Link)

[14] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag., vol. 25, pp. 83–91, 2008. Article (CrossRef Link)

[15] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error measurement to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600-612, 2004. Article (CrossRef Link)

[16] R. G. Keys, “Cubic Convolution Interpolation for Digital Image Proceeding,” IEEE Transactions on Acoustics, Speech, Signal Process., vol. 29, no. 6, pp. 1153-1160, 1981. Article (CrossRef Link)

[17] J. Y. Park and M. B. Wakin, “A multiscale framework for compressive sensing of video,” in Proc. of Pict. Coding Symp.(PCS), pp. 1-4, 2009. Article (CrossRef Link)

[18] C. Li, W. Yin and Y. Zhang, “An efficient augmented Lagragian method with applications to total variation minimization,” in Comput. Optimization and Application, vol. 56, not. 3, pp. 507-530, 2013. Article (CrossRef Link)

Thuong Nguyen Canh received his B.S. in Electronics and Telecommuni- cations from Hanoi University of Science and Technology, Hanoi, Vietnam in 2012. He is currently a master’s student in Electrical and Computer Engineering at Sungkyun- kwan University, Suwon, Korea. His

current research involves image/video compression and compressed sensing.

Khanh Quoc Dinh received his B.S. degree in Electronics and Telecommu- nications from the Hanoi University of Science and Technology, Hanoi, Vietnam, in 2010, the B.S. degree in Electrical and Computer Engineering from Sungkyunkwan University, Suwon, Korea, in 2012. He is currently a Ph.D.

candidate in the Digital Media Laboratory at Sungkyun- kwan University. His research interests include video compression and compressive sensing.


27

Byeungwoo Jeon received his BS degree in 1985 and an MS degree in 1987 from the Department of Electronics Engineering, Seoul National Univer- sity, Seoul, Korea. He received his PhD degree in 1992 from the School of Electrical Engineering at Purdue University, Indiana, United States.

From 1993 to 1997 he was in the Signal Processing Laboratory at Samsung Electronics in Korea, where he worked on video compression algorithms, designing digital broadcasting satellite receivers, and other MPEG-related research for multimedia applications. Since September 1997, he has been with the faculty of the School of Electronic and Electrical Engineering, Sungkyunkwan University, Korea, where he is currently a professor. His research interests include multimedia signal processing, video compression, statistical pattern recognition, and remote sensing.



28


Study on Changes in Shape of Denatured Area in Skull-mimicking Materials Using Focused Ultrasound Sonication

JeongHwa Min1, JuYoung Kim2, HyunDu Jung2, JaeYoung Kim2, SiCheol Noh3, and HeungHo Choi1,2

1 Department of Medical Imaging Science, Inje University / Gimhae, 621-749, Korea [email protected] 2 Department of Biomedical Engineering, Inje University / Gimhae, 621-749, Korea

{kjy96, flowerdu87, kjy10}@bse.inje.ac.kr, [email protected] 3 Department of Radiological Science, International University of Korea / Jinju, 660-759, Korea nscblue@iuk,ac,kr * Corresponding Author: HeungHo Choi


* Extended from a Conference: Preliminary results of this paper were presented at the ICEIC 2014. This paper has been accepted by the editorial board through the regular reviewing process that confirms the original contribution.

Abstract: Recently, ultrasound therapy has become a new and effective treatment for many brain diseases. Therefore, skull-mimicking phantoms have been developed to simulate the skull and brain tissue of a human and allow further research into ultrasound therapy. In this study, the suitability of various skull-mimicking materials(HDPE, POM C, Acrylic) for studies of brain-tumor treatments was evaluated using focused ultrasound. The acoustic properties of three synthetic resins were measured. The skull-mimicking materials were then combined with an egg white phantom to observe the differences in the ultrasound beam distortion according to the type of material. High-intensity polyethylene was found to be suitable as a skull-mimicking phantom because it had acoustic properties and a denatured-area shape that was close to those of the skull,. In this study, a skull-mimicking phantom with a multi-layer structure was produced after evaluating several skull-mimicking materials. This made it possible to predict the denaturation in a skull in relation to focused ultrasound. The development of a therapeutic protocol for a range of brain diseases will be useful in the future.

Keywords: Focused ultrasound, Skull-mimicking material, Acoustic properties 1. Introduction

Ultrasound has been used in surgical therapeutic procedures for tumors associated with prostate cancer, liver cancer, pancreatic cancer, rectal cancer, kidney cancer, and breast cancer. In the case of brain tumors, however, ultrasound is strongly attenuated by the skull, and brain damage near the skull can result from the high temperatures caused by energy loss in the skull. An ultrasound beam can also be distorted by high the sound speeds and variations in thickness [1]. In addition, the technique of removing a piece of skull prior to sonication carries a high risk of complications. On the other hand, the recent development of new phased arrays and compensation methods for computed tomography (CT) imaging has enabled effective energy transfer through an

intact skull [2] without destroying the organization outside. Therefore, the utility of non-invasive ultrasound surgery for the treatment of areas deep inside the brain has attracted considerable attention. In addition, brain tumor treatment is now possible using high-intensity-pulsed ultrasound. Moreover, ultrasound is for overcoming the blood–brain barrier, activation for drug delivery, and the treatment of functional disorders. Therefore, the possibility of treating brain diseases with ultrasound has increased, and noninvasive focused ultrasound for treating brain disorders is attracting increasing attention. If the market for brain treatments involving the use of ultrasound expands, the development of human head phantoms will become one of the main goals in the field of therapeutic ultrasound.

Hard tissues include cortical bone, trabecular bone, and


29

dental hard tissues. The skull contains both trabecular bone and cortical bone. Therefore, it is important to determine the appropriate ranges for the acoustic properties to use in therapeutic ultrasound because cortical bone has a relatively higher sound velocity and attenuation coefficient. Cortical bone has a relatively isotropic and dense structure. Therefore, epoxy resin, acrylic, carbon fiber plastic, etc. can be used as mimic materials [3, 4]. Before clinical treatment, many studies have been conducted to evaluate the heating characteristics of focused ultrasound with the aim of developing various brain modeling and focused ultrasound techniques. On the other hand, there have been few studies on a comprehensive skull mimicking phantom to perform a performance evaluation and establish a treatment protocol using actual ultrasound therapy equipment. Considering the potential of the therapeutic ultrasound technology and the increase in the number of patients receiving treatment, a biological assessment has an influence on the performance. Therefore, for this evaluation, a quantitative phantom is urgently needed. In this study, to manufacture a skull-mimicking phantom, this study examined the acoustic and physical properties of the skull, and investigated the appropriate skull mimic materials. The suitability of three synthetic resins (HDPE, acrylic resin, POM C) as skull-mimicking materials was evaluated by measuring their acoustic properties. In addition, the thermal denaturation at the rear of each skull-mimicking material was observed using a tissue-mimicking material (TMM) phantom based on egg whites.

2. Materials and Methods

2.1 Skull-mimicking Materials The skull contains both trabecular bone and cortical

bone. Therefore, a skull mimic material must have similar acoustic properties and thermal properties to the hard tissues (cortical bone, trabecular bone, dental hard tissues) found in the skull. The mimic material should be able to withstand long-term use in repeated experiments. In this study, the physical properties and acoustic properties of various polymeric synthetic resins were considered to develop a skull-mimicking phantom for the treatment of brain disease. On the other hand, the shape and structural characteristics were not considered.

Fig. 1 shows the macromolecule synthetic resin specimens used in this study, and Table 1 lists the physical properties of the specimens. Polyacetal (POM C) is used mainly for orthopedic implants and artificial hip joints.

High-density polyethylene (HDPE) has been used as a substitute material for bone in many studies [5, 6]. Acrylic resin is used as a simple alternative bone material.

The general features of the specimens used in this study are as follows:

① POM C: The material is extruded from a

homopolymer acetal material or copolymer. Therefore, it has high crystallinity, fatigue resistance, rub resistance, and machinability. ② HDPE: The hardness is excellent even if the

temperature rises rapidly. HDPE also has excellent chemical resistance and processability, and is avirulent. ③ Acrylic resin: The material has a high colorless

transparency, as well as good workability and formability.

2.2 Evaluation of Acoustic Properties Three skull mimic materials were used in this study:

POM C, HDPE, and acrylic resin. Each of the samples had a width of 50 mm, length of 50 mm, and thickness of 6 mm. To measure the acoustic properties, a single 3.5-MHz transducer was used to send and receive the ultrasonic signals. A digital oscilloscope (Wave runner 6100 model, Lecroy Co., USA) was used to acquire and save the signal.

A function generator was used as an ultrasonic pulser/receiver (MKPR-1030 model, MKC KOREA Co., Korea). The sound velocity was measured using a pitch-catch method, instead of the echo-range method often used in non-destructive testing (Figs. 2 and 3). A single 1-MHz oscillator was used to generate an ultrasound pulse, and the signal was received using another single oscillator. The

Fig. 1. Skull-mimicking materials: from left, HDPE, POM C, acrylic resin.

Table 1. Properties of the materials

Components HDPE POM C

Acrylic resin

ABS resin

Density (g/㎤) 0.94 1.41 1.19 1.04Melting point

(℃) 120 165 - -

Heat conductivity (W/(K·m))

- 0.31 - 0.25Thermal properties

Heat deflectiontemperature

(℃) 47 105 96 97

Tensile strength (MPa) 23 68 72 41

Mechanical properties

1/2/5% deflection

compression strength (MPa)

29 19/35/67 - -

Min et al.: Study on Changes in Shape of Denatured Area in Skull-mimicking Materials Using Focused Ultrasound Sonication

30

attenuation coefficient was measured using spectral finite difference schemes. In the spectral difference method, the amplitude of the reflected signal was obtained in the frequency domain using a fast Fourier transform (FFT). After connecting the peak points of the center frequencies of the signals and obtaining a slope, the attenuation coefficient was determined (Figs. 4 and 5). The density was calculated using the mass and volume of the material, and the acoustic impedance was calculated using the sound speed and density.

2.3 Observation of the Changes in Focusing Using an Egg White Phantom

The status distortions and propagation shapes at the rear sides of the skull mimicking materials during focused ultrasound exposure were compared. The egg white

Fig. 2. Schematic diagram of pitch-catch method.

Fig. 3. Sound velocity calculation using the pitch-catch method.

Fig. 4. Schematic diagram of the echo-range method.

Fig. 5. Reflected signal for the attenuation coefficient calculation using the echo-range method; in water(up), in phantom(down).


31

phantom, which had similar thermal and electrical properties to biological tissue, could effectively visualize the increases in temperature. A skull-mimicking phantom and a 30% egg white phantom were used. The center frequency of the transducer (H-101, Sonic Concepts Co.) was 1.1 MHz, and 60-s sonications were delivered with acoustic intensities between 20 and 90 with 3.5 Vp-p. The levels of thermal denaturation were compared in accordance with the presence or absence of a material similar to the skull. The focal point was formed within the egg-white phantom to compare the denaturation sizes and sites. A digital camera (D90 from Nikon, Japan) was also used to perform real-time recording and observe the time of the initial denaturation. Fig. 6 shows the denaturation observation experiment.

3. Results

3.1 Evaluation of the Acoustic Properties In this study, the acoustic properties of three synthetic

resins were measured to evaluate their suitability as a skull-mimicking material. Table 2 lists the acoustic properties of the skull-mimicking materials. The sound velocity was used to emulate the structure of a real human skull, which is 2,700-4,300 m/s. The attenuation coefficient was 4.6–12.5 dB/cm-MHz, and the acoustic impedance was 7.8 × 106kg/㎡/s, as reported elsewhere [7]. In this study, among the skull-mimicking materials, the sound speed was highest for HDPE (2,713 m/s). The sound speed was the lowest for POM C. The speed for POM-C was generally lower than those reported in the literature. The attenuation coefficient was highest for POM C (7.239 dB/cm-MHz), which was similar to the attenuation coefficient of the human skull. The density and acoustic impedance were also highest. The acrylic resin had a similar attenuation coefficient to that of a skull, but that of HDPE was not included in the range. The acoustic impedances of all the materials tested showed no significant differences.

3.2 Observation of Changes in Focusing by Using Egg White Phantom

This study compared the shapes of the areas of denaturation at the rear sides of three synthetic resins during focused ultrasound exposure. A single egg phantom not combined with the skull phantom was used as the control group. The denaturation changed from an oval shape to a tadpole shape after increasing the intensity (20-90 W/㎠). In addition, the head and tail lengths of the denaturation increased. In the case of POM C, cigar- and oval-shaped denaturation shapes were generally observed, and no consistent pattern occurred as the intensity increased. The denaturation area observed for HDPE had a cigar shape for 20–30 W/㎠ and a tadpole shape for 40–90 W/㎠. HDPE appeared to be most similar to the control group, with a delay in the first degeneration formation time. Therefore, this skull-mimicking material appears to be suitable. The acrylic resin was excluded as a candidate skull-mimicking material because of surface deformation caused by repeated exposure. Fig. 7(a) shows the changes in the denaturation shape of the egg-white phantom of the control group according to the intensity. Fig. 7(b) shows how an intensity increase changed the denaturation of the egg-white phantom at the back of the POM C. Fig. 7(c) shows how an increase in intensity altered the denaturation of the egg-white phantom at the back of the HDPE. Fig. 8 presents the denaturation form of a phantom structure.

Figs. 9 and 10 show the distance from the surface, and the lateral length of denaturation according to the intensity, respectively. The distance from the surface decreased gradually and the lateral length increased with increasing intensity. These results suggest that when the site of the focal point is close to the transducer, the energy transferred to the focal point increases with increasing acoustic intensity.

4. Discussion and Conclusion

This study examined the acoustic and physical properties of the skull reported in the literature, and proposed suitable synthetic skull-mimicking materials. The thermal resistance of the skull-mimicking materials was evaluated using ultrasound. Acrylic resin was excluded as a candidate skull-mimicking material due to surface

Fig. 6. Experimental set up for denaturation observation.

Table 2. Acoustic properties of the skull-mimicking materials.

Acoustic parameters *Skull HDPE POM C Acrylic

Sound velocity (m/s) 2,740 - 4,300 2,713.1 2,485.6 2,549.7

Attenuation coefficient

(dB/cm-MHz)

4.6 - 12.5 2.948 7.239 4.308

Density (g/㎤) - 0.958 1.407 1.214Acoustic impedance

(×106 kg/㎡/s) 7.8 2.581 3.481 3.096


32

(a) Control

(b) POM C

(c) HDPE

Fig. 7. Denaturation of an egg white phantom (a) Control, (b) POM C, (c) HDPE.


33

deformation caused by repeated exposure. In an evaluation of the acoustic properties, POM C was found to have the highest attenuation coefficient among the skull mimic materials, but the degeneration was difficult to verify when increasing the intensity. Therefore, it was also excluded from the skull-mimicking materials. HDPE had a relatively

high attenuation coefficient, sound velocity, and acoustic impedance. Moreover, considering that the density of HDPE was most similar to that of the skull, it appeared to be a suitable skull-mimicking phantom. On the other hand, it showed the disadvantage of having a lower actual sound velocity than that of the skull (approximately 85%–90%). On the other hand, HDPE caused the least distortion in the ultrasound beam, and the denaturation was the closest to the original form. Therefore, HDPE could be a useful skull-mimicking material for evaluating ultrasound brain surgery. In this study, a skull-mimicking phantom with a multi-layer structure was produced and the denaturation in the skull by focused ultrasound was predicted. The development of a therapeutic protocol for a variety of brain diseases will be useful in the future.

Acknowledgement

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2012R1A1A2043564).

References

[1] Kullervo Hynynen et al., “Pre-clinical testing of a

phased array ultrasound system for MRI-guided noninvasive surgery of the brain-A primate study”, European Journal of Radiology 59, pp. 149-156, April 2006. Article (CrossRef Link)

[2] Kullervo Hynynen et al., “Demonstration of potential noninvasive ultrasound brain therapy through an intact skull”, Ultrasound in Med. & Biol., Vol. 24, No. 2, pp. 275-283. October 1997. Article (CrossRef Link)

[3] A. J. Clarke, J. A. Evans, J. G. Truscott, et al. “A phantom for quantitative ultrasound of trabecular bone”, Phys. Med. Biol, 1994, 39: 1677-1687. Article (CrossRef Link)

[4] A. Tatarinov, I. Pontaga and U. Vilks, “Modeling the influence of mineral content and porosity on ultrasound parameters in bone by using synthetic phantoms”, Mechanics of Composite Materials, 1999, 35. Article (CrossRef Link)

[5] Gilbert J. Vella, Victor F. Humphrey, Francis A. Duck, et al. “Ultrasound-induced geating in a foetal skull bone phantom and its dependence on beam width and perfusion”, Ultrasound in Med & Biol, 2003, 29: 779-788. Article (CrossRef Link)

[6] Gilvert J. Vella, Victor F. Humphrey, Francis A. Duck, “The cooling effect of liquid flow on the focused ultrasound-induced heating in a simulated foetal brain“, Ultrasound in Med & Biol, 2003, 29: 1193-1204. Article (CrossRef Link)

[7] Martin O. Culjat, David Goldenberg, Priyamvada Tewari, et al. “A review of tissue substitutes for ultrasound imaging”, Ultrasound in Med & Biol, 2010, 36: 861-873. Article (CrossRef Link)

Fig. 8. Form of a Tadpole denaturation in the Phantom(a) Width, (b) Head, (c) Tail, (d) Full-length, (e) Distancefrom the surface.

Fig. 9. Distance from the surface of denaturation.

Fig. 10. Width of denaturation.


34

Jeong Hwa Min received her B.S. degree in Biomedical Engineering from Inje University, Gimhae, Korea, in 2012. She is currently pursuing a M.S. degree in Medical Imaging Science at Inje University. Her research interests include the development of phantoms related to the field

of medical ultrasound, Biological Signal Processing.

Ju Young Kim received her B.S. and M.S. degrees in Biomedical Engineering from Inje University, Gimhae, Korea, in 2000 and 2002, respectively. She is currently pursuing a Ph.D degree in Biomedical Engineering at Inje University. Her research interests include Medical Ultrasound, Rehabili-

tation, Biological Signal Processing.

Hyun Du Jung received his B.S. degree in Biomedical Engineering from Inje University, Gimhae, Korea, in 2012. He is currently pursuing the M.S. degree in Biomedical Engi- neering at Inje University. He research interests include Medical Ultrasound, Biological Signal Processing, Cavi-

tation monitoring.

Jae Young Kim received her B.S. degree in Biomedical Engineering from Inje University, Gimhae, Korea, in 2014. She is currently pursuing a M.S. degree in Biomedical Engi- neering at Inje University. Her research interests include the development of phantom related to the field of

medical ultrasound, Biological Signal Processing.

Si Cheol Noh is Assistant Professor of Radiological Science at International University of Korea, Jinju, Korea. He received his B.S. M.S. and Ph.D. degrees in Biomedical Engineering at Inje University, Korea in 2000, 2002 and 2011, respectively. His research interests include Medical Ultrasound,

ultrasonic cavitation, and applications of nonionizing radiation.

Heung Ho Choi is Professor of Biomedical Engineering at Inje University, Gimhae, Korea. He received his B.S. and M.S. and Ph.D. degrees in Electronic Engineering from Inha University, Korea, in 1984, 1986 and 1991, respectively. In 1993, he was an exchange professor at the

University of Tokyo, Japan. He served on an Editor of the Korean Society of Medical & Biological Engineering. He also served or currently serving as an Editor and Technical advisory Committee for many important Journals, Conferences, Company in Biomedical Engineering. His research interests include medical ultrasound, health care, rehabilitation, Biological Signal Processing, and in this connection, he has submitted many patent applications and superior research results. He has also published a variety of books in Medical Ultrasound, and has a variety of award-winning papers.



35


Modified MMSE Estimator based on Non-Linearly Spaced Pilots for OFDM Systems

Latif Ullah Khan

Department of Electrical Engineering, University of Engineering & Technology / Peshawar, Pakistan [email protected]

* Corresponding Author: Latif Ullah Khan


* Short Paper

Abstract: This paper proposes a Modified Minimum Mean Square Error (M-MMSE) estimator for an Orthogonal Frequency Division Multiplexing (OFDM) System over fast fading Rayleigh channel. The proposed M-MMSE estimator considered the effects of the efficient placement of pilots based on the channel energy distribution. The pilot symbols were placed in a non-linear manner according to the density of the channel energy. Comparative analysis of the MMSE estimator for a comb-type pilot arrangement and M-MMSE estimator for the proposed pilot insertion scheme revealed significant performance improvement of the M-MMSE estimator over the MMSE estimator.

Keywords: OFDM, Modified MMSE estimator, Comb-type channel estimation 1. Introduction

Orthogonal Frequency Division Multiplexing (OFDM) is a spectrally efficient multicarrier digital communication technique adopted by newly developed wireless communication standards. 802.11-based wireless Local Area Networks (WLANs), power line communication and Long Term Evolution (LTE) use OFDM. Efficient use of the spectrum is due to the overlapping nature of the orthogonal subcarriers in the frequency domain. Channel estimation and equalization in OFDM is used to cancel out the impairments introduced by the multipath fading channel. Pilot arrangement in the OFDM symbol determines the performance of the channel estimation technique used to estimate the channel impulse response at the pilot tones. The Least Square (LS) estimator and Minimum Mean Square Error (MMSE) estimator can be used to estimate the channel impulse response at the pilot tones. The MMSE estimator outperforms the LS estimator at the cost of the high computational complexity [1]. A zero forcing (ZF) estimator was used to estimate the channel frequency response at the pilot tones to cancel out the Inter Carrier Interference (ICI) due to the Doppler shifts induced by the fading channel [2]. The performance of the ZF estimator is degraded compared to the MMSE estimation but the MMSE estimator achieves improved

performance at the cost of computational complexity associated with matrix inversion and the use of the prior channel statistics.

Non-overlapping and overlapping techniques were used to partition the channel auto correlation matrix into the sub matrices to reduce the complexity associated with the linear MMSE estimator [3]. The proposed simplified MMSE estimator outperformed the LS estimator but showed degraded performance compared to the MMSE estimator. The MMSE estimator was realized using the DFT-based estimator [4]. An estimation of the noise variance and channel auto-correlation matrix was performed using the channel impulse response estimated in the DFT-based channel estimation. A performance comparison of the proposed MMSE estimator with the LS and MMSE estimator showed improved performance over the LS estimator but degraded performance compared to the MMSE estimator. On the other hand, the computational complexity associated with the proposed MMSE estimator was less than the MMSE estimator because it does not require a power delay profile for its operation. Previous studies discussed the design of 2-D pilot patterns for the frequency selective fading channels [5-7]. The disadvantage of 2-D pilot patterns is their high complexity compared to 1-D pilot patterns, which makes them unattractive for practical implementation [8].

Khan et al.: Modified MMSE Estimator based on Non-Linearly Spaced Pilots for OFDM Systems

36

This paper proposes a novel Non-Linearly Spaced Pilot Insertion (NSPI) scheme based on the efficient placement of the pilots according to channel energy is proposed for the OFDM system over the fast fading Rayleigh channel. The proposed NSPI scheme uses the channel energy distribution to insert the pilots in OFDM symbol in a non-linear manner. Modifications in the MMSE estimator for the proposed NSPI scheme are also discussed. Comparative analysis of the Modified MMSE (M-MMSE) estimator for the NSPI scheme with a MMSE estimator for a comb-type pilot arrangement revealed significant performance improvement.

The remainder of the paper is organized as follows. Section II introduces the OFDM system model. A MMSE estimation for a comb-type pilot arrangement is discussed in section III. Section IV describes the proposed channel estimation algorithm. The simulation results are presented in section V and the paper is concluded in section VI.

Notations: The matrices are denoted by bold face italic

upper case letters and the vectors are denoted by lower case italic letters. Superscripts H and T are used to denote the Hertmitian transpose and Transpose respectively.

2. OFDM System Model

In the OFDM system shown in Fig. 1, the input bit sequence is first mapped using the digital modulation scheme (such as BSPK) to yield the frequency domain sequence of the symbols. The frequency domain sequence of the symbols is divided into blocks with a size equal to the data subcarriers per OFDM symbol for the comb-type channel estimation and the proposed NSPI scheme. The pilot symbols are inserted into each group of symbols at the dedicated subcarriers after converting the serialized sequence into the parallelized sequence for the comb-type and proposed NSPI scheme. The ith group of the symbols is

. The conversion of the frequency domain sequence of the symbols into the time domain sequence of symbols is performed using the Inverse Discrete Fourier Transform (IDFT). Let

for 0, 1, 2, , 1.n N= −… The ith group of symbols in time domain can be expressed as

(1)

The cyclic prefix of length, ,CPL is added to the OFDM

symbol to cancel the ISI. The cyclic prefix added to theith

group of the OFDM symbol is .

Finally, the cyclic prefix added ith group of the OFDM symbol is passed through the fading channel in the presence of AWGN.

(2)

where ( )iw and ( )jh are the AWGN noise vector and the

time domain channel impulse response vector, respectively. At the receiver side, the cyclic prefix is removed from the received signal to cancel out the ISI. The cyclic prefix is removed by dropping the first CPL symbols from the ( )i

CPy to yield ( )iy . The signal was then converted to the frequency domain using DFT.

(3)

A channel estimation is carried out after converting the

signal from the time domain to the frequency domain. Finally, after equalization, the signal is demapped to yield the output bits.

3. MMSE Estimator Based on the Comb-Type Pilot Arrangement

A comb-type channel estimation was used for the fast fading channels, where the variations in the channel impulse response are rapid. Fig. 2 shows the arrangement of pilots in comb-type and block-type channel estimation. Let ( )i

Pr represent the received signal vector for the pilot symbols in the frequency domain for the ith OFDM symbol. In the frequency domain, the MMSE estimates of the time domain channel vector Ph of the Gaussian distribution and non-correlation with noise, is given [9]:

(4)

where NF is the PXP matrix

is the cross covariance matrix of the received signal vector Pr and the time domain channel vector Ph for pilot symbols.

is the auto-covariance matrix of the received signal vector Pr .

HHR is the auto-covariance matrix of the time domain channel vector Ph and 2σ is the variance of noise .Pw

HHR and 2σ are quantities that are known. Eq. (4) can be re-written as

Fig. 1. OFDM System Model.


37

(5)

where

The estimated channel frequency response vector CEh

for all subcarriers was obtained using the interpolation technique. In this study, linear, low pass and spline cubic one dimensional interpolation techniques were used. The reason for considering the one-dimensional interpolation techniques is their low complexity compared to the two dimensional interpolation techniques [8].

4. Modified MMSE Estimator based on Proposed NSPI Scheme

This section describes the modifications in the MMSE estimator to work for the proposed NSPI scheme. The channel impulse response was distributed in the frequency domain and concentrated in the time domain [4]. No energy leakage was observed in case of the sample-spaced channels and all the impulses were located at integer multiples of the system sampling rate [10]. Figs. 3 and 4 present the channel energy distribution in the frequency domain and time domain, respectively, for the channel order 8L = and 16.L = Figs. 3 and 4 show that the energy distribution of the channel is non-uniform in the frequency domain. The energy lies mainly in the beginning and end subcarriers. The significant subcarriers (shown by highlighting in Figs. 3 and 4) have more energy than the less significant subcarriers (subcarriers in the middle). Figs. 3 and 4 show that the locations of the significant subcarriers are different for the channels with different orders. This non-uniform distribution of channel energy motivates the insertion of pilots in a non-linear manner. The proposed NSPI considers the non-uniform distribution of the channel energy for the non-linear placement of the pilot symbols in the frequency domain. Let =di

and =pi

be vectors denoting the indices of the data and pilot subcarriers, respectively. The estimated channel frequency can be expressed as

(6)

where and M P−h is the estimated channel impulse response using the M-MMSE estimator in the time domain at pilot sub carriers for the proposed NSPI scheme, respectively. Define a matrix PQ containing the columns of PF corresponding to the pilot locations. The M-MMSE estimate of the channel in the time domain is given by

(7)

where

Fig. 2. Pilot Arrangement in the Comb-Type and Block-type channel estimation.

L-1 N-1

Transform Coefficient(a) IDFT(H)

Cha

nnel

Ene

rgy

L-1 N-1

Transform Coefficient(b) |H|

Cha

nnel

Ene

rgy Significant Energy Subcarriers

Fig. 3. Channel Energy distribution for the sample spaced channel with L=8 and N=64.

L-1 N-1

Cha

nnel

Ene

rgy

Transform Coefficient(a) IDFT(H)

L-1 N-1

Cha

nnel

Ene

rgy

Transform Coefficient(b) |H|

Significant Energy Subcarriers

Fig. 4. Channel Energy distribution for the sample spaced channel with L=16 and N=64.

Khan et al.: Modified MMSE Estimator based on Non-Linearly Spaced Pilots for OFDM Systems

38

is the auto-covariance matrix of the time domain channel vector M P−h and 2σ is the variance of noise Pw .

is the auto-covariance matrix of the received signal vector Pr for the proposed NSPI scheme.

is the cross covariance matrix of the received signal vector Pr and the time domain channel vector M P−h for the proposed NSPI scheme.Putting the value of M P−h in Eq. (6) yielded the channel frequency response at all subcarriers.

5. Simulation Results

This section presents the results of the MATLAB® simulation to evaluate the performance of the OFDM system for the M-MMSE and MMSE estimator. Table 1 lists the values of the parameters used for the performance evaluation. The channel used for the performance evaluation consisted of L independent taps of the Gaussian distribution and zero mean. Constant and exponential power delay profiles were used for the performance evaluation. The variance of each tap in the case of the exponential power delay profile can be expressed as

(8) The vector containing the indices of the pilots for

channel order 8 is expressed as

(9)

The legends, ‘Low Pass’, ‘Linear’, ‘spline’, ‘Proposed’,

‘C’ and ‘E’, refer to the low pass interpolation, linear interpolation, spline interpolation, proposed channel estimation algorithm, constant power delay profile, and exponential power delay profile, respectively. Fig. 5 shows the performance of the OFDM system for the M-MMSE and MMSE estimator. The performance of M-MMSE for the proposed NSPI scheme was improved significantly compared to the MMSE estimator for comb-type pilot arrangement using low pass, spline and linear interpolation techniques. The performance of the proposed channel estimation algorithm had the same form for both types of power delay profiles but the performance of the MMSE estimator with different one-dimensional interpolation techniques was slightly better for the exponential power delay profile compared to the constant power delay profile

because the higher channel taps are less significant in the case of the exponential power delay profile.

6. Conclusions

A novel NSPI scheme was proposed for OFDM systems over fast fading Rayleigh channels. Modifications were also proposed in the MMSE estimator to work for the proposed NSPI scheme. The proposed channel estimation algorithm significantly improved the performance of the OFDM system significantly compared to the MMSE estimator for the comb-type pilot arrangement. Future studies should examine whether the proposed channel estimation algorithm can be made dynamic in that the pilot positions can be changed with the variations in the channel.

References

[1] Zeeshan Sabir, M. ArifWahla and M. Inayatullah

Babar, OFDM, Turbo Codes and Improved Channel Estimation-A magical Combination. ISBN: 9783639326505 Germany:VDM Verilog Publishers, Jan, 2011. Article (CrossRef Link)

[2] Khan, L.U.; Khan, G.M.; Mahmud, S.A.; Sabir, Z., "Comparison of Three Interpolation Techniques in Comb-Type Pilot-Assisted Channel Coded OFDM System," Advanced Information Networking and Applications Workshops (WAINA), 2013 27th International Conference on, vol., no., pp.977,981, 25-28 March 2013. Article (CrossRef Link)

[3] M. Noh, Y. Lee, and Park, “Low Complexity LMMSE Channel Estimation for OFDM,” IEE Proceedings on Commununications, vol. 153, no.5, pp.645-650, 2006. Article (CrossRef Link)

[4] Jie Ma; Hua Yu; Shouyin Liu;, "The MMSE Channel Estimation Based on DFT for OFDM System," Wireless Communications, Networking and Mobile

Table 1. Simulation Parameters.

Parameters Values Number of Subcarriers

Channel Estimation Channel

Modulation scheme Channel Order, L

64 MMSE and M-MMSE Fast Fading Rayleigh

BPSK 8

0 5 10 15 20 25 30 35 4010

-4

10-3

10-2

10-1

100

SNR(dB)

BE

R

MMSE + Low Pass (C) MMSE + Linear(C)MMSE + Spline(C)Proposed(C) MMSE + Low Pass(E) MMSE + Linear(E)MMSE + Spline(E)Proposed(E)

Fig. 5. BER Performance of the OFDM system for the BPSK modulation scheme.


39

Computing, 2009. WiCom '09. 5th International Conference on, vol., no., pp.1-4, 24-26 Sept. 2009. Article (CrossRef Link)

[5] F. Said and H. Aghvami, “Linear two dimensional pilot assisted channel estimation for OFDM systems,” in IEEE Conf. Telecommunications, Edinburgh, Scotland, Apr. 1998, pp. 32–36. Article (CrossRef Link)

[6] J. K. Moon and S. I. Choi, “Performance of channel estimation methods for OFDM systems in a multipath fading channels,” IEEE Trans. Consum. Electron., vol. 46, no. 1, pp. 161–170, Feb. 2000. Article (CrossRef Link)

[7] P. Hoeher, S. Kaiser, and P. Robertson, “Two-dimensional pilot-symbol-aided channel estimation by Wiener filtering,” in Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, Apr. 1997, pp. 1845–1848. Article (CrossRef Link)

[8] Yushi Shen and Ed Martinez, “Channel Estimation in OFDM Systems”, Free scale Semiconductor, AN3059 Inc., 2006, www.freescale.com, August 2008. Article (CrossRef Link)

[9] Ozdemir, M.K.; Arslan, H.;, "Channel estimation for wireless ofdm systems," Communications Surveys & Tutorials, IEEE, vol.9, no.2, pp.18-48, Second Quarter 2007. Article (CrossRef Link)

[10] Ahmad Chini, Multicarrier Mmodulation in frequency Selective Fading Channels, Phd Thesis, Carleton University, Ottawa, Canada, 1994. Article (CrossRef Link)

Latif Ullah Khan is currently working as a Lecturer at Electrical Engineering department, University of Engineering & technology Peshawar, Pakistan. He received his B.Sc degree in Electrical Engineering from the University of Engineering &Technology Peshawar and is currently pursuing his M.Sc

degree in Electrical Engineering from the same University. He has more than 20 Publications in reputable Engineering Conferences and Journals. He is a member of IEEE USA and had the honor to win the best paper award in the 15th IEEE International Conference on Advanced Communi- cation Technology, (ICACT-2013) in South Korea. His research interests include channel estimation in OFDM, channel coding, intelligent system design and routing protocols in Mobile Ad-hoc Networks (MANETs).


IEIE Transactions on Smart Processing and Computing, vol...

Documents

Transcript of IEIE Transactions on Smart Processing and Computing, vol...