[IEEE 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP) - Zanjan, Iran...

5
Predictive Three Step Search (PTSS) algorithm for motion estimation Hadi Amirpour Department of Electrical Engineering K.N. Toosi University of technology Tehran, Iran [email protected] Dr.Amir Mousavinia Department of Electrical Engineering K.N. Toosi University of technology Tehran, Iran [email protected] Nakisa Shamsi Department of Electrical Engineering K.N. Toosi University of technology Tehran, Iran [email protected] Abstract—Motion estimation is a vital task in video compression and many algorithms are proposed to reduce its computational complexity. In a conventional Full Search (FS) algorithm, all blocks are searched for a match in the search window, resulting in a very acceptable PSNR compared to the other methods. However it suffers from heavy computational overhead. Three Step Search (TSS) algorithm which limits the search space adaptively, is used in many applications for its simplicity and effectiveness. The PTSS algorithm proposed in this paper decreases the number of search blocks even more, using motion information obtained from its neighboring blocks. Experimental and simulation results show approximately a 20% speed enhancement with the same or slightly improved PSNR in comparison to TSS. Keywords—video compression; motion estimation; block matching; Three step search; Predictive. I. INTRODUCTION Motion estimation is an important part of many video compression techniques. Nowadays, video streams are used widely in many applications and video compression can dramatically decrease the amount of required transmission bandwidth or storage memory space. Roughly speaking, 50% to 90% of video compression’s computational load is related to motion estimation algorithm [1]. Therefore, motion estimation plays an important role in the video compression standards like MPEG and H.26X. Video compression techniques remove the spatial, temporal and statistical redundancies between frames in a video sequence by estimating motion vectors. Among the various and different methods used to estimate motion vectors, the block matching motion estimation algorithms are very popular due to their simplicity [2]. In the block matching motion estimation, each current frame is divided into non-overlapping square blocks of size N×N, as shown in fig.1. Considering a block in the current frame, called current block, a search window of P×P blocks is defined around its corresponding location in reference frame, which is the previous frame. Then the algorithm tries to find the best match for the current block inside the search window in the reference frame. The block that has the minimum distortion will be chosen as the best matched block. Many matching criteria are proposed to determine minimum distortion of two blocks [1]. Sum of Absolute Differences (SAD) is usually used to evaluate distortion. For two blocks A and B SAD can be described as: 1 1 0 0 ( , ) (, ) (, ) N N i j SAD A B Aij Bij = = = ¦ ¦ (1) Suppose that block B, located at (u, v) is the best match found for block A, located at (i, j). The estimated motion vector associating block A is: MV=(i-u , j-v). For each block in the current frame, the motion vectors are calculated. In video compression techniques such as MPEG4, the reference frame and the next frame’s motion vectors are used to estimate the next frames. Here the current frame is reconstructed according to the estimated vectors and the reference frame. The PSNR between reconstructed image and current frame will determine the quality of reconstruction method. The PSNR is calculated by: 2 1 10 log( ) n PSNR MSE = (2) Fig. 1. Dividing current frame into nonoverlapping blocks 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP) 978-1-4673-6184-2/13/$31.00 ©2013 IEEE 48

Transcript of [IEEE 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP) - Zanjan, Iran...

Predictive Three Step Search (PTSS) algorithm for motion estimation

Hadi Amirpour Department of Electrical Engineering K.N. Toosi University of technology

Tehran, Iran [email protected]

Dr.Amir Mousavinia Department of Electrical Engineering K.N. Toosi University of technology

Tehran, Iran [email protected]

Nakisa Shamsi Department of Electrical Engineering K.N. Toosi University of technology

Tehran, Iran [email protected]

Abstract—Motion estimation is a vital task in video compression and many algorithms are proposed to reduce its computational complexity. In a conventional Full Search (FS) algorithm, all blocks are searched for a match in the search window, resulting in a very acceptable PSNR compared to the other methods. However it suffers from heavy computational overhead. Three Step Search (TSS) algorithm which limits the search space adaptively, is used in many applications for its simplicity and effectiveness. The PTSS algorithm proposed in this paper decreases the number of search blocks even more, using motion information obtained from its neighboring blocks. Experimental and simulation results show approximately a 20% speed enhancement with the same or slightly improved PSNR in comparison to TSS.

Keywords—video compression; motion estimation; block matching; Three step search; Predictive.

I. INTRODUCTION Motion estimation is an important part of many video

compression techniques. Nowadays, video streams are used widely in many applications and video compression can dramatically decrease the amount of required transmission bandwidth or storage memory space. Roughly speaking, 50% to 90% of video compression’s computational load is related to motion estimation algorithm [1]. Therefore, motion estimation plays an important role in the video compression standards like MPEG and H.26X. Video compression techniques remove the spatial, temporal and statistical redundancies between frames in a video sequence by estimating motion vectors. Among the various and different methods used to estimate motion vectors, the block matching motion estimation algorithms are very popular due to their simplicity [2].

In the block matching motion estimation, each current frame is divided into non-overlapping square blocks of size N×N, as shown in fig.1. Considering a block in the current frame, called current block, a search window of P×P blocks is defined around its corresponding location in reference frame, which is the previous frame. Then the algorithm tries to find the best match for the current block inside the search window in the reference frame. The block that has the minimum distortion will be chosen as the best matched block. Many matching criteria are proposed to determine minimum distortion of two blocks [1]. Sum of Absolute Differences

(SAD) is usually used to evaluate distortion. For two blocks A and B SAD can be described as:

1 1

0 0( , ) ( , ) ( , )N N

i jSAD A B A i j B i j− −

= == − (1)

Suppose that block B, located at (u, v) is the best match found for block A, located at (i, j). The estimated motion vector associating block A is:

MV=(i-u , j-v).

For each block in the current frame, the motion vectors are calculated. In video compression techniques such as MPEG4, the reference frame and the next frame’s motion vectors are used to estimate the next frames. Here the current frame is reconstructed according to the estimated vectors and the reference frame. The PSNR between reconstructed image and current frame will determine the quality of reconstruction method. The PSNR is calculated by:

2 110log( )n

PSNRMSE

−= (2)

Fig. 1. Dividing current frame into nonoverlapping blocks

2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP)

978-1-4673-6184-2/13/$31.00 ©2013 IEEE 48

( )1 1

2

0 0

1 ( , ) ( , )*

m n

c ri j

MSE f i j f i jm n

− −

= == − (3)

Where (2 1)n − is the square of the highest-possible signal value in the image, and n is the number of bits per image sample. cf is the current frame , rf is the reference frame and (m, n) represent the size of the video frames.

The Full Search algorithm is the traditional way to obtain motion vectors . It exhaustively searches all blocks in search area to find the best matched block (fig.2). The algorithm always yields the best PSNR value with a high computation cost. Two different approaches are usually proposed to reduce the high computation cost in a full search algorithm. In the first group of approaches, which are lossless, the same PSNR is obtained, as in a Full Search algorithm, but modified and improved methods are used with lower computational overhead. Successive Elimination Algorithm (SEA) [3], Multilevel Successive Elimination Algorithm (MSEA) [4], Partial Distortion Elimination (PDE) [5], … are classified in this category. The second group which use lossy fast algorithms such as TSS [6], Four Step Search (FSS) [7], New Three Step Search (NTSS) [8], Cross Search (CS) [9], restrict the number of search blocks using uni-modal error surface assumption to reduce computation complexity associated with FS. Under uni-modal error surface assumption, the distortion monotonically increases as the search position moves away from the global minimum (best matched block).

The determined motion vectors of spatial or/and temporal neighboring blocks are very useful during the initial estimation of current block motion vector. The idea has been used in many algorithms such as [10-12]. In some algorithms

Fig. 2. Full search

such as MVFAST [13], PMVFAST [14] and etc, motion vectors of neighboring blocks are used to predict the shape of the next search pattern as LDSP1 or SDSP2. In some algorithms like ARPS [15] and fast ARPS [16] motion vectors of neighboring blocks are used to determine the size of search pattern. In the proposed algorithm these vectors are used to predict search area in the first step of TSS algorithm, eliminating redundant searching blocks.

In the following section, TSS algorithm is introduced. Section III explains the proposed algorithm (PTSS). Section IV presents the corresponding simulation results and finally section V concludes the paper.

II. THREE STEP SEARCH ALGORITHM

This algorithm was introduced by Koga et. al in 1981[6]. It’s one of the earliest attempts for fast block matching motion estimation. It became very popular because of its simplicity, robustness and near optimal performance. Steps of algorithm are as follows:

• Find the corresponding block in the reference frame for the current block of the current frame and define the search window around it. Setting step size as 4 (S=4), the SAD value is calculated for nine different blocks including the central block and the eight blocks located around it at distance S, as shown in fig.3. The block with minimum SAD is chosen as the best matched block in step 1 and will be used as the center block for the next step.

• Halve the step size (S=2), calculate SAD for eight blocks at a distance of step size from the center. Like previous step, choose the block with minimum SAD as the best match and use it as the center block for the next step.

• Halve the step size again (S=1) and calculate SAD for eight neighboring blocks of the center block. The block with minimum SAD is chosen as the best matched block and the algorithm is finished.

An example of TSS algorithm is shown in fig.3. Gray blocks have minimum SAD in each step and will be used as the center of the next step’s search. In step 1 nine blocks, in steps 2 and 3 eight blocks are searched by the algorithm respectively. Totally, TSS searches 25 blocks to find the best matched block. If we assume the size of the search window to be 15×15, the full search algorithm needs to calculate 225 SAD values for current block compared to only 25 SAD values calculated in TSS algorithm. Hence, TSS speed up is 9 in this example.

1 Large Diamond Search Pattern 2 Small Diamond Search Pattern

49

Fig. 3. Three step search algorithm. (a) step 1 ,(b) step 2 and (c) step 3

III. PREDICTIVE THREE STEP SEARCH As the motion vectors in a neighboring window are usually

changing slowly, in most cases the motion vectors of adjacent blocks have nearly the same directions. Consider the blocks shown in fig.4, it is clear that if the direction of motion vectors in the neighboring blocks are mostly toward right, it is accepted that the motion vector of the current block is more likely toward right as well. This is the main idea for the PTSS algorithm which uses the motion vector of the left adjacent block to restrict the search area in the first step of TSS algorithm. It uses only 1 or 4 block(s) to find the best possible match compared to 9 of TSS algorithm. For example, if the direction of the current block’s left neighbor is above-right, PTSS searches only the 4 blocks as shown in fig.5.d . The steps of the proposed algorithm are as follows:

1) Set step size to S=4. In accordance with the motion vector of the current block’s left neighbor, select one of the patterns shown in fig.5. The best match found will be the center of the next step search. If the motion vector is (0, 0) the pattern of fig.5.i is selected and SAD is calculated only for the center block which will be the best matched block.

2) Halve the step size (S=2), calculate SAD for eight blocks at a distance of step size from the center. Like previous step, the block with the minimum SAD will be the center block of the next step.

3) Set step size to 1 (S=1) and calculate SAD for eight neighboring blocks of the center block. The block with minimum SAD will be the best matched block.

Fig. 4. Neighboring blocks of the current block

a b

c d

e f

g h

i

Fig. 5. Different types of search patterns in step 1 of PTSS

50

In another modification, one can choose a threshold value for the calculated SAD between the current block and its corresponding block in the reference frame to decide whether to continue the search or terminate it. This can dramatically increase the speed of algorithm too.

After calculating SAD value, it is compared with the threshold value. If the calculated SAD is less than the threshold value, these two blocks are noticed to be similar enough and the algorithm stops, considering (0,0) as the motion vector. As is guessed, most of the blocks are almost stationary in two adjacent frames, so using a threshold value will save the computation very much and the resulted algorithm will be very fast.

IV. EXPERIMENTAL RESULTS

Four video sequences with 30 fps rate are used to compare TSS with PTSS. Two frames “Flower” and “Tennis” are in SIF (352×240), “Foreman” sequence is in CIF (352×288) and “Miss America” is 360×288. For simulation, the first 100 frames of each sequence are used, with the SAD value as the distortion measure. Block sizes 5×5 and 16×16 are used, and the search window size is 15×15 for both. We compared the PTSS with TSS in PSNR and the average number of blocks each algorithm has searched. Reducing the number of search blocks in step 1 of TSS, we expect the PSNR to decrease. However, in some cases the average PSNR value has increased. Sometimes, the information obtained from the left block’s motion vector helps to prevent the search algorithm from falling into local minima. The speed of the algorithm is much faster than TSS, because PTSS searches 1 or 4 block(s), whereas the TSS searches 9 blocks in step 1. The other steps search the same number of blocks to find the best matched block. To evaluate the effect of considering a threshold value for the current block and its corresponding block in the reference frame, we used a threshold value of 512 for block size of 16×16 and 50 for block size of 5×5. Table I and III summarize PSNR simulation results for TSS and PTSS with different block sizes. Table II and IV show the average number of blocks that TSS and PTSS search to get the best matched block.

TABLE I. AVERAGE PSNR RESULTS (BLOCK SIZE:5×5)

images algorithms

TSS PTSS PTSS with

threshold foreman 34.30 34.42 34.40

garden 20.59 21.67 21.67

tennis 26.66 26.99 26.99

Miss america 38.26 38.34 38.09

TABLE II. AVERAGE SEARCH POINTS FOR EACH BLOCK (BLOCK SIZE:5×5)

images algorithms

TSS PTSS PTSS with

threshold foreman 24.47 19.08 13.10

garden 24.44 19.50 16.89

tennis 24.44 18.66 16.68

Miss america 24.51 19.37 11.75

TABLE III. AVERAGE PSNR RESULTS (BLOCK SIZE:16×16)

images algorithms

TSS PTSS PTSS with

threshold

foreman 32.57 32.61 32.58

garden 22.20 22.37 22.37

tennis 25.05 24.98 24.98

Miss america 37.24 37.26 37.21

TABLE IV. AVERAGE SEARCH POINTS FOR EACH BLOCK (BLOCK SIZE:16×16)

images algorithms

TSS PTSS PTSS with

threshold foreman 23.29 18.20 15.04

garden 23.24 18.60 16.99

tennis 23.24 17.76 16.53

Miss america 23.45 18.48 12.33

V. CONCLUSIONS

In this paper, Predictive Three Step Search algorithm (PTSS) was introduced. PTSS is similar to the conventional TSS algorithm, but it reduces the number of search points in the first step of TSS by using motion vectors of the left adjacent block. This results in a fast and reliable algorithm and a minimum of 20% increase in speed is achieved. Information from left block’s motion vector helps to prevent the search algorithm from falling into local minima, slightly increasing the PSNR value in most cases.

51

[1] Y.-W. Huang , C.-Y. Chen , C.-H. Tsai , C.-F. Shen and L.-G. Chen "Survey on block matching motion estimation algorithms and architectures with new results," J. VLSI Signal Process., vol. 42, no. 3, pp.297 -320 2006.

[2] S.I.A. Pandian, G.J. Bala, and B.A. George, "A study on block matching algorithms for motion estimation," International Journal on Computer Science and Engineering, vol. 3, no. 1, pp. 34-44, 2011.

[3] W. Li and E. Salari, ‘‘Successive Elimination Algorithm for Motion Estimation,’’ IEEE Trans. Image Processing, vol. 4,no. 1, 1995, pp. 105---107.

[4] X.Q. Gao, C.J. Duanmu, and C.R. Zou, ‘‘A Multilevel Successive Elimination Algorithm for Block Matching Motion Estimation,’’ IEEE Trans. Image Processing, vol. 9, no. 3, 2000, pp. 501---504.

[5] Digital Video Coding Group, ITU-T recommendation H.263 software implementation, Telenor R’D, 1995.

[6] T. Koga, K. linuma, A. Hirano, Y. Iijima, and T. Ishiguro, ‘‘Motion compensated interframe coding for video conferencing,’’ in Proc. Nat. Telecommun. Conf., 1981, pp. C9.6.1---C9.6.5.

[7] L.M. Po and W.C. Ma, ‘‘A Novel Four-step Search Algorithm for Fast Block Motion Estimation,’’ IEEE Trans. Circuits Syst.Video Technol., vol. 6, no. 3, 1996, pp. 313---317.

[8] R. Li, B. Zeng, and M.L. Liou, ‘‘A New Three-step Search Algorithm for BlockMotion Estimation,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 4, pp. 438/442, Aug. 1994.

[9] M. Ghanbari, ‘‘The Cross Search Algorithm for Motion Estimation,’’IEEE Trans. Commun., vol. 38, no. 7, 1990, pp. 950---953.

[10] C.H. Hsieh, P.C. Lu, J.S. Shyn, and E.H. Lu, ‘‘Motion Estimation Algorithm using Interblock Correlation,’’ IEE Electron. Lett., vol. 26, no. 5, 1990, pp. 276---277.

[11] S. Zafar, Y.Q. Zhang, and J.S. Baras, ‘‘Predictive BlockMatching Motion Estimation for TV Coding-----Part I: Inter-block Prediction,’’ IEEE Trans. Broadcast., vol. 37, no. 3, 1991, pp.97---101.

[12] Y.Q. Zhang and S. Zafar, ‘‘Predictive Block-matching Motion Estimation for TV Coding-----Part II: Inter-frame Prediction,’’ IEEE Trans. Broadcast., vol. 37, no. 3, 1991, pp. 102---105.

[13] K.K. Ma and P.I. Hosur, "Performance Report of Motion Vector Field Adaptive Search Technique (MVFAST)," in ISO/IEC JTC1/SC29/WG11 MPEG99/m5851, Noordwijkerhout, NL, Mar'00.

[14] M. Tourapis, O. C. Au and M. L. Liou. "Fast Block-matching Motion Estimation using Prediction Motion Vector Field Adaptive Search Technique (PMVFAST)," Noordwijkerhout, Netherlands, 2000

[15] Yao Nie, and Kai-Kuang Ma, "Adaptive Rood Pattern Search for Fast Block-Matching Motion Estimation", IEEE Trans. Image Processing, vol.11, no. 12, Dec. 2002, pp. 1442-1448.

[16] B.-G. Kim , S.-T. Kim , S.-K. Song and P.-S. Mah "Fast-adaptive rood pattern search for block motion estimation", Electron. Lett., vol. 41, no. 16, pp.900 -902 2005

52