Download - Chapter 8: Fast Convolutionpeople.ece.umn.edu/users/parhi/SLIDES/chap8.pdf · Chap. 8 2 Chapter 8 Fast Convolution • Introduction • Cook-Toom Algorithm and Modified Cook-Toom

Chapter 8: Fast Convolution

Keshab K. Parhi

2Chap. 8

Chapter 8 Fast Convolution

• Introduction• Cook-Toom Algorithm and Modified Cook-Toom

Algorithm• Winograd Algorithm and Modified Winograd Algorithm• Iterated Convolution• Cyclic Convolution• Design of Fast Convolution Algorithm by Inspection

3Chap. 8

Introduction• Fast Convolution: implementation of convolution algorithm using fewer

multiplication operations by algorithmic strength reduction

• Algorithmic Strength Reduction: Number of strong operations (such asmultiplication operations) is reduced at the expense of an increase in thenumber of weak operations (such as addition operations). These are best suitedfor implementation using either programmable or dedicated hardware

• Example: Reducing the multiplication complexity in complex numbermultiplication:

– Assume (a+jb)(c+dj)=e+jf, it can be expressed using the matrix form, whichrequires 4 multiplications and 2 additions:

– However, the number of multiplications can be reduced to 3 at the expense of 3extra additions by using:

⋅

−=

ba

cddc

fe

−++=+−+−=−

)()()()(

baddcbbcadbaddcabdac

4Chap. 8

– Rewrite it into matrix form, its coefficient matrix can be decomposed asthe product of a 2X3(C), a 3X3(H)and a 3X2(D) matrix:

• Where C is a post-addition matrix (requires 2 additions), D is a pre-additionmatrix (requires 1 addition), and H is a diagonal matrix (requires 2 additions toget its diagonal elements)

– So, the arithmetic complexity is reduced to 3 multiplications and 3additions (not including the additions in H matrix)

• In this chapter we will discuss two well-known approaches to the design offast short-length convolution algorithms: the Cook-Toom algorithm (basedon Lagrange Interpolation) and the Winograd Algorithm (based on theChinese remainder theorem)

xDHCba

d

dcdc

fe

s ⋅⋅⋅=

⋅

−

⋅

+

−⋅

=

=

11

1001

00

0000

110101

5Chap. 8

Cook-Toom Algorithm

• A linear convolution algorithm for polynomial multiplication based onthe Lagrange Interpolation Theorem

• Lagrange Interpolation Theorem:

Let nββ ,....,0 be a set of 1+n distinct points, and let )( if β , for i

= 0, 1, …, n be given. There is exactly one polynomial )( pf of degree n or less

that has value )( if β when evaluated at iβ for i = 0, 1, …, n. It is given by:

∏

∏∑

≠

≠

= −

−=

ijji

ijjn

ii

pfpf

)(

)()()(

0 ββ

ββ

6Chap. 8

• The application of Lagrange interpolation theorem into linearconvolution

Consider an N-point sequence { }110 ,...,, −= Nhhhh

and an L-point sequence { }110 ,...,, −= Lxxxx . The linear

convolution of h and x can be expressed in terms of polynomial

multiplication as follows: )()()( pxphps ⋅= where

011

1 ...)( hphphph NN +++= −

−

011

1 ...)( xpxpxpx LL +++= −

−

012

2 ...)( spspsps NLNL +++= −+

−+

The output polynomial )( ps has degree 2−+ NL and has1−+ NL

different points.

7Chap. 8

• (continued)

)( ps can be uniquely determined by its values at 1−+ NL

different points. Let { }210 ,...,, −+ NLβββ be 1−+ NL

different real numbers. If )( is β for { }2,...,1,0 −+= NLi are

known, then )( ps can be computed using the Lagrange interpolationtheorem as:

∏∏

∑≠

≠−+

= −

−=

ijji

ijjNL

ii

psps

)(

)()()(

2

0 ββ

ββ

It can be proved that this equation is the unique solution to compute linear

convolution for )( ps given the values of )( is β , for

{ }2,...,1,0 −+= NLi .

8Chap. 8

• Cook-Toom Algorithm (Algorithm Description)

• Algorithm Complexity– The goal of the fast-convolution algorithm is to reduce the multiplication

complexity. So, if βi `s (i=0,1,…,L+N-2) are chosen properly, thecomputation in step-2 involves some additions and multiplications bysmall constants

– The multiplications are only used in step-3 to compute s(βi). So, onlyL+N-1 multiplications are needed

1. Choose 1−+ NL different real numbers 210 ,, −+⋅⋅⋅ NLβββ

2. Compute )( ih β and )( ix β , for { }2,,1,0 −+⋅⋅⋅= NLi

3. Compute )()()( iii xhs βββ ⋅= , for { }2,,1,0 −+⋅⋅⋅= NLi

4. Compute )( ps by using ∏∏

∑≠

≠−+

= −

−=

ijji

ijjNL

ii

psps

)(

)()()(

2

0 ββ

ββ

9Chap. 8

– By Cook-Toom algorithm, the number of multiplications is reduced fromO(LN) to L+N-1 at the expense of an increase in the number of additions

– An adder has much less area and computation time than a multiplier. So,the Cook-Toom algorithm can lead to large savings in hardware (VLSI)complexity and generate computationally efficient implementation

• Example-1: (Example 8.2.1, p.230) Construct a 2X2 convolution algorithm usingCook-Toom algorithm with β={0,1,-1}– Write 2X2 convolution in polynomial multiplication form as

s(p)=h(p)x(p), where

– Direct implementation, which requires 4 multiplications and 1 additions,can be expressed in matrix form as follows:

2210

1010

)(

)()(

pspssps

pxxpxphhph

++=

+=+=

⋅

=

1

0

1

01

0

2

1

0

0

0

xx

hhh

h

sss

10Chap. 8

• Example-1 (continued)– Next we use C-T algorithm to get an efficient convolution implementation

with reduced multiplication number

– Then, s(β0), s(β1), and s(β2) are calculated, by using 3 multiplications, as

– From the Lagrange Interpolation theorem, we get:

1021022

1011011

00000

)(,)(,2)(,)(,1

)(,)(,0

xxxhhhxxxhhh

xxhh

−=−==+=+==

===

ββββββ

βββ

)()()()()()()()()( 222111000 βββββββββ xhsxhsxhs ===

22

10

210

2210

1202

10

2101

101

2010

210

)2

)()()(()2

)()(()(

))(())((

)2(

))(())(()(

))(())(()()(

sppss

ssspssps

pps

ppsppsps

++=

++−+−+=

−−−−

+

−−−−+

−−−−=

ββββββ

ββββββ

β

βββββββ

βββββββ

11Chap. 8

• Example-1 (continued)– The preceding computation leads to the following matrix form

– The computation is carried out as follows (5 additions, 3 multiplications)

⋅

−⋅

−+⋅

−−=

⋅

⋅

−−=

⋅

−−=

1

0

10

10

0

2

1

0

2

1

0

2

1

0

2

1

0

111101

2)(0002)(000

111110

001

)()()(

2)(0002)(000)(

111110

001

2)(2)(

)(

111110

001

xx

hhhh

h

xxx

hh

h

ss

s

ss

s

βββ

ββ

β

ββ

β

210221100

222111000

10210100

102

10100

,,.4,,.3,,.2

2,

2,.1

SSSsSSsSsXHSXHSXHS

xxXxxXxX

hhHhhHhH

++−=−=====

−=+==

−=+== (pre-computed)

12Chap. 8

– (Continued): Therefore, this algorithm needs 3 multiplications and 5additions (ignoring the additions in the pre-computation ), i.e., the numberof multiplications is reduced by 1 at the expense of 4 extra additions

– Example-2, please see Example 8.2.2 of Textbook (p.231)

• Comments– Some additions in the preaddition or postaddition matrices can be

shared. So, when we count the number of additions, we only count oneinstead of two or three.

– If we take h0, h1 as the FIR filter coefficients and take x0, x1 as the signal(data) sequence, then the terms H0, H1 need not be recomputed eachtime the filter is used. They can be precomputed once offline and stored.So, we ignore these computations when counting the number ofoperations

– From Example-1, We can understand the Cook-Toom algorithm as amatrix decomposition. In general, a convolution can be expressed inmatrix-vector forms as

⋅

=

1

0

1

01

0

2

1

0

0

0

xx

hhh

h

sss

xTs ⋅=or

13Chap. 8

– Generally, the equation can be expressed as

• Where C is the postaddition matrix, D is the preaddition matrix, and H is adiagonal matrix with Hi, i = 0, 1, …, L+N-2 on the main diagonal.

– Since T=CHD, it implies that the Cook-Toom algorithm provides a wayto factorize the convolution matrix T into multiplication of 1 postadditionmatrix C, 1 diagonal matrix H and 1 preaddition matrix D, such that thetotal number of multiplications is determined only by the non-zeroelements on the main diagonal of the diagonal matrix H

– Although the number of multiplications is reduced, the number ofadditions has increased. The Cook-Toom algorithm can be modified inorder to further reduce the number of additions

xDHCxTs ⋅⋅⋅=⋅=

14Chap. 8

Modified Cook-Toom Algorithm

• The Cook-Toom algorithm is used to further reduce the number ofaddition operations in linear convolutions

• Now consider the modified Cook-Toom Algorithm

Define 2

2)()(' −+−+−= NL

NL pSpsps . Notice that the

degree of )(ps is 2−+ NL and 2−+NLS is its highest order

coefficient. Therefore the degree of )(' ps is 3−+ NL .

15Chap. 8

• Modified Cook-Toom Algorithm

1. Choose 2−+ NL different real numbers 310 ,, −+⋅⋅⋅ NLβββ

2. Compute )( ih β and )( ix β , for { }3,,1,0 −+⋅⋅⋅= NLi

3. Compute )()()( iii xhs βββ ⋅= , for { }3,,1,0 −+⋅⋅⋅= NLi

4. Compute 2

2)()(' −+−+−= NL

iNLii sss βββ , for { }3,,1,0 −+⋅⋅⋅= NLi

5. Compute )(' ps by using ∏∏

∑≠

≠−+

= −

−=

ijji

ijjNL

ii

psps

)(

)()(')('

2

0 ββ

ββ

6. Compute 2

2)(')( −+−++= NL

NL pspsps

16Chap. 8

• Example-3 (Example 8.2.3, p.234) Derive a 2X2 convolution algorithm using themodified Cook-Toom algorithm with β={0,-1}

– and

• Which requires 2 multiplications (not counting the h1x1

multiplication)– Apply the Lagrange interpolation algorithm, we get:

Consider the Lagrange interpolation for 2

11)()(' pxhpsps −= at

{ }1,0 10 −== ββ .

First, find 2

11)()()(' iiii xhxhs ββββ −=

1011011

00000

)(,)(,1)(,)(,0

xxxhhhxxhh−=−=−=

===βββ

βββ

1110102

111111

002

011000

))(()()()('

)()()('

xhxxhhxhxhs

xhxhxhs

−−−=−=

=−=

ββββ

ββββ

))(')('()('

)()(

)(')()(

)(')('

100

01

01

10

10

ββββββ

ββββ

β

ssps

ps

psps

−+=−−

+−−

=

17Chap. 8

• Example-3 (cont’d)– Therefore,– Finally, we have the matrix-form expression:

– Notice that

– Therefore:

2210

211)(')( pspsspxhpsps ++=+=

⋅

−=

11

1

0

2

1

0

)('

)('

100011

001

xhs

s

ss

s

ββ

⋅

−=

11

1

0

11

1

0

)(

)(

100110

001

)('

)('

xhs

s

xhs

s

ββ

ββ

⋅

−⋅

−⋅

−=

⋅

−⋅

−=

1

0

1

10

0

11

1

0

2

1

0

1011

01

000000

100111001

)()(

100110

001

100011001

xx

hhh

h

xhss

sss

ββ

18Chap. 8

• Example-3 (cont’d)– The computation is carried out as the follows:

– The total number of operations are 3 multiplications and 3 additions.Compared with the convolution algorithm in Example-1, the number ofaddition operations has been reduced by 2 while the number ofmultiplications remains the same.

• Example-4 (Example 8.2.4, p. 236 of Textbook)• Conclusion: The Cook-Toom Algorithm is efficient as measured by the

number of multiplications. However, as the size of the problem increases, it isnot efficient because the number of additions increases greatly if β takesvalues other than {0, ±1, ±2, ±4}. This may result in complicated pre-additionand post-addition matrices. For large-size problems, the Winograd algorithm ismore efficient.

22210100

222111000

1210100

1210100

,,.4,,.3,,.2,,.1

SsSSSsSsXHSXHSXHS

xXxxXxXhHhhHhH

=+−=====

=−===−== (pre-computed)

19Chap. 8

Winograd Algorithm• The Winograd short convolution algorithm: based on the CRT (Chinese

Remainder Theorem) ---It’s possible to uniquely determine a nonnegative integer givenonly its remainder with respect to the given moduli, provided that the moduli are relativelyprime and the integer is known to be smaller than the product of the moduli

• Theorem: CRT for Integers

Given [ ]cRcimi = (represents the remainder when c is divided by im ), for

ki ,...,1,0= , where im are moduli and are relatively prime, then

MMNcck

iiii mod

0

= ∑

=, where ∏ =

=k

i imM0 , ii mMM = ,

and iN is the solution of 1),( ==+ iiiiii mMGCDmnMN , provided

that Mc <≤0

20Chap. 8

• Theorem: CRT for Polynomials

• Example-5 (Example 8.3.1, p.239): using the CRT for integer, Choosemoduli m0=3, m1=4, m2=5. Then , and .Then:

– where and are obtained using the Euclidean GCD algorithm. Giventhat the integer c satisfying , let .

Given [ ])()()( )()( pcpRpc im

i = , for i=0, 1, …,k, where )()( pm i are

relatively prime, then )(mod)()()()(0

)()()( pMpMpNpcpck

i

iii

= ∑

=

, where

∏ ==

k

ii pmpM

0)( )()( , )()()( )()( pmpMpM ii = , and )()( pN i is the solution of

1))(),(()()()()( )()()()()()( ==+ pmpMGCDpmpnpMpN iiiiii Provided that the degree of )( pc is less than the degree of )( pM

60210 == mmmM ii mMM =

15)5(12)2(,12,514)4(15)1(,15,41)3(720)1(,20,3

22

11

00

=+−===+−===+−==

MmMmMm

iN inMc <≤0 [ ]cRc imi =

21Chap. 8

• Example-5 (cont’d)– The integer c can be calculated as

– For c=17,

• CRT for polynomials: The remainder of a polynomial with regard to modulus , where , can be evaluated by substituting by in the polynomial

• Example-6 (Example 8.3.2, pp239)

60mod)241520(mod 2100

cccMMNcck

iiii ∗−∗−∗−=

= ∑

=

2)17(,1)17(,2)17( 524130 ====== RcRcRc( ) 1760mod10360mod)224115220( =−=∗−∗−∗−=c

)(pfpi + 1))(deg( −≤ ipf ip)( pf−

[ ][ ]

[ ] 5253)2(5535).(5353)2(5535).(

195)2(3)2(5535).(

22

22

222

2

2

−−=++−−=++−=++−=++

=+−+−=++

++

+

+

xxxxxRcxxxxRb

xxRa

xx

x

x

22Chap. 8

• Winograd Algorithm– 1. Choose a polynomial with degree higher than the degree of

and factor it into k+1 relatively prime polynomials with realcoefficients, i.e.,

– 2. Let . Use the Euclidean GCD algorithm tosolve for .

– 3. Compute:

– 4. Compute:

– 5. Compute by using:

)()()()( )()1()0( pmpmpmpm k⋅⋅⋅=)()( pxph

)()()( )()( pmpmpM ii =1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i

kiforpmpxpxpmphph iiii

,,1,0)(mod)()(),(mod)()( )()()()(

⋅⋅⋅===

kiforpmpxphps iiii ,,1,0),(mod)()()( )()()()( ⋅⋅⋅==

∑=

=k

i

iiii pmpMpNpsps0

)()()()( )(mod)()()()(

)( pm

)( ps

23Chap. 8

• Example-7 (Example 8.3.3, p.240) Consider a 2X3 linear convolution as inExample 8.2.2. Construct an efficient realization using Winograd algorithmwith

– Let:– Construct the following table using the relationships

and

– Compute residues from :

)1)(1()( 2 +−= ppppm1)(,1)(,)( 2)2()1()0( +=−== ppmppmppm

1)()()()( )()()()( =+ pmpnpMpN iiii)()()( )()( pmpmpM ii =

2,1,0=ifor

i )()( pm i )()( pM i )()( pn i )()( pN i 0 p 123 −+− ppp 12 +− pp 1− 1 1−p pp +3 ( )22

21 ++− pp 2

1

2 12 +p pp −2 ( )221 −− p ( )12

1 −p 2

21010 )(,)( pxpxxpxphhph ++=+=

pxxxpxphhphxxxpxhhph

xpxhph

120)2(10)2(

210)1(

10)1(

0)0(

0)0(

)()(,)()(,)(

)(,)(

+−=+=++=+=

==

24Chap. 8

• Example-7 (cont’d)

– Notice, we need 1 multiplication for , 1 for , and 4 for– However it can be further reduced to 3 multiplications as shown below:

– Then:

psspxxhxhxhxxh

ppxxxphhps

sxxxhhpssxhps

)2(1

)2(02011011200

12010)2(

)1(021010

)1()0(000

)0(

))(()(

)1mod()))((()(

))(()(,)(

2

+=−++−−=

++−+=

=+++===

)()0( ps )()1( ps )()2( ps

−

−+⋅

+−⋅

−

−=

1

20

210

10

10

0

)2(1

)2(0

0000

00

011

101

xxx

xxx

hhhh

h

s

s

[ ])mod(

)2()()1)((

)(mod)()()()(

234

232

)(32

)(23)0(

2

0

)()()()(

)2()1(

pppp

ppppppppps

pmpMpNpsps

pSpS

i

iiii

−+−

+−+++−+−−=

= ∑=

25Chap. 8

• Example-7 (cont’d)– Substitute into to obtain the following

table

– Therefore, we have

)(),(),( )2()1()0( pspsps )( ps0p 1p 2p 3p

)0(0s )0(

0s− )0(0s )0(

0s− 0 )1(

021 s 0 )1(

021 s

0 )2(02

1 s )2(0s− )2(

021 s

0 )2(12

1 s 0 )2(12

1 s−

⋅

−−−

−=

)2(12

1

)2(02

1

)1(02

1

)0(0

3

2

1

0

1111

02011111

0001

ss

ss

sss

s

26Chap. 8

• Example-7 (cont’d)– Notice that

– So, finally we have:

−

−+++

⋅

⋅

−=

+

−

+

1

20

210

210

0

2

2

2

2

0

)2(12

1

)2(02

1

)1(02

1

)0(0

10

01

0

10

0000

000000000000

0000

01100

101000001000001

xxx

xxxxxx

xh

ss

ss

hh

hh

h

hh

⋅

−−⋅

⋅

−−−

−−−

=

+

−

+

2

1

0

2

2

2

2

0

3

2

1

0

010

101111

111001

0000

000000000000

0000

11001

2020111211

00001

10

01

0

10

xxx

h

sss

s

hh

hh

h

hh

27Chap. 8

• Example-7 (cont’d)– In this example, the Winograd convolution algorithm requires 5

multiplications and 11 additions compared with 6 multiplications and 2additions for direct implementation

• Notes:– The number of multiplications in Winograd algorithm is highly dependent

on the degree of each . Therefore, the degree of m(p) should be assmall as possible.

– More efficient form (or a modified version) of the Winograd algorithmcan be obtained by letting deg[m(p)]=deg[s(p)] and applying the CRT to

)()( pm i

)()()(' 11 pmxhpsps LN −−−=

28Chap. 8

Modified Winograd Algorithm

– 1. Choose a polynomial with degree equal to the degree of and factor it into k+1 relatively prime polynomials with real coefficients,i.e.,

– 2. Let , use the Euclidean GCD algorithm tosolve for .

– 3. Compute:

– 4. Compute:– 5. Compute by using:

– 6. Compute

)( pm )( ps

)()()()( )()1()0( pmpmpmpm k⋅⋅⋅=)()()( )()( pmpmpM ii =

1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i

kiforpmpxpxpmphph iiii

,,1,0)(mod)()(),(mod)()( )()()()(

⋅⋅⋅===

kiforpmpxphps iiii ,,1,0),(mod)()()(' )()()()( ⋅⋅⋅==)(' ps

∑=

=k

i

iiii pmpMpNpsps0

)()()()( )(mod)()()(')('

)()(')( 11 pmxhpsps LN −−+=

29Chap. 8

• Example-8 (Example 8.3.4, p.243 ): Construct a 2X3 convolution algorithmusing modified Winograd algorithm with m(p)=p(p-1)(p+1)

– Let– Construct the following table using the relationships

and

– Compute residues from :

1)(,1)(,)( )2()1()0( +=−== ppmppmppm)()()( )()( pmpmpM ii =

1)()()()( )()()()( =+ pmpnpMpN iiii

i )()( pm i )()( pM i )()( pn i )()( pN i 0 p 12 −p p 1−

1 1−p pp +2 ( )221 +− p 2

1 2 1+p pp −2 ( )22

1 −− p 21

221010 )(,)( pxpxxpxphhph ++=+=

210)2(10)2(

210)1(

10)1(

0)0(

0)0(

)(,)()(,)(

)(,)(

xxxpxhhphxxxpxhhph

xpxhph

+−=−=++=+=

==

))(()('

,))(()(',)('

21010)2(

21010)1(

00)0(

xxxhhps

xxxhhpsxhps

+−−=

+++==

30Chap. 8

• Example-8 (cont’d)– Since the degree of is equal to 1, is a polynomial of

degree 0 (a constant). Therefore, we have:

– The algorithm can be written in matrix form as:

)()( pm i )(' )( ps i

[ ])()'()('

)()()()1('

)()(')(

213

2'

2')0(2

212)2('

2)1(')0(

321

22'2

2'2)0(

21

)2()1(

)2()1(

xhpspxhps

ppxhppppps

pmxhpsps

ssss

SS

+++−+−−+=

−+−++++−−=

+=

⋅

−−−

=

21

2'2'

)0(

3

2

1

0

)2(

)1(

'

100001111110

0001

xh

s

ssss

s

s

31Chap. 8

• Example-8 (cont’d)– (matrix form)

– Conclusion: this algorithm requires 4 multiplications and 7 additions

⋅

−⋅

⋅

−−−

=

−

+

2

1

0

1

2

2

0

3

2

1

0

100111111001

000000000000

100001111110

0001

10

10

xxx

h

h

ssss

hh

hh

32Chap. 8

Iterated Convolution

• Iterated convolution algorithm: makes use of efficient short-lengthconvolution algorithms iteratively to build long convolutions

• Does not achieve minimal multiplication complexity, but achieves agood balance between multiplications and addition complexity

• Iterated Convolution Algorithm (Description)– 1. Decompose the long convolution into several levels of short

convolutions– 2. Construct fast convolution algorithms for short convolutions– 3. Use the short convolution algorithms to iteratively (hierarchically)

implement the long convolution– Note: the order of short convolutions in the decomposition affects the

complexity of the derived long convolution

33Chap. 8

• Example-9 (Example 8.4.1, pp.245): Construct a 4X4 linear convolutionalgorithm using 2X2 short convolution

– Let and

– First, we need to decompose the 4X4 convolution into a 2X2 convolution– Define

– Then, we have:

33

2210

33

2210 )(,)( pxpxpxxpxphphphhph +++=+++=

)()()( pxphps =

pxxpxpxxpx

phhphphhph

321100

321100

)(',)('

)(',)('

+=+=

+=+=

qpxpxqpxpxeippxpxpx

qphphqphpheipphphph

)(')('),()(.,.,)(')(')(

)(')('),()(.,.,)(')(')(

102

10

102

10

+==+=

+==+=

[ ] [ ][ ]

),()(')(')('

)(')(')(')(')(')(')(')('

)(')(')(')('),(),()()()(

2210

211011000

1010

qpsqpsqpsps

qpxphqpxphpxphpxph

qpxpxqphphqpxqphpxphps

=++=

+++=

+⋅+===

34Chap. 8

• Example-9 (cont’d)– Therefore, the 4X4 convolution is decomposed into two levels of nested

2X2 convolutions– Let us start from the first convolution , we have:

– We have the following expression for the third convolution:

– For the second convolution, we get the following expression:

)(')(')(' 000 pxphps ⋅=

[ ]110010102

1100

10100000

)()(

)()('')(')('

xhxhxxhhppxhxh

pxxphhxhpxph

−−+⋅+++=

+⋅+=⋅≡⋅

[ ]332232322

3322

323211112

)()(

)()('')(')(')('

xhxhxxhhppxhxh

pxxphhxhpxphps

−−+⋅+++=

+⋅+=⋅≡⋅=

[ ]11001010

011001101

'''')''()''(

'''')(')(')(')(')('

xhxhxxhh

xhxhpxphpxphps

⋅−⋅−+⋅+=

⋅+⋅≡⋅+⋅=

: addition: multiplication

35Chap. 8

• Example-9 (Cont’d)• For , we have the following expression:

– If we rewrite the three convolutions as the following expressions, then we can getthe following table (see the next page):

[ ])''()''( 1010 xxhh +⋅+

[ ] [ ]

)]()()()()()[(

)()()()(

)()()()()''()''(

31312020

32103210

31312

2020

312031201010

xxhhxxhhxxxxhhhhp

xxhhpxxhh

xxpxxhhphhxxhh

+⋅+−+⋅+−+++⋅++++

+⋅+++⋅+=

+++⋅+++=+⋅+

( ) ( ) 32

211010

32

2111

32

2100

''''

''

''

cppccxxhh

bppbbxh

appaaxh

++≡+⋅+

++≡

++≡

This requires 9 multiplications and 11 additions

36Chap. 8


– Therefore, the total number of operations used in this 4X4 iteratedconvolution algorithm is 9 multiplications and 19 additions

0p 1p 2p 3p 4p 5p 6p

1a 2a 3a 1b 2b 3b 1c 2c 3c

1b− 2b− 3b−

1a− 2a− 3a−

Total 8 additions here

37Chap. 8

Cyclic Convolution• Cyclic convolution: also known as circular convolution• Let the filter coefficients be , and the data

sequence be .– The cyclic convolution can be expressed as

– The output samples are given by

• where denotes

• The cyclic convolution can be computed as a linear convolutionreduced by modulo . (Notice that there are 2n-1 different outputsamples for this linear convolution). Alternatively, the cyclicconvolution can be computed using CRT with , whichis much simpler.

{ }110 ,,, −⋅⋅⋅= nhhhh{ }110 ,,, −⋅⋅⋅= nxxxx

[ ] )1mod()()()( −⋅=Ο= nn ppxphxhps

( )( )∑−

=− −⋅⋅⋅==

1

0

1,,1,0,n

kkkii nixhs

( ) nki mod−( )( )ki −

1−np

1)( −= nppm

38Chap. 8

• Example-10 (Example 8.5.1, p.246) Construct a 4X4 cyclic convolutionalgorithm using CRT with– Let– Let– Get the following table using the relationships

and

– Compute the residues

)1)(1)(1(1)( 24 ++−=−= pppppm3

32

2103

32

210 )(,)( pxpxpxxpxphphphhph +++=+++=1)(,1)(,1)( 2)2()1()0( +=+=−= ppmppmppm

)()()( )()( pmpmpM ii =1)()()()( )()()()( =+ pmpnpMpN iiii

i )()( pm i )()( pM i )()( pn i )()( pN i 0 1−p 123 −++ ppp )32( 2

41 ++− pp 4

1

1 1+p 123 −+− ppp ( )32241 +− pp 4

1−

2 12 +p 12 −p 21 2

1−

( ) ( ) phhphhhhphhhhhhphhhhhhph

)2(1

)2(03120

)2(

)1(03210

)1(

)0(03210

)0(

)(,)(,)(

+=−+−==−+−==+++=

39Chap. 8


– Since

– or in matrix-form

– Computations so far require 5 multiplications

( ) ( ) pxxpxxxxpxxxxxxpxxxxxxpx

)2(1

)2(03120

)2(

)1(03210

)1(

)0(03210

)0(

)(,)(,)(

+=−+−==−+−==+++=

[ ]( ) ( ))2(

0)2(

1)2(

1)2(

0)2(

1)2(

1)2(

0)2(

0

2)2()2()2(1

)2(0

)2(

)1(0

)1(0

)1(0

)1()1()1(

)0(0

)0(0

)0(0

)0()0()0(

)1mod()()()(,)()()(,)()()(

xhxhpxhxh

ppxphpsspssxhpxphpssxhpxphps

++⋅−⋅=

+⋅=+==⋅=⋅==⋅=⋅=

( ) ( )( ) ( ) ,

,)2(

0)2(

0)2(

1)2(

1)2(

0)2(

0)2(

0)2(

1)2(

1)2(

0)2(

1

)2(1

)2(1

)2(0

)2(1

)2(0

)2(0

)2(1

)2(1

)2(0

)2(0

)2(0

xhhxxhxhxhsxhhxxhxhxhs

−++=+=+−+=−=

+⋅

+−⋅

−=

)2(1

)2(0

)2(1

)2(0

)2(1

)2(0

)2(0

)2(1

)2(0

)2(1

)2(0

000000

011101

xx

xx

hhhh

h

ss

: multiplication

40Chap. 8

• Example-10 (cont’d)– Then

– So, we have

[ ] ( )( ) ( ) ( )

( ))2(12

1)1(04

1)0(04

13

2442

244244

21)2(

121)2(

041)1(

041)0(

0

2

0

)()()()(

)2(0

)1(0

)0(0

)2(1

)1(0

)0(0

)2(0

)1(0

)0(0

222323

)()()(

)(mod)()()()(

sssp

pp

pssss

pmpMpNpsps

sssssssss

pppppppp

i

iiii

−−+

−+++−+++=

⋅+++=

=

−−

−−

−−+−+++

=∑

⋅

−−−

−=

)2(12

1

)2(02

1

)1(04

1

)0(04

1

3

2

1

0

1011011110110111

ssss

ssss

41Chap. 8

• Example-10 (cont’d)– Notice that:

( )( )

+⋅

−−

⋅

−=

)2(1

)2(0

)2(1

)2(0

)1(0

)0(0

)2(1

)2(02

1

)2(0

)2(12

1

)2(02

1

)1(04

1

)0(04

1

)2(12

1

)2(02

1

)1(04

1

)0(04

1

00000000000000000000

0110010100

0001000001

xx

xxxx

hhhh

hh

h

ssss

42Chap. 8

• Example-10 (cont’d)– Therefore, we have

⋅

−−

−−−−

⋅

⋅

−−−−

−−

=

−−+

−++−

−

−+−

+++

3

2

1

0

2

2

2

4

4

3

2

1

0

1010010111111111

1111

00000000000000000000

01111101110111110111

3210

3210

20

3210

3210

xxxx

ssss

hhhh

hhhh

hh

hhhh

hhhh

43Chap. 8

• Example-10 (cont’d)– This algorithm requires 5 multiplications and 15 additions– The direct implementation requires 16 multiplications and 12 additions

(see the following matrix-form. Notice that the cyclic convolution matrixis a circulant matrix)

• An efficient cyclic convolution algorithm can often be easily extendedto construct efficient linear convolution

• Example-11 (Example 8.5.2, p.249) Construct a 3X3 linear convolutionusing 4X4 cyclic convolution algorithm

⋅

=

3

2

1

0

0123

3012

2301

1230

3

2

1

0

xxxx

hhhhhhhhhhhhhhhh

ssss

44Chap. 8

• Example-11 (cont’d)– Let the 3-point coefficient sequence be , and the 3-point

data sequence be– First extend them to 4-point sequences as:

– Then the 3X3 linear convolution of h and x is

– The 4X4 cyclic convolution of h and x, i.e. , is:

{ }210 ,, hhhh ={ }210 ,, xxxx =

{ } { }0,,,,0,,, 210210 xxxxhhhh ==

+++

+=⋅

22

2112

201102

1001

00

xhxhxh

xhxhxhxhxh

xh

xh

xh 4Ο

+++

++

=Ο

2112

201102

1001

2200

4

xhxhxhxhxh

xhxhxhxh

xh

45Chap. 8

• Example-11 (cont’d)– Therefore, we have– Using the result of Example-10 for , the following convolution

algorithm for 3X3 linear convolution is obtained:

)1()()()( 422 −+Ο=⋅= pxhxhpxphps n

xh 4Ο

⋅

−−−−

−−−

=

100000

001111010111

001111110111

4

3

2

1

0

s

ss

ss

(continued on the next page)

46Chap. 8


⋅

−−

−

⋅

⋅

−+

++−

−

+−

++

2

1

0

2

2

2

2

4

4

100010101111

111111

000000000000000000000000000000

210

210

20

210

210

xxx

h

hhh

hhh

hh

hhh

hhh

47Chap. 8

• Example-11 (cont’d)– So, this algorithm requires 6 multiplications and 16 additions

• Comments:– In general, an efficient linear convolution can be used to obtain an

efficient cyclic convolution algorithm. Conversely, an efficientcyclic convolution algorithm can be used to derive an efficientlinear convolution algorithm

48Chap. 8

Design of fast convolution algorithm byinspection• When the Cook-Toom or the Winograd algorithms can not generate an

efficient algorithm, sometimes a clever factorization by inspection maygenerate a better algorithm

• Example-12 (Example 8.6.1, p.250) Construct a 3X3 fast convolutionalgorithm by inspection– The 3X3 linear convolution can be written as follows, which requires 9

multiplications and 4 additions

+++

+=

22

2112

201102

1001

00

4

3

2

1

0

xhxhxh

xhxhxhxhxh

xh

sssss

49Chap. 8

• Example-12 (cont’d)– Using the following identities:

– The 3X3 linear convolution can be written as:

( ) ( )( )( )

( )( ) 2211212121123

22110020202011022

1100101010011

xhxhxxhhxhxhs

xhxhxhxxhhxhxhxhs

xhxhxxhhxhxhs

−−++=+=−+−++=++=

−−+⋅+=⋅+=

⋅

−−−−

−−=

000100

100110010111

001011000001

4

3

2

1

0

s

ss

ss

(continued on the next page)

50Chap. 8


– Conclusion: This algorithm, which can not be obtained by usingthe Cook-Toom or the Winograd algorithms, requires 6multiplications and 10 additions

⋅

⋅

++

+⋅

2

1

0

21

20

10

2

1

0

110101011100010001

000000000000000000000000000000

xxx

hhhh

hhh

hh