Chapter 8: Fast Convolution
Keshab K. Parhi
2Chap. 8
Chapter 8 Fast Convolution
• Introduction• Cook-Toom Algorithm and Modified Cook-Toom
Algorithm• Winograd Algorithm and Modified Winograd Algorithm• Iterated Convolution• Cyclic Convolution• Design of Fast Convolution Algorithm by Inspection
3Chap. 8
Introduction• Fast Convolution: implementation of convolution algorithm using fewer
multiplication operations by algorithmic strength reduction
• Algorithmic Strength Reduction: Number of strong operations (such asmultiplication operations) is reduced at the expense of an increase in thenumber of weak operations (such as addition operations). These are best suitedfor implementation using either programmable or dedicated hardware
• Example: Reducing the multiplication complexity in complex numbermultiplication:
– Assume (a+jb)(c+dj)=e+jf, it can be expressed using the matrix form, whichrequires 4 multiplications and 2 additions:
– However, the number of multiplications can be reduced to 3 at the expense of 3extra additions by using:
⋅
−=
ba
cddc
fe
−++=+−+−=−
)()()()(
baddcbbcadbaddcabdac
4Chap. 8
– Rewrite it into matrix form, its coefficient matrix can be decomposed asthe product of a 2X3(C), a 3X3(H)and a 3X2(D) matrix:
• Where C is a post-addition matrix (requires 2 additions), D is a pre-additionmatrix (requires 1 addition), and H is a diagonal matrix (requires 2 additions toget its diagonal elements)
– So, the arithmetic complexity is reduced to 3 multiplications and 3additions (not including the additions in H matrix)
• In this chapter we will discuss two well-known approaches to the design offast short-length convolution algorithms: the Cook-Toom algorithm (basedon Lagrange Interpolation) and the Winograd Algorithm (based on theChinese remainder theorem)
xDHCba
d
dcdc
fe
s ⋅⋅⋅=
⋅
−
⋅
+
−⋅
=
=
11
1001
00
0000
110101
5Chap. 8
Cook-Toom Algorithm
• A linear convolution algorithm for polynomial multiplication based onthe Lagrange Interpolation Theorem
• Lagrange Interpolation Theorem:
Let nββ ,....,0 be a set of 1+n distinct points, and let )( if β , for i
= 0, 1, …, n be given. There is exactly one polynomial )( pf of degree n or less
that has value )( if β when evaluated at iβ for i = 0, 1, …, n. It is given by:
∏
∏∑
≠
≠
= −
−=
ijji
ijjn
ii
pfpf
)(
)()()(
0 ββ
ββ
6Chap. 8
• The application of Lagrange interpolation theorem into linearconvolution
Consider an N-point sequence { }110 ,...,, −= Nhhhh
and an L-point sequence { }110 ,...,, −= Lxxxx . The linear
convolution of h and x can be expressed in terms of polynomial
multiplication as follows: )()()( pxphps ⋅= where
011
1 ...)( hphphph NN +++= −
−
011
1 ...)( xpxpxpx LL +++= −
−
012
2 ...)( spspsps NLNL +++= −+
−+
The output polynomial )( ps has degree 2−+ NL and has1−+ NL
different points.
7Chap. 8
• (continued)
)( ps can be uniquely determined by its values at 1−+ NL
different points. Let { }210 ,...,, −+ NLβββ be 1−+ NL
different real numbers. If )( is β for { }2,...,1,0 −+= NLi are
known, then )( ps can be computed using the Lagrange interpolationtheorem as:
∏∏
∑≠
≠−+
= −
−=
ijji
ijjNL
ii
psps
)(
)()()(
2
0 ββ
ββ
It can be proved that this equation is the unique solution to compute linear
convolution for )( ps given the values of )( is β , for
{ }2,...,1,0 −+= NLi .
8Chap. 8
• Cook-Toom Algorithm (Algorithm Description)
• Algorithm Complexity– The goal of the fast-convolution algorithm is to reduce the multiplication
complexity. So, if βi `s (i=0,1,…,L+N-2) are chosen properly, thecomputation in step-2 involves some additions and multiplications bysmall constants
– The multiplications are only used in step-3 to compute s(βi). So, onlyL+N-1 multiplications are needed
1. Choose 1−+ NL different real numbers 210 ,, −+⋅⋅⋅ NLβββ
2. Compute )( ih β and )( ix β , for { }2,,1,0 −+⋅⋅⋅= NLi
3. Compute )()()( iii xhs βββ ⋅= , for { }2,,1,0 −+⋅⋅⋅= NLi
4. Compute )( ps by using ∏∏
∑≠
≠−+
= −
−=
ijji
ijjNL
ii
psps
)(
)()()(
2
0 ββ
ββ
9Chap. 8
– By Cook-Toom algorithm, the number of multiplications is reduced fromO(LN) to L+N-1 at the expense of an increase in the number of additions
– An adder has much less area and computation time than a multiplier. So,the Cook-Toom algorithm can lead to large savings in hardware (VLSI)complexity and generate computationally efficient implementation
• Example-1: (Example 8.2.1, p.230) Construct a 2X2 convolution algorithm usingCook-Toom algorithm with β={0,1,-1}– Write 2X2 convolution in polynomial multiplication form as
s(p)=h(p)x(p), where
– Direct implementation, which requires 4 multiplications and 1 additions,can be expressed in matrix form as follows:
2210
1010
)(
)()(
pspssps
pxxpxphhph
++=
+=+=
⋅
=
1
0
1
01
0
2
1
0
0
0
xx
hhh
h
sss
10Chap. 8
• Example-1 (continued)– Next we use C-T algorithm to get an efficient convolution implementation
with reduced multiplication number
– Then, s(β0), s(β1), and s(β2) are calculated, by using 3 multiplications, as
– From the Lagrange Interpolation theorem, we get:
1021022
1011011
00000
)(,)(,2)(,)(,1
)(,)(,0
xxxhhhxxxhhh
xxhh
−=−==+=+==
===
ββββββ
βββ
)()()()()()()()()( 222111000 βββββββββ xhsxhsxhs ===
22
10
210
2210
1202
10
2101
101
2010
210
)2
)()()(()2
)()(()(
))(())((
)2(
))(())(()(
))(())(()()(
sppss
ssspssps
pps
ppsppsps
++=
++−+−+=
−−−−
+
−−−−+
−−−−=
ββββββ
ββββββ
β
βββββββ
βββββββ
11Chap. 8
• Example-1 (continued)– The preceding computation leads to the following matrix form
– The computation is carried out as follows (5 additions, 3 multiplications)
⋅
−⋅
−+⋅
−−=
⋅
⋅
−−=
⋅
−−=
1
0
10
10
0
2
1
0
2
1
0
2
1
0
2
1
0
111101
2)(0002)(000
111110
001
)()()(
2)(0002)(000)(
111110
001
2)(2)(
)(
111110
001
xx
hhhh
h
xxx
hh
h
ss
s
ss
s
βββ
ββ
β
ββ
β
210221100
222111000
10210100
102
10100
,,.4,,.3,,.2
2,
2,.1
SSSsSSsSsXHSXHSXHS
xxXxxXxX
hhHhhHhH
++−=−=====
−=+==
−=+== (pre-computed)
12Chap. 8
– (Continued): Therefore, this algorithm needs 3 multiplications and 5additions (ignoring the additions in the pre-computation ), i.e., the numberof multiplications is reduced by 1 at the expense of 4 extra additions
– Example-2, please see Example 8.2.2 of Textbook (p.231)
• Comments– Some additions in the preaddition or postaddition matrices can be
shared. So, when we count the number of additions, we only count oneinstead of two or three.
– If we take h0, h1 as the FIR filter coefficients and take x0, x1 as the signal(data) sequence, then the terms H0, H1 need not be recomputed eachtime the filter is used. They can be precomputed once offline and stored.So, we ignore these computations when counting the number ofoperations
– From Example-1, We can understand the Cook-Toom algorithm as amatrix decomposition. In general, a convolution can be expressed inmatrix-vector forms as
⋅
=
1
0
1
01
0
2
1
0
0
0
xx
hhh
h
sss
xTs ⋅=or
13Chap. 8
– Generally, the equation can be expressed as
• Where C is the postaddition matrix, D is the preaddition matrix, and H is adiagonal matrix with Hi, i = 0, 1, …, L+N-2 on the main diagonal.
– Since T=CHD, it implies that the Cook-Toom algorithm provides a wayto factorize the convolution matrix T into multiplication of 1 postadditionmatrix C, 1 diagonal matrix H and 1 preaddition matrix D, such that thetotal number of multiplications is determined only by the non-zeroelements on the main diagonal of the diagonal matrix H
– Although the number of multiplications is reduced, the number ofadditions has increased. The Cook-Toom algorithm can be modified inorder to further reduce the number of additions
xDHCxTs ⋅⋅⋅=⋅=
14Chap. 8
Modified Cook-Toom Algorithm
• The Cook-Toom algorithm is used to further reduce the number ofaddition operations in linear convolutions
• Now consider the modified Cook-Toom Algorithm
Define 2
2)()(' −+−+−= NL
NL pSpsps . Notice that the
degree of )(ps is 2−+ NL and 2−+NLS is its highest order
coefficient. Therefore the degree of )(' ps is 3−+ NL .
15Chap. 8
• Modified Cook-Toom Algorithm
1. Choose 2−+ NL different real numbers 310 ,, −+⋅⋅⋅ NLβββ
2. Compute )( ih β and )( ix β , for { }3,,1,0 −+⋅⋅⋅= NLi
3. Compute )()()( iii xhs βββ ⋅= , for { }3,,1,0 −+⋅⋅⋅= NLi
4. Compute 2
2)()(' −+−+−= NL
iNLii sss βββ , for { }3,,1,0 −+⋅⋅⋅= NLi
5. Compute )(' ps by using ∏∏
∑≠
≠−+
= −
−=
ijji
ijjNL
ii
psps
)(
)()(')('
2
0 ββ
ββ
6. Compute 2
2)(')( −+−++= NL
NL pspsps
16Chap. 8
• Example-3 (Example 8.2.3, p.234) Derive a 2X2 convolution algorithm using themodified Cook-Toom algorithm with β={0,-1}
– and
• Which requires 2 multiplications (not counting the h1x1
multiplication)– Apply the Lagrange interpolation algorithm, we get:
Consider the Lagrange interpolation for 2
11)()(' pxhpsps −= at
{ }1,0 10 −== ββ .
First, find 2
11)()()(' iiii xhxhs ββββ −=
1011011
00000
)(,)(,1)(,)(,0
xxxhhhxxhh−=−=−=
===βββ
βββ
1110102
111111
002
011000
))(()()()('
)()()('
xhxxhhxhxhs
xhxhxhs
−−−=−=
=−=
ββββ
ββββ
))(')('()('
)()(
)(')()(
)(')('
100
01
01
10
10
ββββββ
ββββ
β
ssps
ps
psps
−+=−−
+−−
=
17Chap. 8
• Example-3 (cont’d)– Therefore,– Finally, we have the matrix-form expression:
– Notice that
– Therefore:
2210
211)(')( pspsspxhpsps ++=+=
⋅
−=
11
1
0
2
1
0
)('
)('
100011
001
xhs
s
ss
s
ββ
⋅
−=
11
1
0
11
1
0
)(
)(
100110
001
)('
)('
xhs
s
xhs
s
ββ
ββ
⋅
−⋅
−⋅
−=
⋅
−⋅
−=
1
0
1
10
0
11
1
0
2
1
0
1011
01
000000
100111001
)()(
100110
001
100011001
xx
hhh
h
xhss
sss
ββ
18Chap. 8
• Example-3 (cont’d)– The computation is carried out as the follows:
– The total number of operations are 3 multiplications and 3 additions.Compared with the convolution algorithm in Example-1, the number ofaddition operations has been reduced by 2 while the number ofmultiplications remains the same.
• Example-4 (Example 8.2.4, p. 236 of Textbook)• Conclusion: The Cook-Toom Algorithm is efficient as measured by the
number of multiplications. However, as the size of the problem increases, it isnot efficient because the number of additions increases greatly if β takesvalues other than {0, ±1, ±2, ±4}. This may result in complicated pre-additionand post-addition matrices. For large-size problems, the Winograd algorithm ismore efficient.
22210100
222111000
1210100
1210100
,,.4,,.3,,.2,,.1
SsSSSsSsXHSXHSXHS
xXxxXxXhHhhHhH
=+−=====
=−===−== (pre-computed)
19Chap. 8
Winograd Algorithm• The Winograd short convolution algorithm: based on the CRT (Chinese
Remainder Theorem) ---It’s possible to uniquely determine a nonnegative integer givenonly its remainder with respect to the given moduli, provided that the moduli are relativelyprime and the integer is known to be smaller than the product of the moduli
• Theorem: CRT for Integers
Given [ ]cRcimi = (represents the remainder when c is divided by im ), for
ki ,...,1,0= , where im are moduli and are relatively prime, then
MMNcck
iiii mod
0
= ∑
=, where ∏ =
=k
i imM0 , ii mMM = ,
and iN is the solution of 1),( ==+ iiiiii mMGCDmnMN , provided
that Mc <≤0
20Chap. 8
• Theorem: CRT for Polynomials
• Example-5 (Example 8.3.1, p.239): using the CRT for integer, Choosemoduli m0=3, m1=4, m2=5. Then , and .Then:
– where and are obtained using the Euclidean GCD algorithm. Giventhat the integer c satisfying , let .
Given [ ])()()( )()( pcpRpc im
i = , for i=0, 1, …,k, where )()( pm i are
relatively prime, then )(mod)()()()(0
)()()( pMpMpNpcpck
i
iii
= ∑
=
, where
∏ ==
k
ii pmpM
0)( )()( , )()()( )()( pmpMpM ii = , and )()( pN i is the solution of
1))(),(()()()()( )()()()()()( ==+ pmpMGCDpmpnpMpN iiiiii Provided that the degree of )( pc is less than the degree of )( pM
60210 == mmmM ii mMM =
15)5(12)2(,12,514)4(15)1(,15,41)3(720)1(,20,3
22
11
00
=+−===+−===+−==
MmMmMm
iN inMc <≤0 [ ]cRc imi =
21Chap. 8
• Example-5 (cont’d)– The integer c can be calculated as
– For c=17,
• CRT for polynomials: The remainder of a polynomial with regard to modulus , where , can be evaluated by substituting by in the polynomial
• Example-6 (Example 8.3.2, pp239)
60mod)241520(mod 2100
cccMMNcck
iiii ∗−∗−∗−=
= ∑
=
2)17(,1)17(,2)17( 524130 ====== RcRcRc( ) 1760mod10360mod)224115220( =−=∗−∗−∗−=c
)(pfpi + 1))(deg( −≤ ipf ip)( pf−
[ ][ ]
[ ] 5253)2(5535).(5353)2(5535).(
195)2(3)2(5535).(
22
22
222
2
2
−−=++−−=++−=++−=++
=+−+−=++
++
+
+
xxxxxRcxxxxRb
xxRa
xx
x
x
22Chap. 8
• Winograd Algorithm– 1. Choose a polynomial with degree higher than the degree of
and factor it into k+1 relatively prime polynomials with realcoefficients, i.e.,
– 2. Let . Use the Euclidean GCD algorithm tosolve for .
– 3. Compute:
– 4. Compute:
– 5. Compute by using:
)()()()( )()1()0( pmpmpmpm k⋅⋅⋅=)()( pxph
)()()( )()( pmpmpM ii =1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i
kiforpmpxpxpmphph iiii
,,1,0)(mod)()(),(mod)()( )()()()(
⋅⋅⋅===
kiforpmpxphps iiii ,,1,0),(mod)()()( )()()()( ⋅⋅⋅==
∑=
=k
i
iiii pmpMpNpsps0
)()()()( )(mod)()()()(
)( pm
)( ps
23Chap. 8
• Example-7 (Example 8.3.3, p.240) Consider a 2X3 linear convolution as inExample 8.2.2. Construct an efficient realization using Winograd algorithmwith
– Let:– Construct the following table using the relationships
and
– Compute residues from :
)1)(1()( 2 +−= ppppm1)(,1)(,)( 2)2()1()0( +=−== ppmppmppm
1)()()()( )()()()( =+ pmpnpMpN iiii)()()( )()( pmpmpM ii =
2,1,0=ifor
i )()( pm i )()( pM i )()( pn i )()( pN i 0 p 123 −+− ppp 12 +− pp 1− 1 1−p pp +3 ( )22
21 ++− pp 2
1
2 12 +p pp −2 ( )221 −− p ( )12
1 −p 2
21010 )(,)( pxpxxpxphhph ++=+=
pxxxpxphhphxxxpxhhph
xpxhph
120)2(10)2(
210)1(
10)1(
0)0(
0)0(
)()(,)()(,)(
)(,)(
+−=+=++=+=
==
24Chap. 8
• Example-7 (cont’d)
– Notice, we need 1 multiplication for , 1 for , and 4 for– However it can be further reduced to 3 multiplications as shown below:
– Then:
psspxxhxhxhxxh
ppxxxphhps
sxxxhhpssxhps
)2(1
)2(02011011200
12010)2(
)1(021010
)1()0(000
)0(
))(()(
)1mod()))((()(
))(()(,)(
2
+=−++−−=
++−+=
=+++===
)()0( ps )()1( ps )()2( ps
−
−+⋅
+−⋅
−
−=
1
20
210
10
10
0
)2(1
)2(0
0000
00
011
101
xxx
xxx
hhhh
h
s
s
[ ])mod(
)2()()1)((
)(mod)()()()(
234
232
)(32
)(23)0(
2
0
)()()()(
)2()1(
pppp
ppppppppps
pmpMpNpsps
pSpS
i
iiii
−+−
+−+++−+−−=
= ∑=
25Chap. 8
• Example-7 (cont’d)– Substitute into to obtain the following
table
– Therefore, we have
)(),(),( )2()1()0( pspsps )( ps0p 1p 2p 3p
)0(0s )0(
0s− )0(0s )0(
0s− 0 )1(
021 s 0 )1(
021 s
0 )2(02
1 s )2(0s− )2(
021 s
0 )2(12
1 s 0 )2(12
1 s−
⋅
−−−
−=
)2(12
1
)2(02
1
)1(02
1
)0(0
3
2
1
0
1111
02011111
0001
ss
ss
sss
s
26Chap. 8
• Example-7 (cont’d)– Notice that
– So, finally we have:
−
−+++
⋅
⋅
−=
+
−
+
1
20
210
210
0
2
2
2
2
0
)2(12
1
)2(02
1
)1(02
1
)0(0
10
01
0
10
0000
000000000000
0000
01100
101000001000001
xxx
xxxxxx
xh
ss
ss
hh
hh
h
hh
⋅
−−⋅
⋅
−−−
−−−
=
+
−
+
2
1
0
2
2
2
2
0
3
2
1
0
010
101111
111001
0000
000000000000
0000
11001
2020111211
00001
10
01
0
10
xxx
h
sss
s
hh
hh
h
hh
27Chap. 8
• Example-7 (cont’d)– In this example, the Winograd convolution algorithm requires 5
multiplications and 11 additions compared with 6 multiplications and 2additions for direct implementation
• Notes:– The number of multiplications in Winograd algorithm is highly dependent
on the degree of each . Therefore, the degree of m(p) should be assmall as possible.
– More efficient form (or a modified version) of the Winograd algorithmcan be obtained by letting deg[m(p)]=deg[s(p)] and applying the CRT to
)()( pm i
)()()(' 11 pmxhpsps LN −−−=
28Chap. 8
Modified Winograd Algorithm
– 1. Choose a polynomial with degree equal to the degree of and factor it into k+1 relatively prime polynomials with real coefficients,i.e.,
– 2. Let , use the Euclidean GCD algorithm tosolve for .
– 3. Compute:
– 4. Compute:– 5. Compute by using:
– 6. Compute
)( pm )( ps
)()()()( )()1()0( pmpmpmpm k⋅⋅⋅=)()()( )()( pmpmpM ii =
1)()()()( )()()()( =+ pmpnpMpN iiii )()( pN i
kiforpmpxpxpmphph iiii
,,1,0)(mod)()(),(mod)()( )()()()(
⋅⋅⋅===
kiforpmpxphps iiii ,,1,0),(mod)()()(' )()()()( ⋅⋅⋅==)(' ps
∑=
=k
i
iiii pmpMpNpsps0
)()()()( )(mod)()()(')('
)()(')( 11 pmxhpsps LN −−+=
29Chap. 8
• Example-8 (Example 8.3.4, p.243 ): Construct a 2X3 convolution algorithmusing modified Winograd algorithm with m(p)=p(p-1)(p+1)
– Let– Construct the following table using the relationships
and
– Compute residues from :
1)(,1)(,)( )2()1()0( +=−== ppmppmppm)()()( )()( pmpmpM ii =
1)()()()( )()()()( =+ pmpnpMpN iiii
i )()( pm i )()( pM i )()( pn i )()( pN i 0 p 12 −p p 1−
1 1−p pp +2 ( )221 +− p 2
1 2 1+p pp −2 ( )22
1 −− p 21
221010 )(,)( pxpxxpxphhph ++=+=
210)2(10)2(
210)1(
10)1(
0)0(
0)0(
)(,)()(,)(
)(,)(
xxxpxhhphxxxpxhhph
xpxhph
+−=−=++=+=
==
))(()('
,))(()(',)('
21010)2(
21010)1(
00)0(
xxxhhps
xxxhhpsxhps
+−−=
+++==
30Chap. 8
• Example-8 (cont’d)– Since the degree of is equal to 1, is a polynomial of
degree 0 (a constant). Therefore, we have:
– The algorithm can be written in matrix form as:
)()( pm i )(' )( ps i
[ ])()'()('
)()()()1('
)()(')(
213
2'
2')0(2
212)2('
2)1(')0(
321
22'2
2'2)0(
21
)2()1(
)2()1(
xhpspxhps
ppxhppppps
pmxhpsps
ssss
SS
+++−+−−+=
−+−++++−−=
+=
⋅
−−−
=
21
2'2'
)0(
3
2
1
0
)2(
)1(
'
100001111110
0001
xh
s
ssss
s
s
31Chap. 8
• Example-8 (cont’d)– (matrix form)
– Conclusion: this algorithm requires 4 multiplications and 7 additions
⋅
−⋅
⋅
−−−
=
−
+
2
1
0
1
2
2
0
3
2
1
0
100111111001
000000000000
100001111110
0001
10
10
xxx
h
h
ssss
hh
hh
32Chap. 8
Iterated Convolution
• Iterated convolution algorithm: makes use of efficient short-lengthconvolution algorithms iteratively to build long convolutions
• Does not achieve minimal multiplication complexity, but achieves agood balance between multiplications and addition complexity
• Iterated Convolution Algorithm (Description)– 1. Decompose the long convolution into several levels of short
convolutions– 2. Construct fast convolution algorithms for short convolutions– 3. Use the short convolution algorithms to iteratively (hierarchically)
implement the long convolution– Note: the order of short convolutions in the decomposition affects the
complexity of the derived long convolution
33Chap. 8
• Example-9 (Example 8.4.1, pp.245): Construct a 4X4 linear convolutionalgorithm using 2X2 short convolution
– Let and
– First, we need to decompose the 4X4 convolution into a 2X2 convolution– Define
– Then, we have:
33
2210
33
2210 )(,)( pxpxpxxpxphphphhph +++=+++=
)()()( pxphps =
pxxpxpxxpx
phhphphhph
321100
321100
)(',)('
)(',)('
+=+=
+=+=
qpxpxqpxpxeippxpxpx
qphphqphpheipphphph
)(')('),()(.,.,)(')(')(
)(')('),()(.,.,)(')(')(
102
10
102
10
+==+=
+==+=
[ ] [ ][ ]
),()(')(')('
)(')(')(')(')(')(')(')('
)(')(')(')('),(),()()()(
2210
211011000
1010
qpsqpsqpsps
qpxphqpxphpxphpxph
qpxpxqphphqpxqphpxphps
=++=
+++=
+⋅+===
34Chap. 8
• Example-9 (cont’d)– Therefore, the 4X4 convolution is decomposed into two levels of nested
2X2 convolutions– Let us start from the first convolution , we have:
– We have the following expression for the third convolution:
– For the second convolution, we get the following expression:
)(')(')(' 000 pxphps ⋅=
[ ]110010102
1100
10100000
)()(
)()('')(')('
xhxhxxhhppxhxh
pxxphhxhpxph
−−+⋅+++=
+⋅+=⋅≡⋅
[ ]332232322
3322
323211112
)()(
)()('')(')(')('
xhxhxxhhppxhxh
pxxphhxhpxphps
−−+⋅+++=
+⋅+=⋅≡⋅=
[ ]11001010
011001101
'''')''()''(
'''')(')(')(')(')('
xhxhxxhh
xhxhpxphpxphps
⋅−⋅−+⋅+=
⋅+⋅≡⋅+⋅=
: addition: multiplication
35Chap. 8
• Example-9 (Cont’d)• For , we have the following expression:
– If we rewrite the three convolutions as the following expressions, then we can getthe following table (see the next page):
[ ])''()''( 1010 xxhh +⋅+
[ ] [ ]
)]()()()()()[(
)()()()(
)()()()()''()''(
31312020
32103210
31312
2020
312031201010
xxhhxxhhxxxxhhhhp
xxhhpxxhh
xxpxxhhphhxxhh
+⋅+−+⋅+−+++⋅++++
+⋅+++⋅+=
+++⋅+++=+⋅+
( ) ( ) 32
211010
32
2111
32
2100
''''
''
''
cppccxxhh
bppbbxh
appaaxh
++≡+⋅+
++≡
++≡
This requires 9 multiplications and 11 additions
36Chap. 8
• Example-9 (cont’d)
– Therefore, the total number of operations used in this 4X4 iteratedconvolution algorithm is 9 multiplications and 19 additions
0p 1p 2p 3p 4p 5p 6p
1a 2a 3a 1b 2b 3b 1c 2c 3c
1b− 2b− 3b−
1a− 2a− 3a−
Total 8 additions here
37Chap. 8
Cyclic Convolution• Cyclic convolution: also known as circular convolution• Let the filter coefficients be , and the data
sequence be .– The cyclic convolution can be expressed as
– The output samples are given by
• where denotes
• The cyclic convolution can be computed as a linear convolutionreduced by modulo . (Notice that there are 2n-1 different outputsamples for this linear convolution). Alternatively, the cyclicconvolution can be computed using CRT with , whichis much simpler.
{ }110 ,,, −⋅⋅⋅= nhhhh{ }110 ,,, −⋅⋅⋅= nxxxx
[ ] )1mod()()()( −⋅=Ο= nn ppxphxhps
( )( )∑−
=− −⋅⋅⋅==
1
0
1,,1,0,n
kkkii nixhs
( ) nki mod−( )( )ki −
1−np
1)( −= nppm
38Chap. 8
• Example-10 (Example 8.5.1, p.246) Construct a 4X4 cyclic convolutionalgorithm using CRT with– Let– Let– Get the following table using the relationships
and
– Compute the residues
)1)(1)(1(1)( 24 ++−=−= pppppm3
32
2103
32
210 )(,)( pxpxpxxpxphphphhph +++=+++=1)(,1)(,1)( 2)2()1()0( +=+=−= ppmppmppm
)()()( )()( pmpmpM ii =1)()()()( )()()()( =+ pmpnpMpN iiii
i )()( pm i )()( pM i )()( pn i )()( pN i 0 1−p 123 −++ ppp )32( 2
41 ++− pp 4
1
1 1+p 123 −+− ppp ( )32241 +− pp 4
1−
2 12 +p 12 −p 21 2
1−
( ) ( ) phhphhhhphhhhhhphhhhhhph
)2(1
)2(03120
)2(
)1(03210
)1(
)0(03210
)0(
)(,)(,)(
+=−+−==−+−==+++=
39Chap. 8
• Example-10 (cont’d)
– Since
– or in matrix-form
– Computations so far require 5 multiplications
( ) ( ) pxxpxxxxpxxxxxxpxxxxxxpx
)2(1
)2(03120
)2(
)1(03210
)1(
)0(03210
)0(
)(,)(,)(
+=−+−==−+−==+++=
[ ]( ) ( ))2(
0)2(
1)2(
1)2(
0)2(
1)2(
1)2(
0)2(
0
2)2()2()2(1
)2(0
)2(
)1(0
)1(0
)1(0
)1()1()1(
)0(0
)0(0
)0(0
)0()0()0(
)1mod()()()(,)()()(,)()()(
xhxhpxhxh
ppxphpsspssxhpxphpssxhpxphps
++⋅−⋅=
+⋅=+==⋅=⋅==⋅=⋅=
( ) ( )( ) ( ) ,
,)2(
0)2(
0)2(
1)2(
1)2(
0)2(
0)2(
0)2(
1)2(
1)2(
0)2(
1
)2(1
)2(1
)2(0
)2(1
)2(0
)2(0
)2(1
)2(1
)2(0
)2(0
)2(0
xhhxxhxhxhsxhhxxhxhxhs
−++=+=+−+=−=
+⋅
+−⋅
−=
)2(1
)2(0
)2(1
)2(0
)2(1
)2(0
)2(0
)2(1
)2(0
)2(1
)2(0
000000
011101
xx
xx
hhhh
h
ss
: multiplication
40Chap. 8
• Example-10 (cont’d)– Then
– So, we have
[ ] ( )( ) ( ) ( )
( ))2(12
1)1(04
1)0(04
13
2442
244244
21)2(
121)2(
041)1(
041)0(
0
2
0
)()()()(
)2(0
)1(0
)0(0
)2(1
)1(0
)0(0
)2(0
)1(0
)0(0
222323
)()()(
)(mod)()()()(
sssp
pp
pssss
pmpMpNpsps
sssssssss
pppppppp
i
iiii
−−+
−+++−+++=
⋅+++=
=
−−
−−
−−+−+++
=∑
⋅
−−−
−=
)2(12
1
)2(02
1
)1(04
1
)0(04
1
3
2
1
0
1011011110110111
ssss
ssss
41Chap. 8
• Example-10 (cont’d)– Notice that:
( )( )
+⋅
−−
⋅
−=
)2(1
)2(0
)2(1
)2(0
)1(0
)0(0
)2(1
)2(02
1
)2(0
)2(12
1
)2(02
1
)1(04
1
)0(04
1
)2(12
1
)2(02
1
)1(04
1
)0(04
1
00000000000000000000
0110010100
0001000001
xx
xxxx
hhhh
hh
h
ssss
42Chap. 8
• Example-10 (cont’d)– Therefore, we have
⋅
−−
−−−−
⋅
⋅
−−−−
−−
=
−−+
−++−
−
−+−
+++
3
2
1
0
2
2
2
4
4
3
2
1
0
1010010111111111
1111
00000000000000000000
01111101110111110111
3210
3210
20
3210
3210
xxxx
ssss
hhhh
hhhh
hh
hhhh
hhhh
43Chap. 8
• Example-10 (cont’d)– This algorithm requires 5 multiplications and 15 additions– The direct implementation requires 16 multiplications and 12 additions
(see the following matrix-form. Notice that the cyclic convolution matrixis a circulant matrix)
• An efficient cyclic convolution algorithm can often be easily extendedto construct efficient linear convolution
• Example-11 (Example 8.5.2, p.249) Construct a 3X3 linear convolutionusing 4X4 cyclic convolution algorithm
⋅
=
3
2
1
0
0123
3012
2301
1230
3
2
1
0
xxxx
hhhhhhhhhhhhhhhh
ssss
44Chap. 8
• Example-11 (cont’d)– Let the 3-point coefficient sequence be , and the 3-point
data sequence be– First extend them to 4-point sequences as:
– Then the 3X3 linear convolution of h and x is
– The 4X4 cyclic convolution of h and x, i.e. , is:
{ }210 ,, hhhh ={ }210 ,, xxxx =
{ } { }0,,,,0,,, 210210 xxxxhhhh ==
+++
+=⋅
22
2112
201102
1001
00
xhxhxh
xhxhxhxhxh
xh
xh
xh 4Ο
+++
++
=Ο
2112
201102
1001
2200
4
xhxhxhxhxh
xhxhxhxh
xh
45Chap. 8
• Example-11 (cont’d)– Therefore, we have– Using the result of Example-10 for , the following convolution
algorithm for 3X3 linear convolution is obtained:
)1()()()( 422 −+Ο=⋅= pxhxhpxphps n
xh 4Ο
⋅
−−−−
−−−
=
100000
001111010111
001111110111
4
3
2
1
0
s
ss
ss
(continued on the next page)
46Chap. 8
• Example-11 (cont’d)
⋅
−−
−
⋅
⋅
−+
++−
−
+−
++
2
1
0
2
2
2
2
4
4
100010101111
111111
000000000000000000000000000000
210
210
20
210
210
xxx
h
hhh
hhh
hh
hhh
hhh
47Chap. 8
• Example-11 (cont’d)– So, this algorithm requires 6 multiplications and 16 additions
• Comments:– In general, an efficient linear convolution can be used to obtain an
efficient cyclic convolution algorithm. Conversely, an efficientcyclic convolution algorithm can be used to derive an efficientlinear convolution algorithm
48Chap. 8
Design of fast convolution algorithm byinspection• When the Cook-Toom or the Winograd algorithms can not generate an
efficient algorithm, sometimes a clever factorization by inspection maygenerate a better algorithm
• Example-12 (Example 8.6.1, p.250) Construct a 3X3 fast convolutionalgorithm by inspection– The 3X3 linear convolution can be written as follows, which requires 9
multiplications and 4 additions
+++
+=
22
2112
201102
1001
00
4
3
2
1
0
xhxhxh
xhxhxhxhxh
xh
sssss
49Chap. 8
• Example-12 (cont’d)– Using the following identities:
– The 3X3 linear convolution can be written as:
( ) ( )( )( )
( )( ) 2211212121123
22110020202011022
1100101010011
xhxhxxhhxhxhs
xhxhxhxxhhxhxhxhs
xhxhxxhhxhxhs
−−++=+=−+−++=++=
−−+⋅+=⋅+=
⋅
−−−−
−−=
000100
100110010111
001011000001
4
3
2
1
0
s
ss
ss
(continued on the next page)
50Chap. 8
• Example-12 (cont’d)
– Conclusion: This algorithm, which can not be obtained by usingthe Cook-Toom or the Winograd algorithms, requires 6multiplications and 10 additions
⋅
⋅
++
+⋅
2
1
0
21
20
10
2
1
0
110101011100010001
000000000000000000000000000000
xxx
hhhh
hhh
hh
Top Related