[IEEE 2011 6th International Conference on Broadband and Biomedical Communications (IB2Com) -...

Abstract—This paper presents a novel FPGA based method to

implement a repeated squared-and-multiply algorithm in polynomial rings. The repeated square-and-multiply algorithm for exponentiation is discussed and constructed for a general function f(x). From that, an algorithm to apply for f(x)=xn+1 is also constructed and described in this paper. Simulations and implementation results using an FPGA are provided and discussed.

Index Terms— CMG- Cyclic Multiplicative Group, MG- Multiplicative Group, LCC- Local Cyclic Code, FPGA.

I. INTRODUCTION oding theory has been around for a long time and is applied widely in many fields in everyday life,

particularly in the information and communication fields. Coding theory research is split in three main directions: source code, channel code (has capacity detect and correct error) and cryptography [1],[2]. Almost all error correct codes are constructed according to theorem two of Shannon [1],[4]. Some methods to construct these codes include the combination, geometry and algebraic structure methods.

However, development of error correct codes is essentially related to the application of algebraic structures. The algebraic method has several advantages: the algebraic structure is very clear, coding and decoding can be easily implemented and code words are unique. Codes that have clear and easy to correct errors characteristics can be constructed and realized easily by using algebraic structures.

Operations which are usually used when constructing codes in algebraic structure include multiplication, division, exponentiation, modulo operation [5],[6]. These operations are usually realized by registers and rotation.

This paper proposes an FPGA-based implementation method for repeated square-and-multiply polynomials. This method is fundamental to realize cyclic coders and it is applied widely for the construction of coders, ciphers: Cyclic coders, Local cyclic coders, AES ciphers, RSA, Chor-Revest,

Merkele-Hellman, Elgamal, Rabin, [3], [7],[8],[9]. The paper is structured in the following order. Section II

describes two repeated square-and-multiply algorithms for exponentiation. Section III proposes a FPGA-based implementation method for these algorithms. Section IV highlights all relevant implementation results as well as the necessary discussion. Section V concludes the paper.

II. REPEATED SQUARE-AND-MULTIPLY ALGORITHM FOR EXPONENTIATION

A. Repeated square-and-multiply algorithm for exponentiation in Zn

The algorithm for repeated square-and-multiply for exponentiation in Zn [3] is described as below: INPUT: na ∈Z , and integer 0 k n≤ < whose binary

representation is 0

2t iii

k k=

= ∑ .

OUTPUT: modka n 1. Set 1b ← . If 0k = then return (b) 2. Set A a← . 3. If 0 1k = then set b a← 4. For i from 1 to t do the following:

4.1 Set 2 modA A n← . 4.2 If 1ik = then set * modb A b n←

B5: return (b).

Example 1: 54 mod13 10= ; Here: ( )10 24, 13, 5 101 2a n k t= = = = =

i ik A b

0 1 4 4

1 0 3 4

2 1 9 10

An FPGA-based Implementation for Repeated Square-and-Multiply Polynomials

Hieu T. Nguyen 1 , Minh N. Nguyen 1 , Cuong L. Nguyen2, Edhem Custovic3

1 Department of Electronic Engineering, PTIT University, Hanoi, Vietnam, 2Department of Telecommunication and Electronic Engineering, EPU University, Hanoi, Vietnam,

3Department of Electronic Engineering, Latrobe University, Melbourne, Australia

C

Proceedings of the 6th International Conference on Broadband Communications & Biomedical Applications, November 21 - 24, 2011, Melbourne, Australia

173 IB2COM 2011

B. Repeated square-and-multiply algorithm for exponentiation in mp

F

The repeated square-and-multiply algorithm for exponentiation in mp

F [3] is described as below:

INPUT: ( ) mpg x ∈F and an integer 0 1mk p≤ < − whose

binary representation is 0

2t iii

k k=

= ∑ . (The field mpF

is represented as [ ] ( )( )/p x f xZ , where

( ) [ ]pf x x∈Z is an irreducible polynomial of degree

m over pZ .)

OUTPUT: ( ) ( )modk

g x f x⎡ ⎤⎣ ⎦

1: Set ( ) 1s x ← . If 0k = then return ( ( )s x ).

2: Set ( ) ( )G x g x= .

3: If 0 1k = then set ( ) ( )s x g x← .

4. For i from 1 to t do the following: 4.1 Set ( ) ( ) ( )2 modG x G x f x← .

4.2 If 1ik = then set

( ) ( ) ( ) ( )* mods x G x s x f x← . 5. Return ( ( )s x ).

Example 2: Considering ( ) ( )114 5 21 mod 1x x x x+ + + +

Here: ( ) 4 1g x x x= + + , ( ) 5 2 1f x x x= + + ,

( )1011 1011 3k t= = =

The results from the algorithm are as follows:

i ik ( )G x ( )s x

0 1 4 1x x+ + 4 1x x+ + 1 1 3x 3 2x x+ 2 0 3x x+ 3 2x x+

3 1 3 2x x x+ + x

III. PROPOSED AN FPGA-BASED IMPLEMENTATION METHOD FOR REPEATED SQUARE-AND-MULTIPLY POLYNOMIALS

A. Construct an algorithm to apply for general f(x) From the researched algorithm above, an algorithm

diagram to realize on FPGA is constructed as in figure 1. In this diagram: Algorithm is implemented in two main

steps: - Multiplication: This step is multiplication of two normal binary numbers,

the resultant output is g_d with g_d <= gx*g_gen. In fact that is the multiplication of (n-1) low weight bits of the gx array with g_gen array (g_gen is an array that contain (n-1) bits),

the result is an array that contains (2n-2) bits g_d. - Modulo: Using division is identical to the division of two normal

binary numbers. Division is performed from in order from the high weight to low weight bit. An array is used to store data of dividend, another array is used to store data of the divisor. The division is performed by using left rotation to rotate the divisor to the same position with 1 value bit that has the highest weight then implement a XOR operation.

The resultant after each XOR operation (between the dividend and the rotated divisor) is stored and replaced in the array that contains the dividend. Division will be halted when the weight of the highest value bit (which is stored in the array that contains dividend) is less than weight of highest value bit (which is stored in array that contain divisor). Particularly, division stops after (n-2) clock cycles.

Remainder is the value of lowest weight (n-1) bits that are stored in the dividend array.

-Repeat to k power: The multiplication and modulo operation is repeated to k

times, to ensure to perform multiplication and modulo for k power.

B. Construct algorithm for 1= +( ) nf x x

With format of polynomial ( ) 1nf x x= + then above proposed algorithm is implemented as diagram in figure 2. In this diagram, it can be seen that multiplication and modulo in general algorithm are replaced by rotation and XOR operands.

IV. TESTING AND RESULTS Example 3: Considering ( ) ( )7 6 4 2 9 31 mod 1

kx x x x x x x x+ + + + + + + +

Using the algorithm in section II.B: ( ) 7 6 4 2 1g x x x x x x= + + + + + , ( ) 9 3 1f x x x x= + + + ,

with ( )1022 10110 4k t= = = , the results are displayed in Table I.

TABLE I. RESULT OF ( ) ( )7 6 4 2 9 31 mod 1k

x x x x x x x x+ + + + + + + +

i ik ( )G x ( )s x

0 0 7 6 4 2 1x x x x x+ + + + + 1 1 1 5 3 2 1x x x+ + + 5 3 2 1x x x+ + + 2 1 6 2 1x x x+ + + 8 7 5 2x x x x+ + +

3 0 6 3 2 1x x x+ + + 8 7 5 2x x x x+ + +

4 1 3 1x + 7 5 4 3 2x x x x x x+ + + + + Table I above represent steps to perform the operation

( ) ( )227 6 4 2 9 31 mod 1x x x x x x x x+ + + + + + + + using the algorithm in section II.B. It can be seen that the achieved result in this case is ( ) 7 5 4 3 2s x x x x x x x= + + + + + .


174 IB2COM 2011

Figure 1. Diagram of repeated square-and-multiply polynomials algorithm on FPGA


175 IB2COM 2011

i <= 0;

Set signals: gx, gy, g_xt, g_int, g00: array n bits; g00 <= (others => '0'); -- n bit 0 gy <= (others => '0'); -- n bit 0

g_gen(i)=1

g_xt <= gx ROL i;

gy <= gy XOR g_xt;

i <= i+1;

i = n-1

g_xt <= g00;

gx <= g_int;

g_int <= gy;

t <= 0;

g_int<= g_gen;

t = k

t <= t + 1;

Yes

No

No

Yes

No

Yes

Begin

End

g_out <= g_int;

Declare: In: g_gen: array n bits; k: power; Out: g_out: array n bits;

Figure 2. Diagram of repeated square-and-multiply polynomials algorithm on

FPGA for ( ) 1nf x x= +

When applying the algorithm in figure 1, with

g_gen = “011010111”, g_mode = “1000001011” using an Xilinx xc3s1200e FPGA platform with 1, 63k = yields following results:

TABLE II. RESULTS OF ( ) ( )7 6 4 2 9 31 mod 1

kx x x x x x x x+ + + + + + + + , 1, 63k =

K g_out k g_out 1 011010111 33 111101001 2 000101101 34 101001010 3 110111010 35 011011011 4 001000111 36 111011111 5 100010000 37 001100110 6 110100100 38 101011000 7 101010001 39 000111111 8 001001101 40 101011110 9 000011011 41 011000110 10 010010110 42 110110000 11 111000100 43 101001100 12 011110000 44 000100010 13 010011100 45 100110001 14 011001111 46 111101100 15 111000010 47 011001010 16 000001001 48 001000010 17 001110010 49 010010000 18 101000101 50 100111101 19 001010000 51 000011110 20 001110100 52 100010110 21 110111100 53 101011101 22 010111110 54 110111111 23 111111110 55 111000111 24 000101110 56 110001001 25 011000011 57 011101011 26 000110000 58 000001010 27 111010101 59 100001011 28 101101101 60 100110010 29 001101010 61 010010101 30 010101010 62 010111101 31 111100011 63 010000111 32 001000001

Figure 3 displays the simulation results of the implementation. Table II lists the results coresponding to performing the operation ( ) ( )7 6 4 2 9 31 mod 1

kx x x x x x x x+ + + + + + + + in FPGA using the

algorithm in section III.A is listed in table II. The output g_out is calculated using 63 different values of k ( 1, 63k = ). From this table, it can be seen that when k=22 the output g_out = “010111110”, or g_out = 7 5 4 3 2x x x x x x+ + + + + , this simulation result is identical to calulation result using the algorithm in section II.B.

TABLE III. REQUIRED LOGIC RESOURCES FOR

( ) ( )7 6 4 2 9 31 mod 1k

x x x x x x x x+ + + + + + + + , 1, 63k =

Logic utilization Used Utilization Number of Slices 4085 47% Number of Slices Flip Flops

576 3%

Number of 4 input LUTs 7801 44% Number of bonded IOBs 34 17% Number of GCLKs 1 4%

Example 4: Find multiplicative group with generator element

7 5 3 2( 1)ix x x xα = + + + + over ring 9 1x + .


176 IB2COM 2011

Remark: The multiplicative group with generator element 7 5 3 2( 1)ix x x xα = + + + + over ring 9 1x + has rank is 63 [11].

Applying algorithm in figure 2 for ring 9 1x + with g_gen = “010101101”, 1, 63k = and implement this design on FPGA (IC xc3s500e), the following results are yielded:

TABLE IV: RESULT OF

( ) ( )7 5 3 2 91 1 1 63k

x x x x mod x , k ,+ + + + + =

k g_out k g_out k g_out 1 010101101 22 101101010 43 101010101 2 001110011 23 110011001 44 011001110 3 111011101 24 011101111 45 101111011 4 100001111 25 001111100 46 1111000015 101111110 26 111110101 47 110101111 6 111111001 27 111001111 48 001111111 7 000010000 28 010000000 49 000000010 8 011010101 29 010101011 50 101011010 9 100110011 30 110011100 51 011100110

10 111011110 31 011110111 52 110111011 11 011111000 32 111000011 53 000011111 12 111101011 33 101011111 54 011111101 13 110011111 34 011111110 55 111110011 14 100000000 35 000000100 56 000100000 15 101010110 36 010110101 57 110101010 16 100111001 37 111001100 58 001100111 17 111101110 38 101110111 59 110111101 18 110000111 39 000111110 60 111110000 19 010111111 40 111111010 61 111010111 20 111111100 41 111100111 62 100111111 21 000001000 42 001000000 63 000000001

Figure 4 displays the simulation results of the implementation. Table IV contains outputs of the algorithm III.B in FPGA. In this table, 63 outputs (e.g maximum order of the generator polynomial 7 5 3 2( 1)ix x x xα = + + + + over

polynomial ring 9 1x + ) in cyclic multiplication group A are listed. It can be seen that all results are identical to the results using the algorithm in section II.B.

TABLE V. REQUIRED LOGIC RESOURCES FOR

( ) ( )7 5 3 2 91 1 1 63k

x x x x mod x , k ,+ + + + + =

Logic utilization Used Utilization Number of Slices 1985 42% Number of Slices Flip Flops 558 5% Number of 4 input LUTs 3645 39% Number of bonded IOBs 25 13% Number of GCLKs 1 4% From the simulation and synthesis results, it can be seen

that received results to both algorithms are accurate. Most current low cost FPGA devices such as Xilinx Spartan 3E have enough resources to implement these algorithms. From the tables of synthesis and the FPGA configuration, it can be seen that the implementation for polynomial ring ( ) 1nf x x= + needs less resources than the implementation for a general polynomial ( )f x .

Figure 3. Simulation result of the FPGA implementation of ( ) ( )7 6 4 2 9 31 mod 1

kx x x x x x x x+ + + + + + + +


177 IB2COM 2011

Figure 4. Simulation results of the FPGA implementation of ( ) ( )7 5 3 2 91 1 1 63

kx x x x mod x , k ,+ + + + + =

V. CONCLUSION In this paper a repeated squared-and-multiply polynomials

algorithm in polynomial rings using FPGAs has been presented. The algorithm assists the realization of the coding and decoding processes because it can be implemented in FPGA. This algorithm can also improve the speed of calculation cyclic multiplication group in polynomial rings which are used for building local cyclic codes.

REFERENCES

[1]. Todd K.Moon (2005), Error Correction Coding, Wiley-Interscience, a John Wiley & Sons, Inc., Publication.

[2]. Robert H. Morelos-Zaragoza (2006), The art of Error Correcting Coding, John Wiley & Sons, LTD, ISBNs: 0-470-84782-4.

[3]. Menezes A. J, Van Oorchot P. C. (1998), Handbook of Applied Cryptography, CRC Press.

[4]. Shannon, Claude Shannon and Warren Weaver (1963), The Mathematical Theory of Communication, University of Illinois Press, ISBN 0-252-72548-4

[5]. Richard E. Blahut (2003). Algebraic Codes for Data Transmission. Cambridge University Press. ISBN 0521553741.

[6]. Joan Daemen, Luc Van Linden, René Govaerts and Joos Vandewalle, Propagation Properties of Multiplication Modulo 2n-1, Proceedings of the 13th Symposium on Information Theory in the Benelux, Werkgemeenschap voor Informatie en Communicatietheorie, pp. 111-118, 1992.

[7]. Jeroen M. Doumen, Some Applications of Coding Theory in Cryptography, Technische Universiteit Eindhoven, 2003. Proefschrift. ISBN 90-386-0702-4.

[8]. Schneier, Bruce (1996). Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition (2nd ed.). Wiley. ISBN 978-0471117094.

[9]. Hars, L.: Fast truncated multiplication for cryptographic applications. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 211-225. Springer, Heidelberg (2005)

[10]. Nguyen Binh, Le Dinh Thich, System cyclic codes constructing from local cyclic codes on polynomial rings, International Conference on Advanced Technologies for Communications, Vietnam, Oct. 2008.

[11]. Nguyen Binh, Le Dinh Thich, The orders of polynomials and algorithms for defining order of polynomial over polynomial ring, 5th Vietnam Conference on Automation (5th VICA), Hanoi, Vietnam, Oct. 2002.


178 IB2COM 2011

[IEEE 2011 6th International Conference on Broadband and Biomedical Communications (IB2Com) -...

Documents

Transcript of [IEEE 2011 6th International Conference on Broadband and Biomedical Communications (IB2Com) -...