Post on 05-Dec-2021
Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF (2m) using
polynomial bases
by
Fan Zhou
B.Eng., Zhejiang University, 2013
A Report Submitted in Partial Fulfillment of the
Requirements for the Degree of
MASTER OF ENGINEERING
in the Department of Electrical and Computer Engineering
c© Fan Zhou, 2018
University of Victoria
All rights reserved. This report may not be reproduced in whole or in part, by
photocopying or other means, without the permission of the author.
ii
Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF (2m) using
polynomial bases
by
Fan Zhou
B.Eng., Zhejiang University, 2013
Supervisory Committee
Dr. Fayez Gebali, Supervisor
(Department of Electrical and Computer Engineering)
Dr. Watheq El-Kharashi, Departmental Member
(Department of Electrical and Computer Engineering)
iii
ABSTRACT
Finite field arithmetic is important for the field of information security. The inversion
operation consumes most of the time and resources among all finite field arithmetic
operations. In this report, two main classes of algorithms for inversion are studied.
The first class of inverters is Extended Euclidean based inverters. Extended Euclidean
Algorithm is an extension of Euclidean algorithm that computes the greatest common
divisor. The other class of inverters is based on Fermat’s little theorem. This class
of inverters is also called multiplicative based inverters, because, in these algorithms,
the inversion is performed by a sequence of multiplication and squaring. This report
represents a literature review of inversion algorithm and implements a multiplicative
based inverter and an Extended Euclidean based inverter in MATLAB. The experi-
mental results show that inverters based on Extended Euclidean Algorithm are more
efficient than inverters based on Fermat’s little theorem.
iv
Contents
Supervisory Committee ii
Abstract iii
Table of Contents iv
List of Tables vi
List of Figures vii
List of Acronyms viii
Acknowledgements ix
Dedication x
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Preliminaries: Binary Finite Field Arithmetic . . . . . . . . . . . . . 2
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Project Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Report Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Extended Euclidean algorithm 6
2.1 Extended Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . 6
2.2 An example of Extended Euclidean algorithm . . . . . . . . . . . . . 9
3 Itoh-Tsujii algorithm 11
3.1 Inversion based on Fermat’s little theorem . . . . . . . . . . . . . . . 11
3.2 Itoh-Tsujii algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 An example of Itoh-Tsujii algorithm . . . . . . . . . . . . . . . . . . 15
v
4 MATLAB Implementation 17
4.1 MATLAB results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Analysis and comparison . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Conclusion 20
Appendix A 22
Bibliography 26
vi
List of Tables
Table 2.1 An example of binary polynomial division . . . . . . . . . . . . 8
Table 2.2 An example of EEA . . . . . . . . . . . . . . . . . . . . . . . . . 10
Table 3.1 Inverse of a ∈ GF (2233) using an addition chain [1] . . . . . . . . 15
Table 4.1 Execution time of EEA and Itoh-Tsujii Algorithms on a quad-
core processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Table 4.2 Execution time of EEA and Itoh-Tsujii Algorithms on a dual-core
processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
vii
List of Figures
Figure 1.1 ECC Arithmetic Architecture . . . . . . . . . . . . . . . . . . 2
Figure 3.1 Flowchart of Itoh-Tsujii Algorithm . . . . . . . . . . . . . . . . 14
viii
List of Acronyms
EEA Extended Euclidean Algorithm
ECC Elliptic Curve Cryptography
FLT Fermat’s Little Theorem
GCD Greatest Common Divisor
SM scalar multiplication
VLSI Very Large Scale Integration
ix
ACKNOWLEDGEMENTS
I would like to thank my supervisor Dr. Gebali, who provided my valuable gudance
and advice throughout my graduate study. Besides my supervisor, I would like to
thank Ibrahim Hazmi for helping me improve my project. My gratitude also goes to
my parents and my roommate who constantly support me when I am in need.
x
DEDICATION
To my parents
Chapter 1
Introduction
1.1 Background
Elliptic Curve Cryptography (ECC) is a public-key cryptosystem based on the alge-
braic structure of elliptic curves over finite fields, which can be used to create faster
and more efficient cryptographic schemes.
The hierarchy of the computations involved in the implementation of ECC cryptosys-
tems is in a pyramid of four levels of operations. Finite field or modular arithmetic
is the foundation of the pyramid, as it is the basic building block of elliptic curve
point addition and point doubling. Whereas the scalar multiplication (SM) is per-
formed by repeating point addition and point doubling operations and is used by all
ECC cryptographic protocols. Figure 1.1 illustrates the arithmetic architecture of
SM computational processes.
An elliptic curve E(K) over a field K is defined by an equation [2]:
y2 + a1xy + a3y = x3 + a22x + a4x + a6 (1.1)
where a1, a2, a3, a4, a6 ∈ K, and the discriminant of E is 4 6= 0. In the binary field,
E(K) could be simplified as:
2
y3 + xy = x3 + ax2 + b (1.2)
where a, b ∈ K.
Figure 1.1: ECC Arithmetic Architecture [3]
1.2 Preliminaries: Binary Finite Field Arithmetic
The finite field GF (2m) of order 2m is called binary finite field. The element a(x) ∈GF (2m) can be expressed as a binary polynomial of degree m− 1 [2]:
a(x) = am−1xm−1 + am−2x
m−2 + · · ·+ a2x2 + a1x
1 + a0 (1.3)
where ai = 0 or 1.
A polynomial f(x) of degree m is said to be irreducible in GF (2m) if there does not
exist two polynomials g(x) and h(x) of lesser degree in GF (2m) such that f(x) =
g(x)h(x). In polynomial arithmetic, as the coefficients ai of the polynomial can be
either 0 or 1, an irreducible polynomial f(x) is used to reduce the result of any
operation if its degree is greater than m− 1. For instance, the operations defined in
field GF (25) are on an irreducible polynomial f(x) = x5 + x2 + 1.
3
Computing point multiplication requires point doubling and point addition, which
can be implemented using four basic operations, namely, addition, subtraction, mul-
tiplication and division.
Addition and subtraction in binary fields can be achieved by adding or subtracting
two of these polynomials together, and reducing the result modulo 2. For instance,
let a(x) = am−1xm−1 + · · · + a1x
1 + a0, b(x) = bm−1xm−1 + · · · + b1x
1 + b0 and
c(x) = a(x) + b(x) = cm−1xm−1 + · · ·+ c1x
1 + c0. If ak, bk and ck are the coefficients
of a(x), b(x) and c(x) respectively, then:
ck = (ak + bk) mod 2 (1.4)
The computational complexity of addition and subtraction in binary field is usually
neglected.
Multiplication in a finite field is multiplication modulo an irreducible polynomial. Let
a(x) and b(x) be the elements of GF (2m) and let modular multiplication c(x) also
be an element of the field. c(x) might be accomplished in two steps, by performing
first a polynomial product of the two operands a(x) and b(x), followed by a modular
reduction step using the irreducible polynomial f(x). Then, we have:
c(x) = a(x) · b(x) mod f(x) (1.5)
A great deal of work has been done in studying aspects of inversion in a finite field
since inversion is the most time-consuming of the four basic operations. The inverse
of a polynomial a(x) in GF (2m) is defined as the computation process to find a
polynomial a−1(x) in GF (2m), such that:
a(x) · a−1(x) mod f(x) = 1 (1.6)
Inversion algorithms can be classified into two main categories, the Extended Eu-
clidean Algorithm, and the Fermat’s Little Theorem based algorithm. These two
algorithms will be discussed in chapters 2 and 3.
4
1.3 Related Work
Several algorithms for computing the Extended Euclidean based algorithms have been
proposed in the literature [3-5]. In [4], a class of bit serial unidirectional systolic ar-
chitectures for inversion and division in polynomial basis has been proposed. They
also presented a variant of Extended Euclidean Algorithm (EEA) optimized for uni-
directional systolization with no carry propagation structure. Also, in this design, a
simpler distributed counter structure which is suitable for applications where the field
dimension may be large or variable is introduced. Yan [5] presents two-dimensional
systolic architectures for inversion based on a modified extended Euclidean algorithm.
The new architecture uses a distributed control mechanism for a variety of field sizes
and is suitable for Very Large Scale Integration (VLSI) implementation. In compari-
son to similar architectures, their architectures have smaller critical path delays and
use considerably fewer hardware costs. An optimized inversion algorithm that can
be applied very well in hardware was proposed in [6]. A two-dimensional multipli-
cation/inversion systolic architecture and a one-dimensional multiplication/inversion
systolic architecture was implemented and can apply very well to an Elliptic Curve
arithmetic unit required in elliptic curve cryptography.
In terms of the Itoh-Tsujii inverse algorithm in GF (2m), Rebeiro [7] proposed a mod-
ification of the Itoh-Tsujii algorithm called quad-Itoh-Tsujii algorithm which was
implemented on field-programmable gate-array platforms. That adapted algorithm
requires shorter addition chains and reduces the clock cycles significantly by using a
parallel architecture. A modified Itoh-Tsujii algorithm algorithm for inversion with
polynomial basis was proposed in [8]. An optimal addition chain was used for inver-
sion to reduce the operation time by the parallel computation between part of mul-
tiplications and squarings. Their inversion architecture with a digit serial multiplier
experimentally obtained 61% timing improvement and 69% less resources on average
than previous designs with normal basis. Another parallel version of the Itoh-Tsujii
algorithm was proposed in [9]. It used a special class of irreducible trinomials, namely,
P (x) = xm + xk + 1 to achieve its best performance. This special class of irreducible
trinomials reduces the computation complexity and yields a 30% timing improvement
on average compared to the standard version of it. In [10], a high-performance and
high-speed FPGA implementation of polynomial basis ITA over GF (2m) generated
by irreducible trinomials and pentanomials has been proposed. The structures are
5
designed by efficient digit-serial multiplier and k-times squarer blocks, where k is a
small positive integer. Their design provides a comparable improvement compared
with other implementations of the polynomial basis Itoh-Tsujii inversion algorithm.
1.4 Project Contributions
This project aims at finding an effective algorithm to perform inversion. Below are
several contributions of this project:
1) A literature review of finite field arithmetic and the related work of inversion
algorithm.
2) Introduce Extended Euclidean algorithm and Itoh-Tsujii algorithm.
3) Implement Extended Euclidean algorithm and Itoh-Tsujii algorithm on MATLAB.
4) Compare Extended Euclidean algorithm and Itoh-Tsujii algorithm.
1.5 Report Organization
This report is organized as follows. Chapter 2 introduces the Extended Euclidean
based algorithm in a polynomial field GF (2m). The application of an inverter based
on Fermat’s little theorem is presented in Chapter 3. The MATLAB implementation
of these two algorithms is in Chapter 4. Chapter 5 is the conclusion of these two
algorithms.
6
Chapter 2
Extended Euclidean algorithm
Euclidean algorithm is to calculate the greatest common divisor (GCD) of two inte-
gers. It makes use of the fact that GCD(m,n) = GCD(m−n, n) and GCD(m, 0) = m
and simply repeats the operation until n is zero. A more efficient way of doing this
is to use
GCD(m,n) = GCD(n,m mod n), (2.1)
and repeat until n is zero. For example, to calculate the GCD (38,8), we write GCD
(38, 8) = GCD (8, 6) = GCD (6, 2) = GCD (2, 0) = 2. Since modulo basically is
repeated subtractions, this is very much the same algorithm, but several subtractions
are done at once. In the case of prime fields, a great number of variants on Euclidean
algorithm have been developed for use in cryptographic applications, as in [11]. The
Extended Euclidean Algorithm may also be used to find the multiplicative inverse of
polynomials over GF (2m).
2.1 Extended Euclidean algorithm
Let f(x) be the irreducible polynomial over GF (2m). Also, let a(x) be the polynomial
representation in this basis. Obviously, since f(x) is irreducible and since degree
a(x) < degree f(x) holds, a(x) and f(x) are relatively prime. If we initiate the
7
Euclidean algorithm with f(x) and a(x), then the extended algorithm generates two
polynomials, s(x) and t(x), with degrees degree s(x) < m and degree t(x) < m − 1.
These polynomials satisfy:
a(x)s(x) + f(x)t(x) = GCD(a(x), f(x)). [12] (2.2)
Since f(x) is an irreducible polynomial in GF (2m), we have GCD(a(x), f(x)) = 1.
Hence, we find that a(x)s(x)+f(x)t(x) = 1. Over the finite field GF (2m), f(x)t(x) =
0. Then a(x)c(x)+f(x)d(x) = 1 could be simplified to a(x)s(x) = 1. Then the inverse
element a−1(x) has the polynomial representation s(x). Therefore, we can use the
EEA for inversion in GF (2m) using a polynomial basis.
Algorithms 2.1 and 2.2 show the EEA algorithm.
Algorithm 2.1 Binary Polynomial Division (PloyDivide) [13]
Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: r&q.1: a← a(x), f ← f(x), r ← 1, q ← 12: while (fdeg ≥ adeg) do3: a← a << (fdeg − adeg)4: r ← a⊕ f5: if (rdeg ≥ adeg) then6: q ← (q << (fdeg − rdeg)) + 17: else8: q ← (q << (fdeg − adeg))9: end if10: f ← r11: end while
Algorithm 2.2 Extended Euclidean Algorithm (EEA) [13]
Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: a(x)−1 mod f(x).1: a← a(x), f ← f(x), t← 0, s← 1.2: while r 6= 0 (gcd 6= 1) do3: Perform PolyDivide to find r & q (f = a× q + r)4: f ← a, a← r5: t← s, s← t− q × s6: end while
8
Algorithm 2.1 is the binary division algorithm. The loop from line 2 to 11 computes
the remainder and the quotient when f(x) is divided by a(x). In line 3, we firstly left
shift a(x) by the amount of fdeg − adeg and use the new value of a(x) to do a(x) xor
f(x) to obtain r(x) in line 4. From line 5 to 9, we compare the degree of r(x) with
the degree of a(x). If the degree of r(x) is greater than or equals to da, we left shift
q(x) by the amount of fdeg − rdeg and add 1 to the last bit of q(x). If the degree of
r(x) is smaller than da, we left shift q(x) by the amount of fdeg − adeg. And then we
assign the value of r(x) to f(x) in line 10. This process is repeated until the degree
of f(x) is smaller than da.
For instance, let f(x)=1111010000, a(x)=11011, da is the degree of the initial value of
a(x) which is 4, df is the degree of f(x), dr is the degree of r(x), dfa=df -da, dfr=df -dr
and q(x) is the quotient. From Table 2.1, the initial value of dfa is 5, so we left shift
a(x) by 5 and the new value of a(x) is 1101100000. The value of r(x) is computed
by a(x) xor f(x). Since dr is greater than da and dfr is 2, we left shift q(x) by 2 and
add 1 to the last bit. After the third iteration, dr is 3 which is smaller than 4, the
quotient is 101100 with the remainder 100.
Table 2.1: An example of binary polynomial division
Iteration a(x) r(x) dfa dfr q(x)0 11011 1 5 11 1101100000 10110000 3 2 1012 11011000 1101000 2 1 10113 1101100 100 101100
Algorithm 2.2 use the remainder and the quotient produced in Algorithm 2.1 and
iteratively compute the s coefficient which is the inverse of a. The computational
complexity of EEA is O(m2) [10].
The proof of Algorithm 2.2 relies on the fact that for two nonzero polynomials a and
f , the EEA produces the unique pair of polynomials (s, t) such that:
ft + as = GCD(f, a). (2.3)
If we replace f and a by a and (f mod a) and let t be the previous value of t and s
be the previous value of s, we have:
9
at + (f mod a)s = GCD(a, f mod a) (2.4)
From Equation 2.1, it follows that:
GCD(f, a) = GCD(a, f mod a). (2.5)
By using Equation 2.5, it can be observed that the right part of Equation 2.3 and 2.4
are equal, therefore we can write that:
ft + as = at + (f mod a)s (2.6)
Based on the fact that the Euclidean division of f by a may be written f = aq + r
and r = f mod a = f − aq, we rearrange Equation 2.6 as:
ft + as = at + (f mod a)s
= at + (f − aq)s
= fs + a(t− qs)
Hence,
t = s
s = t− qs(2.7)
In this recursive function, the new value of s which is the output of Algorithm 2.2 can
be computed directly from its current values and its previous value by the formulas
s = t − qs. After iteratively computing the s coefficient by using the the quotient
obtained in Algorithm 2.1, we can get the value of s which is the inverse of a.
2.2 An example of Extended Euclidean algorithm
Table 2.2 shows an example of how the EEA algorithm works, where m=25, f(x) =
x25+x3+1 = x”2000009”, a(x) = x24+x8+x7+x6+x = x”10000E2”. The variables
10
in this example are displayed in hexadecimal representation.
All variables are initialized as follows: r = 1, q = 1, t = 0, s = 1.
Each variable is computed as follow:
f(i) = a(i− 1)
a(i) = r(i− 1)
r(i) and q(i) are calculated by using Algorithm 2.1. and the value of f(i) and a(i).
t(i) = s(i− 1)
s(i) = t(i− 1)⊕ (q(i)× s(i− 1)).
At the fifth iteration, r is 1 and the new value of r to the next iteration would be 0.
Thus the value of s in this iteration is the invese of a(x) mod f(x).
As shown in Table 2.2, the multiplicative inverse of (x10000E2 mod x2000009) is
(x054ED9E).
Table 2.2: An example of EEA
It# a f r q t s
0 10000E2 2000009 1 1 0 11 1CD 10000E2 1CD 2 1 22 D0 1CD D0 1B8CA 2 371953 6D D0 6D 2 37195 6E3284 A 6D A 2 6E328 EB7C55 1 A 1 E EB7C5 54ED9E
11
Chapter 3
Itoh-Tsujii algorithm
3.1 Inversion based on Fermat’s little theorem
The simple and primary dividers based on Fermat’s little theorem are also known
as multiplicative based dividers because in Fermat’s little theorem, the division is
performed by a sequence of squarings and multiplication.
The Itoh-Tsujii algorithm based on Fermat’s little theorem was originally proposed to
be applied in [14] using Normal Basis representation. Since its publication, however,
several improvements and variations of it have been reported in [6-8] showing that it
can also be used in other field representations such as the polynomial representation.
To compute inverse using normal bases representation, basis conversion between poly-
nomial and normal bases is needed at the beginning and end of the operation. The
algorithm to convert polynomial bases to normal bases is complicated and takes a
lot of computational work which influence the speed of compute inversion in normal
bases [15].
Binary field multiplication using normal bases representation is more complicated
and more costly in time and implementation area compared to multiplication using
polynomial bases [16]. Therefore, normal bases are competitive only with very few
multiplications [17]. The Itoh-Tsujii computes the multiplicative inverse using a series
of multiplications and squarings. Although squaring in normal bases is computed by
a cyclic shift of the binary representation [18], the higher computational complexity
12
of multiplication leads to efficiency decrease.
Therefore, this project implements the Itoh-Tsujii algorithm using a polynomial basis
and compares the performance of Itoh-Tsujii algorithm with EEA in the same bases.
Let p be a prime and let a be an integer satisfying GCD(a, p) = 1. Then:
ap−1 ≡ 1( mod p) (3.1)
or
a× ap−2 ≡ 1( mod p) (3.2)
Hence we can conclude the inversion of any integer a over GF (p) is ap−2.
For example, in GF (5), the numbers chosen for a is 3. Then the inversion of 3 over
GF (5) is 35−2 = 2 ( mod 5) and 2× 3 = 1 ( mod 5).
Expanding this technique to GF (2m), we can write a2m−1 = a× a2
m−2= 1.
Hence:
a−1 = a2m−2. (3.3)
3.2 Itoh-Tsujii algorithm
Itoh-Tsujii algorithm is based on Fermat’s little theorem, by which the inverse of an
element a ∈ GF (2m) is computed by a−1 = a2m−2.
A straightforward implementation of Equation 3.3 requires m−2 multiplications and
m− 1 squarings. The Itoh-Tsujii algorithm reduces the number of multiplication to
log2(m − 1) + HW (m − 1) − 1, where HW (m − 1) is the Hamming weight of the
binary representation of m−1 and the number of required squaring is m−1[10]. This
remarkable saving on the number of multiplications is based on the observation that
the inverse can be rewritten from [19] as:
13
a−1 = [Sm−1(a)]2 (3.4)
where Sk = a2k−1 ∈ GF (2m) and k ∈ N . Let k, j be two positive integers. Then, the
element Sk+j ∈ GF (2m) can be expressed as:
Sk+j = Sk2j · Sj = Sj
2k · Sk (3.5)
In Itoh-Tsujii algorithm, an addition chain is used to reduce the number of multi-
plications required and perform this field exponentiation more efficiently. Addition
chain for an integer value such as m−1 is a series of positive integers with t elements
such that, C={c1, c2, · · · , ct}. Algorithm 3.1 and the flowchart in Figure 3.1 show
how to compute the addition chain. Given f(x) of m degree, we have c1 = 1 and
ct = m− 1. If ci is even, ci−1 = ci/2 and if ci is odd, ci−1 = ci − 1.
Hence, to compute a−1, we should use the Equation 3.3 and an addition chain con-
structed using Algorithm 3.1 to achieve Sm−1(a) = a2m−1.
Itoh-Tsujii Algorithm is illustrated in Algorithm 3.2 and the flowchart of it is shown in
Figure 3.2. Considering Equation 3.4, we can compute the inverse of a by calculating
the square of Sm−1(a). Therefore, Algorithm 3.2 iteratively computes the Si coeffi-
cients in the order stipulated by the addition chain. In the final iteration, after having
computed the coefficient St = a2m−1−1, the algorithm returns the required multiplica-
tive inversion by performing a regular field squaring, namely, St2 = a2
m−2= a−1. The
inverse of a is St2.
It has been shown that the maximum number of multiplication in this method is t
and the required number of square operation is m − 1, where t is the step-length of
the addition chain for m− 1 [9].
The advantage of Fermat’s little theorem based inversion algorithm is that it can be
implemented just by using multiplication and squaring. This eliminates the need to
add any extra components, such as dividers.
14
Algorithm 3.1 Finding the addition chain
Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: An addition C of length t.1: i← 1, C(i)← m− 1.2: while C(i) >1 do3: if C(i) mod 2 == 0 then4: C(i + 1)← C(i)/25: else6: C(i + 1)← C(i)− 17: end if8: i← i + 19: end while
Algorithm 3.2 Itoh-Tsujii Algorithm [19]
Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: a(x)−1 mod f(x).1: S0 ← a(x).2: for i from 1 to t do3: Si = (Si1)
2Ci2 × Si2
4: end for5: a−1(x)← (St)
2
Figure 3.1: Flowchart of Itoh-Tsujii Algorithm
15
3.3 An example of Itoh-Tsujii algorithm
The inversion operation for GF (2233) has been illustrated with an example.
To calculate a−1 in GF (2233) with m = 233, we use the addition chain
C = {C(t), · · · , C(2), C(1)}
with t elements.
From Algorithm 3.1 , we have C(1) = m − 1 = 232. Since C(1) = 232 is an even
number, then, C(2) = C(1)/2 = 116. If C(i) is odd, C(i + 1) follows the rule that
C(i + 1) = C(i)− 1.
Therefore, we obtain the addition chain with length t = 10 :
C={1, 2, 3, 6, 7, 14, 28, 29, 58, 116, 232 }.
Table 3.1: Inverse of a ∈ GF (2233) using an addition chain [1]
Step SV i(a) SVj+Uk(a) Exponentiation
1 S1(a) a
2 S2(a) S1+1(a) (S1)21S1 = a2
2−1
3 S3(a) S2+1(a) (S2)21S1 = a2
3−1
4 S6(a) S3+3(a) (S3)23S3 = a2
6−1
5 S7(a) S6+1(a) (S6)21S1 = a2
7−1
6 S14(a) S7+7(a) (S7)27S7 = a2
14−1
7 S28(a) S14+14(a) (S14)214S14 = a2
28−1
8 S29(a) S28+1(a) (S28)21S1 = a2
29−1
9 S58(a) S29+29(a) (S29)229S29 = a2
58−1
10 S116(a) S58+58(a) (S58)258S58 = a2
116−1
11 S232(a) S116+116(a) (S116)2116S116 = a2
232−1
The computational process is illustrated in Table 3.1, Vi are the integers in the addi-
tion chain and Vj = Vi−1, Vi = Vj + Uk.
From Equations 3.4 and 3.5, we have:
SVj+Uk=SVj
2j · SUk, where SVi
= a2Vi−1
.
Thus, we can rewrite SV i(a) as:
SVi(a) = SVj+Uk
=SVi−1+Uk= SVj
2VjSUk= a2
Vi−1.
16
As shown in Figure 3.2, we obtain the value of S233 and the inverse of a is (S233)2.
It may be noted that the Itoh-Tsujii Algorithm for field GF (2m) requires a high
number of squarings. The large number of squarings required leads to efficiency
decrease.
17
Chapter 4
MATLAB Implementation
4.1 MATLAB results
Inverters based on EEA and Itoh-Tsujii Algorithms are implemented in MATLAB
as shown in Appendix A. In MATLAB, a polynomial is represented as a vector.
For instance, to calculate the inverse of a(x) = x4 + x2 + 1 with an irreducible
polynomial f(x) = x5 + x2 + 1 in GF (25), we input a(x)=[(MSB) 1 0 1 0 1 (LSB)]
and f(x)=[(MSB) 1 0 0 1 0 1 (LSB)]. After implementing the MATLAB code, the
results of EEA and Itoh-Tsujii algorithms are both [(MSB) 1 1 0 1 0 (LSB)] which
implies the inverse a−1(x) of a(x) is x4 + x3 + x.
To compare the performance of inverters based on EEA and Itoh-Tsujii Algorithms,
the timeit function and the stopwatch timer function, namely, tic and toc functions
are used to time how long the MATLAB code of EEA and Itoh-Tsujii algorithms take
to run.
Table 4.1 and 4.2 presents the MATLAB implementation time of the EEA and Itoh-
Tsujii algorithms in two different processor. The platform used in Table 4.1 is a Dell
Optiplex 9020 computer with a 4th generation Intel Core-i7-4790 3.6 GHz quad-core
processor, 16 GB of RAM. The platform used in Table 4.2 is a Sony SVF142171SCW
computer with a 3nd generation Intel Core-i5-3337 1.8 GHz dual-core processor, 8
GB of RAM. The results show that performance of both algorithms on a quad-core
processor is more satisfied than that on a dual-core processor.
18
With the key length increases the execution time increases within an acceptable range.
The performance of the EEA based inverters implemented in MATLAB is considered
promising from the result.
The execution time of Itoh-Tsujii algorithm increases as the key size m increases. For
m smaller than 23, the Itoh-Tsujii algorithm performs efficiently. However, when the
key size becomes greater than 23, the performance of Itoh-Tsujii algorithm is not as
efficient as EEA.
Table 4.1: Execution time of EEA and Itoh-Tsujii Algorithms on a quad-core proces-sor
m 7 20 23 25 27 31
EEA/Time(ms) 2 11 14 15 16 18
Itoh-Tsujii/Time(ms) 3.2 28 70 160 356 1007
Table 4.2: Execution time of EEA and Itoh-Tsujii Algorithms on a dual-core processor
m 7 20 23 25 27 31
EEA/Time(ms) 3 25 29 33 34 35
Itoh-Tsujii/Time(ms) 12 49 118 375 845 2345
4.2 Analysis and comparison
For an efficient implementation of ECC, it is very important to carry out finite field
operations faster and use lesser resources. The inversion operation consumes most
of the time and resources. Therefore, the speed of inversion has a great impact on
the computation time of ECC. EEA and Itoh-Tsujii algorithms have been effective
in achieving fast inversion.
The Extended Euclidean Algorithm finds the inverse in binary fields using repeated bi-
nary polynomial division operations. Since performing the division is time-consuming,
19
the EEA replaces the division with shifts and subtractions which can be implemented
efficiently.
The Itoh-Tsujii algorithm performs inversion by a sequence of multiplication and
squaring. In order to reduce the number of multiplications, an addition chain can be
used to carry out the computation of the multiplicative inversion. With the addition
chain, the Itoh-Tsujii algorithm computes the inverse in less time using a recursive
re-arrangement of finite field operations.
In relation to speed, EEA based inverters yield an efficient way to compute inverse in
the binary field since EEA based inverters mainly use shifts and subtractions. Itoh-
Tsujii algorithm has a higher computational complexity than EEA since it requires
many multiplication and squaring operations to compute inverse. Therefore, EEA
based inverters take less computational work in polynomial bases. The results reveal
that they both are very efficient for the key size smaller than 23. But when the key
size becomes greater than 23, EEA based inverters are much faster.
20
Chapter 5
Conclusion
Finite field arithmetic is used in a variety of applications, including in coding theory
and cryptography. Compared to other arithmetic operations in finite fields, the in-
version is the most time-consuming operation. Efficient implementation of inversion
would therefore be a challenging problem. In general, the most common methods to
compute inversion are based on Itoh-Tsujii algorithm and EEA.
The Euclidean Algorithm is a set of instructions for finding the greatest common
divisor of any two positive numbers. EEA is an extension of Euclidean Algorithm
that computes the greatest common divider and finds the multiplicative inverse using
repeated division operations.
The Itoh-Tsujii Algorithm is based on Fermat’s little theorem. This algorithm per-
forms the inversion by a series of multiplications and squarings. In order to reduce
the number of multiplications, Itoh-Tsujii Algorithm uses addition chain to perform
inverse more efficiently.
To perform inversion in finite field, some other schemes have been proposed such as
Wiener-Hopf equation based inverters. Morii [20] proved that solving the discrete
time Wiener-Hopf equation is equivalent to performing division over finite fields. The
hardware efficiency of these inverters is not comparable with Itoh-Tsujii and Extended
Euclidean based inverters.
This report provides a literature review of inverters base on EEA and Itoh-Tsujii
Algorithm. These two common classes of inverters which are widely used for the
21
cryptographic purpose are illustrated with examples. The MALAB implementation
of EEA and Itoh-Tsujii Algorithm has been presented in this report. The execution
time in MATLAB shows that the EEA is more efficient than Itoh-Tsujii algorithm in
polynomial bases.
For the future work, the optimization of inverters based on Itoh-Tsujii Algorithm
with large key size might be a reasonable starting point. Finding the parallel version
of the Itoh-Tsujii algorithm or using an optimal addition chain might be useful to
speed up performance.
22
Appendix A
function [q,r]=func_divide(A,F)
% F is fivided by A
% obtain the quotient and the remainder
q=1;
r=1;
da=func_poly_degree(A);
df=func_poly_degree(F);
if da>df
q=0;
r=F;
elseif da==0
q=F;
r=0;
else
while (df>=da)
if r==0
break
end
%Caculate the remaider
23
dfa=df-da;
B=[A,zeros(1,dfa)]; % left shift A by amount of dfa
r=xor(B,F);
r=deletezeros(r);
%Caculate the quotient
dfr=func_poly_degree(F)-func_poly_degree(r);
if(func_poly_degree(r)>=da)
q=[q,zeros(1,dfr)];
q(end)=1;
else
q=[q,zeros(1,dfa)];
end
F=r
df=func_poly_degree(F);
end
end
end
function inv_a=func_EEA(A,F)
% Extended Euclidean algorithm
% Algorithm finds the inverse of an element A in F_{2^m}.
% F is the primitive polynomial
FF=F;
AA=A;
g1=0;
g2=1;
r=1;
C=F;
24
while 1
[q,r]=func_divide(A,F)
if r==0
break
end
g3=g1;
g1=g2;
B=func_poly_mult(g2,q);
delta=func_poly_degree(B)-func_poly_degree(g3);
g3=[zeros(1,delta),g3];
g2=xor(B,g3);
F=A;
A=r;
end
inv_a=g2; % The inverse of A
% test if A mutiplied by the inverse of A equals 1
mul=func_poly_mult(inv_a,AA);
[q,r]=func_divide(FF,mul); % mod F
if r==1
fprintf(’\n inv_a*a=1, the answer is correct\n’)
end
end
function inv_a=main_inv(A,F)
%Itoh-Tsuji Algorithm
m=length(F)-1;
i=1;
25
b=cell(20);
%Generate addition chain c(i)
c(i)=m-1;
while c(i)>1
if (mod(c(i),2)==0)
c(i+1)=c(i)/2;
i=i+1;
else
c(i+1)=c(i)-1;
i=i+1;
end
end
b{c(i)}=A;
while i>1
l= c(i-1)-c(i);
p=func_square( b{c(i)} , l,F );
b{c(i-1)} = func_poly_mult(p,b{l},m,F);
i=i-1;
end
%inverse of A is
inv_a=func_poly_mult(b{c(i)},b{c(i)},m,F);
% test if A mutiplied by the inverse of A (inv_a) equals to 1
p=func_poly_mult(inv_a,A,m,F); % p=a*a^(-1)
flag = find(p~=0 );
p = p(flag:end) % remove leading zeros
if(p==1)
fprintf(’\n inv_a*a=1, the answer is correct\n’)
end
26
Bibliography
[1] A. A. Zadeh, “Division and inversion over finite fields,” in Cryptography and
Security in Computing. InTech, 2012.
[2] D. Hankerson, A. J. Menezes, and S. Vanstone, Guide to elliptic curve cryptog-
raphy. Springer Science & Business Media, 2006.
[3] I. H. Hazmi, F. Zhou, F. Gebali, and T. F. Al-Somani, “Review of elliptic curve
processor architectures,” in Communications, Computers and Signal Processing
(PACRIM), 2015 IEEE Pacific Rim Conference on. IEEE, 2015, pp. 192–200.
[4] A. K. Daneshbeh and M. A. Hasan, “A class of unidirectional bit serial sys-
tolic architectures for multiplicative inversion and division over GF(2m),” IEEE
Transactions on Computers, vol. 54, no. 3, pp. 370–380, 2005.
[5] Z. Yan, D. V. Sarwate, and Z. Liu, “High-speed systolic architectures for finite
field inversion,” Integration, the VLSI Journal, vol. 38, no. 3, pp. 383–398, 2005.
[6] A. P. Fournaris and O. Koufopavlou, “Applying systolic multiplication–inversion
architectures based on modified Extended Euclidean algorithm for GF(2k) in el-
liptic curve cryptography,” Computers and Electrical Engineering, vol. 33, no. 5,
pp. 333–348, 2007.
[7] C. Rebeiro, S. S. Roy, D. S. Reddy, and D. Mukhopadhyay, “Revisiting the Itoh-
Tsujii inversion algorithm for FPGA platforms,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 19, no. 8, pp. 1508–1512, 2011.
[8] L. Li and S. Li, “Fast inversion in GF(2m) with polynomial basis using optimal
addition chains,” in Circuits and Systems (ISCAS), 2017 IEEE International
Symposium on. IEEE, 2017, pp. 1–4.
27
[9] F. Rodrıguez-Henrıquez, G. Morales-Luna, N. A. Saqib, and N. Cruz-Cortes,
“Parallel Itoh–Tsujii multiplicative inversion algorithm for a special class of tri-
nomials,” Designs, Codes and Cryptography, vol. 45, no. 1, pp. 19–37, 2007.
[10] B. Rashidi, R. R. Farashahi, and S. M. Sayedi, “High-performance and high-
speed implementation of polynomial basis itoh–tsujii inversion algorithm over gf
(2 m),” IET Information Security, vol. 11, no. 2, pp. 66–77, 2017.
[11] J. Vliegen, N. Mentens, J. Genoe, A. Braeken, S. Kubera, A. Touhafi, and I. Ver-
bauwhede, “A compact FPGA-based architecture for elliptic curve cryptography
over prime fields,” in Application-specific Systems Architectures and Processors
(ASAP), 2010 21st IEEE International Conference on. IEEE, 2010, pp. 313–
316.
[12] M. Olofsson, VLSI Aspects on Inversion in finite fields. Department of Electrical
Engineering, Linkopings university, 2002.
[13] I. H. Hazmi, “Project: EEA-based polynomial inversion over GF(2m): FPGA
design and implementation,” ECE, University of Victoria, Tech. Rep., 2015.
[14] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative inverses in
GF(2m) using normal bases,” Information and computation, vol. 78, no. 3, pp.
171–177, 1988.
[15] A. Ibrahim, F. Gebali, and T. F. Al-Somani, “Systolic array architectures for
Sunar–Koc optimal normal basis type ii multiplier,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 23, no. 10, pp. 2090–2102, 2015.
[16] F. Gebali and T. Al-Somani, “Finite field multiplication using reordered normal
basis multiplier,” in Broadband and Wireless Computing, Communication and
Applications (BWCCA), 2011 International Conference on. IEEE, 2011, pp.
320–326.
[17] D. J. Bernstein and T. Lange, “Type-II optimal polynomial bases,” in Interna-
tional Workshop on the Arithmetic of Finite Fields. Springer, 2010, pp. 41–61.
[18] B. Sunar and C. K. Koc, “An efficient optimal normal basis type II multiplier,”
IEEE Transactions on Computers, vol. 50, no. 1, pp. 83–87, 2001.
28
[19] F. Rodriguez-Henriquez, N. Cruz-Cortes, and N. Saqib, “A fast implementation
of multiplicative inversion over GF(2m),” in Information Technology: Coding and
Computing, 2005. ITCC 2005. International Conference on, vol. 1. IEEE, 2005,
pp. 574–579.
[20] M. Morii, M. Kasahara, and D. L. Whiting, “Efficient bit-serial multiplication
and the discrete-time Wiener-Hopf equation over finite fields,” IEEE Transac-
tions on Information Theory, vol. 35, no. 6, pp. 1177–1183, 1989.